Thank you. I think I need to be patient and wait for RC3 for the FORTRAN interfaces. That is not too bad as I am back to the university and have lots of other things to do. I was so frustrated not having any CUDA hardware last autumn that I went and bought my own so it is at home. I did a lot of the work over the holiday. I know a shop in London (Yoyotech) and last time I was down there for a meeting I went to see them and when I saw they had a 2 Gig version of the GTX 460 I bought one and a box to run it on.
I work with a colleague who has some large matrix problems which I help him to run. We currently run the larger ones on ScaLAPACK and Parpack. I am using this work as a pilot to work out how much quicker we could run things if we had some GPUs as well.
I agree with you about single precision speed and I think that for the larger sizes the mixed precision will be the best option for problems where I want to solve Ax=b type problems. See dsgesv results below.
Two thoughts there. For my particular problem, it is more efficient to store the transpose of the matrix because it is generated a row at a time. There is an option to solve the transpose problem when using DGETRF and DGETRS separately, and I see gains with that with GotoBLAS (without GPU). There is not the same option with DGESV and DSGESV. I suspect I could get inside the code and generate a transpose version, but would that be an option which could be added?
Also, in some codes I want to factorise once and then solve several times. What would be needed to use the mixed precision approach to that?
Here are some results. This may be a mix of magma_blas and CUblas.
- Code: Select all
device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory
testing_dsgesv_gpu -nrhs 1 -N 1024
N DP-Factor DP-Solve SP-Factor SP-Solve MP-Solve ||b-Ax||/||A|| NumIter
1024 32.70 26.48 43.38 37.64 11.56 1.565239e-16 3
2048 48.86 44.12 122.34 119.92 42.16 1.120209e-15 3
3072 58.57 54.45 176.71 168.60 66.74 1.795437e-16 3
4032 63.05 59.39 256.07 250.92 78.07 4.456418e-14 4
5184 66.34 63.62 286.66 282.16 118.01 3.926494e-16 3
6016 67.78 65.28 299.07 293.68 116.43 1.428188e-14 4
7040 69.05 66.99 307.25 304.99 147.38 1.091337e-15 3
8064 70.37 68.50 320.73 316.82 141.02 2.429935e-16 4
9088 71.22 69.49 327.65 322.49 156.68 8.076463e-15 4
10240 71.50 70.01 331.08 327.58 168.54 1.963176e-14 4