Thank you. I think I need to be patient and wait for RC3 for the FORTRAN interfaces. That is not too bad as I am back to the university and have lots of other things to do. I was so frustrated not having any CUDA hardware last autumn that I went and bought my own so it is at home. I did a lot of the work over the holiday. I know a shop in London (Yoyotech) and last time I was down there for a meeting I went to see them and when I saw they had a 2 Gig version of the GTX 460 I bought one and a box to run it on.

I work with a colleague who has some large matrix problems which I help him to run. We currently run the larger ones on ScaLAPACK and Parpack. I am using this work as a pilot to work out how much quicker we could run things if we had some GPUs as well.

I agree with you about single precision speed and I think that for the larger sizes the mixed precision will be the best option for problems where I want to solve Ax=b type problems. See dsgesv results below.

Two thoughts there. For my particular problem, it is more efficient to store the transpose of the matrix because it is generated a row at a time. There is an option to solve the transpose problem when using DGETRF and DGETRS separately, and I see gains with that with GotoBLAS (without GPU). There is not the same option with DGESV and DSGESV. I suspect I could get inside the code and generate a transpose version, but would that be an option which could be added?

Also, in some codes I want to factorise once and then solve several times. What would be needed to use the mixed precision approach to that?

Here are some results. This may be a mix of magma_blas and CUblas.

- Code: Select all
`fletcher@fletcher-desktop:~/magma_1.0.0-rc2/testing$ ./testing_dsgesv_gpu`

device 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory

Usage:

testing_dsgesv_gpu -nrhs 1 -N 1024

Epsilon(double): 1.110223e-16

Epsilon(single): 5.960464e-08

N DP-Factor DP-Solve SP-Factor SP-Solve MP-Solve ||b-Ax||/||A|| NumIter

==================================================================================

1024 32.70 26.48 43.38 37.64 11.56 1.565239e-16 3

2048 48.86 44.12 122.34 119.92 42.16 1.120209e-15 3

3072 58.57 54.45 176.71 168.60 66.74 1.795437e-16 3

4032 63.05 59.39 256.07 250.92 78.07 4.456418e-14 4

5184 66.34 63.62 286.66 282.16 118.01 3.926494e-16 3

6016 67.78 65.28 299.07 293.68 116.43 1.428188e-14 4

7040 69.05 66.99 307.25 304.99 147.38 1.091337e-15 3

8064 70.37 68.50 320.73 316.82 141.02 2.429935e-16 4

9088 71.22 69.49 327.65 322.49 156.68 8.076463e-15 4

10240 71.50 70.01 331.08 327.58 168.54 1.963176e-14 4

Best wishes

John