Hi,
I am facing trouble with SGEMM function. I got following output where error approaches to infinity by running testing_sgemm. In version magma1.1 I tried to add additional function but I always gets wrong result for sgemm operation. When I tried just to copy value of A in kernel code then I can not even get the original data back on CPU. I want to ask whether the presented source code for sgemm operation is correct?
> ./testing_sgemm
device 0: GeForce GTX 560 Ti, 1800.0 MHz clock, 1023.6 MB memory, capability 2.1
Usage:
testing_sgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = N transB = N
M N K MAGMA GFLop/s CUBLAS GFlop/s error
==================================================================
1024 1024 1024 513.63 442.96 1.088222e+38
1280 1280 1280 531.53 512.50 inf
1600 1600 1600 568.85 512.00 1.618371e+38
2000 2000 2000 590.19 515.28 inf
2500 2500 2500 562.16 488.20 inf
3125 3125 3125 591.78 559.36 inf
3906 3906 3906 600.21 560.27 inf
4882 4882 4882 608.12 563.61 inf
6102 6102 6102 605.01 519.26 inf
magma-1.2.0/testing> ./testing_sgemm
device 0: GeForce GTX 560 Ti, 1800.0 MHz clock, 1023.6 MB memory, capability 2.1
Usage:
testing_sgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = N transB = N
M N K MAGMA GFLop/s CUBLAS GFlop/s error
==================================================================
1024 1024 1024 513.88 441.23 1.626685e+38
1280 1280 1280 533.97 513.88 inf
1600 1600 1600 568.69 512.45 inf
2000 2000 2000 589.58 514.11 inf
2500 2500 2500 561.38 592.78 inf
3125 3125 3125 591.03 559.03 inf
3906 3906 3906 600.06 560.36 inf
4882 4882 4882 607.80 563.23 inf
6102 6102 6102 604.80 591.99 inf
magma-1.2.0/testing> ./testing_sgemm
device 0: GeForce GTX 560 Ti, 1800.0 MHz clock, 1023.6 MB memory, capability 2.1
Usage:
testing_sgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = N transB = N
M N K MAGMA GFLop/s CUBLAS GFlop/s error
==================================================================
1024 1024 1024 511.79 441.51 inf
1280 1280 1280 532.47 515.59 1.483815e+38
1600 1600 1600 568.14 511.71 inf
2000 2000 2000 589.54 514.82 1.407334e+38
2500 2500 2500 560.17 592.95 1.616552e+38
3125 3125 3125 591.05 559.18 inf
3906 3906 3906 600.19 560.44 inf
4882 4882 4882 607.78 563.55 inf
6102 6102 6102 604.76 591.47 inf
Best Regards,
Muhammad Kashif Hanif