Dear Mr. Stan Tomov,
I have compiled the testing_sgemm and executed it on my GTX 480M using Visual Studio 2010.
Because I've run into problems building magma 1.1.0 with Visual Studio, I ended up just grabbing from the library
testing_sgemm and sgem_fermi*.* files - just for the performance testing purposes (I couldn't get the whole thing to build, but when we start using magma for the project, I will need to build the whole thing with Visual Studio)
However, I am consistently getting the following results:
test_Sgemm_fermi(1024x1024): dSeconds=0.624246, gflops=3.440124
test_cublasSgemm(1024x1024): dSeconds=0.006688, gflops=321.117214
test_Sgemm_fermi(1280x1280): dSeconds=1.241110, gflops=3.379477
test_cublasSgemm(1280x1280): dSeconds=0.011544, gflops=363.324047
test_Sgemm_fermi(1600x1600): dSeconds=2.298245, gflops=3.564459
test_cublasSgemm(1600x1600): dSeconds=0.022207, gflops=368.893005
test_Sgemm_fermi(2000x2000): dSeconds=3.159041, gflops=5.064828
I have tried the following variations of the sgemm_fermi functions: magmablas_sgemm_fermi64 and magmablas_sgemm_fermi80
but the performance was similar.
Apparently I am doing something wrong. Ideally I would like to get the reported performance of >800 GFlops (or perhaps smaller for GTX 480M).
Could you, please help me?