Strange slow execution of SGEMM test.

Open discussion for MAGMA

Strange slow execution of SGEMM test.

Postby gg31 » Thu Mar 15, 2012 3:50 am

Dear Mr. Stan Tomov,

I have compiled the testing_sgemm and executed it on my GTX 480M using Visual Studio 2010.
Because I've run into problems building magma 1.1.0 with Visual Studio, I ended up just grabbing from the library
testing_sgemm and sgem_fermi*.* files - just for the performance testing purposes (I couldn't get the whole thing to build, but when we start using magma for the project, I will need to build the whole thing with Visual Studio)

However, I am consistently getting the following results:
test_Sgemm_fermi(1024x1024): dSeconds=0.624246, gflops=3.440124
test_cublasSgemm(1024x1024): dSeconds=0.006688, gflops=321.117214
test_Sgemm_fermi(1280x1280): dSeconds=1.241110, gflops=3.379477
test_cublasSgemm(1280x1280): dSeconds=0.011544, gflops=363.324047
test_Sgemm_fermi(1600x1600): dSeconds=2.298245, gflops=3.564459
test_cublasSgemm(1600x1600): dSeconds=0.022207, gflops=368.893005
test_Sgemm_fermi(2000x2000): dSeconds=3.159041, gflops=5.064828

I have tried the following variations of the sgemm_fermi functions: magmablas_sgemm_fermi64 and magmablas_sgemm_fermi80
but the performance was similar.

Apparently I am doing something wrong. Ideally I would like to get the reported performance of >800 GFlops (or perhaps smaller for GTX 480M).

Could you, please help me?

Thank you,

Greg
gg31
 
Posts: 2
Joined: Thu Mar 15, 2012 3:19 am

Re: Strange slow execution of SGEMM test.

Postby Stan Tomov » Thu Mar 15, 2012 10:03 am

Greg,
Yes, something seems to be wrong. I just tested on our GTX480 (under Linux) and get
Code: Select all
tomov:ig /mnt/scratch/tomov/magma_1.1.0/testing> ./testing_sgemm            <-  9:36AM
device 0: GeForce GTX 480, 1401.0 MHz clock, 1535.6 MB memory
device 1: Tesla C870, 1350.0 MHz clock, 1535.8 MB memory
device 2: Tesla C870, 1350.0 MHz clock, 1535.8 MB memory

Usage:
  testing_sgemm [-NN|NT|TN|TT] [-N 1024]

Testing transA = N  transB = N
    M    N    K     MAGMA GFLop/s    CUBLAS GFlop/s       error
==================================================================
 1024  1024  1024       657.33           578.52         7.629395e-06
 1280  1280  1280       690.76           668.52         7.629395e-06
 1600  1600  1600       765.46           655.94         7.629395e-06
 2000  2000  2000       804.51           691.95         1.525879e-05

Is it possible that you are running on another card because CUBLAS seems to be slow
as well (e.g., is you GTX480M listed as device 0)?

I know several users have successfully compiled magma on Windows. Can you
please post the specific problems that you experience to possibly get feedback
on that.

Thanks,
Stan
Stan Tomov
 
Posts: 253
Joined: Fri Aug 21, 2009 10:39 pm

Re: Strange slow execution of SGEMM test.

Postby gg31 » Fri Mar 16, 2012 4:23 am

Mr. Stan Tomov,

Because I couldn't get magma1.1.0 to build on windows, I downloaded magma1.0.0 and built it with Visual Studio 2010.

When running testing_dgemm, I got the following results on my GTX 480M:
1024 1024 1024 6.79 74.04 316msec 0.000000e+000
1280 1280 1280 6.73 73.57 623msec 0.000000e+000
1600 1600 1600 6.76 73.79 1211msec 0.000000e+000

With the single precision I am getting 390 GFlops when using cublas.
Both 74 for double precision and 390 for single precision seem kind-of low, but more-or-less reasonable, given that the theoretical peak for GTX 480M is 890GFlops for single precision. GTX 480M (352 cores @850MHz) is a notebook version of GTX480.

However, I can't explain why magma1.0.0 runs so slowly. My GTX 480 notebook is running Windows 7. I have not tried running magma1.0.0 on GTX 480M with linux.
I have a suspicion that settings that work fine for GTX 480 do not work that well for 480M, but I still was expecting better that 6GFlops.

I ran magma1.0.0 on Tesla 2070 (LINUX) today and got the following results:

Tesla M2070, Double Precision
M N K MAGMA GFLop/s CUBLAS GFlop/s error
===================================================
1024 1024 1024 275.99 301.11 0.000000e+00
1280 1280 1280 286.54 307.28 0.000000e+00
1600 1600 1600 290.18 310.96 0.000000e+00
2000 2000 2000 276.93 307.82 0.000000e+00
2500 2500 2500 277.92 307.20 0.000000e+00
3125 3125 3125 291.10 313.00 0.000000e+00
3906 3906 3906 284.87 301.82 0.000000e+00
4882 4882 4882 289.13 289.76 5.684342e-14
6102 6102 6102 291.18 291.54 5.684342e-14
7627 7627 7627 291.78 295.09 1.136868e-13
9533 9533 9533 295.95 291.10 1.136868e-13
11916 11916 11916 296.76 296.84 0.000000e+00

Tesla M2070, Single Precision
M N K MAGMA GFLop/s CUBLAS GFlop/s error
===================================================
1024 1024 1024 545.88 611.82 0.000000e+00
1280 1280 1280 557.38 633.29 0.000000e+00
1600 1600 1600 592.21 645.60 0.000000e+00
2000 2000 2000 607.69 639.67 0.000000e+00
2500 2500 2500 577.22 642.26 0.000000e+00
3125 3125 3125 613.40 654.63 0.000000e+00
3906 3906 3906 620.09 638.77 0.000000e+00
4882 4882 4882 630.73 638.36 0.000000e+00
6102 6102 6102 627.66 637.67 0.000000e+00
7627 7627 7627 628.34 635.88 0.000000e+00
9533 9533 9533 629.41 631.87 0.000000e+00
11916 11916 11916 633.67 633.66 0.000000e+00
14895 14895 14895 632.99 632.71 0.000000e+00
18618 18618 18618 637.54 637.39 0.000000e+00




thank you for your help,

Greg
gg31
 
Posts: 2
Joined: Thu Mar 15, 2012 3:19 am


Return to User discussion

Who is online

Users browsing this forum: No registered users and 3 guests

cron