magma_dsyevd_gpu performance

Open discussion for MAGMA

magma_dsyevd_gpu performance

Postby luiceur » Tue Apr 30, 2013 11:32 am

I was wondering about the results showed from the testing_dsyevd_gpu file. Does anyone know if the result would be much difference using mkl with multiple threads?
Does it mean it's not worth using the gpu version with matrices N< 10000 ?
Attachments
dsyevd.png
Benchmark
dsyevd.png (33.37 KiB) Viewed 903 times
luiceur
 
Posts: 26
Joined: Tue Jul 10, 2012 4:38 am

Re: magma_dsyevd_gpu performance

Postby Stan Tomov » Wed May 01, 2013 6:00 pm

Hello,
It depends on your CPU/GPU configuration and the GPU BLAS used. If I use MAGMA BLAS's dsymv, the results on 16 Sandy Bridge CPU cores @2.6 GHz vs. a K20c GPU (13 x 192 CUDA cores @ 0.7 GHz) would look like this
Code: Select all
tomov:bunsen ~/trunk/testing> ./testing_dsyevd -l                                                       
MAGMA 1.3.0
device 0: Tesla K20c, 705.5 MHz clock, 4799.6 MB memory, capability 3.5
device 1: Tesla K20c, 600.0 MHz clock, 3839.6 MB memory, capability 3.5
Usage: ./testing_dsyevd [options] [-h|--help]

    N   CPU Time (sec)   GPU Time (sec)
=======================================
 1088      0.14             0.16
 2112      0.56             0.82
 3136      1.14             1.76
 4160      3.24             3.31
 5184      7.81             5.43
 6208     14.18             8.33
 7232     23.55            11.99
 8256     36.93            16.53
 9280     48.98            22.31
10304     60.14            29.23

This is about 4x faster than what you get and 2x faster than 16 Sandy Bridge cores with braking point at about 4K. GPUs can not beat the CPUs at smaller sizes because this particular algorithm is memory bound for large matrices and the matrices up to 4K fit almost entirely in certain caches on the CPU.

If I run with dsymv from CUBLAS the result is
Code: Select all
tomov:bunsen ~/trunk/testing> ./testing_dsyevd -l                                                     
MAGMA 1.3.0
device 0: Tesla K20c, 705.5 MHz clock, 4799.6 MB memory, capability 3.5
device 1: Tesla K20c, 600.0 MHz clock, 3839.6 MB memory, capability 3.5
Usage: ./testing_dsyevd [options] [-h|--help]

    N   CPU Time (sec)   GPU Time (sec)
=======================================
 1088      0.13             0.14
 2112      0.45             0.80
 3136      1.21             2.10
 4160      3.26             4.30
 5184      6.85             7.67
 6208     11.27            12.57
 7232     18.33            18.95
 8256     28.74            27.16
 9280     44.57            37.61
10304     58.76            50.56


To get the faster result you have to go to file src/dsytrd.cpp and change
Code: Select all
//#define FAST_HEMV

to
Code: Select all
#define FAST_HEMV

and recompile.

Even if you are not running on Kepler you should see results close to 30 sec on matrices of size 10K in double precision.

Stan
Stan Tomov
 
Posts: 251
Joined: Fri Aug 21, 2009 10:39 pm

Re: magma_dsyevd_gpu performance

Postby luiceur » Mon May 06, 2013 5:59 am

Hi, Yes, you are right. For some reason I misused my data. I'll post my new plots later so somebody else reading the forum ould see some results.
However, you said that you run dsyevd from CUBLAS? I thought CUBLAS did not have eigensolvers...
luiceur
 
Posts: 26
Joined: Tue Jul 10, 2012 4:38 am


Return to User discussion

Who is online

Users browsing this forum: Yahoo [Bot] and 1 guest

cron