dot slower in gpu

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

dot slower in gpu

Postby thanasis_giannis » Thu Nov 09, 2017 2:58 pm

So I found out that magma_?dot() seems to be a lot slower than the cpu code...Can anyone tell me why this is happening?

Thank you!
thanasis_giannis
 
Posts: 9
Joined: Thu Aug 24, 2017 7:35 am

Re: dot slower in gpu

Postby mgates3 » Thu Nov 09, 2017 4:22 pm

Please be more specific about what CPU and GPU you are using, what CPU and GPU software, and what size and precision vectors. Specific output from a tester would be helpful.

For small to modest size vectors, I would expect the CPU to be faster -- especially if the vectors are in cache memory.
For large vectors, say several times the size of cache, I would expect the GPU with its faster memory to be faster.

Currently, MAGMA does not have a specific dot tester, since we use cuBLAS dot, but recent revisions available from Bitbucket do include an axpy tester, which should give similar performance to dot, and exemplifies this crossover.
Code: Select all
bunsen magma/testing> ./testing_daxpy -n 100 -n 1000 -n 10000 -n 100000 -n 1000000
% MAGMA 2.2.0 svn compiled for CUDA capability >= 3.5, 64-bit magma_int_t, 64-bit pointer.
% CUDA runtime 7050, driver 9000. OpenMP threads 1. MKL 11.3.3, MKL threads 1.
% device 0: Tesla K40c, 745.0 MHz clock, 11439.9 MiB memory, capability 3.5
% device 1: Tesla K40c, 745.0 MHz clock, 11439.9 MiB memory, capability 3.5
% Thu Nov  9 15:17:12 2017
% Usage: ./testing_daxpy [options] [-h|--help]

%   M   cnt     cuBLAS Gflop/s (ms)       CPU Gflop/s (ms)  cuBLAS error
%===========================================================================
  100   100      0.0294 (   0.6809)      0.4877 (   0.0410)    0.00e+00   ok
 1000   100      0.3199 (   0.6251)      2.4105 (   0.0830)    0.00e+00   ok
10000   100      2.8896 (   0.6921)      1.7243 (   1.1599)    0.00e+00   ok
100000   100     11.1063 (   1.8008)      1.3741 (  14.5550)    0.00e+00   ok
1000000   100     15.3857 (  12.9991)      1.3988 ( 142.9799)    0.00e+00   ok


-mark
mgates3
 
Posts: 750
Joined: Fri Jan 06, 2012 2:13 pm

Re: dot slower in gpu

Postby thanasis_giannis » Thu Nov 09, 2017 4:56 pm

I am using a i7-3770K CPU @ 3.50GHz with Tesla K40c with linux ubuntu. The vectors are 10000 and I am not timing data transfers. I am using magma_ddot as part of a bigger code and I see that If i use the cpu code is 5 times faster than the gpu code

dot is called many times and the cumulative times are

gpu: 7.19 seconds
cpu: 1.30 seconds

I know i shouldn t expect good performance because dot doesn t have many computations, but getting 5 times slower i think is strange
thanasis_giannis
 
Posts: 9
Joined: Thu Aug 24, 2017 7:35 am

Re: dot slower in gpu

Postby thanasis_giannis » Sat Nov 11, 2017 5:13 am

Actually I timed the specific function (dot) every timed it was called in a certain function and in cpu us way faster from gpu
thanasis_giannis
 
Posts: 9
Joined: Thu Aug 24, 2017 7:35 am

Re: dot slower in gpu

Postby thanasis_giannis » Sat Nov 11, 2017 9:45 am

So I did measured dot in a seperate file and indeed starts to have a good performance for vectors of 50 000 elements..that s why i get bad timings i didn t test my code for 50 000 elements...
thanasis_giannis
 
Posts: 9
Joined: Thu Aug 24, 2017 7:35 am

Re: dot slower in gpu

Postby mgates3 » Mon Nov 13, 2017 4:07 am

A vector of length 10000 double precision values takes only 78 KiB. If you are calling dot many times on the same vector, then it will stay in L2 or even L1 cache, so the CPU will be quite fast.

On the other hand, the GPU has a hard time parallelizing a modest size vector like that. If it uses one thread block, it is limited to one SMX out of 15 SMX. If it uses 15 thread blocks, each thread block has only 667 elements to reduce, then it has to synchronize the thread blocks somehow and do another reduction.

Note that magma_ddot is simply a wrapper around cublasDdot, used for portability to other platforms.

-mark
mgates3
 
Posts: 750
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 2 guests