MAGMA SVD implementation on GPUs?

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

MAGMA SVD implementation on GPUs?

Postby edmundberry » Tue Sep 29, 2015 7:17 pm

Hello MAGMA experts,

What is the status of MAGMA's SVD implementation regarding GPUs?

- Does Magma use GPUs to implement SVD? It looks like MAGMA's dgesvd function only accepts CPU pointers for arguments (or maybe I have misinterpreted the documentation):
http://icl.cs.utk.edu/projectsfiles/mag ... river.html

- If I have a device pointer (pointer to a matrix on a GPU device), do I have to copy it back to the host (CPU) before I can use MAGMA's SVD functionality?

- Regardless, is there example code for dgesvd? I couldn't find any example code online.

Thank you!

Best,
Edmund
edmundberry
 
Posts: 1
Joined: Tue Sep 29, 2015 7:11 pm

Re: MAGMA SVD implementation on GPUs?

Postby mgates3 » Wed Sep 30, 2015 3:06 pm

Yes, it is GPU accelerated. The input matrix is given in CPU memory, but internally gets copied to the GPU for computation.

Unfortunately, this means currently if your matrix is on the GPU, you have to copy it to the CPU before calling magma_dgesvd.

There are examples in magma/testing/testing_dgesvd.cpp and testing_dgesdd.cpp. I recommend using dgesdd (divide and conquer) instead of dgesvd (QR iteration), as dgesdd is faster in both MAGMA and LAPACK when computing singular vectors.

-mark
mgates3
 
Posts: 736
Joined: Fri Jan 06, 2012 2:13 pm

Re: MAGMA SVD implementation on GPUs?

Postby RGomez » Wed Aug 10, 2016 4:18 pm

Hello,

I found it more appropriate to continue the discussion here rather than opening a new topic. In my case, I am using magma_dgesdd and it works when I "magma_malloc_cpu" all the arguments, but if fails if I have them in the GPU. My question is which arguments (if any) should be passed from the GPU in order to make the running time optimal.


Originally my A is in the CPU, but I was trying to send it to the GPU as well as the workspace variable "work", before calling magma_dgesdd.

Otherwise it looks a bit strange to believe that the calculations are done in the GPU if the workspace is in the CPU (maybe you can give an heuristical explanation of how this works?)


Thanks a lot!
RGomez
 
Posts: 3
Joined: Wed Aug 10, 2016 4:01 pm

Re: MAGMA SVD implementation on GPUs?

Postby mgates3 » Thu Aug 11, 2016 8:42 am

magma_dgesdd takes all its arguments on the CPU. It simply replaces lapack's dgesdd.

MAGMA is a hybrid CPU + GPU library. Some of its calculations are done on the CPU, so it needs workspace there. It also relies on some routines from LAPACK, such as dbdsdc (divide-and-conquer), which need CPU workspace. MAGMA internally allocates additional memory on the GPU.

Generally, MAGMA routines with no suffix take their input arguments in CPU memory, while routines with a _gpu suffix take (at least some of) their arguments in GPU memory. Generally, variables prefixed with "d" are on the GPU device, such as "dA" (on GPU) vs. "A" (on CPU). See the documentation:

http://icl.cs.utk.edu/projectsfiles/mag ... tines.html
http://icl.cs.utk.edu/projectsfiles/mag ... ables.html

-mark
mgates3
 
Posts: 736
Joined: Fri Jan 06, 2012 2:13 pm

Re: MAGMA SVD implementation on GPUs?

Postby RGomez » Thu Aug 11, 2016 1:43 pm

Ok, I guess I was missinterpreting the documentation!

Thank you
RGomez
 
Posts: 3
Joined: Wed Aug 10, 2016 4:01 pm

Re: MAGMA SVD implementation on GPUs?

Postby cdeterman » Wed Aug 24, 2016 3:25 pm

mgates3 wrote:Yes, it is GPU accelerated. The input matrix is given in CPU memory, but internally gets copied to the GPU for computation.

Unfortunately, this means currently if your matrix is on the GPU, you have to copy it to the CPU before calling magma_dgesvd.

-mark


If the input matrix is internally copied to the GPU for computation wouldn't it be relatively simple to create an additionial function with _gpu suffix that passes in an existing gpu matrix and omit the copy part? Or are the internals more complex that they require the matrix to be on the CPU at different times.
cdeterman
 
Posts: 10
Joined: Wed Aug 24, 2016 2:39 pm

Re: MAGMA SVD implementation on GPUs?

Postby mgates3 » Wed Aug 24, 2016 8:20 pm

The SVD is a rather complex code. It doesn't just allocate dA on the GPU, copy A to dA, then do work. It extensively use other routines like geqrf, gebrd, gelqf, unmqr, ungqr, bdsdc, etc. We would need to replace all of those with _gpu variants. Some already exist; some do not. Some do not yet even have GPU-accelerated implementations yet (like bdsdc).

So it's a good goal to have, and we may eventually get there, but it isn't trivial.

-mark
mgates3
 
Posts: 736
Joined: Fri Jan 06, 2012 2:13 pm

Re: MAGMA SVD implementation on GPUs?

Postby RGomez » Thu Aug 25, 2016 1:08 pm

Yes, it's highly non-trivial. Right now im running tests for Magma dgesdd and it's surprising to find out that is more slow than a regular Maple (Lapack based) computation. I wonder if I'm doing something wrong or the code for SVD is not really optimized for GPU yet.


Code: Select all
~/magma-2.0.2/testing$ ./testing_dgesdd
% MAGMA 2.0.2  compiled for CUDA capability >= 2.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 7050, driver 7050. OpenMP threads 8. MKL 11.2.3, MKL threads 4.
% device 0: GeForce GTS 450, 1764.0 MHz clock, 1023.2 MB memory, capability 2.1
% Thu Aug 25 12:38:13 2016
% Usage: ./testing_dgesdd [options] [-h|--help]

% jobz   M     N  CPU time (sec)  GPU time (sec)   |S1-S2|   |A-USV^H|   |I-UU^H|/M   |I-VV^H|/N   S sorted
%==========================================================================================================
   N  1088  1088    ---              0.49            ---   
   N  2112  2112    ---              2.10            ---   
   N  3136  3136    ---              6.45            ---   
   N  4160  4160    ---             14.64            ---   
   N  5184  5184    ---             27.89            ---   
   N  6208  6208    ---             47.25            ---   
   N  7232  7232    ---             75.09            ---   



While Maple times (not even GPU time!) are:

Code: Select all
Maple time: n= 1100     0.285000
Maple time: n= 2100     2.770000
Maple time: n= 3100     2.959000
Maple time: n= 4100     6.757000
RGomez
 
Posts: 3
Joined: Wed Aug 10, 2016 4:01 pm

Re: MAGMA SVD implementation on GPUs?

Postby mgates3 » Thu Aug 25, 2016 5:52 pm

GeForce cards are designed for graphics and gaming, which primarily use single-precision. Their support for double-precision math is slow -- perhaps 8x slower than single-precision -- while with a high-end Tesla card, double would only be 2x slower than single (same as CPU). You may have better results with single-precision.

BTW, you can add -l or --lapack flag to get LAPACK CPU times from testing_dgesdd or testing_sgesdd.

-mark
mgates3
 
Posts: 736
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 2 guests

cron