MAGAMA routines and CUDA kernels

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
uumami
Posts: 3
Joined: Wed Nov 06, 2019 3:47 pm

MAGAMA routines and CUDA kernels

Post by uumami » Wed Nov 06, 2019 4:02 pm

Hello,
I have been using MAGMA(BLAS). I have been experiencing some bottlenecks in my code since some operations are performed in the cpu. Basically I perform some operations via MAGMA, bring the matrices to the host, back to the device and so forth. I have two options to speed up my code either use pthreads library or perform the operations in the GPU (they are simple comparisons/operations extremely suitable for the CUDA framework). My question is if I can access the arrays created by MAGMA routines via a CUDA kernel, perform some operations at the GPU and then either call MAGMA routines from a CUDA kernel or download them to the host and lauch the routine, and thus avoiding the overhead of multiple siple oeprations and/or the communication device-host.

I am using C, and MAGMA compiled with BLAS. The pseudocode is:

Setting matrices at cpu
for:
Matrix Multiplications via MAGMA
Download to Host
Check which coefficients are positive and negative
Depending on the result multiply each column from the matrix by a scalar (different scalar per column)

As you can see, I keep downloading everything to the host after the matrix multiplications, but I know that all the other operations are simple enough and are completely suitable for a GPU and CUDA. I would be happy if either an other matrix is created with the new matrix coefficients or the original matrix is modified from the GPU. I don't know if I can access the coefficients via pointers in the host with the CUDA kernels, or how they behave.

I am using double precision routines.

Hope my explanation is not a mess. Thanks for your time!!!!

mgates3
Posts: 902
Joined: Fri Jan 06, 2012 2:13 pm

Re: MAGAMA routines and CUDA kernels

Post by mgates3 » Wed Nov 06, 2019 4:31 pm

I'm not sure what you mean by "the arrays created by MAGMA routines". Do you mean arrays allocated by, say, magma_dmalloc? Yes, that's just a chunk of memory on the GPU, so you can process it equally well with MAGMA, cuBLAS, and your own custom CUDA kernels.

It sounds like checking the coefficients and multiplying columns by a scalar would be a relatively easy custom CUDA kernel to write. From that description, it doesn't seem to fit any routines that we already have available in MAGMA.

-mark

uumami
Posts: 3
Joined: Wed Nov 06, 2019 3:47 pm

Re: MAGAMA routines and CUDA kernels

Post by uumami » Wed Nov 06, 2019 5:16 pm

mgates3 wrote:
Wed Nov 06, 2019 4:31 pm
I'm not sure what you mean by "the arrays created by MAGMA routines". Do you mean arrays allocated by, say, magma_dmalloc? Yes, that's just a chunk of memory on the GPU, so you can process it equally well with MAGMA, cuBLAS, and your own custom CUDA kernels.

It sounds like checking the coefficients and multiplying columns by a scalar would be a relatively easy custom CUDA kernel to write. From that description, it doesn't seem to fit any routines that we already have available in MAGMA.

-mark
Yes, I want to access the numbers stored in the GPU memory via pointers. Say I have a magma_dmalloc pointer "p" where I store the result of matrix multiplication. I want to check the coefficients from "p" and apply some operations to t them given some criteria. And then repeat the MAGMA routines.
+Should I be worried about the asynchronous process, I don't want to call the MAGMA routines before all the operations from the CUDA kernels are finished.
+Finally, can I access "p" as if it was assigned by cudaMalloc?

Thanks for the super fast response!

mgates3
Posts: 902
Joined: Fri Jan 06, 2012 2:13 pm

Re: MAGAMA routines and CUDA kernels

Post by mgates3 » Thu Nov 07, 2019 2:47 pm

Yes, magma_dmalloc is just a wrapper around cudaMalloc. It is type-safe (you don't need to use sizeof(double) as you do with cudaMalloc), but otherwise nothing special going on.

If you call asynchronous MAGMA routines that take a magma_queue, use the stream from the magma_queue to call CUDA functions to have them execute on the same stream. (See magma_queue_get_cuda_stream.) magma_queue is just a simple struct wrapping a CUDA stream and cuBLAS handle. Or you can explicitly synchronize after the MAGMA function using magma_queue_sync.

If you call MAGMA routines that don't take a stream, those are generally synchronous — they don't return until the computation is done.

-mark
Last edited by mgates3 on Thu Nov 07, 2019 4:04 pm, edited 1 time in total.
Reason: clarify "magma_dmalloc", not "magma_malloc", is typesafe.

uumami
Posts: 3
Joined: Wed Nov 06, 2019 3:47 pm

Re: MAGAMA routines and CUDA kernels

Post by uumami » Thu Nov 07, 2019 8:09 pm

Thanks for the help. I really appreciate it, you are really kind. I will experiment with the cuda kernels!

Post Reply