magma_zgesv_batched on multiple gpus simultaneously

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

magma_zgesv_batched on multiple gpus simultaneously

Postby jcc525 » Thu Apr 28, 2016 5:40 am

Hi,

I'm using magma_zgesv_batched to solve 25 matrices at once say. If I have 100 matrices to solve I need to do magma_zgesv_batched 4 times on one GPU. But if I have 4 GPUs can I run magma_zgesv_batched simultaneously on all GPUs with the different matrices?

What I have tried so far hasn't worked. It runs on each GPU serially. Is this because the routine is hybrid CPU/GPU so I would need 4 CPU threads (OpenMP?), one for each GPU? At the moment I just have 1 CPU thread so maybe this is what is wrong?

If I comment out the magma_zgesv_batched, my application (other kernels) does run on the 4 GPUs simultaneously. What I have done to move to a multi-gpu implementation is to create magma arrays for each GPU setting each GPU with cudaSetDevice(). I've also created a magma_queue for each device.

Is what I want to achieve possible? Any hints appreciated!

On a related topic, is there a sparse matrix equivalent to magma_zgesv_batched and if so can you point me to the documentation!

Many thanks,
Joe
jcc525
 
Posts: 3
Joined: Mon Apr 11, 2016 10:54 am

Re: magma_zgesv_batched on multiple gpus simultaneously

Postby mgates3 » Thu Apr 28, 2016 10:39 am

magma_zgesv_batched is a synchronous routine -- it does not return until its results are finished. So yes, you need to use 4 separate threads, one for each GPU. I think either pthreads or OpenMP would be fine. (We haven't tested using multiple threads, so there may be other issues, but at least that is required.)
-mark
mgates3
 
Posts: 750
Joined: Fri Jan 06, 2012 2:13 pm

Re: magma_zgesv_batched on multiple gpus simultaneously

Postby jcc525 » Thu Apr 28, 2016 1:17 pm

Hi Mark,

Thanks for the tip. I've used OpenMP to spawn the same number of threads as GPUs, using their ID to set the GPU context and all the arrays are already specified on each device so the thread ID can be used to identify the correct arrays in the call to magma_zgesv_batched. This is working. Many thanks for the advice.

As a further question, I was wondering if there were any magma routines for a sparse version of magma_zgesv_batched and if so could you point me in the right direction to learn more about it please?

Thanks,
Joe
jcc525
 
Posts: 3
Joined: Mon Apr 11, 2016 10:54 am

Re: magma_zgesv_batched on multiple gpus simultaneously

Postby mgates3 » Sat Apr 30, 2016 12:37 pm

There are sparse routines for solving a system, but not batched. If the problems are small (say, 100 x 100), then likely using a dense direct method will be faster than a sparse iterative method (whether batched or not). See:

http://icl.cs.utk.edu/projectsfiles/mag ... lvers.html

-mark
mgates3
 
Posts: 750
Joined: Fri Jan 06, 2012 2:13 pm

Re: magma_zgesv_batched on multiple gpus simultaneously

Postby jcc525 » Fri May 06, 2016 11:31 am

Hi Mark,

Thanks for the info. Typically my matrices will be 2603*2603, which I'm guessing is too big for magma_zgesv_batched anyway? Reading I've found suggests this routine is meant for solving many small matrices and from what I've seen in the reading I've found this may be up to size 1024*1024. I have tried this size and that's fine/runs but what is the upper limit to the size of matrices magma_zgesv_batched can solve?

The number of non-zero entries in my matrices will be approximately 120,000, so they are very sparse.

Typically the number of matrices I need to solve will be very large (over 1 million).

Additionally, there is large dynamic range (the values in the matrices differ by several orders of magnitude) in my matrices and other methods I have tried like CuSolverSp routines don't work for my typical matrices. I tested matrices with the same structure with dummy variables and found the method to work but when putting in some of my typical values the solver failed. magma_zgesv_batched is the only routine I've found (so far) that works for my matrices (when I've tested with smaller matrices).

I will take a look at some of the sparse routines and see if I can get any of these to work. Thanks for the link!

Joe
jcc525
 
Posts: 3
Joined: Mon Apr 11, 2016 10:54 am

Re: magma_zgesv_batched on multiple gpus simultaneously

Postby mgates3 » Wed May 11, 2016 5:00 pm

I don't think there's an upper bound on the size for batched routines, but at some point (e.g., around 1000) it becomes faster to run them serially one after another than to run as a batch.

We haven't investigated how to optimize small sparse problems. Typically, we have looked at large problems with millions of unknowns. Your matrices may need a custom preconditioner to converge, or a sparse direct solver may work better -- but MAGMA currently only has sparse iterative solvers.

-mark
mgates3
 
Posts: 750
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: brianborchers and 3 guests