Hi,

I'm using magma_zgesv_batched to solve 25 matrices at once say. If I have 100 matrices to solve I need to do magma_zgesv_batched 4 times on one GPU. But if I have 4 GPUs can I run magma_zgesv_batched simultaneously on all GPUs with the different matrices?

What I have tried so far hasn't worked. It runs on each GPU serially. Is this because the routine is hybrid CPU/GPU so I would need 4 CPU threads (OpenMP?), one for each GPU? At the moment I just have 1 CPU thread so maybe this is what is wrong?

If I comment out the magma_zgesv_batched, my application (other kernels) does run on the 4 GPUs simultaneously. What I have done to move to a multi-gpu implementation is to create magma arrays for each GPU setting each GPU with cudaSetDevice(). I've also created a magma_queue for each device.

Is what I want to achieve possible? Any hints appreciated!

On a related topic, is there a sparse matrix equivalent to magma_zgesv_batched and if so can you point me to the documentation!

Many thanks,

Joe