how to release GPU process

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
tmalas
Posts: 4
Joined: Wed Aug 07, 2019 2:03 pm

how to release GPU process

Post by tmalas » Thu Aug 27, 2020 9:43 pm

When multiple processes starts to use GPU for magma_ccgetrf, we observe crashes inside magma_cgetrf. As a possible remedy, we decided to use GPU exclusively, i.e., one process at a time. However, in my process, magma_cgetrf is used around 20%-30% of the whole execution time. Matrix setup then solve by magma_ccgetrf is performed many times, and I want to release GPU after magma_cgetrf so that if there are any other process they use GPU during next setup. However, once my process starts GPU by magma_init, GPU is never released even though I call magma_finalize after magma_cgetrf. I check this by "nvidia-smi --query-compute-apps=pid --format=csv,noheader". Is this expected and is there a way to accomplish my goal?

Stan Tomov
Posts: 283
Joined: Fri Aug 21, 2009 10:39 pm

Re: how to release GPU process

Post by Stan Tomov » Fri Aug 28, 2020 10:16 am

MAGMA is thread-safe when different routines are called from the same process (e.g., through different threads). Using MAGMA from different processes should have even less potential problems - at least non that we have heard of. If there is a problem it is usually the GPU setting and you may have to look at some log files for hint what went wrong. Also check that MPS service is available and in what mode, not to exceed specified number of MPIs using it, etc. Here is a link to MPS:
https://docs.nvidia.com/deploy/mps/index.html
Check maybe MPS KNOWN ISSUES (https://docs.nvidia.com/deploy/mps/index.html#topic_5_4) and the Common Tasks section (https://docs.nvidia.com/deploy/mps/index.html#topic_6) on how to set and test these services.
Hope this will help,
Stan

Post Reply