Page 1 of 1

how to release GPU process

Posted: Thu Aug 27, 2020 9:43 pm
by tmalas
When multiple processes starts to use GPU for magma_ccgetrf, we observe crashes inside magma_cgetrf. As a possible remedy, we decided to use GPU exclusively, i.e., one process at a time. However, in my process, magma_cgetrf is used around 20%-30% of the whole execution time. Matrix setup then solve by magma_ccgetrf is performed many times, and I want to release GPU after magma_cgetrf so that if there are any other process they use GPU during next setup. However, once my process starts GPU by magma_init, GPU is never released even though I call magma_finalize after magma_cgetrf. I check this by "nvidia-smi --query-compute-apps=pid --format=csv,noheader". Is this expected and is there a way to accomplish my goal?

Re: how to release GPU process

Posted: Fri Aug 28, 2020 10:16 am
by Stan Tomov
MAGMA is thread-safe when different routines are called from the same process (e.g., through different threads). Using MAGMA from different processes should have even less potential problems - at least non that we have heard of. If there is a problem it is usually the GPU setting and you may have to look at some log files for hint what went wrong. Also check that MPS service is available and in what mode, not to exceed specified number of MPIs using it, etc. Here is a link to MPS:
https://docs.nvidia.com/deploy/mps/index.html
Check maybe MPS KNOWN ISSUES (https://docs.nvidia.com/deploy/mps/index.html#topic_5_4) and the Common Tasks section (https://docs.nvidia.com/deploy/mps/index.html#topic_6) on how to set and test these services.
Hope this will help,
Stan