qr factorization magma_sgeqrf_gpu has a GPU memory leak

Open discussion for MAGMA

qr factorization magma_sgeqrf_gpu has a GPU memory leak

Postby napl » Thu Jan 30, 2014 4:20 pm

I'm using magma_sgeqrf_gpu to do QR factorization on a large matrix, and the function does not appear to free cuda memory after it is done. My code tries to allocate more GPU memory after the magma_sgeqrf_gpu call but it fails stating that the GPU is out of memory. I tried using the cudaMemGetInfo function to see how much memory is left over after the call, but it gives a segmentation fault when I run it. The function works when it is called before the magma_sgeqrf_gpu function call. Has anyone encountered this GPU memory leak in the magma_sgeqrf_gpu function and is there a solution?
napl
 
Posts: 3
Joined: Thu Jan 30, 2014 4:10 pm

Re: qr factorization magma_sgeqrf_gpu has a GPU memory leak

Postby napl » Thu Jan 30, 2014 6:42 pm

I just learned that the magma_sgeqrf_gpu function is returning the error MAGMA_ERR_HOST_ALLOC. This occurs where the function tries to allocate pinned memory. I might try changing to this to a regular host memory allocation and see if that allows the allocation to succeed.
napl
 
Posts: 3
Joined: Thu Jan 30, 2014 4:10 pm

Re: qr factorization magma_sgeqrf_gpu has a GPU memory leak

Postby mgates3 » Sun Feb 02, 2014 3:16 pm

How big is your matrix? How much CPU and GPU RAM do you have?
It's very surprising to have malloc_pinned fail -- seems to imply that you are running out of physical RAM to hold the CPU workspace, which is only (m + n + nb)*nb, not the whole m*n matrix. It would be helpful to try to replicate the problem using the magma tester, testing/testing_sgeqrf_gpu, and provide the complete input & output of the tester here.

From a cursory examination, magma_sgeqrf_gpu does not appear to allocate any GPU memory (other than for streams/queues), since the matrix was already passed in on the GPU.

-mark
mgates3
 
Posts: 427
Joined: Fri Jan 06, 2012 2:13 pm

Re: qr factorization magma_sgeqrf_gpu has a GPU memory leak

Postby napl » Mon Feb 03, 2014 12:36 pm

The (float) matrix that I'm allocating is 4847595 by 24

for workl = (m + n + nb)*nb I'm getting (4847595 + 24 + 512)*512 = 2482200000; however, a regular int cant hold that many digits so the result comes out as -1812736512. I changed the int for "workl" to an unsigned long int but it still doesn't get through the function and I get a segmentation fault.

I have 16 GB of CPU ram and 6 GB of GPU ram
napl
 
Posts: 3
Joined: Thu Jan 30, 2014 4:10 pm

Re: qr factorization magma_sgeqrf_gpu has a GPU memory leak

Postby mgates3 » Mon Feb 03, 2014 6:44 pm

Several points.
1) You can make magma_int_t into 64-bit. You need to link with an ilp64 BLAS and LAPACK library. See make.inc.mkl-ilp64.

2) MAGMA will probably not help with such a tall skinny matrix. MAGMA does a panel factorization on the CPU, followed by updating the trailing matrix on the GPU. The panel size depends on the matrix size, but is always >= 32. Since your entire matrix is less columns than that, it will do the entire factorization on the CPU and no work on the GPU. You could change the nb to something small like 8, but I think it would get poor performance. See control/get_nb.cpp.

You could transpose the matrix and then do QR, resulting in LQ^T of the original matrix. That should be fast with MAGMA. (Sadly, doing LQ of the transposed matrix won't help; MAGMA's LQ does a transpose and QR.)

3) PLASMA might be a better option for a tall skinny matrix, using multi-core CPUs. It has a hierarchical QR function to achieve parallelism.
http://icl.cs.utk.edu/plasma/

-mark
mgates3
 
Posts: 427
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: Bing [Bot] and 1 guest

cron