Incorrect determination of free memory for OOC GETRF

Open discussion for MAGMA

Incorrect determination of free memory for OOC GETRF

Postby evanlezar » Tue Jun 05, 2012 2:25 pm

Hi,

I have just run across the following problem:
When I use my primary GPU under Windows to do some quick tests -- it's not very fast, so this is just to check if the code runs -- the the calls to magma_zgertf (and other LU-decomposition routines) fail in some cases when the OOC routines are used. Upon investigating, I saw the following (for example in zgetrf3_ooc.cpp):
Code: Select all
    /* initialize nb */
    nb = magma_get_zgetrf_nb(m);
    maxm = ((m  + 31)/32)*32;

    /* figure out NB */
    cuDeviceGet( &dev, 0);
    cuDeviceTotalMem( &totalMem, dev );
    totalMem /= sizeof(cuDoubleComplex);
   
    /* number of columns in the big panel */
    NB = (magma_int_t)(0.8*totalMem/maxm-h*nb);

with cuDeviceTotalMem being used to determine the memory which can be allocated on the device. As far as I can tell, this routine returns the total memory installed on the device and this causes the later allocation to fail and the magma calculation to terminate if the amount of memory already allocated is too much. Shouldn't this be a call to cuMemGetInfo or cudaGetMemInfo -- although the latter is a runtime library call.

I have given the ZGETRF example here, but the situation is similar for other OOC routines.

Thanks in advance for any comments.

Regards
Evan
evanlezar
 
Posts: 33
Joined: Tue Aug 25, 2009 7:20 pm
Location: Stellenbosch, South Africa

Re: Incorrect determination of free memory for OOC GETRF

Postby jah87 » Wed Jun 06, 2012 2:50 pm

Not sure if it's related, but I'm getting a segmentation fault at line 248 in dgetrf_gpu.cpp
Code: Select all
magma_free_host( work );

work is an internal variable, so I don't think I'm screwing it up, but I wouldn't put it past me.
jah87
 
Posts: 21
Joined: Tue May 01, 2012 1:54 pm


Return to User discussion

Who is online

Users browsing this forum: Google [Bot], mgates3 and 2 guests