I have just run across the following problem:
When I use my primary GPU under Windows to do some quick tests -- it's not very fast, so this is just to check if the code runs -- the the calls to magma_zgertf (and other LU-decomposition routines) fail in some cases when the OOC routines are used. Upon investigating, I saw the following (for example in zgetrf3_ooc.cpp):
- Code: Select all
/* initialize nb */
nb = magma_get_zgetrf_nb(m);
maxm = ((m + 31)/32)*32;
/* figure out NB */
cuDeviceGet( &dev, 0);
cuDeviceTotalMem( &totalMem, dev );
totalMem /= sizeof(cuDoubleComplex);
/* number of columns in the big panel */
NB = (magma_int_t)(0.8*totalMem/maxm-h*nb);
with cuDeviceTotalMem being used to determine the memory which can be allocated on the device. As far as I can tell, this routine returns the total memory installed on the device and this causes the later allocation to fail and the magma calculation to terminate if the amount of memory already allocated is too much. Shouldn't this be a call to cuMemGetInfo or cudaGetMemInfo -- although the latter is a runtime library call.
I have given the ZGETRF example here, but the situation is similar for other OOC routines.
Thanks in advance for any comments.