BUG in MAGMA 1.1.0 in dpotrf2_ooc.cpp?

Open discussion for MAGMA

BUG in MAGMA 1.1.0 in dpotrf2_ooc.cpp?

Postby jnbntz » Fri Dec 09, 2011 2:37 pm

I'm trying to run magmaf_dpotrf from a fortran code, and I want to run OOC as well as multi-gpu. When I do so, I get an error from dpotrf of -6, which, according to the source means that the GPU memory allocation failed. When I examine dpotrf2_ooc.cpp I notice that line 171 is the first cuda driver call of "cuDeviceGet" and if I check the error code it is CUDA_ERROR_NOT_INITIALIZED. This error is thrown if cuInit(0) has not been called. If I insert "cuInit(0)" directly before this call then the code works fine.

Note that for single gpu there is no problem, but in that case the source code calls cudaStreamCreate(), which implicitly includes a call to initialize CUDA.
jnbntz
 
Posts: 1
Joined: Fri Dec 09, 2011 2:34 pm

Re: BUG in MAGMA 1.1.0 in dpotrf2_ooc.cpp?

Postby Stan Tomov » Mon Dec 12, 2011 2:33 pm

That's good to know. Thank you for finding it and pointing it out! The bug slipped in because there was no problem on any of our development/testing multiGPU systems. We had seen a problem though on a system with 8 GPUs, most probably due to this missed cuInit call.
Stan Tomov
 
Posts: 253
Joined: Fri Aug 21, 2009 10:39 pm


Return to User discussion

Who is online

Users browsing this forum: Google [Bot], Yahoo [Bot] and 1 guest