[SOLVED] magma_dprint(_gpu) and strange CuRand behavior

Open discussion for MAGMA

[SOLVED] magma_dprint(_gpu) and strange CuRand behavior

Postby ReL » Wed Jun 20, 2012 9:02 am

Hello,

I have noticed a really strange behavior when the function magma_dprint(_gpu) is used along with NVIDIA CuRand library (random number generation on GPU).

I create a uniform random number generator with a fixed seed, I use it to create 4 random vectors with 5 elements each and print them using magma_dprint_gpu. I do the same operation again with the same seed.

Here is the corresponding output:
Code: Select all
1:
[
   0.5130   0.1024   0.0680   0.0562
   0.8300   0.4196   0.6611   0.8106
   0.0058   0.7963   0.3174   0.2008
   0.2229   0.7701   0.3577   0.3821
   0.5795   0.3953   0.3689   0.9635
];

2:
[
   0.5130   0.5130   0.1024   0.0680
   0.8300   0.8300   0.4196   0.6611
   0.0058   0.0058   0.7963   0.3174
   0.2229   0.2229   0.7701   0.3577
   0.5795   0.5795   0.3953   0.3689
];


If I replace the first call to magma_dprint_gpu with my own implementation, the two matrices are the same as expected:
Code: Select all
1:
-----
   0.5130   0.1024   0.0680   0.0562
   0.8300   0.4196   0.6611   0.8106
   0.0058   0.7963   0.3174   0.2008
   0.2229   0.7701   0.3577   0.3821
   0.5795   0.3953   0.3689   0.9635
-----

2:
[
   0.5130   0.1024   0.0680   0.0562
   0.8300   0.4196   0.6611   0.8106
   0.0058   0.7963   0.3174   0.2008
   0.2229   0.7701   0.3577   0.3821
   0.5795   0.3953   0.3689   0.9635
];


Since the only difference between my implementation of dprint_gpu and magma_dprint_gpu is the check is_devptr, I will guess that this function somehow is responsible for the problem.

I have attached my sample code:
testDprint.cpp
sample code
(2.11 KiB) Downloaded 96 times


Any idea will be appreciated! :) If you think the problem is in CuRand, I can report the problem to them.

Thanks in advance!

Rémi
Last edited by ReL on Fri Jun 22, 2012 4:07 am, edited 1 time in total.
ReL
 
Posts: 13
Joined: Tue Jun 05, 2012 7:20 am
Location: France

Re: magma_dprint(_gpu) and strange CuRand behavior

Postby mgates3 » Thu Jun 21, 2012 5:03 pm

I can confirm the problem, and think I found the solution. Indeed, you were looking in the right direction.

First, notice the columns are shifted over 1 column.
Second, check the error return values of all CUDA, CUBLAS, and CURAND functions. In this case, curandGenerateUniformDouble(...) was giving a 202 error on the first call in the second generate(...). That error we find in curand.h is:

CURAND_STATUS_PREEXISTING_FAILURE = 202, ///< Preexisting failure on library entry

This is why everything was shifted -- the first column had a failure, so wasn't filled in, but the 2nd and subsequent columns filled in okay, starting from the seed value.

The only error that previously occurred was when is_devptr(...) checked the host A pointer with cudaPointerGetAttributes(...). If we add cudaGetLastError() to clear that error, everything seems to work. Here's the relevant part of is_devptr(...), which is in magma_*print.cpp. (Unfortunately the function is duplicated in all 4 files; I'll see about moving it remove the duplicates.)

err = cudaPointerGetAttributes( &attr, A );
if ( ! err ) {
// definitely know type
return (attr.memoryType == cudaMemoryTypeDevice);
}
else if ( err == cudaErrorInvalidValue ) {
// clear error code
err = cudaGetLastError(); // <== added this line
// infer as host pointer
return 0;
}

I'm not sure whether this is the intended behavior of CURAND. It seems inconsistent with the behavior of other CUDA libraries like CUBLAS. E.g., if I put cublasSetVector instead of (or even just before) curandGenerateUniform, everything works fine. It may be good to check & clear errors before using CURAND.

Thanks for the nice bug report.
-mark
mgates3
 
Posts: 442
Joined: Fri Jan 06, 2012 2:13 pm

Re: magma_dprint(_gpu) and strange CuRand behavior

Postby ReL » Fri Jun 22, 2012 4:07 am

Thanks for the detailed answer.

I had a look at CURAND documention and it seems that this behavior is intended. I should have checked that before... Thanks again for the workaround.
ReL
 
Posts: 13
Joined: Tue Jun 05, 2012 7:20 am
Location: France


Return to User discussion

Who is online

Users browsing this forum: Bing [Bot] and 2 guests

cron