Some questions on testing_sgetrf

Open discussion for MAGMA

Some questions on testing_sgetrf

Postby danilo » Thu Oct 29, 2009 6:31 am

Hi,
i'm trying Magma, in particolular I have 2 questions on the example testing_sgetrf.

1) I use two different GPU, 8400m GS and 9500GT. The program runs ok on 9500GT but when I try it on 8400m GS I have a segmentation fault (note that I run examples with -N 128 because the small memory of this GPU). In particoular exploring the code and debugging it with some printf I saw that the problem is when I do cudaMallocHost and then I set h_R]. The setting of h_R values produces segmentation fault. Why?

2) When I run the code on 9500 GT now, I have always NaN on the error column of the output (||PA-LU||/||A||*N). In particoular if I explore the code the NaN is in the value of "residual" parameter in "get_LU_error" function. What does it mean?

Thanks a lot for your answers.

Danilo
danilo
 
Posts: 7
Joined: Thu Oct 29, 2009 6:20 am

Re: Some questions on testing_sgetrf

Postby danilo » Thu Oct 29, 2009 12:21 pm

I add some informations:

I run code on Ubuntu 9 - 32 bit. I use standard lapack and blas downloaded with package manager Synaptic. I have no compilation error and the NaN result I obtain is due to very high value returned from residual parameter in "get_LU_error". If I modify the code in order to display the error of Lapack CPU version of sgetrf I obtain about 10e-10, so standard-CPU sgetrf works fine, I think that there is some problem in the results of magma_sgetrf. Do you know possible causes of the problem? Thanks.
danilo
 
Posts: 7
Joined: Thu Oct 29, 2009 6:20 am

Re: Some questions on testing_sgetrf

Postby Stan Tomov » Thu Oct 29, 2009 2:30 pm

Hello,
A user had a similar problem before and in that case updating the driver fixed it. You can run an older cuda on a new driver ( for example CUDA 2.1 on 190 driver) but not vice-versa. For example

CUDA 2.3 requires 190.xx
CUDA 2.2 requires 185.xx
CUDA 2.1 requires 180.xx

You can check your driver with
Code: Select all
> cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  190.18  Wed Jul 22 15:36:09 PDT 2009
GCC version:  gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)

The above is the result on my system and it tells me that the driver is 190.18. On my system I have CUDA 2.3 so the combination is fine.

What driver and CUDA do you have? Also, did you take the 32-bit version of MAGMA? In addition, if you run the LU on size <= 128 we call directly the LAPACK implementation and the GPU is not used (i.e. MAGMA is more like a wrapper in that case to call the LAPACK+BLAS combination that is on the system).
Stan
Stan Tomov
 
Posts: 247
Joined: Fri Aug 21, 2009 10:39 pm

Re: Some questions on testing_sgetrf

Postby danilo » Thu Oct 29, 2009 6:56 pm

Hi,
thanks Stan for your reply. I solved the first problem updating the drivers to 185.xx (I'm using CUDA 2.2). But I have not already resolved the second problem. magma_sgetrf seems not working properly and it gives wrong results. I have too high values for error and the output is Nan.

Where is the problem?

I'm using the 32 bit magma and magmablas library, no compilation error.
danilo
 
Posts: 7
Joined: Thu Oct 29, 2009 6:20 am

Re: Some questions on testing_sgetrf

Postby danilo » Fri Oct 30, 2009 5:46 am

Hi,
another feedback: I also install cuda 2.3 with 190.xx driver, I tried magma as with standard lapack and blas as with ACML Package using included make.inc. No compilation error, but when I run "testing_sgetrf" I have Nan as result of "get_LU_error" function. I tried also "testing_sgetrf_gpu" and print the residual value for different n1 (as in the code) but I have always Nan.
danilo
 
Posts: 7
Joined: Thu Oct 29, 2009 6:20 am

Re: Some questions on testing_sgetrf

Postby Stan Tomov » Fri Oct 30, 2009 9:25 am

Hi,
It looks like we have to recompile a few CUDA kernels for your system. To make sure that's the problem, are the other functions O.K., e.g. what do you get when running testing_sgeqrf?
Thanks,
Stan
Stan Tomov
 
Posts: 247
Joined: Fri Aug 21, 2009 10:39 pm

Re: Some questions on testing_sgetrf

Postby danilo » Fri Oct 30, 2009 9:45 am

Hi Stan,
this is my hardware situation:

PC1 - Notebook - Ubuntu 9-32bit - Nvidia 8400M-GS --> compilation OK, "magma_sgetrf" gives LU matrix with many Nan and Inf values. The other functions "magma_sgeqrf" and "magma_spotrf" RUN PERFECTY (I think, because the error is very low, comparable to the results of the .txt files)

PC2 - Desktop - Ubuntu 8-32bit - Nvidia 9500-GT --> compilation OK, "magma_sgetrf" gives LU matrix with many Nan and Inf values. I don't run yet the other functions, I can next week.

Thanks for your help.

Danilo
danilo
 
Posts: 7
Joined: Thu Oct 29, 2009 6:20 am

Re: Some questions on testing_sgetrf

Postby Stan Tomov » Sat Oct 31, 2009 11:45 pm

Just for the record of this topic, recompiling the MAGMA CUDA kernels for this specific configuration fixed the problem.
Stan Tomov
 
Posts: 247
Joined: Fri Aug 21, 2009 10:39 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 8 guests

cron