by scho » Wed Mar 16, 2011 7:27 am
Thank you very much, Stan.
I summarized what I tried according to your guidance as follows:
The GTX580 must show excellent performance in single precision; I would be interested to see it; of course if we manage to find the problem first!
From what you report so far it looks like there may be some incompatibility between the GPU, CUDA used and the driver. I would first check if you can run programs from the SDK. For example, what do you get after running deviceQuery. Run some of the other programs in the SDK to make sure the installation is correct, e.g., matrixMul and bandwidthTest.
---> What I have tried so far worked fine.
If everything is fine with the SDK you can move to the magma testing directory. Make sure you used the same CUDA (sometimes system administrators may have installed several) and nvcc as in the SDK. Which CUDA do you use and which driver (e.g., 'cat /proc/driver/nvidia/version' will give you the driver).
--> I reinstalled the driver which is in the download page of cudatooklit_3.2.16_linux_64_rhel5.5 & gpu..sdk_3.2.16.
--> 'cat /proc/driver/nvidia/version' yields
NVRM version: NVIDIA UNIX x86_64 Kernel Module 260.19.26 Mon Nov 29 00:53:44 PST 2010
GCC version: gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)
You can first check that the LAPACK and BLAS installations are fine. You can run for example
./testing_zgetrf -M 20 -N 20
This is a small problem that will not try to use the GPU. All the code executed will be on the CPU using LAPACK. If you get ||PA-LU||/(||A||*N) to be zero your LAPACK most probably is not the problem.
--> device 0: GeForce GTX 580, 1564.0 MHz clock, 1535.2 MB memory
testing_zgetrf -M 20 -N 20
M N CPU GFlop/s GPU GFlop/s ||PA-LU||/(||A||*N)
============================================================
20 20 0.01 1.40 8.655930e-18
Next try some entirely GPU kernels, e.g.,
./testing_sgemm -M 200 -N 200 -K 200
--> device 0: GeForce GTX 580, 1564.0 MHz clock, 1535.2 MB memory
!!!! cublasAlloc failed for: d_A
testing_sgemm -M 20 -N 20
device 0: GeForce GTX 580, 1564.0 MHz clock, 1535.2 MB memory
!!!! cublasAlloc failed for: d_A
Next you can try some of the hybrid codes, e.g., the one that you mentioned is not working
./testing_zgetrf_gpu -M 1 -N 1
This call takes the 1x1 matrix from the GPU and copies it to the CPU, uses LAPACK to factor it, and moves the result back. So this is a small problem where no magma hybrid algorithms are used yet - only CUDA and CUBLAS calls related to initialization, memory allocation, and data transfers. If there is problem here the problem is most probably as I thought at the beginning the combination of CUDA, driver and GPU used.
Also, does the magma testing recognize the card - in general this is what the testing drivers print first. Is your device 0 recognized as the GTX580?
-->./ testing_zgetrf -M 1 -N 1
device 0: GeForce GTX 580, 1564.0 MHz clock, 1535.2 MB memory
testing_zgetrf -M 1 -N 1
!!!! cublasAlloc failed for: d_A
==> HOWEVER, THE FORTRAN VERSION WORKS!
testing_zgetrf_gpu_f
Solving A x = b using LU factorization:
|| A || = 1.063E+03
|| b || = 9.994E-01
|| b - A x || / (||A|| ||b||) = 4.811E-16
Gflops = 35.0904631976274