testing_dgemm "can not bind to texture"

Open discussion for MAGMA

testing_dgemm "can not bind to texture"

Postby bravegag » Wed Jun 19, 2013 4:39 pm

Hello,

I have successfully installed magma 1.3 in Ubuntu 11.x and have the following CPU and GPU:
- i7 3930K and Intel Parallel Studio (which includes MKL) 2013 update 3 (downloaded and installed today)
- nVidia GTX 670 and CUDA 5.0 (downloaded and installed today) which is at Kepler architecture and supports computing capability 3.0
- My cuda installer version is cuda_5.0.35_linux_64_ubuntu11.10-1.run
- My driver installer version is NVIDIA-Linux-x86_64-319.23.run

The problem is that compiling and running using "GPU_TARGET = Kepler" ./testing/testing_dgemm I get the following error "can not bind to texture" while my architecture is clearly Kepler, the errors and Gflop/s are totally off but that could be a consequence of whatever error leads to "can not bind to texture":
Code: Select all
bravegag@Zeus:~/Downloads/magma-1.3.0/testing$ ./testing_dgemm
MAGMA 1.3.0
device 0: GeForce GTX 670, 1084.5 MHz clock, 2047.7 MB memory, capability 3.0
Usage:
  testing_dgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = N  transB = N
    M    N    K     MAGMA GFLop/s    CUBLAS GFlop/s       error
==================================================================
can not bind to texture
 1024  1024  1024       45691.14           104.03         8.330523e+01
can not bind to texture
 1280  1280  1280       104857.60           105.08         1.030630e+02
can not bind to texture
 1600  1600  1600       227555.56           105.68         1.278515e+02
can not bind to texture
 2000  2000  2000       432432.43           109.60         1.592382e+02
can not bind to texture
 2500  2500  2500       844594.59           117.17         1.972784e+02
can not bind to texture
 3125  3125  3125       1695421.01           120.48         2.452298e+02
can not bind to texture
 3906  3906  3906       3221254.13           119.60         3.038157e+02
can not bind to texture
 4882  4882  4882       6648983.83           120.79         3.780258e+02
can not bind to texture
 6102  6102  6102       12622462.96           121.27         4.698986e+02


However, running using using "GPU_TARGET = Fermi" ./testing/testing_dgemm produces:
Code: Select all
bravegag@Zeus:~/Downloads/magma-1.3.0/testing$ ./testing_dgemm
MAGMA 1.3.0
device 0: GeForce GTX 670, 1084.5 MHz clock, 2047.7 MB memory, capability 3.0
Usage:
  testing_dgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = N  transB = N
    M    N    K     MAGMA GFLop/s    CUBLAS GFlop/s       error
==================================================================
 1024  1024  1024        97.30           104.24         0.000000e+00
 1280  1280  1280        99.67           111.52         0.000000e+00
 1600  1600  1600       108.83           113.81         0.000000e+00
 2000  2000  2000       109.06           117.83         0.000000e+00
 2500  2500  2500       109.62           118.42         0.000000e+00
 3125  3125  3125       115.06           120.49         0.000000e+00
 3906  3906  3906       112.49           119.62         0.000000e+00
 4882  4882  4882       114.04           120.79         0.000000e+00
 6102  6102  6102       114.92           121.27         nan


TIA,
Best regards,
Giovanni
bravegag
 
Posts: 23
Joined: Wed Jun 19, 2013 9:51 am

Re: testing_dgemm "can not bind to texture"

Postby mgates3 » Wed Jun 19, 2013 8:01 pm

There's a bug in the Makefile for Kepler. In Makefile.internal, where it says:
Code: Select all
else ifeq (${GPU_TARGET}, Kepler)
        NVOPTS += -DGPUSHMEM=300 -arch sm_35

change to:
Code: Select all
else ifeq (${GPU_TARGET}, Kepler)
        NVOPTS += -DGPUSHMEM=300 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35

Basically, it was compiling for compute capability 3.5, not 3.0.

-mark
mgates3
 
Posts: 438
Joined: Fri Jan 06, 2012 2:13 pm

Re: testing_dgemm "can not bind to texture"

Postby bravegag » Thu Jun 20, 2013 12:32 am

Thank you again! ;)

Now it comes back much better ... how about those nans in the output?

Code: Select all
bravegag@Zeus:~/Downloads/magma-1.3.0/testing$ ./testing_dgemm
MAGMA 1.3.0
device 0: GeForce GTX 670, 1084.5 MHz clock, 2047.7 MB memory, capability 3.0
Usage:
  testing_dgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = N  transB = N
    M    N    K     MAGMA GFLop/s    CUBLAS GFlop/s       error
==================================================================
 1024  1024  1024        98.35            98.22         0.000000e+00
 1280  1280  1280       100.94           105.15         0.000000e+00
 1600  1600  1600       102.09           105.66         0.000000e+00
 2000  2000  2000       107.42           116.86         0.000000e+00
 2500  2500  2500       109.99           116.97         nan
 3125  3125  3125       114.91           119.12         nan
 3906  3906  3906       112.38           119.15         nan
 4882  4882  4882       114.30           120.79         nan
 6102  6102  6102       116.10           121.25         nan


However, using cuBLAS directly though single precision, I still get higher GFlop/s:
Code: Select all
bravegag@Zeus:/usr/local/cuda-5.0/samples/0_Simple/matrixMulCUBLAS$ ./matrixMulCUBLAS
[Matrix Multiply CUBLAS] - Starting...
GPU Device 0: "GeForce GTX 670" with compute capability 3.0
MatrixA(4000,4000), MatrixB(4000,4000), MatrixC(4000,4000)
Computing result using CUBLAS...done.
Performance= 1260.71 GFlop/s, Time= 101.530 msec, Size= 128000000000 Ops


TIA,
Best regards,
Giovanni
bravegag
 
Posts: 23
Joined: Wed Jun 19, 2013 9:51 am

Re: testing_dgemm "can not bind to texture"

Postby mgates3 » Thu Jun 20, 2013 5:35 pm

Don't know why you are seeing nans there. If you try the tester in the MAGMA 1.4.0 beta 1 (just released today), it can show error for MAGMA and CUBLAS gemm separately (with -l):

Code: Select all
> ./testing_sgemm -l
MAGMA 1.3.0
device 0: Tesla K20c, 705.5 MHz clock, 4799.6 MB memory, capability 3.5
device 1: Tesla K20c, 600.0 MHz clock, 3839.6 MB memory, capability 3.5
Usage: ./testing_sgemm [options] [-h|--help]

If running lapack (option --lapack), MAGMA and CUBLAS error are both computed
relative to CPU BLAS result. Else, MAGMA error is computed relative to CUBLAS result.

transA = N, transB = N
    M     N     K   MAGMA Gflop/s (ms)  CUBLAS Gflop/s (ms)   CPU Gflop/s (ms)  MAGMA error  CUBLAS error
=========================================================================================================
 1088  1088  1088    135.63 (  18.99)     278.29 (   9.26)    113.86 (  22.62)    2.25e-06     2.33e-06
 2112  2112  2112    945.65 (  19.92)    2329.85 (   8.09)     31.74 ( 593.65)    3.00e-06     3.09e-06
 3136  3136  3136    957.07 (  64.45)    2464.71 (  25.03)    289.75 ( 212.88)    4.11e-06     4.11e-06
 4160  4160  4160    955.48 ( 150.69)    2509.80 (  57.37)    297.53 ( 483.93)    4.72e-06     4.72e-06


For the high-performance Tesla line K20c, roughly:

MAGMA dgemm is 600 Gflop/s, CUBLAS dgemm is 1000 Gflop/s
MAGMA sgemm is 1000 Gflop/s, CUBLAS sgemm is 2500 Gflop/s

However, the double-precision performance on the GTX line will be MUCH slower than single-precision, like the 10x difference you observe. The CUBLAS gemm kernel has been optimized for Kepler, while the MAGMA gemm was only optimized for Fermi architectures. On Fermi they are basically the same performance. Therefore it's best to use the CUBLAS gemm.

-mark
mgates3
 
Posts: 438
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 5 guests

cron