I used to got this kind of error when passing wrong arguments or data to the OCL kernel, making it terminate with wrong results and quite-null computational time.
I can reproduce this behaviour only on xgemm kernels: as you can see from the output below, single precision test kernel seems to work well before -M 6102 -N 6102 test, while dgemm, zgemm and cgemm gives very similar output on almost every work dimension.
I made some hacks on make.inc and Makefile.internal files to make work NVIDIA gpu with clAmdBlas (i wanted an opencl-only setup), so maybe i did some configuration mistake..
Can you please check them and point me a solution? Thank you in advance.
Environment (Linux Fedora 17):
. Intel i5 quad-core
. NVIDIA Quadro 600 (NVIDIA UNIX x86_64 Kernel Module 304.51)
. GCC version: gcc version 4.7.2 20120921 (Red Hat 4.7.2-2) (GCC)
. clAmdBlas 1.8.291
. AMD-APP-SDK 2.7 lnx64
. clmagma-1.0.0
- Code: Select all
[xxx@yyy testing]$ ./testing_dgemm && ./testing_sgemm
Initializing clMAGMA runtime ...
Usage:
testing_dgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = o transB = o
M N K clAmdBlas GFLop/s (sec) CPU GFlop/s (sec) error
===========================================================================
1024 1024 1024 16930.83 ( 0.00) 6.59 ( 0.33) 8.340644e+01
1280 1280 1280 37430.18 ( 0.00) 6.64 ( 0.63) 1.803064e+02
1600 1600 1600 73105.83 ( 0.00) 6.64 ( 1.23) 3.013679e+02
2000 2000 2000 134486.70 ( 0.00) 6.68 ( 2.40) 4.507322e+02
2500 2500 2500 276523.21 ( 0.00) 6.69 ( 4.67) 6.353319e+02
3125 3125 3125 507936.51 ( 0.00) 6.68 ( 9.13) 8.665594e+02
3906 3906 3906 1028609.07 ( 0.00) 6.71 ( 17.77) 1.154436e+03
4882 4882 4882 2059230.13 ( 0.00) 6.69 ( 34.78) 1.515831e+03
6102 6102 6102 4020945.33 ( 0.00) 6.69 ( 67.88) 1.963820e+03
Initializing clMAGMA runtime ...
Usage:
testing_sgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = o transB = o
M N K clAmdBlas GFLop/s (sec) CPU GFlop/s (sec) error
===========================================================================
1024 1024 1024 42.50 ( 0.05) 13.30 ( 0.16) 1.754761e-04
1280 1280 1280 42.76 ( 0.10) 13.46 ( 0.31) 2.746582e-04
1600 1600 1600 42.99 ( 0.19) 13.48 ( 0.61) 3.280640e-04
2000 2000 2000 39.73 ( 0.40) 13.55 ( 1.18) 4.882812e-04
2500 2500 2500 38.87 ( 0.80) 13.56 ( 2.30) 7.781982e-04
3125 3125 3125 39.64 ( 1.54) 13.58 ( 4.49) 9.918213e-04
3906 3906 3906 39.00 ( 3.06) 13.63 ( 8.75) 1.312256e-03
4882 4882 4882 39.59 ( 5.88) 13.57 ( 17.15) 2.075195e-03
6102 6102 6102 2950353.08 ( 0.00) 13.60 ( 33.40) 4.695436e+02
