clMAGMA 'compile errors on device 0'?

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

clMAGMA 'compile errors on device 0'?

Postby cdeterman » Thu Aug 25, 2016 11:48 am

I have successfully compiled and executed some cholesky factorization. The code runs and the results look good, however, there are numerous, repeated lines in the output as well:

Code: Select all
compile errors on device 0:


----
compile errors on device 0:


----
compile errors on device 0:


----
compile errors on device 0:
...


Any initial ideas as to what this could be caused by? Is there any 'debugging' version of clMAGMA that could be setup with a #DEFINE perhaps? You can see the relevant code https://github.com/gpuRcore/gpuRclmagma/blob/master/src/chol.cpp.
cdeterman
 
Posts: 10
Joined: Wed Aug 24, 2016 2:39 pm

Re: clMAGMA 'compile errors on device 0'?

Postby cdeterman » Mon Aug 29, 2016 12:59 pm

I have just localized this problem to the single line

Code: Select all
magma_init()


by placing some print statements before and after. Any idea why `magma_init()` would have compile errors but the actual 'magma_dpotrf_gpu' function works without a problem?
cdeterman
 
Posts: 10
Joined: Wed Aug 24, 2016 2:39 pm

Re: clMAGMA 'compile errors on device 0'?

Postby mgates3 » Mon Aug 29, 2016 2:21 pm

Because magma_init() compiles or loads all the clMAGMA kernels. They are not compiled when used, e.g., inside dpotrf. This behavior may have changed in the repository since the last release, though.

So, it depends on what kernels dpotrf uses, whether the error affects dpotrf.

-mark
mgates3
 
Posts: 750
Joined: Fri Jan 06, 2012 2:13 pm

Re: clMAGMA 'compile errors on device 0'?

Postby cdeterman » Tue Aug 30, 2016 8:45 am

Okay, so if the kernel errors are referring to other kernels what is the recommendation? Should I just ignore them? Pull more recent changes? Any form of debugging recommended?
cdeterman
 
Posts: 10
Joined: Wed Aug 24, 2016 2:39 pm

Re: clMAGMA 'compile errors on device 0'?

Postby mgates3 » Wed Aug 31, 2016 4:02 pm

Can you include the complete input & output of the tester? [You can omit excessive repetitions of lines.]

Your make.inc file and any other relevant information about your system -- such as environment variables you needed to set -- would be helpful.

For instance, I'm using make.inc.macos on Mac OS.

Code: Select all
clmagma-1.3.0/testing> echo $clBLAS
/usr/local/clBLAS-2.4

clmagma-1.3.0/testing> setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:${HOME}/src/clmagma-1.3.0/lib

clmagma-1.3.0/testing> ./testing_dpotrf --range 100:1000:100 -c
% clMAGMA 1.3.0
% OpenCL platform OpenCL 1.2 (Nov  2 2015 15:02:14). MAGMA not compiled with OpenMP.
% Device: GeForce GT 750M, 2048.0 MiB memory, max allocation 512.0 MiB, driver  8.26.29 310.40.55f01
Usage: ./testing_dpotrf [options] [-h|--help]

ngpu = 1, uplo = Lower
    N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   ||R_magma - R_lapack||_F / ||R_lapack||_F
========================================================
  100      6.51 (   0.00)      3.59 (   0.00)   0.00e+00   ok
  200      5.69 (   0.00)      0.16 (   0.02)   4.54e-17   ok
  300     11.32 (   0.00)      0.54 (   0.02)   8.64e-17   ok
  400     19.67 (   0.00)      0.80 (   0.03)   6.89e-17   ok
  500     23.42 (   0.00)      1.29 (   0.03)   6.71e-17   ok
  600     38.62 (   0.00)      1.76 (   0.04)   7.85e-17   ok
  700     38.04 (   0.00)      1.86 (   0.06)   7.52e-17   ok
  800     37.69 (   0.00)      1.93 (   0.09)   6.77e-17   ok
  900     53.26 (   0.00)      2.29 (   0.11)   6.33e-17   ok
 1000     55.19 (   0.01)      2.75 (   0.12)   5.97e-17   ok
mgates3
 
Posts: 750
Joined: Fri Jan 06, 2012 2:13 pm

Re: clMAGMA 'compile errors on device 0'?

Postby cdeterman » Thu Sep 01, 2016 8:50 am

The output as requested for the tester

Code: Select all
./testing_dpotrf --range 100:1000:100 -c
compile errors on device 0:


----
compile errors on device 0:

... repeated many times

----
% clMAGMA 1.3.0 svn
% OpenCL platform OpenCL 1.2 CUDA 7.5.26. OpenMP threads 4.
% Device: GeForce GTX 970, 4095.3 MiB memory, max allocation 1023.8 MiB, driver  352.93
% Usage: ./testing_dpotrf [options] [-h|--help]

% ngpu = 1, uplo = Lower
%   N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   ||R_magma - R_lapack||_F / ||R_lapack||_F
%=======================================================
  100      1.84 (   0.00)      0.01 (   0.04)   0.00e+00   ok
dtrsm( R, L, C, N, 72, 128, 224, 224 )
dtrsm done
  200      6.09 (   0.00)      0.02 (   0.13)   8.30e-18   ok
dtrsm( R, L, C, N, 172, 128, 320, 320 )
dtrsm done
dtrsm( R, L, C, N, 44, 128, 320, 320 )
dtrsm done
  300     12.39 (   0.00)      0.03 (   0.29)   3.64e-17   ok
dtrsm( R, L, C, N, 272, 128, 416, 416 )
dtrsm done
dtrsm( R, L, C, N, 144, 128, 416, 416 )
dtrsm done
dtrsm( R, L, C, N, 16, 128, 416, 416 )
dtrsm done
  400     18.10 (   0.00)      3.34 (   0.01)   3.59e-17   ok
dtrsm( R, L, C, N, 372, 128, 512, 512 )
dtrsm done
dtrsm( R, L, C, N, 244, 128, 512, 512 )
dtrsm done
dtrsm( R, L, C, N, 116, 128, 512, 512 )
dtrsm done
  500     25.04 (   0.00)      5.60 (   0.01)   3.16e-17   ok
dtrsm( R, L, C, N, 472, 128, 608, 608 )
dtrsm done
dtrsm( R, L, C, N, 344, 128, 608, 608 )
dtrsm done
dtrsm( R, L, C, N, 216, 128, 608, 608 )
dtrsm done
dtrsm( R, L, C, N, 88, 128, 608, 608 )
dtrsm done
  600     30.72 (   0.00)     12.49 (   0.01)   4.25e-17   ok
dtrsm( R, L, C, N, 572, 128, 704, 704 )
dtrsm done
dtrsm( R, L, C, N, 444, 128, 704, 704 )
dtrsm done
dtrsm( R, L, C, N, 316, 128, 704, 704 )
dtrsm done
dtrsm( R, L, C, N, 188, 128, 704, 704 )
dtrsm done
dtrsm( R, L, C, N, 60, 128, 704, 704 )
dtrsm done
  700     17.74 (   0.01)     14.63 (   0.01)   4.04e-17   ok
dtrsm( R, L, C, N, 672, 128, 800, 800 )
dtrsm done
dtrsm( R, L, C, N, 544, 128, 800, 800 )
dtrsm done
dtrsm( R, L, C, N, 416, 128, 800, 800 )
dtrsm done
dtrsm( R, L, C, N, 288, 128, 800, 800 )
dtrsm done
dtrsm( R, L, C, N, 160, 128, 800, 800 )
dtrsm done
dtrsm( R, L, C, N, 32, 128, 800, 800 )
dtrsm done
  800     23.85 (   0.01)      1.52 (   0.11)   4.37e-17   ok
dtrsm( R, L, C, N, 772, 128, 928, 928 )
dtrsm done
dtrsm( R, L, C, N, 644, 128, 928, 928 )
dtrsm done
dtrsm( R, L, C, N, 516, 128, 928, 928 )
dtrsm done
dtrsm( R, L, C, N, 388, 128, 928, 928 )
dtrsm done
dtrsm( R, L, C, N, 260, 128, 928, 928 )
dtrsm done
dtrsm( R, L, C, N, 132, 128, 928, 928 )
dtrsm done
dtrsm( R, L, C, N, 4, 128, 928, 928 )
dtrsm done
  900     43.13 (   0.01)     15.08 (   0.02)   3.97e-17   ok
dtrsm( R, L, C, N, 872, 128, 1024, 1024 )
dtrsm done
dtrsm( R, L, C, N, 744, 128, 1024, 1024 )
dtrsm done
dtrsm( R, L, C, N, 616, 128, 1024, 1024 )
dtrsm done
dtrsm( R, L, C, N, 488, 128, 1024, 1024 )
dtrsm done
dtrsm( R, L, C, N, 360, 128, 1024, 1024 )
dtrsm done
dtrsm( R, L, C, N, 232, 128, 1024, 1024 )
dtrsm done
dtrsm( R, L, C, N, 104, 128, 1024, 1024 )
dtrsm done
 1000     42.20 (   0.01)     18.55 (   0.02)   3.77e-17   ok
cdeterman
 
Posts: 10
Joined: Wed Aug 24, 2016 2:39 pm

Re: clMAGMA 'compile errors on device 0'?

Postby cdeterman » Tue Sep 06, 2016 4:01 pm

Any further thoughts regarding the output from the tester?
cdeterman
 
Posts: 10
Joined: Wed Aug 24, 2016 2:39 pm

Re: clMAGMA 'compile errors on device 0'?

Postby mgates3 » Wed Sep 07, 2016 10:12 pm

Is there any particular reason you are using clMAGMA on a CUDA card? The CUDA version of MAGMA will perform much better.

Of course, if you have other OpenCL code that you are trying to integrate it with, that would explain.

But otherwise, no, we don't have a solution as yet.
-mark
mgates3
 
Posts: 750
Joined: Fri Jan 06, 2012 2:13 pm

Re: clMAGMA 'compile errors on device 0'?

Postby cdeterman » Thu Sep 08, 2016 11:51 am

Your assumption is correct that I have other OpenCL code I would like to integrate with it. I just happen to have a NVIDIA gpu to work with at the moment.
cdeterman
 
Posts: 10
Joined: Wed Aug 24, 2016 2:39 pm

Re: clMAGMA 'compile errors on device 0'?

Postby mgates3 » Tue Sep 13, 2016 1:55 pm

[Note that this post is in relation to the clMAGMA 1.3.0 release, not what is currently available on BitBucket.]

I haven't been able to reproduce this error. I did encounter a different error, which is innocuous but annoying. clBLAS doesn't allow m=0 or n=0 or k=0 in most routines, while the BLAS standard does. Some MAGMA wrappers were missing if statements to handle this condition, so yielded an error message. However, the result is correct:

Code: Select all
mint clmagma-1.3.0/testing> ./testing_dpotrf --range 100:1000:100 -c                                                                                                                                 Error: file 'clmagma_kernels.co' not found in $LD_LIBRARY_PATH '/Users/mgates/src/hadoop-2.6.0/lib:/Users/mgates/Documents/cl-magma/lib:/usr/local/openblas/lib:/usr/local/cuda-7.5/lib:/usr/local/cuda-7.5/extras/CUPTI/lib:/usr/local/openmpi/lib'
% clMAGMA 1.3.0
% OpenCL platform OpenCL 1.2 (Nov  2 2015 15:02:14). MAGMA not compiled with OpenMP.
% Device: GeForce GT 750M, 2048.0 MiB memory, max allocation 512.0 MiB, driver  8.26.29 310.40.55f01
Usage: ./testing_dpotrf [options] [-h|--help]

ngpu = 1, uplo = Lower
    N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   ||R_magma - R_lapack||_F / ||R_lapack||_F
========================================================
  100      6.66 (   0.00)      3.49 (   0.00)   0.00e+00   ok
OpenCL runtime error: unknown OpenCL error code (-1017) in magma_dsyrk at blas_d.cpp:1268
  200      6.48 (   0.00)      0.26 (   0.01)   4.54e-17   ok
OpenCL runtime error: unknown OpenCL error code (-1017) in magma_dsyrk at blas_d.cpp:1268
  300     15.81 (   0.00)      0.86 (   0.01)   8.64e-17   ok
OpenCL runtime error: unknown OpenCL error code (-1017) in magma_dsyrk at blas_d.cpp:1268
  400     22.90 (   0.00)      1.14 (   0.02)   6.89e-17   ok


Attached is a fix for that problem. With that fix, it runs fine, with or without LD_LIBRARY_PATH set.

Code: Select all
mint clmagma-1.3.0/testing> ./testing_dpotrf --range 100:400:100 -c
Error: file 'clmagma_kernels.co' not found in $LD_LIBRARY_PATH '.'
% clMAGMA 1.3.0
% OpenCL platform OpenCL 1.2 (Nov  2 2015 15:02:14). MAGMA not compiled with OpenMP.
% Device: GeForce GT 750M, 2048.0 MiB memory, max allocation 512.0 MiB, driver  8.26.29 310.40.55f01
Usage: ./testing_dpotrf [options] [-h|--help]

ngpu = 1, uplo = Lower
    N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   ||R_magma - R_lapack||_F / ||R_lapack||_F
========================================================
  100      5.57 (   0.00)      2.96 (   0.00)   0.00e+00   ok
  200      6.37 (   0.00)      0.19 (   0.01)   4.54e-17   ok
  300     12.44 (   0.00)      0.55 (   0.02)   8.64e-17   ok
  400     19.29 (   0.00)      0.69 (   0.03)   6.89e-17   ok

mint clmagma-1.3.0/testing> setenv LD_LIBRARY_PATH ../lib
mint clmagma-1.3.0/testing> ./testing_dpotrf --range 100:400:100 -c
% clMAGMA 1.3.0
% OpenCL platform OpenCL 1.2 (Nov  2 2015 15:02:14). MAGMA not compiled with OpenMP.
% Device: GeForce GT 750M, 2048.0 MiB memory, max allocation 512.0 MiB, driver  8.26.29 310.40.55f01
Usage: ./testing_dpotrf [options] [-h|--help]

ngpu = 1, uplo = Lower
    N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   ||R_magma - R_lapack||_F / ||R_lapack||_F
========================================================
  100      3.64 (   0.00)      1.93 (   0.00)   0.00e+00   ok
  200      4.52 (   0.00)      0.17 (   0.02)   4.54e-17   ok
  300     11.61 (   0.00)      0.56 (   0.02)   8.64e-17   ok
  400     19.20 (   0.00)      0.70 (   0.03)   6.89e-17   ok


I'll investigate a bit more to see what the issue might be.
-mark
Attachments
clmagma-1.3.0-blas-patch.tar.gz
Quick return if m == 0, n ==0, or k == 0
(13.41 KiB) Downloaded 20 times
mgates3
 
Posts: 750
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 5 guests

cron