Magma 1.2.1 is for CUDA, so it works only with Nvidia cards, not AMD cards. You can, however, get clMagma 0.3 which uses OpenCL and works with AMD cards. For OpenCL, clMagma doesn't actually implement the gemm call. It is implemented by AMD in their clAmdBlas library. The AMD libraries are available from their website:
http://developer.amd.com/tools/hc/appma ... fault.aspxWe do, however, provide a testing routine in clMagma that tests the performance of the gemm call. To compile, first copy one of make.inc.acml4 or make.inc.mkl to make.inc, then edit it to reflect where libraries are installed on your machine. Then type make to compile clMagma. After compiling clMagma, go into the testing directory and run:
./testing_sgemm
Usage: testing_sgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = No transpose transB = No transpose
M N K clAmdBlas GFLop/s (sec) CPU GFlop/s (sec)
===========================================================================
1024 1024 1024 327.01 ( 0.01) 53.40 ( 0.04)
1280 1280 1280 522.85 ( 0.01) 84.55 ( 0.05)
1600 1600 1600 1362.83 ( 0.01) 109.94 ( 0.07)
2000 2000 2000 1285.78 ( 0.01) 109.36 ( 0.15)
2500 2500 2500 1300.46 ( 0.02) 113.79 ( 0.27)
3125 3125 3125 1420.51 ( 0.04) 116.30 ( 0.52)
3906 3906 3906 1428.27 ( 0.08) 122.23 ( 0.98)
4882 4882 4882 1456.54 ( 0.16) 124.28 ( 1.87)
If you have a particular size and No-transpose/Transpose, you can provide those on the command line:
./testing_sgemm -M 1000 -N 500 -K 800 -NT
Usage: testing_sgemm [-NN|NT|TN|TT] [-N 1024]
Testing transA = No transpose transB = Transpose
M N K clAmdBlas GFLop/s (sec) CPU GFlop/s (sec)
===========================================================================
1000 500 800 218.41 ( 0.00) 31.18 ( 0.03)