First, you are accessing the matrix row-wise, whereas MAGMA and LAPACK use column-wise ordering. That is,
A[ i + j*size ]
Of course, for a symmetric matrix, this just changes whether the entries are in the upper or lower triangle, but I think it's best to use column-wise access for clarity (i.e., rather than flipping uplo).
magma_zheevd takes the matrix A in the CPU memory, and produces the result in the CPU memory. Internally, it allocates GPU memory and transfers data to the GPU as necessary.
magma_zheevd_gpu takes the matrix dA in the GPU memory, and produces the result in the GPU memory. Note if you use the GPU interface, it's best to pad each column to a multiple of 32. See the use of ldda in the magma/testing examples.
Of course, if your matrix is double-precision, then you need one of the double-precision versions, magma_dsyevd[_gpu]. The magma_zheevd routine is for double-complex. Documentation for the routines is in the source itself (magma/src/dsyevd.cpp), which is also accessible online at http://icl.cs.utk.edu/magma/docs/dsyevd_8cpp.html
For compiling, include the magma.h header. Link with -lmagma -lmagmablas, as well as your BLAS and LAPACK libraries. You may need -lmagma -lmagmablas -lmagma. The various make.inc.* files give some pointers about what MKL or other BLAS/LAPACK libraries to include. If you've already setup your make.inc, then just follow the examples in magma/testing/. For instance,
- Code: Select all
gcc -O3 -DADD_ -I/mnt/scratch/cuda/include -I../include -I../control -c testing_zheevd.cpp -o testing_zheevd.o
gcc testing_zheevd.o -o testing_zheevd \
libtest.a lin/liblapacktest.a -L../lib -lmagma -lmagmablas \
-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread -lcublas -lcudart -lm -fopenmp -liomp5
(Adjust paths as required. For clarity, I've removed some flags that are extraneous here. You would not need -I../control, libtest.a, and lin/liblapacktest.a, as those are specific to the MAGMA testers.)