MAGMA
2.5.4
Matrix Algebra for GPU and Multicore Architectures
|
First, create a make.inc
file, using one of the examples as a template.
Set environment variables for where external packages are installed, either in your .cshrc/.bashrc
file, or in the make.inc
file itself.
All the make.inc
files assume $CUDADIR
is set in your environment. For bash (sh), put in ~/.bashrc
(with your system's path):
export CUDADIR=/usr/loca/cuda
For csh/tcsh, put in ~/.cshrc
:
setenv CUDADIR /usr/local/cuda
MAGMA is tested with CUDA >= 7.5. Some functionality requires a newer version.
The MKL make.inc
files assume $MKLROOT
is set in your environment. To set it, for bash (sh), put in ~/.bashrc (with your system's path):
source /opt/intel/bin/compilervars.sh intel64
For csh/tcsh, put in ~/.cshrc:
source /opt/intel/bin/compilervars.csh intel64
MAGMA is tested with MKL 11.3.3 (2016), both LP64 and ILP64; other versions may work.
The ACML make.inc
file assumes $ACMLDIR
is set in your environment. For bash (sh), put in ~/.bashrc (with your system's path):
export ACMLDIR=/opt/acml-5.3.1
For csh/tcsh, put in ~/.cshrc:
setenv ACMLDIR /opt/acml-5.3.1
MAGMA is tested with ACML 5.3.1; other versions may work. See comments in make.inc.acml
regarding ACML 4; a couple testers fail to compile with ACML 4.
The ATLAS make.inc
file assumes $ATLASDIR
and $LAPACKDIR
are set in your environment. If not installed, install LAPACK from http://www.netlib.org/lapack/ For bash (sh), put in ~/.bashrc (with your system's path):
export ATLASDIR=/opt/atlas export LAPACKDIR=/opt/LAPACK
For csh/tcsh, put in ~/.cshrc:
setenv ATLASDIR /opt/atlas setenv LAPACKDIR /opt/LAPACK
The OpenBLAS make.inc
file assumes $OPENBLASDIR
is set in your environment. For bash (sh), put in ~/.bashrc (with your system's path):
export OPENBLASDIR=/opt/openblas
For csh/tcsh, put in ~/.cshrc:
setenv OPENBLASDIR /opt/openblas
Some bugs exist with OpenBLAS 0.2.19; see BUGS.txt.
Unfortunately, the MacOS Accelerate framework uses an old ABI for BLAS and LAPACK, where single precision functions – such as sdot
, cdot
, slange
, and clange
– return a double precision result. This makes them incompatibile with our C/C++ headers and with the Fortran code used in our testers. The fix is to substitute reference implementations of these functions, found in magma/blas_fix
. Setting blas_fix = 1
in make.inc
will compile these into magma/lib/libblas_fix.a
, with which your application should link.
Depending on the Fortran compiler used for your BLAS and LAPACK libraries, the linking convention is one of:
gemm()
in Fortran becomes gemm_()
in C.gemm()
in Fortran becomes GEMM()
in C.gemm()
in Fortran stays gemm()
in C.Set -DADD_
, -DUPCASE
, or -DNOCHANGE
, respectively, in all FLAGS in your make.inc
file to select the appropriate one. Use nm
to examine your BLAS library:
acml-5.3.1/gfortran64_mp/lib> nm libacml_mp.a | grep -i 'T.*dgemm' 0000000000000000 T dgemm 00000000000004e0 T dgemm_
In this case, it shows that either -DADD_ (dgemm_)
or -DNOCHANGE (dgemm)
should work. The default in all make.inc files is -DADD_
.
Several compiler defines, below, affect how MAGMA is compiled and might have a large performance impact. These are set in make.inc
files using the -D
compiler flag, e.g., -DMAGMA_WITH_MKL
in CFLAGS.
MAGMA_WITH_MKL
If linked with MKL, allows MAGMA to get MKL's version and set MKL's number of threads.
MAGMA_WITH_ACML
If linked with ACML 5 or later, allows MAGMA to get ACML's version. ACML's number of threads are set via OpenMP.
MAGMA_NO_V1
Disables MAGMA v1.x compatability. Skips compiling non-queue versions of MAGMA BLAS routines, and simplifies magma_init().
MAGMA_NOAFFINITY
Disables thread affinity, available in glibc 2.6 and later.
BATCH_DISABLE_CHECKING
For batched routines, disables the info_array that contains errors. For example, for Cholesky factorization if you are sure your matrix is SPD and want better performance, you can compile with this flag.
BATCH_DISABLE_CLEANUP
For batched routines, disables the cleanup code. For example, the {sy|he}rk called with "lower" will write data on the upper triangular portion of the matrix.
BATCHED_DISABLE_PARCPU
In the testing directory, disables the parallel implementation of the batched computation on CPU. Can be used to compare a naive versus a parallelized CPU batched computation.
These variables control MAGMA, BLAS, and LAPACK run-time behavior.
$MAGMA_NUM_GPUS
For multi-GPU functions, set $MAGMA_NUM_GPUS
to the number of GPUs to use.
$OMP_NUM_THREADS
$MKL_NUM_THREADS
$VECLIB_MAXIMUM_THREADS
For multi-core BLAS libraries, set $OMP_NUM_THREADS
or $MKL_NUM_THREADS
or $VECLIB_MAXIMUM_THREADS
to the number of CPU threads, depending on your BLAS library. See the documentation for your BLAS and LAPACK libraries.
If you do not have a Fortran compiler, comment out FORT
in make.inc
. MAGMA's Fortran 90 interface and Fortran testers will not be built. Also, many testers will not be able to check their results – they will print an error message, e.g.:
magma/testing> ./testing_dgehrd -N 100 -c ... Cannot check results: dhst01_ unavailable, since there was no Fortran compiler. 100 --- ( --- ) 0.70 ( 0.00) 0.00e+00 0.00e+00 ok
By default, all make.inc
files (except ATLAS) add the -fPIC
option to CFLAGS, FFLAGS, F90FLAGS, and NVCCFLAGS, required for building a shared library. Note in NVCCFLAGS that -fPIC
is passed via the -Xcompiler
option. Running:
make
or
make lib make test make sparse-lib make sparse-test
will create shared libraries:
lib/libmagma.so lib/libmagma_sparse.so
and static libraries:
lib/libmagma.a lib/libmagma_sparse.a
and testing drivers in testing
and sparse-iter/testing
.
The current exception is for ATLAS, in make.inc.atlas
, which in our install is a static library, thus requiring MAGMA to be a static library.
Static libraries are always built along with the shared libraries above. Alternatively, comment out FPIC
in your make.inc
file to compile only a static library. Then, running:
make
will create static libraries:
lib/libmagma.a lib/libmagma_sparse.a
and testing drivers in testing
and sparse-iter/testing
.
To install libraries and include files in a given prefix, run:
make install prefix=/usr/local/magma
The default prefix is /usr/local/magma
. You can also set prefix
in make.inc
. This installs MAGMA libraries in ${prefix}/lib
, MAGMA header files in ${prefix}/include
, and ${prefix}/lib/pkgconfig/magma.pc
for pkg-config
.
You can modify the blocking factors for the algorithms of interest in control/get_nb.cpp
.
Performance results are included in results/vA.B.C/cudaX.Y-zzz/*.txt
for MAGMA version A.B.C, CUDA version X.Y, and GPU zzz.