MAGMA  2.5.4
Matrix Algebra for GPU and Multicore Architectures
Installing MAGMA

First, create a make.inc file, using one of the examples as a template.

Set environment variables for where external packages are installed, either in your .cshrc/.bashrc file, or in the make.inc file itself.

CUDA

All the make.inc files assume $CUDADIR is set in your environment. For bash (sh), put in ~/.bashrc (with your system's path):

export CUDADIR=/usr/loca/cuda

For csh/tcsh, put in ~/.cshrc:

setenv CUDADIR /usr/local/cuda

MAGMA is tested with CUDA >= 7.5. Some functionality requires a newer version.

Intel MKL

The MKL make.inc files assume $MKLROOT is set in your environment. To set it, for bash (sh), put in ~/.bashrc (with your system's path):

source /opt/intel/bin/compilervars.sh intel64

For csh/tcsh, put in ~/.cshrc:

source /opt/intel/bin/compilervars.csh intel64

MAGMA is tested with MKL 11.3.3 (2016), both LP64 and ILP64; other versions may work.

AMD ACML

The ACML make.inc file assumes $ACMLDIR is set in your environment. For bash (sh), put in ~/.bashrc (with your system's path):

export ACMLDIR=/opt/acml-5.3.1

For csh/tcsh, put in ~/.cshrc:

setenv ACMLDIR  /opt/acml-5.3.1

MAGMA is tested with ACML 5.3.1; other versions may work. See comments in make.inc.acml regarding ACML 4; a couple testers fail to compile with ACML 4.

ATLAS

The ATLAS make.inc file assumes $ATLASDIR and $LAPACKDIR are set in your environment. If not installed, install LAPACK from http://www.netlib.org/lapack/ For bash (sh), put in ~/.bashrc (with your system's path):

export ATLASDIR=/opt/atlas
export LAPACKDIR=/opt/LAPACK

For csh/tcsh, put in ~/.cshrc:

setenv ATLASDIR  /opt/atlas
setenv LAPACKDIR /opt/LAPACK

OpenBLAS

The OpenBLAS make.inc file assumes $OPENBLASDIR is set in your environment. For bash (sh), put in ~/.bashrc (with your system's path):

export OPENBLASDIR=/opt/openblas

For csh/tcsh, put in ~/.cshrc:

setenv OPENBLASDIR /opt/openblas

Some bugs exist with OpenBLAS 0.2.19; see BUGS.txt.

MacOS Accelerate (previously Veclib)

Unfortunately, the MacOS Accelerate framework uses an old ABI for BLAS and LAPACK, where single precision functions – such as sdot, cdot, slange, and clange – return a double precision result. This makes them incompatibile with our C/C++ headers and with the Fortran code used in our testers. The fix is to substitute reference implementations of these functions, found in magma/blas_fix. Setting blas_fix = 1 in make.inc will compile these into magma/lib/libblas_fix.a, with which your application should link.

Linking to BLAS

Depending on the Fortran compiler used for your BLAS and LAPACK libraries, the linking convention is one of:

  • Add underscore, so gemm() in Fortran becomes gemm_() in C.
  • Uppercase, so gemm() in Fortran becomes GEMM() in C.
  • No change, so gemm() in Fortran stays gemm() in C.

Set -DADD_, -DUPCASE, or -DNOCHANGE, respectively, in all FLAGS in your make.inc file to select the appropriate one. Use nm to examine your BLAS library:

acml-5.3.1/gfortran64_mp/lib> nm libacml_mp.a | grep -i 'T.*dgemm'
0000000000000000 T dgemm
00000000000004e0 T dgemm_

In this case, it shows that either -DADD_ (dgemm_) or -DNOCHANGE (dgemm) should work. The default in all make.inc files is -DADD_.

Compile-time options

Several compiler defines, below, affect how MAGMA is compiled and might have a large performance impact. These are set in make.inc files using the -D compiler flag, e.g., -DMAGMA_WITH_MKL in CFLAGS.

  • MAGMA_WITH_MKL

    If linked with MKL, allows MAGMA to get MKL's version and set MKL's number of threads.

  • MAGMA_WITH_ACML

    If linked with ACML 5 or later, allows MAGMA to get ACML's version. ACML's number of threads are set via OpenMP.

  • MAGMA_NO_V1

    Disables MAGMA v1.x compatability. Skips compiling non-queue versions of MAGMA BLAS routines, and simplifies magma_init().

  • MAGMA_NOAFFINITY

    Disables thread affinity, available in glibc 2.6 and later.

  • BATCH_DISABLE_CHECKING

    For batched routines, disables the info_array that contains errors. For example, for Cholesky factorization if you are sure your matrix is SPD and want better performance, you can compile with this flag.

  • BATCH_DISABLE_CLEANUP

    For batched routines, disables the cleanup code. For example, the {sy|he}rk called with "lower" will write data on the upper triangular portion of the matrix.

  • BATCHED_DISABLE_PARCPU

    In the testing directory, disables the parallel implementation of the batched computation on CPU. Can be used to compare a naive versus a parallelized CPU batched computation.

Run-time options

These variables control MAGMA, BLAS, and LAPACK run-time behavior.

  • $MAGMA_NUM_GPUS

For multi-GPU functions, set $MAGMA_NUM_GPUS to the number of GPUs to use.

  • $OMP_NUM_THREADS
  • $MKL_NUM_THREADS
  • $VECLIB_MAXIMUM_THREADS

    For multi-core BLAS libraries, set $OMP_NUM_THREADS or $MKL_NUM_THREADS or $VECLIB_MAXIMUM_THREADS to the number of CPU threads, depending on your BLAS library. See the documentation for your BLAS and LAPACK libraries.

Building without Fortran

If you do not have a Fortran compiler, comment out FORT in make.inc. MAGMA's Fortran 90 interface and Fortran testers will not be built. Also, many testers will not be able to check their results – they will print an error message, e.g.:

magma/testing> ./testing_dgehrd -N 100 -c
...
Cannot check results: dhst01_ unavailable, since there was no Fortran compiler.
  100     ---   (  ---  )      0.70 (   0.00)   0.00e+00        0.00e+00   ok

Building shared libraries

By default, all make.inc files (except ATLAS) add the -fPIC option to CFLAGS, FFLAGS, F90FLAGS, and NVCCFLAGS, required for building a shared library. Note in NVCCFLAGS that -fPIC is passed via the -Xcompiler option. Running:

make

or

make lib
make test
make sparse-lib
make sparse-test

will create shared libraries:

lib/libmagma.so
lib/libmagma_sparse.so

and static libraries:

lib/libmagma.a
lib/libmagma_sparse.a

and testing drivers in testing and sparse-iter/testing.

The current exception is for ATLAS, in make.inc.atlas, which in our install is a static library, thus requiring MAGMA to be a static library.

Building static libraries

Static libraries are always built along with the shared libraries above. Alternatively, comment out FPIC in your make.inc file to compile only a static library. Then, running:

make

will create static libraries:

lib/libmagma.a
lib/libmagma_sparse.a

and testing drivers in testing and sparse-iter/testing.

Installation

To install libraries and include files in a given prefix, run:

make install prefix=/usr/local/magma

The default prefix is /usr/local/magma. You can also set prefix in make.inc. This installs MAGMA libraries in ${prefix}/lib, MAGMA header files in ${prefix}/include, and ${prefix}/lib/pkgconfig/magma.pc for pkg-config.

Tuning

You can modify the blocking factors for the algorithms of interest in control/get_nb.cpp.

Performance results are included in results/vA.B.C/cudaX.Y-zzz/*.txt for MAGMA version A.B.C, CUDA version X.Y, and GPU zzz.