MAGMA 2.5.1-alpha1

MAGMA 2.5.1 Alpha is now released. Updates include:

  • Updates and improvements in CMakeLists.txt for improved/friendlier CMake and spack installations;
  • Fixes related to MAGMA installation on GPUs and CUDA versions that do not support FP16 arithmetic;
  • Added support for Turing GPUs;
  • Removed some C++ features from MAGMA Sparse for friedlier compilation (using nvcc and various CPU compilers).
magma-2.5.1-alpha1.tar.gz   Download View License

MAGMA 2.5.0

MAGMA 2.5.0 is now released. Updates include:

  • New routines: Magma is releasing the Nvidia Tensor Cores version of its linear mixed-precision solver that is able to provide an FP64 solution with up to 4X speedup using the fast FP16 Tensor Cores arithmetic. The release includes:
    magma_dhgesv_iteref_gpu (FP64-FP16 solver with FP64 input and solution);
    magma_dsgesv_iteref_gpu (FP64-FP32 solver with FP64 input and solution);
    magma_hgetrf_gpu        (mixed precision FP32-FP16 LU factorization);
    magma_htgetrf_gpu       (mixed precision FP32-FP16 LU factorization using Tensor Cores).
    Further details for the function names and the testing routines are given in file:
  • New routine: magmablas_Xgemm_batched_strided (X = {s, d, c, z}) is the stride-based variant of magmablas_Xgemm_batched;
  • New routine: magma_Xgetrf_native (X = {s, d, c, z}) performs the LU factorization with partial pivoting using the GPU only. It has the same interface as the hybrid (CPU+GPU) implementation provided by magma_Xgetrf_gpu. Testing the performance of this routine is possible through running testing_Xgetrf_gpu with the option (--version 3);
  • New routine: magma_Xpotrf_native (X = {s, d, c, z}) performs the Cholesky factorization using the GPU only. It has the same interface as the hybrid (CPU+GPU) implementation provided by magma_Xpotrf_gpu.
    Testing the performance of this routine is possible through running testing_Xpotrf_gpu with the option (--version 2)
  • Added benchmark for GEMM in FP16 arithmetic (HGEMM) as well as auxiliary functions to cast matrices from FP32 to FP16 storage (magmablas_slag2h) and from FP16 to FP32 (magmablas_hlag2s).
magma-2.5.0.tar.gz   Download View License

MagmaDNN 0.2

MagmaDNN 0.2 is now available. MagnaDNN provides HP data analytics and machine learning tools using MAGMA as its computational backend. Updates in this release include:

  • Bug fixes and performance improvements;
  • Winograd convolutions to accelerate CNNs;
  • Hyperparameter optimization framework;
  • MNIST and CIFAR-10 benchmarks using MagmaDNN;
  • Performance comparisons, accuracy validations, etc. (w\ TensorFlow, Theano, and PyTorch).

More information on MagmaDNN 0.2 is given in this presentation.

MagmaDNN's repository is on Bitbucket:

magmadnn-0.2.0.tar.gz   Download View License

MAGMA 2.4.0

MAGMA 2.4.0 is now released. Updates include:

  • Added constrained least squares routines (magma_[sdcz]gglse) and dependencies:
    magma_zggrqf - generalized RQ factorization
    magma_zunmrq - multiply by orthogonal Q as returned by zgerqf
  • Performance improvements across many batch routines, including batched TRSM, batched LU, batched LU-nopiv, and batched Cholesky
  • Fixed some compilation issues with inf, nan, and nullptr.


  • Changed the way how data from an external application is handled:
    There is now a clear distinction between memory allocated/used/freed from MAGMA and the user application. We added a functions magma_zvcopy and magma_zvpass that do not allocate memory, instead they copy values from/to application-allocated memory.
  • The examples ( in example/example_sparse.c ) give a demonstration on how these routines should be used.
magma-2.4.0.tar.gz   Download View License

MAGMA 2.3.0

MAGMA 2.3.0 is now released. Updates include:

  • Moved MAGMA's repository to Bitbucket:
  • Added support for Volta GPUs
  • Improved performance for batched LU and QR factorizations on small square sizes up to 32
  • Added test matrix generator to many testers
  • Launched Data Analytics tools (MagmaDNN 0.1 Alpha) using MAGMA as computational backend


  • Added support for CUDA 9.0
  • Improved the ParILUT algorithm w.r.t. stability and scalability
  • Added ParICT, a symmetry-exploiting version of the ParILUT algorithm
magma-2.3.0.tar.gz   Download View License

