http://icl.cs.utk.edu/magma/software/

Additionally, we have a survey for feedback on MAGMA, LAPACK, and other dense linear algebra libraries.

https://www.surveymonkey.com/r/2016DenseLinearAlgebra

This release includes a major interface change for all MAGMA BLAS functions; most higher level functions such as magma_zgetrf have not changed their interface. Significant changes:

- Added queue argument to magmablas routines, and deprecated magmablas{Set,Get}KernelStream. This resolves a thread safety issue with using global magmablas{Set,Get}KernelStream.
- Fixed bugs related to relying on CUDA NULL stream implicit synchronization.
- Fixed memory leaks (zunmqr_m, zheevdx_2stage, etc.). Add -DDEBUG_MEMORY option to catch leaks.
- Fixed geqrf*_gpu bugs for m == nb, n >> m (ex: -N 64,10000); and m >> n, n == nb+i (ex: -N 10000,129)
- Fixed zunmql2_gpu for rectangular sizes.
- Fixed zhegvdx_m itype 3.
- Added zunglq, zungbr, zgeadd2 (which takes both alpha and beta).

MAGMA sparse

- Added QMR, TFQMR, preconditioned TFQMR
- Added CGS, preconditioned CGS
- Added kernel-fused versions for CGS/PCGS QMR, TFQMR/PTFQMR
- Changed relative stopping criterion to be relative to RHS
- Fixed bug in complex version of CG
- Accelerated version of Jacobi-CG
- Added very efficient IDR
- Performance tuning for SELLP SpMV