News
Displaying 16-20 of 28 Entries
MAGMA 1.0 RC5 Released (updated on April 14th, 2011)
2011-04-14

MAGMA 1.0 RC5 is now available. This release includes the MAGMA sources! MAGMA 1.0 RC5 is intended for a single CUDA enabled NVIDIA GPU. It extends version 0.2 by adding support for the Fermi GPUs (see the sample performances for LU, QR, and Cholesky factorizations and LS solvers in complex arithmetic). For more details see the RC5 Rlease Notes and the MAGMA 1.0 presentation.

Included are routines for the following algorithms:

  • LU, QR, and Cholesky factorizations in both real and complex arithmetic (single and double);
  • Hessenberg, bidiagonal, and tridiagonal reductions in both real and complex arithmetic (single and double);
  • Linear solvers based on LU, QR, and Cholesky in both real and complex arithmetic (single and double);
  • Eigen and singular value problem solvers in both real and complex arithmetic (single and double);
  • Mixed-precision iterative refinement solvers based on LU, QR, and Cholesky in both real and complex arithmetic;
  • MAGMA BLAS in real arithmetic (single and double), including gemm, gemv, symv, and trsm.

See the Software section for a download link.


2010-11-15

2010-10-19

MAGMA GEMM Sources for Fermi Released
2010-08-04

The MAGMA BLAS SGEMM and DGEMM sources for Fermi GPUs are now released.
These improved GEMMs, developed by Rajib Nath and Stan Tomov, will be
part of the up-coming MAGMA 0.3 library release and will be included in
CUBLAS 3.2 as well.

The basic algorithm is described in:
Nath, R., Tomov, S., Dongarra, J. "An Improved MAGMA GEMM for Fermi GPUs,"
University of Tennessee Computer Science Technical Report, UT-CS-10-655
(also LAPACK working note 227), July 29, 2010.
http://icl.cs.utk.edu/projectsfiles/magma/pubs/fermi_gemm.pdf

On a C2050 GPU the new DGEMM gets up to 300 GFlop/s (58% of peak) and
the SGEMM up to 645 (63% of peak). On a GTX480 DGEMM gets up to 166 GFlop/s
and SGEMM up to 844 GFlop/s.

The sources are available for download at the Software section of the web site.


MAGMA tutorial at SAAHPC
2010-07-10

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and the DPLASMA and StarPU Scheduler, by:

Stanimire Tomov, George Bosilca, and Cédric Augonnet

Learn how to develop numerical software for heterogeneous architectures of Multicore and GPUs through a hybridization methodology that is built on:
  • Representing algorithms as collections of tasks and data dependencies, and
  • Properly scheduling the tasks' execution over the available multicore and GPU hardware components.
Examples will be given from the Matrix Algebra on GPU and Multicore Architectures (MAGMA) project, which aims to develop a new generation of linear algebra libraries that extends the sequential LAPACK-style algorithms for the highly parallel GPU and multicore heterogeneous architectures. As MAGMA has stand-alone hybrid algorithms, it also provides hybrid kernels to be used as building blocks in tile and "communication-avoiding" algorithms that must be efficiently scheduled. You will learn how to use dynamic schedulers to easily express these new algorithms, while at the same time fully use and extract high-performance from heterogeneous systems of multicore and GPUs. In particular, we will consider the DPLASMA and StarPU schedulers. DPLASMA is related to the Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) project but extends its operation to the distributed memory regime, while StarPU is a runtime system that is specialized into scheduling tasks onto accelerator-based platforms.

Tutorial presentations:

Displaying 16-20 of 28 Entries
May 20 2013 Admin Login