Displaying 6-10 of 14 Entries
MAGMA GEMM Sources for Fermi Released

The MAGMA BLAS SGEMM and DGEMM sources for Fermi GPUs are now released.
These improved GEMMs, developed by Rajib Nath and Stan Tomov, will be
part of the up-coming MAGMA 0.3 library release and will be included in
CUBLAS 3.2 as well.

The basic algorithm is described in:
Nath, R., Tomov, S., Dongarra, J. "An Improved MAGMA GEMM for Fermi GPUs,"
University of Tennessee Computer Science Technical Report, UT-CS-10-655
(also LAPACK working note 227), July 29, 2010.

On a C2050 GPU the new DGEMM gets up to 300 GFlop/s (58% of peak) and
the SGEMM up to 645 (63% of peak). On a GTX480 DGEMM gets up to 166 GFlop/s
and SGEMM up to 844 GFlop/s.

The sources are available for download at the Software section of the web site.

MAGMA tutorial at SAAHPC

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and the DPLASMA and StarPU Scheduler, by:

Stanimire Tomov, George Bosilca, and Cédric Augonnet

Learn how to develop numerical software for heterogeneous architectures of Multicore and GPUs through a hybridization methodology that is built on:
  • Representing algorithms as collections of tasks and data dependencies, and
  • Properly scheduling the tasks' execution over the available multicore and GPU hardware components.
Examples will be given from the Matrix Algebra on GPU and Multicore Architectures (MAGMA) project, which aims to develop a new generation of linear algebra libraries that extends the sequential LAPACK-style algorithms for the highly parallel GPU and multicore heterogeneous architectures. As MAGMA has stand-alone hybrid algorithms, it also provides hybrid kernels to be used as building blocks in tile and "communication-avoiding" algorithms that must be efficiently scheduled. You will learn how to use dynamic schedulers to easily express these new algorithms, while at the same time fully use and extract high-performance from heterogeneous systems of multicore and GPUs. In particular, we will consider the DPLASMA and StarPU schedulers. DPLASMA is related to the Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) project but extends its operation to the distributed memory regime, while StarPU is a runtime system that is specialized into scheduling tasks onto accelerator-based platforms.

Tutorial presentations:

MAGMA Library

 Major chip manufacturers are developing next-generation microprocessor designs that are heterogeneous/hybrid in nature, integrating homogeneous x86-based multicore CPU components and GPU components. The MAGMA (Matrix Algebra on GPU and Multicore Architectures) project’s goal is to develop innovative linear algebra algorithms and to incorporate them into a library that is

• similar to LAPACK in functionality, data storage, and interface

but targeting the

• next-generation of highly parallel, and heterogeneous processors.


U of Tennessee Named CUDA Center of Excellence

NVIDIA Corp. today recognized the University of Tennessee, Knoxville's (UTK's) Innovative Computing Laboratory (ICL) as a CUDA Center of Excellence, noting its adoption of the CUDA programming model in its curriculum, as well as its pioneering research into the development of linear algebra libraries for the high-performance computing community.



Why You Should Touch MAGMA

Hiding the details of the multi-core and GP-GPU hardware is a really cool goal. Read the full article at Linux Magazine.

Displaying 6-10 of 14 Entries
May 26 2018 Admin Login