MAGMA GEMM Sources for Fermi Released

Open discussion for MAGMA

MAGMA GEMM Sources for Fermi Released

Postby admin » Wed Aug 04, 2010 12:54 pm

The MAGMA BLAS SGEMM and DGEMM sources for Fermi GPUs are now released.
These improved GEMMs, developed by Rajib Nath and Stan Tomov, will be
part of the up-coming MAGMA 0.3 library release and will be included in
CUBLAS 3.2 as well.

The basic algorithm is described in:
Nath, R., Tomov, S., Dongarra, J. "An Improved MAGMA GEMM for Fermi GPUs,"
University of Tennessee Computer Science Technical Report, UT-CS-10-655
(also LAPACK working note 227), July 29, 2010.
http://icl.cs.utk.edu/projectsfiles/mag ... i_gemm.pdf

On a C2050 GPU the new DGEMM gets up to 300 GFlop/s (58% of peak) and
the SGEMM up to 645 (63% of peak). On a GTX480 DGEMM gets up to 166 GFlop/s
and SGEMM up to 844 GFlop/s.
Attachments
magmablas_gemm_fermi.tar.gz
(9.95 KiB) Downloaded 489 times
admin
Site Admin
 
Posts: 18
Joined: Tue Aug 04, 2009 12:23 pm

Re: MAGMA GEMM Sources for Fermi Released

Postby mbibby » Thu Aug 05, 2010 10:04 am

When will we see the cgemm and zgemm equivalents?

Malcolm
mbibby
 
Posts: 10
Joined: Fri Aug 07, 2009 9:07 am

Re: MAGMA GEMM Sources for Fermi Released

Postby Stan Tomov » Thu Aug 05, 2010 12:32 pm

I am not sure if we would personally write the equivalents. NVIDIA is preparing CUBLAS 3.2
that will have improved c/z gemms using ideas from the s/d gemms.
Stan
Stan Tomov
 
Posts: 249
Joined: Fri Aug 21, 2009 10:39 pm

Re: MAGMA GEMM Sources for Fermi Released

Postby Boxed Cylon » Fri Aug 13, 2010 2:26 am

I preface this post with the declaration that I know just about nothing about details of these routines...

I was looking through the fermi_sgemm.cu routine to get some sense of how the code was engineered. I noticed the __mul24 function, and wondered what it did. A google search turned up the Fermi Tuning Guide with:

Code: Select all
32-Bit Integer Multiplication
On devices of compute capability 1.x, 32-bit integer multiplication is implemented using multiple instructions as it is not natively supported. 24-bit integer multiplication is natively supported via the __[u]mul24 intrinsic.

On devices of compute capability 2.0, however, 32-bit integer multiplication is natively supported, but 24-bit integer multiplication is not. __[u]mul24 is therefore implemented using multiple instructions and should not be used (Section 5.4.1).


Should the fermi_sgemm.cu routine be using __mul24? (Or perhaps there are reasons 24-bit integers are employed?)
Boxed Cylon
 
Posts: 27
Joined: Sat Nov 21, 2009 6:03 pm

Re: MAGMA GEMM Sources for Fermi Released

Postby Stan Tomov » Tue Sep 07, 2010 1:15 pm

There is no reason to use __mul24. We will remove it. Thanks for pointing this out.
Stan Tomov
 
Posts: 249
Joined: Fri Aug 21, 2009 10:39 pm

Re: MAGMA GEMM Sources for Fermi Released

Postby Allan Menezes » Sun Sep 12, 2010 11:51 pm

Dear Stan,
As this is just pointer arithmetic and used in only a few places it does not change the perfomance much at all as per my experiment below.
Just for fun I changed fermi_dgemm.cu and fermi_sgemm.cu with a single #define on top as #define __mul24(a,b) ((a)*(b)) and there was no significant difference in Gflops and err was still 0.00 on a GTX-480.
The device memory still on available fermi devices is < 4GB and is going to change in the future with the Tesla C2070 and CUDA 3.2 to 64 bit addresses.
Thank you,
Allan
Allan Menezes
 
Posts: 14
Joined: Wed Aug 05, 2009 10:01 pm

Re: MAGMA GEMM Sources for Fermi Released

Postby rramachand21 » Tue Nov 30, 2010 5:05 pm

Hello,

I am new to cuda and this api. Could I please get the source code for matrix vector multiplication (sgemv and dgemv) which is generic.

Thanks,
Ranjith
rramachand21
 
Posts: 2
Joined: Tue Nov 30, 2010 5:02 pm


Return to User discussion

Who is online

Users browsing this forum: Yahoo [Bot] and 3 guests