Why "magma_dsyevd" performs better with parallel MKL?

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
xinwu
Posts: 8
Joined: Fri Jun 24, 2011 9:22 am

Why "magma_dsyevd" performs better with parallel MKL?

Post by xinwu » Thu Jun 30, 2011 9:05 am

Hi, everyone!

I succeeded in compiling MAGMA. But for testing purpose, the parallel linked "testing_dsyevd" is faster than the sequential linked binary on GPU, why it is that? Does "magma_dsyevd" have something to run on CPU?

Code: Select all

#
# this is a sequential linked binary
#
./testing_dsyevd -N 4000
device 0: Tesla C2070, 1147.0 MHz clock, 5375.2 MB memory
  testing_dsyevd -N 4000



  N     CPU Time(s)    GPU Time(s)     ||R||_F / ||A||_F
==========================================================
 4000      29.51          11.62         4.113991e-16 2.838989e-13
#
# this is a parallel linked binary
#
./testing_dsyevd -N 4000
device 0: Tesla C2070, 1147.0 MHz clock, 5375.2 MB memory
  testing_dsyevd -N 4000



  N     CPU Time(s)    GPU Time(s)     ||R||_F / ||A||_F
==========================================================
 4000       9.60           7.45         2.607371e-16 4.292615e-13

the parallel link was

Code: Select all

-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread
the sequential link was

Code: Select all

-lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread

xinwu
Posts: 8
Joined: Fri Jun 24, 2011 9:22 am

Re: Why "magma_dsyevd" performs better with parallel MKL?

Post by xinwu » Thu Jun 30, 2011 9:55 am

I took a look at the source code, and I now finally understand that "magma_dsyevd" is a hybrid function of both CPU and GPU. So the linker options affect the performance.

Stan Tomov
Posts: 263
Joined: Fri Aug 21, 2009 10:39 pm

Re: Why "magma_dsyevd" performs better with parallel MKL?

Post by Stan Tomov » Mon Jul 04, 2011 3:07 pm

Hi,
Actually, most of the MAGMA algorithms are hybrid.
In particular, for the dsyevd algorithm, the most time consuming part is the reduction to tridiagonal (dsytrd). The dsytrd becomes memory bound for large matrices (e.g., above ~2048), so the magma dsytrd will call CPU dsytrd (e.g., from MKL) for the small matrices and switch to hybrid code for larger ones.
Stan

Post Reply