Why "magma_dsyevd" performs better with parallel MKL?

Open discussion for MAGMA

Why "magma_dsyevd" performs better with parallel MKL?

Postby xinwu » Thu Jun 30, 2011 9:05 am

Hi, everyone!

I succeeded in compiling MAGMA. But for testing purpose, the parallel linked "testing_dsyevd" is faster than the sequential linked binary on GPU, why it is that? Does "magma_dsyevd" have something to run on CPU?

Code: Select all
#
# this is a sequential linked binary
#
./testing_dsyevd -N 4000
device 0: Tesla C2070, 1147.0 MHz clock, 5375.2 MB memory
  testing_dsyevd -N 4000



  N     CPU Time(s)    GPU Time(s)     ||R||_F / ||A||_F
==========================================================
 4000      29.51          11.62         4.113991e-16 2.838989e-13
#
# this is a parallel linked binary
#
./testing_dsyevd -N 4000
device 0: Tesla C2070, 1147.0 MHz clock, 5375.2 MB memory
  testing_dsyevd -N 4000



  N     CPU Time(s)    GPU Time(s)     ||R||_F / ||A||_F
==========================================================
 4000       9.60           7.45         2.607371e-16 4.292615e-13



the parallel link was
Code: Select all
-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread


the sequential link was
Code: Select all
-lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
xinwu
 
Posts: 8
Joined: Fri Jun 24, 2011 9:22 am

Re: Why "magma_dsyevd" performs better with parallel MKL?

Postby xinwu » Thu Jun 30, 2011 9:55 am

I took a look at the source code, and I now finally understand that "magma_dsyevd" is a hybrid function of both CPU and GPU. So the linker options affect the performance.
xinwu
 
Posts: 8
Joined: Fri Jun 24, 2011 9:22 am

Re: Why "magma_dsyevd" performs better with parallel MKL?

Postby Stan Tomov » Mon Jul 04, 2011 3:07 pm

Hi,
Actually, most of the MAGMA algorithms are hybrid.
In particular, for the dsyevd algorithm, the most time consuming part is the reduction to tridiagonal (dsytrd). The dsytrd becomes memory bound for large matrices (e.g., above ~2048), so the magma dsytrd will call CPU dsytrd (e.g., from MKL) for the small matrices and switch to hybrid code for larger ones.
Stan
Stan Tomov
 
Posts: 251
Joined: Fri Aug 21, 2009 10:39 pm


Return to User discussion

Who is online

Users browsing this forum: Bing [Bot] and 3 guests