zheevd performance: GPU == CPU??

Open discussion for MAGMA

zheevd performance: GPU == CPU??

Postby RezaRob » Sun May 27, 2012 10:49 pm

On a Fermi card(GT-430) <testing_zheevd_gpu -N 800> reports almost the exact same timing for GPU and CPU. Making the matrix size larger makes no difference either. This is on a 2-core Intel box. But testing_zgemm reports a very acceptable 20 GFLOPS. Why is that?

EDIT: I should have mentioned this is magma 1.2.0. and Ubuntu Linux 10.04.
RezaRob
 
Posts: 2
Joined: Sun May 27, 2012 10:27 pm

Re: zheevd performance: GPU == CPU??

Postby mgates3 » Wed May 30, 2012 9:45 am

The GT-430 is a consumer level card primarily intended for graphics applications like games. It's performance is fairly low. Consider that a Tesla 2050 achieves 340 Gflop/s on a zgemm, compared to the 20 Gflop/s you are reporting. In a quick test, I get 14 Gflop/s with a zgemm on 2 CPU cores (depends on CPU processor). Using the GPU adds additional overhead in copying the matrix back-and-forth to the GPU, so it is not surprising that you see no performance improvement.

-mark
mgates3
 
Posts: 329
Joined: Fri Jan 06, 2012 2:13 pm

Re: zheevd performance: GPU == CPU??

Postby RezaRob » Wed May 30, 2012 12:50 pm

Okay, many thanks for the CPU test. Good point.

However, testing_zheevd_gpu is maxing out both my CPU cores at 170% AFTER I comment out the call to lapack(at the bottom) and switch off MAGMA_TESTINGS_CHECK completely(CPU time is reported as 0.00 after doing these things.)

Why is that?! Is the GPU even being used??

PS: <Off_Topic> Games require _extremely_ high performance which they get through GTX-580 etc.

Reza.
RezaRob
 
Posts: 2
Joined: Sun May 27, 2012 10:27 pm

Re: zheevd performance: GPU == CPU??

Postby mgates3 » Wed May 30, 2012 3:24 pm

The MAGMA algorithm is hybrid, it uses both the CPU and GPU. For eigenvalues, the initial reduction to tridiagonal or Hessenberg form uses the CPU for panels and the GPU for trailing matrix multiplies and updates. The computation of eigenvalues from a tridiagonal matrix is done with LAPACK on the CPU. You can see if the tridiagonal reduction or Hessenberg reduction sees a speedup by using testing_zhetrd and testing_zgehrd, respectively.

-mark
mgates3
 
Posts: 329
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: Bing [Bot] and 2 guests

cron