Why is magma_ssyevd_gpu so slow?

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
Posts: 4
Joined: Wed Dec 05, 2018 6:22 am

Why is magma_ssyevd_gpu so slow?

Post by vivi_ » Fri Dec 07, 2018 8:02 am

Hi everybody.
I have run the code that you can find in paragraph 4.7.13 of the following manual:
https://developer.nvidia.com/sites/defa ... /mygpu.pdf
I just erased the lapack parts.
I launched it 5 times just to understand the order of magnitude of the elapsed time, and I get a range between 56.99 and 93.80 seconds.
I am using magma 2.3.0, on a 2*8-cores Intel Xeon E5-2630 v3 @ 2.40GHz + 2 nVidia K80 GPUs, cuda version 9.0.

If I generate a symmetric random matrix of the same size in matlab and I compute its eigenvalues and eigenvectors on my dual-core macbook air with 1,6 GHz Intel Core i5, it takes a couple of minutes.
I was expecting magma to be way faster. What function can I use instead of magma_ssyevd_gpu?
Thank you in advance

Posts: 918
Joined: Fri Jan 06, 2012 2:13 pm

Re: Why is magma_ssyevd_gpu so slow?

Post by mgates3 » Fri Dec 07, 2018 1:35 pm

What BLAS & LAPACK library are you using?
What size is your problem?
Are you timing just the magma_ssyevd_gpu call, or are you timing the runtime of the entire test program?

You can run MAGMA's testers in magma/testing to get both MAGMA and LAPACK timings. There are several versions to try (note these are single precision; use dysevd for double precision):

Code: Select all

./testing_ssyevd         -n 2000:20000:2000 -JV --lapack --niter 5
./testing_ssyevd_gpu     -n 2000:20000:2000 -JV --lapack --niter 5
./testing_ssyevdx_2stage -n 2000:20000:2000 -JV --lapack --niter 5
The 2stage should be the fastest for large problems. Flags: -n gives dimensions to try, -JV computes eigenvectors (job = V), --lapack also gives LAPACK timing results, --niter 5 repeats each test 5 times.

Here are some results for comparison, using Intel MKL for BLAS & LAPACK.

Code: Select all

[mgates@b00 testing]$ ./testing_ssyevd -n 2000:20000:2000 -JV -l
% MAGMA 2.3.0 svn compiled for CUDA capability >= 3.5, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 9020, driver 10000. OpenMP threads 20. MKL 2018.0.1, MKL threads 20.
% device 0: Tesla K40c, 745.0 MHz clock, 11441.2 MiB memory, capability 3.5

% jobz = Vectors needed, uplo = Lower, ngpu = 1
%   N   CPU Time (sec)   GPU Time (sec)   |S-S_magma|   |A-USU^H|   |I-U^H U|
 2000      0.2239           0.2327         4.88e-10        ---         ---      ok
 4000      0.8543           1.3268         9.15e-11        ---         ---      ok
 6000      4.2844           3.0909         2.71e-11        ---         ---      ok
 8000     14.1701           5.8308         6.10e-11        ---         ---      ok
10000     30.9939          11.7388         4.88e-11        ---         ---      ok
In Matlab, you can use tic and toc to time routines. Are you doing Matlab on a different system than where you are running MAGMA, it seems?


Post Reply