Hi everybody.
I have run the code that you can find in paragraph 4.7.13 of the following manual:
https://developer.nvidia.com/sites/defa ... /mygpu.pdf
I just erased the lapack parts.
I launched it 5 times just to understand the order of magnitude of the elapsed time, and I get a range between 56.99 and 93.80 seconds.
I am using magma 2.3.0, on a 2*8-cores Intel Xeon E5-2630 v3 @ 2.40GHz + 2 nVidia K80 GPUs, cuda version 9.0.
If I generate a symmetric random matrix of the same size in matlab and I compute its eigenvalues and eigenvectors on my dual-core macbook air with 1,6 GHz Intel Core i5, it takes a couple of minutes.
I was expecting magma to be way faster. What function can I use instead of magma_ssyevd_gpu?
Thank you in advance
Why is magma_ssyevd_gpu so slow?
Re: Why is magma_ssyevd_gpu so slow?
What BLAS & LAPACK library are you using?
What size is your problem?
Are you timing just the magma_ssyevd_gpu call, or are you timing the runtime of the entire test program?
You can run MAGMA's testers in magma/testing to get both MAGMA and LAPACK timings. There are several versions to try (note these are single precision; use dysevd for double precision):
The 2stage should be the fastest for large problems. Flags: -n gives dimensions to try, -JV computes eigenvectors (job = V), --lapack also gives LAPACK timing results, --niter 5 repeats each test 5 times.
Here are some results for comparison, using Intel MKL for BLAS & LAPACK.
In Matlab, you can use tic and toc to time routines. Are you doing Matlab on a different system than where you are running MAGMA, it seems?
-mark
What size is your problem?
Are you timing just the magma_ssyevd_gpu call, or are you timing the runtime of the entire test program?
You can run MAGMA's testers in magma/testing to get both MAGMA and LAPACK timings. There are several versions to try (note these are single precision; use dysevd for double precision):
Code: Select all
./testing_ssyevd -n 2000:20000:2000 -JV --lapack --niter 5
./testing_ssyevd_gpu -n 2000:20000:2000 -JV --lapack --niter 5
./testing_ssyevdx_2stage -n 2000:20000:2000 -JV --lapack --niter 5
Here are some results for comparison, using Intel MKL for BLAS & LAPACK.
Code: Select all
[mgates@b00 testing]$ ./testing_ssyevd -n 2000:20000:2000 -JV -l
% MAGMA 2.3.0 svn compiled for CUDA capability >= 3.5, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 9020, driver 10000. OpenMP threads 20. MKL 2018.0.1, MKL threads 20.
% device 0: Tesla K40c, 745.0 MHz clock, 11441.2 MiB memory, capability 3.5
% jobz = Vectors needed, uplo = Lower, ngpu = 1
% N CPU Time (sec) GPU Time (sec) |S-S_magma| |A-USU^H| |I-U^H U|
%============================================================================
2000 0.2239 0.2327 4.88e-10 --- --- ok
4000 0.8543 1.3268 9.15e-11 --- --- ok
6000 4.2844 3.0909 2.71e-11 --- --- ok
8000 14.1701 5.8308 6.10e-11 --- --- ok
10000 30.9939 11.7388 4.88e-11 --- --- ok
-mark