1.) Is ScaLAPACK a good choice for diagonalizing a real, symmetric matrix that
might require, say 10 GB of memory when I have a 48 node cluster, each with 1
GB of memory? How about a matrix that requires almost all 48 GB?
ScaLAPACK is definetely a good choice for diagonalizing symmetric matrices.
They are three routines right now to do this in ScaLAPACK:
PDSYEV: which is based on tridiagonal QR iteration
PDSYEVD: which is based on Divide and Conquer algorithm
PDSYEVX: which is based on Bisection and Inverse Iteration
In general
PDSYEVD is the fastest. They are different studies for comparing those
three methods in different papers (see for example [1,2]).
Here are some times for PDSYEV quickly gathered to give some ideas
- Code: Select all
------------------------------------------------------------------------------------------
| | || jobz='N' || jobz='V' |
| nb_proc | n || time(s) | GFlops/s | efficiency || time(s) | GFlops/s | efficiency |
------------------------------------------------------------------------------------------
| 1 | 1,000 || 1.39 | 1.92 | 1.00 || 10.29 | 0.51 | 1.00 |
| 2 | 2,000 || 9.14 | 1.17 | 0.60 || 111.13 | 0.19 | 0.37 |
| 4 | 4,000 || 32.26 | 1.32 | 0.69 || 522.91 | 0.16 | 0.31 |
| 8 | 8,000 || 142.60 | 1.20 | 0.62 || 1657.82 | 0.21 | 0.40 |
| 16 | 16,000 || 533.90 | 1.28 | 0.67 || 8147.92 | 0.17 | 0.32 |
------------------------------------------------------------------------------------------
Hardware: Intel(R) Xeon(TM) CPU 3.20GHz using Myrinet interconnect
grid_shape = [1:LAPACK 2:1x2 4:2x2 8:2x4 16:4x4]
BLAS = ATLAS
block size of the block cyclic distribution = 80
MPI = MPICH-MX (Mpich over Myrinet)
Number of flops of PDSYEV with option 'N' (just the eigenvalues) is taken as 8/3*n^3 (see ([3, p.213]).
Number of flops of PDSYEV with option 'V' (eigenvalues and eigenvectors)is taken as 16/3*n^3 (see ([3, p.213]).
A random matrix is taken as input.
Note also that, a fourth solver based an MRRR and way faster than the three others is in testing mode right now.
We hope to release it within the year 2006.
1.)
How about a matrix that requires almost all 48 GB?
It depends on what you want, if you want the eigenvectors and the eigenvalues
(i.e. JOBZ='V') then it is not possible since you need storage for A
and Z (the eigenvectors). (Note as well that A is destroyed on output.) (If you
look to the interface of PDSYEV you'll see that it is not the same as the one of DSYEV: the
eigenvectors are not returned in the matrix but in a separate array.)
To repeat if you want in output:
- eigenvalues: you need to be able to store A and a workspace O(n), sizeof(A)~48GB
- eigenvalues + eigenvectors: you need to be able to store A, Z and a workspace O(n), sizeof(A)~24GB
- eigenvalues + eigenvectors + initial matrix: you need to be able to store A, Z, a backup for A, and a workspace O(n), sizeof(A)~18GB
2.) Do you know of other solutions and how they might compare to ScaLAPACK?
Yes they are some other solutions. Some of them are mentionned in [1a], [1b] and [2].
ScaLAPACK is in general the best choice (according to [1a], [1b] and [2]]).
[1a] A.G. Sunderland, I.J. Bush,
Parallel Eigensolver Performance. CCLRC web page.
[1b] Elena Breitmoser, Andrew G. Sunderland
A performance study of the PLAPACK and ScaLAPACK Eigensolvers on HPCx for the standard problem. EPCC Technical report HPCxTR0406, 2004.
[2] Robert C. Ward, Yihua Bai and Justin Pratt.
Performance of Parallel Eigensolvers on Electronic Structure Calculations. UT-CS Technical report #ut-cs-05-560, February 2005.
[3] Jim Demmel.
Applied Numerical Linear Algebra. SIAM, 1997.