Page 1 of 1

### Simple question about diagonalization (I think).

Posted: Wed Dec 21, 2005 9:14 am
Hi -
I've been using LAPACK (with Matrix Toolkits for Java) on a Beowulf cluster. My problem involves diagonalizing (and finding the inverse) of large matrices. Currently, I work with matrices of about 3200 X 3200 and I get great performance (one matrix per node). I'd like to be able to do larger matrices (no real limit on how large ). If anyone could give me some feedback on these questions, I'd appreciate it:

1.) Is ScaLAPACK a good choice for diagonalizing a real, symmetric matrix that might require, say 10 GB of memory when I have a 48 node cluster, each with 1 GB of memory? How about a matrix that requires almost all 48 GB?

2.) Do you know of other solutions and how they might compare to ScaLAPACK?

Thanks!

Posted: Thu Jan 12, 2006 12:25 pm
1.) Is ScaLAPACK a good choice for diagonalizing a real, symmetric matrix that
might require, say 10 GB of memory when I have a 48 node cluster, each with 1
GB of memory? How about a matrix that requires almost all 48 GB?

ScaLAPACK is definetely a good choice for diagonalizing symmetric matrices.
They are three routines right now to do this in ScaLAPACK:
PDSYEV: which is based on tridiagonal QR iteration
PDSYEVD: which is based on Divide and Conquer algorithm
PDSYEVX: which is based on Bisection and Inverse Iteration

In general PDSYEVD is the fastest. They are different studies for comparing those
three methods in different papers (see for example [1,2]).

Here are some times for PDSYEV quickly gathered to give some ideas

Code: Select all
` ------------------------------------------------------------------------------------------|         |        ||             jobz='N'             ||             jobz='V'             || nb_proc |   n    || time(s) | GFlops/s | efficiency  || time(s) | GFlops/s | efficiency  | ------------------------------------------------------------------------------------------|     1   |  1,000 ||    1.39 |    1.92  |    1.00     ||   10.29 |    0.51  |    1.00     ||     2   |  2,000 ||    9.14 |    1.17  |    0.60     ||  111.13 |    0.19  |    0.37     ||     4   |  4,000 ||   32.26 |    1.32  |    0.69     ||  522.91 |    0.16  |    0.31     ||     8   |  8,000 ||  142.60 |    1.20  |    0.62     || 1657.82 |    0.21  |    0.40     ||    16   | 16,000 ||  533.90 |    1.28  |    0.67     || 8147.92 |    0.17  |    0.32     | ------------------------------------------------------------------------------------------`

Hardware: Intel(R) Xeon(TM) CPU 3.20GHz using Myrinet interconnect
grid_shape = [1:LAPACK 2:1x2 4:2x2 8:2x4 16:4x4]
BLAS = ATLAS
block size of the block cyclic distribution = 80
MPI = MPICH-MX (Mpich over Myrinet)
Number of flops of PDSYEV with option 'N' (just the eigenvalues) is taken as 8/3*n^3 (see ([3, p.213]).
Number of flops of PDSYEV with option 'V' (eigenvalues and eigenvectors)is taken as 16/3*n^3 (see ([3, p.213]).
A random matrix is taken as input.

Note also that, a fourth solver based an MRRR and way faster than the three others is in testing mode right now.
We hope to release it within the year 2006.

1.)
How about a matrix that requires almost all 48 GB?

It depends on what you want, if you want the eigenvectors and the eigenvalues
(i.e. JOBZ='V') then it is not possible since you need storage for A
and Z (the eigenvectors). (Note as well that A is destroyed on output.) (If you
look to the interface of PDSYEV you'll see that it is not the same as the one of DSYEV: the
eigenvectors are not returned in the matrix but in a separate array.)

To repeat if you want in output:
1. eigenvalues: you need to be able to store A and a workspace O(n), sizeof(A)~48GB
2. eigenvalues + eigenvectors: you need to be able to store A, Z and a workspace O(n), sizeof(A)~24GB
3. eigenvalues + eigenvectors + initial matrix: you need to be able to store A, Z, a backup for A, and a workspace O(n), sizeof(A)~18GB

2.) Do you know of other solutions and how they might compare to ScaLAPACK?

Yes they are some other solutions. Some of them are mentionned in [1a], [1b] and [2].
ScaLAPACK is in general the best choice (according to [1a], [1b] and [2]]).

[1a] A.G. Sunderland, I.J. Bush, Parallel Eigensolver Performance. CCLRC web page.

[1b] Elena Breitmoser, Andrew G. Sunderland A performance study of the PLAPACK and ScaLAPACK Eigensolvers on HPCx for the standard problem. EPCC Technical report HPCxTR0406, 2004.

[2] Robert C. Ward, Yihua Bai and Justin Pratt. Performance of Parallel Eigensolvers on Electronic Structure Calculations. UT-CS Technical report #ut-cs-05-560, February 2005.

[3] Jim Demmel. Applied Numerical Linear Algebra. SIAM, 1997.

Posted: Tue Jan 17, 2006 6:50 pm
Thanks!

That's a great bunch of help!

-tom

Posted: Sat Apr 01, 2006 8:55 am
Julien is a bit vague about alternatives for this operation in answering part 2) of your question.

PLAPACK has a friendlier using interface and is competitive with ScaLAPACK, especially for very large matrices.

The reports that Julien quotes only ran relatively small problem sizes. This was justified because they only wanted to run "real" problems. However, if your real problems are always relatively small (say <20,000) then you really should not be running them on a distributed memory architecture. You would be much better off investing in an SMP.

For details on PLAPACK's eigensolver, see
Paolo Bientinesi, Inderjit S. Dhillon, and Robert A. van de Geijn. A Parallel Eigensolver for Dense Symmetric Matrices Based on Multiple Relatively Robust Representations. SIAM Journal on Scientific Computing , Vol. 27, No. 1, 2005.
(draft available from http://www.cs.utexas.edu/users/flame/pubs.html)