1.) Is ScaLAPACK a good choice for diagonalizing a real, symmetric matrix that

might require, say 10 GB of memory when I have a 48 node cluster, each with 1

GB of memory? How about a matrix that requires almost all 48 GB?

ScaLAPACK is definetely a good choice for diagonalizing symmetric matrices.

They are three routines right now to do this in ScaLAPACK:

PDSYEV: which is based on tridiagonal QR iteration

PDSYEVD: which is based on Divide and Conquer algorithm

PDSYEVX: which is based on Bisection and Inverse Iteration

In general

PDSYEVD is the fastest. They are different studies for comparing those

three methods in different papers (see for example [1,2]).

Here are some times for PDSYEV quickly gathered to give some ideas

- Code: Select all
` ------------------------------------------------------------------------------------------`

| | || jobz='N' || jobz='V' |

| nb_proc | n || time(s) | GFlops/s | efficiency || time(s) | GFlops/s | efficiency |

------------------------------------------------------------------------------------------

| 1 | 1,000 || 1.39 | 1.92 | 1.00 || 10.29 | 0.51 | 1.00 |

| 2 | 2,000 || 9.14 | 1.17 | 0.60 || 111.13 | 0.19 | 0.37 |

| 4 | 4,000 || 32.26 | 1.32 | 0.69 || 522.91 | 0.16 | 0.31 |

| 8 | 8,000 || 142.60 | 1.20 | 0.62 || 1657.82 | 0.21 | 0.40 |

| 16 | 16,000 || 533.90 | 1.28 | 0.67 || 8147.92 | 0.17 | 0.32 |

------------------------------------------------------------------------------------------

Hardware: Intel(R) Xeon(TM) CPU 3.20GHz using Myrinet interconnect

grid_shape = [1:LAPACK 2:1x2 4:2x2 8:2x4 16:4x4]

BLAS = ATLAS

block size of the block cyclic distribution = 80

MPI = MPICH-MX (Mpich over Myrinet)

Number of flops of PDSYEV with option 'N' (just the eigenvalues) is taken as 8/3*n^3 (see ([3, p.213]).

Number of flops of PDSYEV with option 'V' (eigenvalues and eigenvectors)is taken as 16/3*n^3 (see ([3, p.213]).

A random matrix is taken as input.

Note also that, a fourth solver based an MRRR and way faster than the three others is in testing mode right now.

We hope to release it within the year 2006.

1.)

How about a matrix that requires almost all 48 GB?

It depends on what you want, if you want the eigenvectors and the eigenvalues

(i.e. JOBZ='V') then it is not possible since you need storage for A

and Z (the eigenvectors). (Note as well that A is destroyed on output.) (If you

look to the interface of PDSYEV you'll see that it is not the same as the one of DSYEV: the

eigenvectors are not returned in the matrix but in a separate array.)

To repeat if you want in output:

- eigenvalues: you need to be able to store A and a workspace O(n), sizeof(A)~48GB
- eigenvalues + eigenvectors: you need to be able to store A, Z and a workspace O(n), sizeof(A)~24GB
- eigenvalues + eigenvectors + initial matrix: you need to be able to store A, Z, a backup for A, and a workspace O(n), sizeof(A)~18GB

2.) Do you know of other solutions and how they might compare to ScaLAPACK?

Yes they are some other solutions. Some of them are mentionned in [1a], [1b] and [2].

ScaLAPACK is in general the best choice (according to [1a], [1b] and [2]]).

[1a] A.G. Sunderland, I.J. Bush,

Parallel Eigensolver Performance. CCLRC web page.

[1b] Elena Breitmoser, Andrew G. Sunderland

A performance study of the PLAPACK and ScaLAPACK Eigensolvers on HPCx for the standard problem. EPCC Technical report HPCxTR0406, 2004.

[2] Robert C. Ward, Yihua Bai and Justin Pratt.

Performance of Parallel Eigensolvers on Electronic Structure Calculations. UT-CS Technical report #ut-cs-05-560, February 2005.

[3] Jim Demmel.

Applied Numerical Linear Algebra. SIAM, 1997.