I am relatively new to this topic, so the question may be too primitive, sorry for that.
I am working on diagonalizing huge matrices with the use of ScaLAPACK PDSYEVX subroutine. My problem is that I read the matrix from the file without parallelizing this process, so in MPI run every processor reads the full matrix and only then the matrix is being redistributed after I create the processor grid. And as soon as my matrix grows big enough, the program stops, since, as far as I understand, the local processor memory is exceeded, or the overall node memory is exceeded due to every processor on the node reads the matrix.
My question is what would be the best way to solve the problem: read the whole matrix only on one processor and then redistribute the local arrays to other processors, or read only the needed parts of matrix by each processor from the beginning? The latter seems a bit more complicated for me, since the matrix is stored not even in it's full form (only non-zero elements), although I know the algorithm to retrieve the whole matrix out of it. Or may be there are ways of using shared memory at MPI parallelization (like OpenMP parallelization) so that I can read the whole matrix once but it will be accessible to all the processors on the node?
Thank you very much in advance,