The problem is that accessing the eigenvectors using PDELGET requires
upwards of 10x more time than the original diagonalization routine (Due
to the repeated broadcasts). This extra time eliminates the advantage
of using a parallel routine. Is it possible to broadcast the
distributed matrix of eigenvectors so that all nodes have a copy of the
total matrix (similar to the allgatther operation in mpi) ?
That's still a pretty heavy operation ... If you want a parallel code that
scales, you need to figure out a way to do your step differently. Anyway
if this is really the operation you want to do, I believe you can use the
famous/infamous pdgemr2d routine.