LAPACK and ScaLAPACK Survey Results - ordered by question

[view answers grouped by response]


ScaLAPACK Usage


Question #13. How could the ScaLAPACK interface be improved to feel more natural to your application and implementation language?

Responses
Would be nice if there are tools or example to help in setting up the matrix, distributing data, reading/writing the matrix
A consistent interface for the QR routines with pivoting would be most useful. The public version of QR routines seem to fail for one-column matrices.
Same as LAPACK: F90/95 interfaces, overload GET RID OF BLACS: Use MPI instead Memory for diagonalizers often seems an awful lot, and sometimes stops jobs that I think should run from running.
Better interface to Blacs. The redist utils are a great start, but they're poorly documented, and frequently when the system administrators install scalapack they don't even know that they should also build redist.
64-bit integer arguments are needed (Fortran Integer*8 or C/C++ long long).
A more extensive, and *universal*, pblas library. Portability of code is of paramount concern (for maintainability), and many of the support routines are not universal, and therefore the underlying distributed data structures need to be deciphered and the necessary functions hand-coded. Very ppu (personal processing unit) time-consuming.
It is very inconvenient to prepare the data for ScaLAPACK. Should be more natural, at least as natural as lapack.
Some examples that call ScaLAPACK from C would be helpful.
The required data layout was quite diffcult to implement.
The same: f77/C interfaces are sufficient.
1) There should be a functional call to generate a global matrix descriptor using either a BLACS context or an MPI communicator. 2) Another function should generate a BLACS gridmap from an MPI communicator and vice versa. 3) Maybe the BLACS context should be taken out of the global matrix descriptor
It is hard to sort out precisely how to split up and send a matrix to different processors. For example, in a situation where the matrix can be stored on one node, but ScaLapack is being used for speed-up of the linear algebra operations, there is no simple way to send the matrix to the nodes, perform the computation and get the matrix sent back. Having a routine to do that would be very useful, especially if it would automatically determine the optimal number of processors for the ScaLapack routines.
object oriented
na
The symmetric packed
Automated workspace, more obvious routine names.
The interface used by PLAPACK, which allows submatrices to be submitted in a transparent fashion, is far superior to the ScaLAPACK interface.
Same as LAPACK: desperately needed abstraction from details, memory allocation, etc.
Getting matrix packings right is quite annoying.
Give a set of routines to help distribute the data. The packed format is for us a key feature.
The matrix layout and communication seems difficult.
The problem is that once you've used ScaLAPACK for a while, you get used to the shortcomings of the package. Often people who have used other packages (Trilinos subsets, PLAPACK, Global Array, etc.) are not willing to get used to things. Any step towards an interface like these would be helpful, especially to new users.
see above
Heavy use of C++
The data distribution is difficult. PLAPACK allows this to happen in a much more natural way where the user does not have to worry about placing the data on the nodes (in the case of clusters), PLAPACK does it in the background for you.
Some object-orientation in terms of matrix types (maybe bundling the array descriptors with the matrix, etc.) would be useful from a C++ application's perspective.
similar to the call in serial jobs
It's not the interface per se that causes me problems, it's the functionality. I need a tridiagonal or band diagonal matrix solver which will allow for a two-D data distribution and dedicated IO nodes. The ScaLAPACK-based IBM library PESSL only allows for a 1D data decomposition, as (I think) does ScaLAPACK itself. Several years ago I got a 2D data decomposition to work on a Hitachi system by passing MPI communicators to the BLACS grid initialization routines. It would be great if this were standardized across platforms.
The key problem of using ScalaPACK (as others similar libraries) is data distribution. Block cyclic distribution is usually hard to stick on during a calculation. Perhaps, enriching the type of data distribution or letting users define/describe their own data distribution could be a more natural way (but at what cost ?).
Get rid of the dependance on BLACS. BLACS contexts in particular are unwieldy and difficult to use and understand.
I think that the use of block cyclic distribution of dense matrices is a little bit complicated. Thus it woluld be nice to find routines that assist do distribute matrics.
I think that the use of block cyclic distribution of dense matrices is a little bit complicated. Thus it woluld be nice to find routines that assist do distribute matrics.
The most major problem is the errors that are given when the workspace is too small. The message that comes from ScaLAPACK is often incorrect and says that the problem is due to an incorrect argument to a routine. I would also like to use ScaLAPACK in a way that allowed several parallel diagonalisations to be carried out in parallel. This is mentioned in the BLACS documentation but does not seem to work.
Make it easier to use subsets of MPI_world in parallel LAPACK operations.








Thu May 23 14:17:31 2013
0 seconds