Thank you, it is good to know that you have some work in progress. You talk of
running MAGMA distributed across several GPUs on several nodes
without saying how this may be achieved.
Our problems are ones which can generate matrices too big to fit in the memory of even a modern CPU system with a large memory e.g. 16 Gbytes. So I want to harness more that one and at present I use Scalapack for this. We want to gain the advantage of the GPU for these large problems. I know now, largely from work with my recently acquired copy of "CUDA Application Design and Development", that I can combine CUDA and MPI and run e.g. 4 tasks on 2 CPUs each with a GPU and have the GPU kernels run work for more than one MPI task. That opens the possibility for Scalapack to take advantage of this.
I hope that you have this somewhere on your route map, and that is what I am asking. Do you envisage this as (a) worthwhile and (b) are your team going to do it or (c) is someone else going to do it and make it available?
An alternative would be for MAGMA to achieve its objective separately from Scalapack, by having its own version of e.g. BLACS. There would have to be some way to distribute the matrix across different tasks.
We are end users and not CUDA specialists, so I am reluctant to take this on here.
If we know it is coming then that will affect our choices for new hardware for our problems.
It may be that I just have to be patient.