I work for the Numerical Algorithms Group (NAG). We've been working with a customer combining MAGMA with some of our own code and ran into this issue as well. The customer (and us!) would like to have a single MAGMA library which could be copied to multiple machines, with possibly different NVIDIA cards, and at runtime MAGMA should pick up the GPU architecture and select the correct code path to run.
As I understand it, the biggest complication (from the software engineering perspective) is the fact that MAGMA implements its own GPU BLAS functions for several BLAS algorithms, instead of calling into CUBLAS. This is no doubt for performance reasons.
1.) Do you know whether the MAGMA BLAS functions have made it into CUBLAS 4.0?
We've gone ahead and refactored/reworked several parts of the MAGMA library (basically enough to have a working Cholesky decomposition) so that it can pick up architecture at runtime and call the correct code path. It seems the easiest way to achieve this is to turn MAGMA BLAS into a "separate library" (or at least conceptually treat it that way), which has implementations for the BLAS functions you wish to override. The MAGMA BLAS function should query the device and launch the correct code path, as CUBLAS does. Throughout the code one can then make CUBLAS function calls, and in a global config header one could #define those CUBLAS functions that have been overridden, to point at the corresponding MAGMA BLAS functions. This is very similar to what MAGMA does at the moment.
2.) Do you have any feel for how much of CUBLAS MAGMA might override in the future? I imagine the set would shrink as NVIDIA incorporates the BLAS improvements that you have made.
We are quite happy to contribute the changes we've made back to the MAGMA project. However seeing as multi-architecture support is on your plan anyways, the obvious question is
3.) How far has this work progressed?
If it is almost complete, then there is probably no need. If the work is not very advanced, it might make sense to coordinate efforts and perhaps discuss a design for how best to implement the multi-architecture support. Obviously if we're going to contribute large changes like this, the MAGMA team would have to be happy with the changes. In a perfect world there would be no need for a MAGMA BLAS library, and so one might hope that in the future the MAGMA BLAS library would shrink until MAGMA only relied on CUBLAS. This is the rationale for modelling the design/behaviour of a MAGMA BLAS library on CUBLAS.
I would be very interested in any comments/questions/suggestions you may have.
Jacques du Toit