The paper http://www.hpca.uji.es/ficheros/remon/pdp11.pdf discusses how matrix inversion can be perfomed efficiently using Gauss-Jordan algorithm. It uses cuBLAS for L3 operations. They have reported impressive speedups and I hope they are talking only about single precision.

Any idea on how MAGMA might help in Inversion process? Should we go the LU way and solve for Identity Matrix (like TRSM)?

I assume we could make use of multi-GPUs for the LU...So, at least that should make it a bit easier...

