by **mgates3** » Wed May 02, 2012 7:18 pm

Yes, magma_potrs_gpu does data transfers between the CPU and the GPU. It factors the matrix by blocks. Each diagonal block is copied to the CPU, factored there, then copied back to the GPU. On the GPU, the rest of the panel below the diagonal block is updated.

If your system is tridiagonal, you will probably achieve higher performance using the tridiagonal solver in LAPACK, pttrs, rather than converting it to a full matrix and using a dense solver in MAGMA.

You should almost never compute an explicit inverse. It is generally both more expensive and less accurate than using a factorization and forward/back solves, as in potrs or pttrs. Also, for sparse systems, the explicit inverse is generally dense.

-mark