## potrs : does it do host-gpu memory transfers

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

### potrs : does it do host-gpu memory transfers

Hi

I want to know if the magam_dpotrs_gpu function does involve some GPU-CPU and back to GPU memory transfers? I ask this since the system we solve (in dpotrs) is more suited for sequential calculation than parallel.

i am using magma 1.1 with cuda 4.1 on tesla c2070 with a quad core CPU.
rohit
itabhiyanta

Posts: 11
Joined: Thu Jul 01, 2010 12:12 pm

### Re: potrs : does it do host-gpu memory transfers

the system we are trying to solve is a tridiagonal sysem.

Ey=s. Here E is tridiagonal [diagonal offsets (-1,0,1)]. If Eis d*d matrix then y and s are d*1 vectors.

I request the moderators to please answer this question since i am using MAGMA in my project and i have to submit a paper where i refer to MAGMABLAS use. I see considerable difference in solving the aforementioned system when say i solve it by calculating the explicit inverse of E (y=E^{-1}s). Then i use the dgemv function in magma and the difference is substantial between solving the tridiagonal system and the matrix vector multiplication in case of solving with the explicit inverse.

I assume it is because part of the work is done on the CPU or on the GPU but sequentially. Could you please suggest/advise me on this assumption?

thanks and regards

rohit
itabhiyanta

Posts: 11
Joined: Thu Jul 01, 2010 12:12 pm

### Re: potrs : does it do host-gpu memory transfers

Yes, magma_potrs_gpu does data transfers between the CPU and the GPU. It factors the matrix by blocks. Each diagonal block is copied to the CPU, factored there, then copied back to the GPU. On the GPU, the rest of the panel below the diagonal block is updated.

If your system is tridiagonal, you will probably achieve higher performance using the tridiagonal solver in LAPACK, pttrs, rather than converting it to a full matrix and using a dense solver in MAGMA.

You should almost never compute an explicit inverse. It is generally both more expensive and less accurate than using a factorization and forward/back solves, as in potrs or pttrs. Also, for sparse systems, the explicit inverse is generally dense.

-mark
mgates3

Posts: 782
Joined: Fri Jan 06, 2012 2:13 pm