Tridiagonal Solver

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
Posts: 1
Joined: Mon Apr 02, 2012 5:30 am

Tridiagonal Solver

Post by yushan » Mon Apr 02, 2012 6:27 am

Hi everyone,

I am looking for a tridiagonal solver on GPU, anyone has any idea? Which routine should I use?

Posts: 916
Joined: Fri Jan 06, 2012 2:13 pm

Re: Tridiagonal Solver

Post by mgates3 » Tue Apr 03, 2012 1:28 pm

Magma does not have tridiagonal (or banded) solvers for the GPU. My guess is the tridiagonal solver in LAPACK (dgtsv or dptsv) on the CPU is faster than transferring a tridiagonal matrix to the GPU, solving, and transferring the results back. This is because a tridiagonal solve has O(n) operations on O(n) memory, so is memory bandwidth limited.

Why are you looking for a GPU implementation? Is the tridiagonal solver really a performance bottleneck in your code?


Posts: 18
Joined: Tue Jan 25, 2011 8:20 pm

Re: Tridiagonal Solver

Post by brom » Tue Apr 03, 2012 6:50 pm

mgates3 wrote:This is because a tridiagonal solve has O(n) operations on O(n) memory, so is memory bandwidth limited.
This isn't entirely true. Check out NVIDIA's paper on cyclic-reduction algorithms (not in MAGMA).

Stan Tomov
Posts: 281
Joined: Fri Aug 21, 2009 10:39 pm

Re: Tridiagonal Solver

Post by Stan Tomov » Tue Apr 03, 2012 9:39 pm

Actually, we had discussions with collaborators about including in MAGMA banded and tridiagonal solvers that they had already developed, but as Mark pointed out, we haven't included them yet. If the solver is needed in a CPU interface (input on CPU and output on CPU) Mark's remark is correct - by the time the matrix is only sent to the GPU through a 5 GB/s connection, the CPU would have solved the problem. In the GPU interface though one does not have to transfer data, and because GPUs have also very high bandwidth, one can solve a tridiagonal problem in speed proportional to that bandwidth.
One application that we needed these are for example eigensolvers - first reduce to tridiagonal and then solve the tridiagonal eigenproblem. If one wants to use shift and invert iteration, there would be need for fast (and many) tridiagonal linear solvers on the GPU (no data transfers between solvers).

Post Reply