Hi everyone,
I am looking for a tridiagonal solver on GPU, anyone has any idea? Which routine should I use?
Tridiagonal Solver
Re: Tridiagonal Solver
Magma does not have tridiagonal (or banded) solvers for the GPU. My guess is the tridiagonal solver in LAPACK (dgtsv or dptsv) on the CPU is faster than transferring a tridiagonal matrix to the GPU, solving, and transferring the results back. This is because a tridiagonal solve has O(n) operations on O(n) memory, so is memory bandwidth limited.
Why are you looking for a GPU implementation? Is the tridiagonal solver really a performance bottleneck in your code?
mark
Why are you looking for a GPU implementation? Is the tridiagonal solver really a performance bottleneck in your code?
mark
Re: Tridiagonal Solver
This isn't entirely true. Check out NVIDIA's paper on cyclicreduction algorithms (not in MAGMA).mgates3 wrote:This is because a tridiagonal solve has O(n) operations on O(n) memory, so is memory bandwidth limited.

 Posts: 279
 Joined: Fri Aug 21, 2009 10:39 pm
Re: Tridiagonal Solver
Actually, we had discussions with collaborators about including in MAGMA banded and tridiagonal solvers that they had already developed, but as Mark pointed out, we haven't included them yet. If the solver is needed in a CPU interface (input on CPU and output on CPU) Mark's remark is correct  by the time the matrix is only sent to the GPU through a 5 GB/s connection, the CPU would have solved the problem. In the GPU interface though one does not have to transfer data, and because GPUs have also very high bandwidth, one can solve a tridiagonal problem in speed proportional to that bandwidth.
One application that we needed these are for example eigensolvers  first reduce to tridiagonal and then solve the tridiagonal eigenproblem. If one wants to use shift and invert iteration, there would be need for fast (and many) tridiagonal linear solvers on the GPU (no data transfers between solvers).
One application that we needed these are for example eigensolvers  first reduce to tridiagonal and then solve the tridiagonal eigenproblem. If one wants to use shift and invert iteration, there would be need for fast (and many) tridiagonal linear solvers on the GPU (no data transfers between solvers).