Tridiagonal Solver

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

Tridiagonal Solver

Postby yushan » Mon Apr 02, 2012 6:27 am

Hi everyone,

I am looking for a tridiagonal solver on GPU, anyone has any idea? Which routine should I use?
Posts: 1
Joined: Mon Apr 02, 2012 5:30 am

Re: Tridiagonal Solver

Postby mgates3 » Tue Apr 03, 2012 1:28 pm

Magma does not have tridiagonal (or banded) solvers for the GPU. My guess is the tridiagonal solver in LAPACK (dgtsv or dptsv) on the CPU is faster than transferring a tridiagonal matrix to the GPU, solving, and transferring the results back. This is because a tridiagonal solve has O(n) operations on O(n) memory, so is memory bandwidth limited.

Why are you looking for a GPU implementation? Is the tridiagonal solver really a performance bottleneck in your code?

Posts: 717
Joined: Fri Jan 06, 2012 2:13 pm

Re: Tridiagonal Solver

Postby brom » Tue Apr 03, 2012 6:50 pm

mgates3 wrote:This is because a tridiagonal solve has O(n) operations on O(n) memory, so is memory bandwidth limited.

This isn't entirely true. Check out NVIDIA's paper on cyclic-reduction algorithms (not in MAGMA).
Posts: 18
Joined: Tue Jan 25, 2011 8:20 pm

Re: Tridiagonal Solver

Postby Stan Tomov » Tue Apr 03, 2012 9:39 pm

Actually, we had discussions with collaborators about including in MAGMA banded and tridiagonal solvers that they had already developed, but as Mark pointed out, we haven't included them yet. If the solver is needed in a CPU interface (input on CPU and output on CPU) Mark's remark is correct - by the time the matrix is only sent to the GPU through a 5 GB/s connection, the CPU would have solved the problem. In the GPU interface though one does not have to transfer data, and because GPUs have also very high bandwidth, one can solve a tridiagonal problem in speed proportional to that bandwidth.
One application that we needed these are for example eigensolvers - first reduce to tridiagonal and then solve the tridiagonal eigenproblem. If one wants to use shift and invert iteration, there would be need for fast (and many) tridiagonal linear solvers on the GPU (no data transfers between solvers).
Stan Tomov
Posts: 256
Joined: Fri Aug 21, 2009 10:39 pm

Return to User discussion

Who is online

Users browsing this forum: Majestic-12 [Bot] and 3 guests