Hi everyone,

I am looking for a tridiagonal solver on GPU, anyone has any idea? Which routine should I use?

4 posts
• Page **1** of **1**

Hi everyone,

I am looking for a tridiagonal solver on GPU, anyone has any idea? Which routine should I use?

I am looking for a tridiagonal solver on GPU, anyone has any idea? Which routine should I use?

- yushan
**Posts:**1**Joined:**Mon Apr 02, 2012 5:30 am

Magma does not have tridiagonal (or banded) solvers for the GPU. My guess is the tridiagonal solver in LAPACK (dgtsv or dptsv) on the CPU is faster than transferring a tridiagonal matrix to the GPU, solving, and transferring the results back. This is because a tridiagonal solve has O(n) operations on O(n) memory, so is memory bandwidth limited.

Why are you looking for a GPU implementation? Is the tridiagonal solver really a performance bottleneck in your code?

-mark

Why are you looking for a GPU implementation? Is the tridiagonal solver really a performance bottleneck in your code?

-mark

- mgates3
**Posts:**705**Joined:**Fri Jan 06, 2012 2:13 pm

mgates3 wrote:This is because a tridiagonal solve has O(n) operations on O(n) memory, so is memory bandwidth limited.

This isn't entirely true. Check out NVIDIA's paper on cyclic-reduction algorithms (not in MAGMA).

- brom
**Posts:**18**Joined:**Tue Jan 25, 2011 8:20 pm

Actually, we had discussions with collaborators about including in MAGMA banded and tridiagonal solvers that they had already developed, but as Mark pointed out, we haven't included them yet. If the solver is needed in a CPU interface (input on CPU and output on CPU) Mark's remark is correct - by the time the matrix is only sent to the GPU through a 5 GB/s connection, the CPU would have solved the problem. In the GPU interface though one does not have to transfer data, and because GPUs have also very high bandwidth, one can solve a tridiagonal problem in speed proportional to that bandwidth.

One application that we needed these are for example eigensolvers - first reduce to tridiagonal and then solve the tridiagonal eigenproblem. If one wants to use shift and invert iteration, there would be need for fast (and many) tridiagonal linear solvers on the GPU (no data transfers between solvers).

One application that we needed these are for example eigensolvers - first reduce to tridiagonal and then solve the tridiagonal eigenproblem. If one wants to use shift and invert iteration, there would be need for fast (and many) tridiagonal linear solvers on the GPU (no data transfers between solvers).

- Stan Tomov
**Posts:**256**Joined:**Fri Aug 21, 2009 10:39 pm

4 posts
• Page **1** of **1**

Users browsing this forum: Bing [Bot] and 1 guest