Actually, we had discussions with collaborators about including in MAGMA banded and tridiagonal solvers that they had already developed, but as Mark pointed out, we haven't included them yet. If the solver is needed in a CPU interface (input on CPU and output on CPU) Mark's remark is correct - by the time the matrix is only sent to the GPU through a 5 GB/s connection, the CPU would have solved the problem. In the GPU interface though one does not have to transfer data, and because GPUs have also very high bandwidth, one can solve a tridiagonal problem in speed proportional to that bandwidth.
One application that we needed these are for example eigensolvers - first reduce to tridiagonal and then solve the tridiagonal eigenproblem. If one wants to use shift and invert iteration, there would be need for fast (and many) tridiagonal linear solvers on the GPU (no data transfers between solvers).