Hello!

I was looking around at options for generalized SVD solvers for CUDA, and MAGMA really seems to take the cake. However, I'm not sure it's really appropriate for my (modest) purposes.

I'm investigating whether or not I should move my current (Eigen based) CPU solve for small (640x480 point cloud alignment) solves to the GPU.

The questions:

1. Even if it were the case that I had the problem initialized on the GPU, were I to use MAGMA I must let MAGMA perform the CPU -> GPU copy, correct? This is not a complaint, I'm just trying to make sure I understand.

2. Does MAGMA allow users to specify streams on which copy / compute operations can be performed on?

3. For this small of a problem, does it even make sense to want to port this to the GPU? After researching / looking at some implementations, it seems that a generalized SVD solve is really just not a good GPU problem anyway.

Thank you for any advice, I apologize if this question is not appropriate for this forum.

## High level advice: is MAGMA for me?

### Re: High level advice: is MAGMA for me?

Some answers:

1. Yes, currently MAGMA's SVD takes the matrix in CPU host memory, and internally allocates GPU memory and copies the matrix to the GPU as needed.

2. No, MAGMA's SVD internally creates its own CUDA streams. The SVD is synchronous. (Some of MAGMA routines do take a MAGMA queue, which wraps a CUDA stream.)

3. Is your matrix 640 x 480? That's pretty small -- don't expect much if any performance boost from the GPU.

If you are using Eigen's Jacobi SVD, switching to LAPACK (e.g., dgesdd for Divide and Conquer, or dgejsv if you really want Jacobi) will give you a substantial performance boost. My tests of Eigen's accuracy shows that it is not any more accurate than LAPACK's dgesvd or dgesdd routines. (One-sided Jacobi in dgejsv is more accurate for certain types of poorly scaled matrices.)

-mark

1. Yes, currently MAGMA's SVD takes the matrix in CPU host memory, and internally allocates GPU memory and copies the matrix to the GPU as needed.

2. No, MAGMA's SVD internally creates its own CUDA streams. The SVD is synchronous. (Some of MAGMA routines do take a MAGMA queue, which wraps a CUDA stream.)

3. Is your matrix 640 x 480? That's pretty small -- don't expect much if any performance boost from the GPU.

If you are using Eigen's Jacobi SVD, switching to LAPACK (e.g., dgesdd for Divide and Conquer, or dgejsv if you really want Jacobi) will give you a substantial performance boost. My tests of Eigen's accuracy shows that it is not any more accurate than LAPACK's dgesvd or dgesdd routines. (One-sided Jacobi in dgejsv is more accurate for certain types of poorly scaled matrices.)

-mark