Hello,

We are happy you find MAGMA useful for your research!

The reason for the different versions (which indeed can make use confusing!) is performance considerations. If you need to explicitly generate both

Q and

R you must use version 3. As an example of how to use it you can look in files

dgels3_gpu.cpp and

dgeqrs3_gpu.cpp. For example, you may have to call

- Code: Select all
`// get nb - in all functions has to be the same`

int nb = magma_get_dgeqrf_nb(m);

// Do the QR factorization

magma_dgeqrf3_gpu( m, n, dA, ldda, tau, dT, info );

// copy dA to dQ to generate Q in dQ

cudaMemcpy2D(dQ, m*sizeof(double), dA, ldda*sizeof(double), m*sizeof(double), n, cudaMemcpyDeviceToDevice);

// copy dA to dR to generate R in dR

cudaMemcpy2D(dR, m*sizeof(double), dA, ldda*sizeof(double), m*sizeof(double), n, cudaMemcpyDeviceToDevice);

// generate Q

magma_dorgqr_gpu(m, m, min(m, n), dQ, lda, tau, dT, nb, &info);

// generate R

magmablas_dswapdblk(min(m, n), nb, dR, lda, 1, dT+min(m,n)*nb, nb, 0);

The reason for this is that to make operations involving

Q fast we put zeroes in the upper triangular parts of the panels and ones on the diagonals (and the Householder vectors are stored below, as in LAPACK). This destroys R though and therefore we had to store it separately (in dT). To regenerate

R we provide the

magmablas_dswapdblk routine (see above). Examples on how to avoid the above copies if you just need to solve a least squares problem is given in

dgeqrs3_gpu.cpp (there we have multiplication with

Q' and solve using R, where we first generate

R in place, use it to solve, and move back the data to the original stage for direct application/use of Q if needed).

Stan