It seems that the right routines for doing this in MAGMA are magma_dgeqrf_gpu and magma_dorgqr_gpu.
However, the R matrix computed by magma_dgeqrf_gpu is not the same as returned by magma_dgeqrf2_gpu
and LAPACK. Does magma_dgeqrf_gpu use a different scheme to store R?
Also, which is the purpose of magma_dgeqrf3_gpu? The documentation talks about some parts of R stored
separately, but this is not quite clear to me.
Thanks for your wonderful library, its source code is an excellent way of learning how to use the GPU.
We are happy you find MAGMA useful for your research!
The reason for the different versions (which indeed can make use confusing!) is performance considerations. If you need to explicitly generate both Q and R you must use version 3. As an example of how to use it you can look in files dgels3_gpu.cpp and dgeqrs3_gpu.cpp. For example, you may have to call
Code: Select all
// get nb - in all functions has to be the same int nb = magma_get_dgeqrf_nb(m); // Do the QR factorization magma_dgeqrf3_gpu( m, n, dA, ldda, tau, dT, info ); // copy dA to dQ to generate Q in dQ cudaMemcpy2D(dQ, m*sizeof(double), dA, ldda*sizeof(double), m*sizeof(double), n, cudaMemcpyDeviceToDevice); // copy dA to dR to generate R in dR cudaMemcpy2D(dR, m*sizeof(double), dA, ldda*sizeof(double), m*sizeof(double), n, cudaMemcpyDeviceToDevice); // generate Q magma_dorgqr_gpu(m, m, min(m, n), dQ, lda, tau, dT, nb, &info); // generate R magmablas_dswapdblk(min(m, n), nb, dR, lda, 1, dT+min(m,n)*nb, nb, 0);
Also, I'm attempting to compile my code and keep getting an error that there is an undefined reference to magma_stream in libmagmablas.a. Any thoughts?
Code: Select all
cudaStream_t magma_stream = 0;
Code: Select all
extern cudaStream_t magma_stream;
From looking at the code magmablas_dswapdblk swaps values from dR and dT. It also seems to me from looking at the code and your reply that should I compute Q prior to grabbing R from dT I would destroy the information regarding R that is stored in dT, but that is precisely what is done in the code example above. Additionally, since magmablas_dswapdblk swaps values if I were to call this prior to computing Q I would destroy any information in dT that I would need to compute Q. Therefore, shouldn't it be dT that is temporarily copied, and not dA?
Lastly, I still don't understand why I continue to get an error stating that magma_stream is undefined. I've stripped testing_sgeqrf_gpu.cpp and testing_sorgqr.cpp to their respective bare minimums, and it seems the only header file that is necessary to execute magma_sgeqrf_gpu and magma_sorgqr_gpu is magma.h Additionally, I am linking all the same libraries when I try to compile my code as when I compile the two testing files. The testing files compile and run, but I continue to receive the same error.
As always, your help is much appreciated.
Thank you very much for your help, it works wonderfully.
I had the same problem, the linker is not able to solve the circular dependencies between magmablas and magma.
Try to link using something like this: -L$(MAGMAPATH)/lib -lmagma -lmagmablas -lmagma
You can try out the suggestion about the copies; I am not sure I understand what you need to keep and generate. The example in the code is working, but it may have copies that you would not need in your specific case; it is just to show how to use the routines. Another example is solving the least squares problem Q R x = b (see files dgels3_gpu.cpp and dgeqrs3_gpu.cpp) without generating Q (only the application of Q^T) and generating R in order to do a triangular solve.
Thanks, that did the trick.
Thanks for all your help. I was able to figure out how to read off just the diagonal. I was wondering how you guys would like me to cite the use of MAGMA in a publication.
Here are a few publications that we use for the one-sided factorizations and solvers, the two-sided factorizations and solvers, and MAGMA BLAS.
S. Tomov, J. Dongarra, and M. Baboulin, Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Computing , Vol. 36, No 5&6, pp. 232-240 (2010).
S. Tomov, R. Nath, and J. Dongarra, Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing, Parallel Computing, vol. 36, number 12, December 2010, pp 645 – 654.
R. Nath, S. Tomov, and J. Dongarra, An Improved MAGMA GEMM for Fermi GPUs, International Journal of High Performance Computing Applications, volume 24, number 4, 2010, pp 511-515, ISSN 1094-3420.
Here is also a site with the MAGMA-related publications.