MAGMA + pycuda + my own CUDA kernels

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Posts: 9
Joined: Thu Sep 28, 2017 4:59 am

Re: MAGMA + pycuda + my own CUDA kernels

Post by mrader1248 » Mon Oct 16, 2017 4:07 am

Just as a reminder: I want to obtain both Q and R.

When I use magma_zgeqrf2_gpu, I have direct access to R, but there is no matching function to restore Q: magma_zungqr and magma_zungqr2 both require A to be in host memory and magma_zungqr_gpu requires the dT array which I don't get from magma_zgeqrf2_gpu.

When I use magma_zgeqrf3_gpu, I can use magma_zungqr_gpu to obtain Q and the code from testing_zgeqrf_gpu.cpp to restore R?

Just as a small side question: What are the computational complexities of *geqrf* and *ungqr*? Is the complexity of *ungqr* negligible in comparison to *geqrf* (and therefore the reason, why there is only a CPU-*ungqr* for magma_zgeqrf2_gpu)?

Posts: 911
Joined: Fri Jan 06, 2012 2:13 pm

Re: MAGMA + pycuda + my own CUDA kernels

Post by mgates3 » Wed Oct 18, 2017 5:37 pm

With the currently available functions, you can use either
  • magma_zgeqrf2_gpu( dA ), copy dA to A on host, magma_zungqr2( A )
  • magma_zgeqrf3_gpu( dA, dT ), copy dA to dQ, magma_zungqr_gpu( dQ, dT ), reconstruct R in dA using bits from dT
Another option might be
  • magma_zgeqrf2_gpu( dA ), copy dA to wA on host, set dQ = identity on GPU [magmablas_zlaset( zero, one, dQ )], magma_zunmqr2_gpu( dA, dQ, wA )
That magma_zunmqr2_gpu was written for a particular use in the eigenvalue codes, so it's weird in taking both dA (on GPU) and wA (on host).

There's no particular reason that magma_zungqr2_gpu doesn't exist. We've just never needed it.

For a real, square matrix:
geqrf is 4/3 n^3 flops
ungqr is 4/3 n^3 flops
In complex, those get multiplied by about 4.
For rectangular matrices, it depends on what part of Q you want. LAPACK Working Note (LAWN) 41 has detailed flop counts for most of the routines (listed under the single-precision names: sgeqrf, sorgqr, etc.).

Often, you can use unmqr (multiply by Q) instead of ungqr (generate explicit Q), but not always.


Post Reply