Stan, is there any plan to implement other cases for magma_dormqr_gpu?
It seems than only A = Q' * A is implemented, but my application requires computing A = A * Q.
magma_dormqr_gpu should be much faster than combining magma_dorgqr_gpu and cublasDgemm.
Thanks
