Hello everyone,

I'm doing some research on the QR factorization and its implementation on GPUs for my thesis. I want to consider MAGMA's dgeqrf_gpu / dgeqrf routine from the latest version 2.0.1 and apply one of these routines to factorize big dense matrices with more rows than columns.

My first question is: What is the difference between these two routines? Does dgeqrf_gpu run entirely on the GPU?

Furthermore, even more important, I would like to understand how the QR factorization is implemented and how the GPU-CPU communication looks like. Are there any detailed documentations or papers on this topic? So far I only found explanations of the dgeqrf routine from version 1.0.0 or 1.1.0. Does anybody also know what has changed since then?

Maybe there is a paper proposing how to improve an earlier version which is now implemented in the current version?

I hope that you can help me and I'm looking forward to trying out some calculations on the GPU using MAGMA.

Thanks,

nahla