MAGMA 1.1 Released
Re: MAGMA 1.1 Released
On the release announcement http://icl.cs.utk.edu/magma/news/news.html?id=278 it seems to indicate that version 1.1 is capable of performing QR factorization on the GPU as well as multiple GPUs, but under the Magma 1.1 computational routines http://icl.cs.utk.edu/projectsfiles/mag ... ines2.png QR factorization is listed as a CPU only function. Which is it?

 Posts: 264
 Joined: Fri Aug 21, 2009 10:39 pm
Re: MAGMA 1.1 Released
"CPU" does not mean that only CPUs would be used  it means "CPU interface" (the input data and the output result is expected to be on the CPU memory). The "GPU" or "GPU interface" means that the input matrix as well as the output is on the GPU memory. In either case both the GPUs and the CPUs are used.
For the case of QR, if you have more than one GPU, you can set environment variable MAGMA_NUM_GPUS to the number of GPUs you would like to use. For example, setting
will result in using 4 GPUs in subsequent calls to magma_{s,d,c,z}geqrf.
For the case of QR, if you have more than one GPU, you can set environment variable MAGMA_NUM_GPUS to the number of GPUs you would like to use. For example, setting
Code: Select all
setenv MAGMA_NUM_GPUS 4
Re: MAGMA 1.1 Released
Thanks Stan. If this is the case it would seem data input to magma_sgeqrf functions would have to all reside on CPU memory, but in testing_sgeqrf_gpu.cpp it appears to me that d_A is in device memory. So does SGEQRF also have a GPU interface in which case the "Computation Routines in Magma 1.1" table I linked earlier needs to be updated?
Also, what is the difference between sgeqrf_gpu, sgeqrf2_gpu, and sgeqrf3_gpu?
Also, what is the difference between sgeqrf_gpu, sgeqrf2_gpu, and sgeqrf3_gpu?

 Posts: 264
 Joined: Fri Aug 21, 2009 10:39 pm
Re: MAGMA 1.1 Released
Yes, this is a typo  QR has both CPU and GPU interface. Thanks for pointing this out. We will fix it.
Regarding the different versions, sgeqrf2_gpu is LAPACK consistent in terms of input and output data layout. The sgeqrf_gpu version stores the triangular matrices used in the factorization. sgeqrf3_gpu stores the triangular matrices but also modifies the storage for the Householder vectors used in the factorization  0s are put in the upper triangular parts of the panels, 1s on the diagonal, and the upper triangular parts are stored separately. See also this discussion topic.
Regarding the different versions, sgeqrf2_gpu is LAPACK consistent in terms of input and output data layout. The sgeqrf_gpu version stores the triangular matrices used in the factorization. sgeqrf3_gpu stores the triangular matrices but also modifies the storage for the Householder vectors used in the factorization  0s are put in the upper triangular parts of the panels, 1s on the diagonal, and the upper triangular parts are stored separately. See also this discussion topic.