### MAGMA 1.1 Released

Posted:

**Fri Nov 18, 2011 4:52 pm**User discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures), http://icl.utk.edu/magma/

http://icl.cs.utk.edu/magma/forum/

Page **1** of **1**

Posted: **Thu Nov 24, 2011 8:03 pm**

On the release announcement http://icl.cs.utk.edu/magma/news/news.html?id=278 it seems to indicate that version 1.1 is capable of performing QR factorization on the GPU as well as multiple GPUs, but under the Magma 1.1 computational routines http://icl.cs.utk.edu/projectsfiles/magma/magma-routines-2.png QR factorization is listed as a CPU only function. Which is it?

Posted: **Fri Nov 25, 2011 12:05 am**

"CPU" does not mean that only CPUs would be used - it means "CPU interface" (the input data and the output result is expected to be on the CPU memory). The "GPU" or "GPU interface" means that the input matrix as well as the output is on the GPU memory. In either case both the GPUs and the CPUs are used.

For the case of QR, if you have more than one GPU, you can set environment variable MAGMA_NUM_GPUS to the number of GPUs you would like to use. For example, setting

will result in using 4 GPUs in subsequent calls to magma_{s,d,c,z}geqrf.

For the case of QR, if you have more than one GPU, you can set environment variable MAGMA_NUM_GPUS to the number of GPUs you would like to use. For example, setting

- Code: Select all
`setenv MAGMA_NUM_GPUS 4`

will result in using 4 GPUs in subsequent calls to magma_{s,d,c,z}geqrf.

Posted: **Sat Nov 26, 2011 1:57 pm**

Thanks Stan. If this is the case it would seem data input to magma_sgeqrf functions would have to all reside on CPU memory, but in testing_sgeqrf_gpu.cpp it appears to me that d_A is in device memory. So does SGEQRF also have a GPU interface in which case the "Computation Routines in Magma 1.1" table I linked earlier needs to be updated?

Also, what is the difference between sgeqrf_gpu, sgeqrf2_gpu, and sgeqrf3_gpu?

Also, what is the difference between sgeqrf_gpu, sgeqrf2_gpu, and sgeqrf3_gpu?

Posted: **Sun Nov 27, 2011 2:14 am**

Yes, this is a typo - QR has both CPU and GPU interface. Thanks for pointing this out. We will fix it.

Regarding the different versions, sgeqrf2_gpu is LAPACK consistent in terms of input and output data layout. The sgeqrf_gpu version stores the triangular matrices used in the factorization. sgeqrf3_gpu stores the triangular matrices but also modifies the storage for the Householder vectors used in the factorization - 0s are put in the upper triangular parts of the panels, 1s on the diagonal, and the upper triangular parts are stored separately. See also this discussion topic.

Regarding the different versions, sgeqrf2_gpu is LAPACK consistent in terms of input and output data layout. The sgeqrf_gpu version stores the triangular matrices used in the factorization. sgeqrf3_gpu stores the triangular matrices but also modifies the storage for the Householder vectors used in the factorization - 0s are put in the upper triangular parts of the panels, 1s on the diagonal, and the upper triangular parts are stored separately. See also this discussion topic.