## variables in sgetrf_gpu.cpp

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

### variables in sgetrf_gpu.cpp

Hi everybody,

I am looking at sgetrf_gpu.cpp file to understand how clmagma uses LU factorization.
However, I am very new to clmagma and programming, I have hard time understanding the code.
Here is the part of sgetrf_gpu.cpp:

/* Use hybrid blocked code. */
maxm = ((m + 31)/32)*32;
maxn = ((n + 31)/32)*32;

lddat = maxn;
lddwork = maxm;

Could anybody explain what the variables lddat and lddwork stand for? and also nb, and dAT?
Also, could anybody explain why (m+31) is divided by 32 and then multiplied by 32?

n_n

Posts: 1
Joined: Tue Feb 04, 2014 4:35 pm

### Re: variables in sgetrf_gpu.cpp

lda is the leading dimension of the matrix. An m x n matrix may be a submatrix of a larger lda x n matrix in memory. For example, using Matlab notation,
A = [
11, 12, 13
21, 22, 23
31, 32, 33
41, 42, 43
]
has an lda=4.

A2 = A( 3:4, 2:3 )
would be
A2 = [
32, 33
42, 43
]
In this case, A2 is a 2x2 sub-matrix of A, so its leading dimension (lda) is still 4. (I mean A2 is literally a sub-matrix of A, not a copy of a sub-matrix of A.)

We prefix matrices with "d" to mean on the device, so dA is the matrix A on the GPU device. The leading dimension for dA is then ldda, to distinguish it from lda for A. dAT is dA transposed, to be in row-major order instead of column-major order.

(m + 31) / 32) * 32
is a cryptic way of rounding m up to the next multiple of 32. Remember that the floor is taken in integer division, so read this as
floor( (m + 31) / 32 ) * 32
The +31 is to force rounding up, so in effect it is
ceil( m / 32 ) * 32
The ldda is rounded up because the GPU is most efficient at reading data when each column is aligned on a 32 word boundary.

nb is the block size. That is, we process nb columns of A at a time.

-mark
mgates3

Posts: 791
Joined: Fri Jan 06, 2012 2:13 pm