## Question about linear solver sgetrf_gpu in Magma 0.2

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

### Question about linear solver sgetrf_gpu in Magma 0.2

Hi:

I am triyng to understand the way of work of linear solver "magma_sgetrf_gpu"

I think (but I am not sure) that the solver uses any kind of padding technique

Am I right?

If I see the code in "testing_sgesv.gupu.cpp", I have some questions for

- allocating memory for matrix in GPU:

status = cublasAlloc((N+32)*(N+32) + 32*maxnb + lwork+2*maxnb*maxnb,
sizeof(float), (void**)&d_A ) ;
if (status != CUBLAS_STATUS_SUCCESS) {

- Sending the matrix to GPU

int dlda = (N/32)*32;
if (dlda<N) dlda+=32;

.....

cublasSetMatrix( N, N, sizeof( float ), A, N, d_A, dlda ) ;

This confuses to me a little because dlda is not N...What happens here?

- Solving the system:

magma_sgetrf_gpu(&N, &N, d_A, &dlda, IPIV, h_work_M_S, INFO);

Posts: 10
Joined: Thu Dec 03, 2009 2:53 pm

### Re: Question about linear solver sgetrf_gpu in Magma 0.2

The solver first transposes the matrix in the GPU memory. The CUDA kernel that is doing it is of block size 32 and we request larger matrix so that we do not code the transpose operation for general matrix size. This will be most probably changed in future releases. When the next panel has to be processed, it is first transposed (to move it back to the standard data layout that LAPACK expects) and than sent to the CPU and factored there using LAPACK. The work space on the GPU needed for this and other operations is requested by the user - to be given as single pointer.
int dlda = (N/32)*32;
if (dlda<N) dlda+=32;
cublasSetMatrix( N, N, sizeof( float ), A, N, d_A, dlda ) ;

Here we just make the device lda of d_A divisible by 32 (and larger than N). This is where the matrix is copied and transposed in-place. The rest of the memory is used as workspace. So, to answer your question, we do "padding" just for the transpose operation, not for BLAS, and in future releases we will remove the need for the "padding" in the transpose operation.
Stan
Stan Tomov

Posts: 258
Joined: Fri Aug 21, 2009 10:39 pm

### Re: Question about linear solver sgetrf_gpu in Magma 0.2

The question is that I am studying if this behaviour change the way of working with my algorithms. I do several CUBLAS computations with the matrix A ( Matrix x Matrix, and Matrix x Vector) before applying it the LU.

I think that this behaviour only change in my algorithms that I need to change the space allocated in for A in GPU. I am not sure yet, if I must change anything else

Thanks again.