## what is the size of d_A (A matrix on device) for sgesv_gpu?

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

### what is the size of d_A (A matrix on device) for sgesv_gpu?

I'm having trouble trying to figure out why the size of the matrix A on the device (d_A) in the example testing_sgesv_gpu.cpp is set to ((N+32)*(N+32) + 32*maxnb + lwork+2*maxnb*maxnb). The manual says that rest of the space will be used for workspace.
So I have an N x N matrix A on device memory where I need update only the on-diagonal elements of A (e.g. elements of row 0 col 0, row 1 col1, row2 col2.....) before feeding into magma sgesv solver. Currently I have a simple kernel to do this:
Code: Select all
`__global__ void d_update_matrix(float * A, float * beta, int total, int offset){   int i = blockIdx.x * blockDim.x + threadIdx.x;   if (i<total)   {      //only update the on-diagonal elements for A      A[i*offset] += beta[i];   }}`

where offset = N+1.
How would these indices for the on-diagonal elements correspond in d_A of size ((N+32)*(N+32) + 32*maxnb + lwork+2*maxnb*maxnb)? I tried to update the on-diagonal elements my matrix A then copy to d_A using cudamemcpyDeviceToDevice but it does not seem to work. So I would like to copy the elements of A to d_A and update d_A directly but i cannot figure out how the indexing would correspond in d_A. When I print out ((N+32)*(N+32) + 32*maxnb + lwork+2*maxnb*maxnb) it gives me 25937, where N = 73.
I am in the process of comparing the performance of MAGMA, CULA (commercial), and CUSP (sparse matrix solver being developed by people from NVIDIA). So far MAGMA seems to have the best results, but if I can figure this out I would be able to present complete results.
Thanks!
egyptwangja

Posts: 3
Joined: Thu Apr 22, 2010 12:57 pm

### Re: what is the size of d_A (A matrix on device) for sgesv_gpu?

A strip around the matrix is required for padding (to make the new size divisible by 32). This would result in potentially increasing the leading dimension of the matrix, e.g., see in testing_sgesv_gpu.cpp
Code: Select all
`int dlda = (N/32)*32;if (dlda<N) dlda+=32;`

Thus, if you have allocated enough memory, your program should work, but to update elements on the diagonal (and in general any element of the matrix) you must use d_update_matrix with offset equal to dlda+1 instead of with offset N+1. Please let us know if this didn't work (i.e., to jump through columns you have to increase your indexes with multiples of dlda, not N) .
Thanks,
Stan
Stan Tomov

Posts: 258
Joined: Fri Aug 21, 2009 10:39 pm