## Very slow dgetmatrix

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

### Very slow dgetmatrix

I'm using magma to speed up computing inverse matrix. Here is the main part of code.
Code: Select all
`   double *h_A, *h_R;   double *d_A, *dwork, *work, tmp;   int i, j;   magma_int_t *ipiv;   magma_int_t lda = n, ldda=((n+31)/32)*32;   magma_int_t info, n2=n*n;   magma_int_t ldwork, lwork;   work=&tmp;      lwork = int( MAGMA_D_REAL( *work ));   ldwork = n * magma_get_dgetri_nb( n );   TESTING_MALLOC(    ipiv,  magma_int_t,     n      );   TESTING_MALLOC(    work,  double, lwork  );   TESTING_MALLOC(    h_A,   double, n2     );   TESTING_HOSTALLOC( h_R,   double, n2     );   TESTING_DEVALLOC(  d_A,   double, ldda*n );   TESTING_DEVALLOC(  dwork, double, ldwork );   for (i = 0; i < n; i++) {          for (j = 0; j < n; j++) {             h_A[n * i + j] = a.data[i][j];          }       }   magma_dsetmatrix( n, n, h_A, lda, d_A, ldda );   magma_dgetrf_gpu( n, n, d_A, ldda, ipiv, &info );   magma_dgetri_gpu(n, d_A, ldda, ipiv, dwork, ldwork, &info);   magma_dgetmatrix( n, n, d_A, ldda, h_R, lda );`

Using time and difftime functions I figured out, that on matrix like 5000x5000 it takes 9 seconds to execute magma_dgetmatrix. Is it always so slow? Or the problem in my videocard - NVidia GeForce GT 424M? Distro - Debian Wheezy, I used disto's drivers.
aptypr

Posts: 1
Joined: Tue Oct 16, 2012 7:36 am

### Re: Very slow dgetmatrix

The magma_dgetmatrix is a thin wrapper around cublasGetMatrix, mainly for platform independence and type checking. The performance issue may be your PCIe bus. It should be about the same time to do setmatrix as getmatrix.

Also be aware of timing asynchronous functions. The getri_gpu may be asynchronous (i.e., return before the GPU is finished), in which case the getmatrix would appear to be much longer because it has to wait for getri to finish. Best to do cudaDeviceSynchronize() before each timer call if you're not sure whether calls are async or not. For example:

cudaDeviceSynchronize()
gettimeofday( t1 )

getri( ... )
cudaDeviceSynchronize()
gettimeofday( t2 )

getmatrix( ... )
cudaDeviceSynchronize()
gettimeofday( t3 )

-mark
mgates3

Posts: 778
Joined: Fri Jan 06, 2012 2:13 pm

Return to User discussion

### Who is online

Users browsing this forum: No registered users and 1 guest