Very slow dgetmatrix

Open discussion for MAGMA

Very slow dgetmatrix

Postby aptypr » Tue Oct 16, 2012 7:43 am

I'm using magma to speed up computing inverse matrix. Here is the main part of code.
Code: Select all
   double *h_A, *h_R;
   double *d_A, *dwork, *work, tmp;
   int i, j;
   magma_int_t *ipiv;
   magma_int_t lda = n, ldda=((n+31)/32)*32;
   magma_int_t info, n2=n*n;
   magma_int_t ldwork, lwork;
   work=&tmp;
   
   lwork = int( MAGMA_D_REAL( *work ));
   ldwork = n * magma_get_dgetri_nb( n );
   TESTING_MALLOC(    ipiv,  magma_int_t,     n      );
   TESTING_MALLOC(    work,  double, lwork  );
   TESTING_MALLOC(    h_A,   double, n2     );
   TESTING_HOSTALLOC( h_R,   double, n2     );
   TESTING_DEVALLOC(  d_A,   double, ldda*n );
   TESTING_DEVALLOC(  dwork, double, ldwork );
   for (i = 0; i < n; i++) {
          for (j = 0; j < n; j++) {
             h_A[n * i + j] = a.data[i][j];
          }
       }
   magma_dsetmatrix( n, n, h_A, lda, d_A, ldda );
   magma_dgetrf_gpu( n, n, d_A, ldda, ipiv, &info );
   magma_dgetri_gpu(n, d_A, ldda, ipiv, dwork, ldwork, &info);
   magma_dgetmatrix( n, n, d_A, ldda, h_R, lda );

Using time and difftime functions I figured out, that on matrix like 5000x5000 it takes 9 seconds to execute magma_dgetmatrix. Is it always so slow? Or the problem in my videocard - NVidia GeForce GT 424M? Distro - Debian Wheezy, I used disto's drivers.
aptypr
 
Posts: 1
Joined: Tue Oct 16, 2012 7:36 am

Re: Very slow dgetmatrix

Postby mgates3 » Wed Oct 17, 2012 1:30 pm

The magma_dgetmatrix is a thin wrapper around cublasGetMatrix, mainly for platform independence and type checking. The performance issue may be your PCIe bus. It should be about the same time to do setmatrix as getmatrix.

Also be aware of timing asynchronous functions. The getri_gpu may be asynchronous (i.e., return before the GPU is finished), in which case the getmatrix would appear to be much longer because it has to wait for getri to finish. Best to do cudaDeviceSynchronize() before each timer call if you're not sure whether calls are async or not. For example:

cudaDeviceSynchronize()
gettimeofday( t1 )

getri( ... )
cudaDeviceSynchronize()
gettimeofday( t2 )

getmatrix( ... )
cudaDeviceSynchronize()
gettimeofday( t3 )

-mark
mgates3
 
Posts: 438
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: Bing [Bot] and 1 guest

cron