Bug in ZINPLACE_TRANSPOSE?

Open discussion for MAGMA

Bug in ZINPLACE_TRANSPOSE?

Postby keitat » Tue Apr 24, 2012 12:47 am

Hi,

I have seen strange error with zgetrf_gpu (wrong answer or segfault with memory calls after the routine) on Cray XK (Fermi+). The error occurs when (1) M=N, (2) M is multiple of 32, and (3) LDA is mulitple of 32, but not equal to M. I thought this could be a bug in zinplace_tranpsose. So I changed the source to call magmablas_ztransepose2, then the problem is fixed. I think zinplace_transpose source might have some bugs, but I haven't figured it yet.

It will be great, if you can help us.

Thanks,

Keita
keitat
 
Posts: 8
Joined: Tue Jan 24, 2012 2:19 pm

Re: Bug in ZINPLACE_TRANSPOSE?

Postby Stan Tomov » Tue Apr 24, 2012 2:53 am

Hi Keita,
Thank you for this bug report. We were able to reproduce it and to fix the bug. The fix will be in the next release (beginning of May). It is for a case where we do an in-place matrix transpose. There are two calls to the transpose routine (in file zgetrf_gpu.cpp). The first one is:
Code: Select all
if ((m == n) && (m % 32 == 0) && (ldda%32 == 0))
     magmablas_zinplace_transpose( dAT, ldda, lddat );

and must be replaced by
Code: Select all
if ((m == n) && (m % 32 == 0) && (ldda%32 == 0)){
            lddat = ldda;
            magmablas_zinplace_transpose( dAT, ldda, m);
}

The second call must be replaced by
Code: Select all
magmablas_zinplace_transpose( dAT, lddat, m );

Please let us know if this worked for your tests. Thanks again.
Stan
Stan Tomov
 
Posts: 251
Joined: Fri Aug 21, 2009 10:39 pm

Re: Bug in ZINPLACE_TRANSPOSE?

Postby keitat » Tue Apr 24, 2012 9:55 pm

Stan,

I can see the same problem with sgetrf_gpu, dgetrf_gpu and cgetrf_gpu.

Thanks,

Keita
keitat
 
Posts: 8
Joined: Tue Jan 24, 2012 2:19 pm

Re: Bug in ZINPLACE_TRANSPOSE?

Postby Stan Tomov » Wed Apr 25, 2012 1:09 am

Keita,
Yes, we fixed the other versions as well. Actually, we generate them from the double complex version. I will go through the other LU versions as well, in particular the CPU interface ones, to see if they also need this fix.
Stan
Stan Tomov
 
Posts: 251
Joined: Fri Aug 21, 2009 10:39 pm

Re: Bug in ZINPLACE_TRANSPOSE?

Postby keitat » Mon May 28, 2012 12:19 am

I am still seeing wrong answer if A and X are allocated in contiguous memory as described below. I suspect that zinplace_tranpose is making out-of-bound memory access when M!=LDA. In my application, I finally get the correct answer after changing the source to apply zinplace_transpose for LDA=M only.


====Error case description==
(Definition of BIGA and A in Fortran90 notation.)
LDA = M + 32; (M is divisible by 32).
BIGA = Double Complex Array of (1:LDA,1:M)
A = BIGA(33:lda,33:lda)

In my application code:
magma_zgetrf_gpu ( m,m, &A[32*lda+32], lda, ipvt, info);

After zgetrf call, the elements in A(1:32,33:M) are polluted.
keitat
 
Posts: 8
Joined: Tue Jan 24, 2012 2:19 pm

Re: Bug in ZINPLACE_TRANSPOSE?

Postby keitat » Tue May 29, 2012 11:47 am

It appears transpose routine looks OK. I am also looking into magmablas_zpermute_long2 to find any out-of-bound memory access.

Just for clarification. The error occurs when:
M=3584
LDA=3616
Input matrix starts at A(32,32).

keitat wrote:I am still seeing wrong answer if A and X are allocated in contiguous memory as described below. I suspect that zinplace_tranpose is making out-of-bound memory access when M!=LDA. In my application, I finally get the correct answer after changing the source to apply zinplace_transpose for LDA=M only.


====Error case description==
(Definition of BIGA and A in Fortran90 notation.)
LDA = M + 32; (M is divisible by 32).
BIGA = Double Complex Array of (1:LDA,1:M)
A = BIGA(33:lda,33:lda)

In my application code:
magma_zgetrf_gpu ( m,m, &A[32*lda+32], lda, ipvt, info);

After zgetrf call, the elements in A(1:32,33:M) are polluted.
keitat
 
Posts: 8
Joined: Tue Jan 24, 2012 2:19 pm

Re: Bug in ZINPLACE_TRANSPOSE?

Postby Stan Tomov » Tue May 29, 2012 12:09 pm

Keita,
I managed to reproduce this bug as well. Thanks for reporting it. It was in magmablas_zpermute_long2. Now it is fixed in the SVN and we will make it available soon. Meanwhile, the fix is to call zpermute as
Code: Select all
magmablas_zpermute_long2( n, dAT, lddat, ipiv, nb, i*nb );

The added argument changes the implementation of zpemute in magmablas/zpermute-v2.cu (the other precisions are similar) as follows:
Code: Select all
extern "C" void
magmablas_zpermute_long2( int n, cuDoubleComplex *dAT, int lda, int *ipiv, int nb, int ind )
{
        int k;

        for( k = 0; k < nb-BLOCK_SIZE; k += BLOCK_SIZE )
   {
                //zlaswp_params_t params = { dAT, lda, lda, ind + k };
                zlaswp_params_t2 params = { dAT, n, lda, ind + k, BLOCK_SIZE };
        ...
}

The bug was that in this particular case the code was permuting user data outside of the submatrix that is being factorized. We didn't have this issue in the CPU interface code and for this case we were only checking correctness for the factorization.
Stan
Stan Tomov
 
Posts: 251
Joined: Fri Aug 21, 2009 10:39 pm


Return to User discussion

Who is online

Users browsing this forum: Google [Bot] and 2 guests