Page **1** of **1**

### Bug in ZINPLACE_TRANSPOSE?

Posted:

**Tue Apr 24, 2012 12:47 am**
by **keitat**

Hi,

I have seen strange error with zgetrf_gpu (wrong answer or segfault with memory calls after the routine) on Cray XK (Fermi+). The error occurs when (1) M=N, (2) M is multiple of 32, and (3) LDA is mulitple of 32, but not equal to M. I thought this could be a bug in zinplace_tranpsose. So I changed the source to call magmablas_ztransepose2, then the problem is fixed. I think zinplace_transpose source might have some bugs, but I haven't figured it yet.

It will be great, if you can help us.

Thanks,

Keita

### Re: Bug in ZINPLACE_TRANSPOSE?

Posted:

**Tue Apr 24, 2012 2:53 am**
by **Stan Tomov**

Hi Keita,

Thank you for this bug report. We were able to reproduce it and to fix the bug. The fix will be in the next release (beginning of May). It is for a case where we do an in-place matrix transpose. There are two calls to the transpose routine (in file zgetrf_gpu.cpp). The first one is:

- Code: Select all
`if ((m == n) && (m % 32 == 0) && (ldda%32 == 0))`

magmablas_zinplace_transpose( dAT, ldda, lddat );

and must be replaced by

- Code: Select all
`if ((m == n) && (m % 32 == 0) && (ldda%32 == 0)){`

lddat = ldda;

magmablas_zinplace_transpose( dAT, ldda, m);

}

The second call must be replaced by

- Code: Select all
`magmablas_zinplace_transpose( dAT, lddat, m );`

Please let us know if this worked for your tests. Thanks again.

Stan

### Re: Bug in ZINPLACE_TRANSPOSE?

Posted:

**Tue Apr 24, 2012 9:55 pm**
by **keitat**

Stan,

I can see the same problem with sgetrf_gpu, dgetrf_gpu and cgetrf_gpu.

Thanks,

Keita

### Re: Bug in ZINPLACE_TRANSPOSE?

Posted:

**Wed Apr 25, 2012 1:09 am**
by **Stan Tomov**

Keita,

Yes, we fixed the other versions as well. Actually, we generate them from the double complex version. I will go through the other LU versions as well, in particular the CPU interface ones, to see if they also need this fix.

Stan

### Re: Bug in ZINPLACE_TRANSPOSE?

Posted:

**Mon May 28, 2012 12:19 am**
by **keitat**

I am still seeing wrong answer if A and X are allocated in contiguous memory as described below. I suspect that zinplace_tranpose is making out-of-bound memory access when M!=LDA. In my application, I finally get the correct answer after changing the source to apply zinplace_transpose for LDA=M only.

====Error case description==

(Definition of BIGA and A in Fortran90 notation.)

LDA = M + 32; (M is divisible by 32).

BIGA = Double Complex Array of (1:LDA,1:M)

A = BIGA(33:lda,33:lda)

In my application code:

magma_zgetrf_gpu ( m,m, &A[32*lda+32], lda, ipvt, info);

After zgetrf call, the elements in A(1:32,33:M) are polluted.

### Re: Bug in ZINPLACE_TRANSPOSE?

Posted:

**Tue May 29, 2012 11:47 am**
by **keitat**

It appears transpose routine looks OK. I am also looking into magmablas_zpermute_long2 to find any out-of-bound memory access.

Just for clarification. The error occurs when:

M=3584

LDA=3616

Input matrix starts at A(32,32).

keitat wrote:I am still seeing wrong answer if A and X are allocated in contiguous memory as described below. I suspect that zinplace_tranpose is making out-of-bound memory access when M!=LDA. In my application, I finally get the correct answer after changing the source to apply zinplace_transpose for LDA=M only.

====Error case description==

(Definition of BIGA and A in Fortran90 notation.)

LDA = M + 32; (M is divisible by 32).

BIGA = Double Complex Array of (1:LDA,1:M)

A = BIGA(33:lda,33:lda)

In my application code:

magma_zgetrf_gpu ( m,m, &A[32*lda+32], lda, ipvt, info);

After zgetrf call, the elements in A(1:32,33:M) are polluted.

### Re: Bug in ZINPLACE_TRANSPOSE?

Posted:

**Tue May 29, 2012 12:09 pm**
by **Stan Tomov**

Keita,

I managed to reproduce this bug as well. Thanks for reporting it. It was in

magmablas_zpermute_long2. Now it is fixed in the SVN and we will make it available soon. Meanwhile, the fix is to call zpermute as

- Code: Select all
`magmablas_zpermute_long2( n, dAT, lddat, ipiv, nb, i*nb );`

The added argument changes the implementation of zpemute in

magmablas/zpermute-v2.cu (the other precisions are similar) as follows:

- Code: Select all
`extern "C" void`

magmablas_zpermute_long2( int n, cuDoubleComplex *dAT, int lda, int *ipiv, int nb, int ind )

{

int k;

for( k = 0; k < nb-BLOCK_SIZE; k += BLOCK_SIZE )

{

//zlaswp_params_t params = { dAT, lda, lda, ind + k };

zlaswp_params_t2 params = { dAT, n, lda, ind + k, BLOCK_SIZE };

...

}

The bug was that in this particular case the code was permuting user data outside of the submatrix that is being factorized. We didn't have this issue in the CPU interface code and for this case we were only checking correctness for the factorization.

Stan