It seems that magma_dtrtri_gpu fails for some specific matrix size because MAGMA implementation of dtrsm can lead to incorrect memory accesses.
The condition on the matrix size seems to be directly related to the bloc sizes used by MAGMA internally.
For example it seems that this code from diag_dtrtri_kernel_upper produces an error if the last bloc is bigger than the remainder of the matrix:
- Code: Select all
// load A
#pragma unroll
for (i=0; i<BLOCK_SIZE; i++)
Bs[i*BLOCK_SIZE+tx] = ((double)(tx<=i))*(*(Aoff+i*lda+tx)); // read in the whole square block of my A and zero out the non data triangular
You can use N = 129 to reproduce the error using magma_dtrtri_gpu.
Rémi
