I'm having a problem with zgetri_gpu or zgesv_gpu when they perform LU factorization at zgetrf_gpu.cpp:167. I checked that all the pointers I pass to zgesv/zgetri are fine. What is more weird is that there is no such error when only the tests for zgesv/zgetri are run. This only happens when other operations are performed before zgetri/zgesv.

Code: Select all

```
Program received signal CUDA_EXCEPTION_10, Device Illegal Address.
[Switching focus to CUDA kernel 0, grid 94, block (0,124,0), thread (0,6,0), device 0, sm 2, warp 11, lane 0]
0x0000000005cee738 in ztranspose3_32<<<(2,125,1),(16,8,1)>>> (m32=0, n32=20, __val_paramB=0x136e880000, ldb=4000, __val_paramA=0x130fac0000, lda=3980, m=64, n=3980)
at ztranspose-v2.cu:85
85 sA[iny+ 8][inx] = A[ 8*lda];
```

Code: Select all

```
sA[iny+ 0][inx] = A[ 0*lda];
sA[iny+ 8][inx] = A[ 8*lda];
sA[iny+16][inx] = A[16*lda];
sA[iny+24][inx] = A[24*lda];
```

Code: Select all

```
int t2 = iby+iny;
if (ibx+inx < m) {
if (t2 < n) {
sA[iny+0][inx] = A[0*lda];
if (t2+ 8 < n) {
sA[iny+8][inx] = A[8*lda];
if (t2 + 16 < n) {
sA[iny+16][inx] = A[16*lda];
if (t2 + 24 < n) {
sA[iny+24][inx] = A[24*lda];
}
}
}
}
}
```

Thanks,

Harshad