Hi,
I have seen strange error with zgetrf_gpu (wrong answer or segfault with memory calls after the routine) on Cray XK (Fermi+). The error occurs when (1) M=N, (2) M is multiple of 32, and (3) LDA is mulitple of 32, but not equal to M. I thought this could be a bug in zinplace_tranpsose. So I changed the source to call magmablas_ztransepose2, then the problem is fixed. I think zinplace_transpose source might have some bugs, but I haven't figured it yet.
It will be great, if you can help us.
Thanks,
Keita