I was not able to create a bug report in magma mercurial issues, so I'll report it here.
The subject of this message summarizes the issue, here's a reproducer based on pytorch:
Code: Select all
>>> import torch >>> m, n = 3, 3 >>> torch.ones(1, m, n, device='cuda').lu() (tensor([[[1., 1., 1.], [1., 0., 0.], [1., nan, nan]]], device='cuda:0'), tensor([[1, 2, 3]], device='cuda:0', dtype=torch.int32))
The source of this issue is likely in the kernel functions implemented in magmablas/zgetrf_batched_smallsq_shfl.cu and ./magmablas/zgetrf_batched_smallsq_noshfl.cu .