Thanks for the nopiv versions
nevertheless Fortran beats GPU
I have done some tests with sgetrf on cpu and gpu
First i thought the gpu results are bad but then I calculate the flops and recognized i got impossible results for cpu variant.
My question is, how is that possible. The result matrices are correct and i take the time at the right place.
5046x5046 i got factorization on cpu with fortran sgetrf = 0.25 s ~ 514 Gflops
omg I have a little supercomputer under my desk.
What is going on? Is the dense Matrix transformed to an sparse? I have mainly zero entries in my matrix, because i convertet a sparse into a dense.
I substract 1 from every entrie in my matrix, and see the results are possible.
factorization on cpu with fortran sgetrf = 18.79 s ~ 6,8 Gflops
That means Fortran really recognize zeros, but the gpu version doesn't.