I've noticed that the factorization (sgetrf_gpu) always does some smal computation on the CPU. why is that? the synchronization, get-setMatrix and re-start of the kernel takes a lot of time.
is there the posibility to turn it off, or do this part also on gpu?
Edit: ahhhh finally found your Paper "Dense Linear Algebra Solvers for..."
Now I think i understand, it´s necessary for pivoting?! is it possible to deaktivate pivote, idont need it on an Laplace.
and why is the cpu doing the factorization for panels
Thanks tomac
