On a Fermi card(GT-430) <testing_zheevd_gpu -N 800> reports almost the exact same timing for GPU and CPU. Making the matrix size larger makes no difference either. This is on a 2-core Intel box. But testing_zgemm reports a very acceptable 20 GFLOPS. Why is that?
EDIT: I should have mentioned this is magma 1.2.0. and Ubuntu Linux 10.04.
