I have been testing the performance of the MAGMA library for Xeon Phi, and I am a little bit disappointed with the performance. In particular, I have benchmarked the DGETRF function and compared to the performance using Intel MKL library on standard CPU. Basically I was trying to reproduce the performance of slide 15 of presentation http://icl.cs.utk.edu/projectsfiles/mag ... MIC_03.pdf. My system is very similar to the one used for those tests, but I can only reach 200-250 GFlops on Xeon Phi, whereas on 16 Sandy Bridge cores I am close to 300 GFlops.
I would like to know if anyone can help get reproduce those results ? Thank you very much in advance for your help.
Other question, is there an implementation of DGETRI planned to be released for Xeon Phi ?