Open forum for general discussions relating to PLASMA.


Postby mpettipher » Tue Feb 19, 2013 10:37 am


I have installed PLASMA and MAGMA on the Intel MIC. MAGMA performs very well - better than MKL, but PLASMA is relatively poor, significantly slower than MKL. I believe I have used MKL BLAS in PLASMA and have set MKL_NUM_THREADS and OMP_NUM_THREADS to 1, but I have not tried any changes to the blocking sizes. (It is built explicitly for use on the MIC itself, not for use with offloading from the host.) I know that for MAGMA, setting the thread affinity with KMP_AFFINITY is crucial for best performance (a factor of 2 difference), and so wondered whether this might be the case with PLASMA. However KMP_AFFINITY seems to be ignored - there is no output of thread/node binding even when verbose is specified. It seems that some aspect of the PLASMA implementation results in KMP_AFFINITY being ignored. I know there is some reference to thread affinity within the code, specifically the sched_setaffinity function in plasmaos.c, However i do not know the implications of this, nor whether it would be possible or even desirable, to enable KMP_AFFINITY.

Given MAGMA works well and is designed (in part) for the MIC, I do not intend to pursue this much further, but it would be useful to identify the reason for poor performance, and whether enabling KMP_AFFINITY helps. Any thoughts on this would be appreciated.


Mike Pettipher
Posts: 1
Joined: Tue Feb 19, 2013 9:48 am

Re: PLASMA on Intel MIC

Postby admin » Tue Feb 19, 2013 1:13 pm

Putting PLASMA on the MIC is a complicated issue.
We would have to make sure that MKL plays along with PLASMA in terms of thread affinity, memory affinity, and what not.
You are welcome to make such experiments, but we are not venturing in that direction at this moment.
MAGMA is the preferred package for the MIC (a.k.a. Xeon Phi).
Site Admin
Posts: 84
Joined: Wed May 13, 2009 1:27 pm

Return to User discussion

Who is online

Users browsing this forum: No registered users and 2 guests