The PLASMA user guide for 2.4.5 directs (6.3 Tuning Howto) you to LAWN #217 (Comparative Study of One-Sided Factorizations with Multiple Software Packages on Multi-Core Hardware) for information on tuning PLASMA.
Having read the paper section 3.1 (Tuning.PLASMA) details the tuning process, my understanding is as follows:
- Iterate though values of NB (40...500) and IB (factors or NB)
- run sequential core_blas routines used in factorisations (with N=NB) with the select IB and NB values
- Select the best performing combinations (from across the NB range)
- Run full parallel factorisations using IB and NB values from the selection
- Select best performing combination as "best performing" for architecture @ problem size.
So my first question, is my understanding of the pruned tuning process correct?
My second is the graphs in that paper show that the core_blas routines used by the factorisation (DPORTF - dgemm-seq, DGEQRF - dssrfb-seq, DETRF - dsssm-seq) seem to have changed since the paper was written (dssrfb isn't even included with my version of plasma) which routines should I tune against instead.