I am currently working on a benchmark comparing the performance of various Hermitian matrix eigenproblem implementations on a SMP system.
I already have some scaling results for LAPACK with multithreaded BLAS and MKL, using routines ZHEEV, ZHEEVD and ZHEEVR. I decided to try Plasma 2.5.1, as the release notes mention support for Hermitian eigenproblem. However, I do not get any speed up, only a very small improvement when using 2 cores vs. 1, but with 4 and more cores the time actually increases. When watching CPU usage using htop, i see that actually only one thread is busy, while the others are basically idle.
PLASMA is compiled using installer with multithreaded MKL:
./setup.py --cc=gcc --fc=gfortran --cflags=-DPLASMA_WITH_MKL -L/opt/intel/ --blaslib=-L/opt/intel/mkl/lib/intel64 -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -ldl -lpthread -lm
The program calling PLASMA is compiled with:
mpif90 -ffree-form -ffree-line-length-256 -m64 -I/opt/intel/composer_xe_2013.1.117/mkl/include -I/home/stachon/include/plasma -O3 -fno-range-check -o bin.out source_codes...f -L/usr/lib64 -L/home/stachon/lib -lplasma -lcoreblas -llapacke -lquark -Wl,--start-group /opt/intel/composer_xe_2013.1.117/mkl/lib/intel64/libmkl_gf_lp64.a /opt/intel/composer_xe_2013.1.117/mkl/lib/intel64/libmkl_gnu_thread.a /opt/intel/composer_xe_2013.1.117/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -ldl -lpthread -lm -L/home/
Martin Stachon, VSB-Technical University of Ostrava, Czech republic