No scaling of Hermitian eigenvalue routines

Open forum for general discussions relating to PLASMA.

No scaling of Hermitian eigenvalue routines

Postby stachon » Wed Jul 31, 2013 9:49 am

Hello,
I am currently working on a benchmark comparing the performance of various Hermitian matrix eigenproblem implementations on a SMP system.
I already have some scaling results for LAPACK with multithreaded BLAS and MKL, using routines ZHEEV, ZHEEVD and ZHEEVR. I decided to try Plasma 2.5.1, as the release notes mention support for Hermitian eigenproblem. However, I do not get any speed up, only a very small improvement when using 2 cores vs. 1, but with 4 and more cores the time actually increases. When watching CPU usage using htop, i see that actually only one thread is busy, while the others are basically idle.

PLASMA is compiled using installer with multithreaded MKL:

./setup.py --cc=gcc --fc=gfortran --cflags=-DPLASMA_WITH_MKL -L/opt/intel/ --blaslib=-L/opt/intel/mkl/lib/intel64 -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -ldl -lpthread -lm

The program calling PLASMA is compiled with:
mpif90 -ffree-form -ffree-line-length-256 -m64 -I/opt/intel/composer_xe_2013.1.117/mkl/include -I/home/stachon/include/plasma -O3 -fno-range-check -o bin.out source_codes...f -L/usr/lib64 -L/home/stachon/lib -lplasma -lcoreblas -llapacke -lquark -Wl,--start-group /opt/intel/composer_xe_2013.1.117/mkl/lib/intel64/libmkl_gf_lp64.a /opt/intel/composer_xe_2013.1.117/mkl/lib/intel64/libmkl_gnu_thread.a /opt/intel/composer_xe_2013.1.117/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -ldl -lpthread -lm -L/home/

Best regards,
Martin Stachon, VSB-Technical University of Ostrava, Czech republic
stachon
 
Posts: 5
Joined: Mon Jul 29, 2013 5:06 am

Re: No scaling of Hermitian eigenvalue routines

Postby haidar » Wed Jul 31, 2013 10:20 am

Dear Martin,
There are one thing that need to be done in order to get performance from PLASMA eigenvalue problem.
Please link with mkl_thread (OK in your case), but also compile with -DPLASMA_WITH_MKL

Can you send me also how you call plasma ?

Thanks
Azzam
haidar
 
Posts: 13
Joined: Tue Sep 07, 2010 12:01 pm

Re: No scaling of Hermitian eigenvalue routines

Postby stachon » Wed Jul 31, 2013 10:27 am

Hello Azzam,
I used --cflags=-DPLASMA_WITH_MKL with the plasma installer script. I call PLASMA this way (Fortran 90 code):

CALL PLASMA_INIT(NCPU, INFO)
CALL PLASMA_ALLOC_WORKSPACE_ZHEEVD(NDIM, NDIM, desc, INFO)
CALL PLASMA_ZHEEVD(PlasmaVec, PlasmaLower, NDIM, A, LDA, EVAL, desc, EVEC, LDEVEC, INFO)
CALL PLASMA_FINALIZE(INFO)

Martin
stachon
 
Posts: 5
Joined: Mon Jul 29, 2013 5:06 am

Re: No scaling of Hermitian eigenvalue routines

Postby haidar » Wed Jul 31, 2013 10:40 am

Martin,
can you give me an idea about the timing you are getting, the number of threads you want to use, the matrix size, and the processors type?
Thanks
Azzam
haidar
 
Posts: 13
Joined: Tue Sep 07, 2010 12:01 pm

Re: No scaling of Hermitian eigenvalue routines

Postby stachon » Wed Jul 31, 2013 11:01 am

I run benchmarks of ZHEEV, ZHEEVR, ZHEEVD on matrix sizes 600x600 and 6000x6000, with 1-32 threads. The CPU is Quad-Core AMD Opteron(tm) Processor 8380 (eight of these in the system)

Some timings in seconds, measured using MPI_WTIME:

ZHEEV, size 600
1 thread 1,55
2 threads 1,52
4 threads 2,41
8 threads 3,46
16 threads 4,84
32 threads 5,80

ZHEEV, size 6000
1 thread 1393,73
2 threads 1118,47
4 threads 1333,89
8 threads 1457,35
16 threads 1586,77
32 threads 1915,45

Similar results for ZHEEVD and ZHEEVR
stachon
 
Posts: 5
Joined: Mon Jul 29, 2013 5:06 am

Re: No scaling of Hermitian eigenvalue routines

Postby haidar » Wed Jul 31, 2013 11:19 pm

Hi again,
Can you please run it with the patch that I sent to you and with the option -DENABLE_TIMER -DENABLE_DEBUG.
Please you need to recompile all, and send me the output on 8threads.
Thanks
Azzam
haidar
 
Posts: 13
Joined: Tue Sep 07, 2010 12:01 pm

Re: No scaling of Hermitian eigenvalue routines

Postby shengguo » Thu Aug 15, 2013 9:19 am

haidar wrote:Hi again,
Can you please run it with the patch that I sent to you and with the option -DENABLE_TIMER -DENABLE_DEBUG.
Please you need to recompile all, and send me the output on 8threads.
Thanks
Azzam



Could me tell me how to use "-DENABLE_TIMER" to print the execution time ? When I add this opion,
I get the error that "PLASMA_Wtime" is undefined. Thanks.
shengguo
 
Posts: 1
Joined: Thu Aug 15, 2013 9:06 am

Re: No scaling of Hermitian eigenvalue routines

Postby stachon » Thu Aug 22, 2013 2:34 pm

There seems to be a problem with thread/core binding. For a brief moment, PLASMA threads are distributed among cores (1,2,3, etc.), but after a second, all the threads, including the main one, are bound to core 0, until the execution ends. I have recompiled PLASMA with -DPLASMA_AFFINITY_DISABLE and the threads are no longer bound to core 0. I will try some benchmarks tomorrow.
stachon
 
Posts: 5
Joined: Mon Jul 29, 2013 5:06 am

Re: No scaling of Hermitian eigenvalue routines

Postby stachon » Wed Aug 28, 2013 10:31 am

I ran benchmarks of PLASMA compiled with -DPLASMA_AFFINITY_DISABLE and the scaling is very good for ZHEEV, ZHEEVD and ZHEEVD on 1-32 cores. PLASMA times beat MKL by more than 3x.
stachon
 
Posts: 5
Joined: Mon Jul 29, 2013 5:06 am


Return to User discussion

Who is online

Users browsing this forum: No registered users and 2 guests

cron