Disappointing performance of DSYEV

Open forum for general discussions relating to PLASMA.

Disappointing performance of DSYEV

Postby jimy_b » Thu Aug 29, 2013 12:14 pm

I'm trying to benchmark dsyev and I'm getting lousy scaling with it, which I wasn't expecting.

The program is just the one from the tests that comes with PLASMA. 'testing_dsyev.c'.
I run it with a 4096x4096 sized matrix that is randomly generated in the code. I run it with static scheduling.

Timings:
cores wtime (s)
1 58.0
2 44.7
4 34.0
8 30.0
16 28.0

I've ran the same thing with MKL DSYEV:
cores wtime(s)
1 46.0
2 23.5
4 14.6
8 9.4
16 13.4

It's on a ccNUMA system of Intel Xeons. Like the poster who got bad scaling for Hermitian sym matrices I also get the problem that it appears to be mostly serial from looking at htop. I tried his fix of disabling affinity in PLASMA and recompiling and it hasn't improved things for me.
I compiled PLASMA with:
CFLAGS = -g -O2 -DADD_ -DPLASMA_WITH_MKL -diag-disable vec
CFLAGS += -DMKL_ILP64 -I$(MKLROOT)/include -openmp
CFLAGS += -DPLASMA_AFFINITY_DISABLE
FFLAGS = -g -O2 -diag-disable vec
FFLAGS += -I$(MKLROOT)/include -openmp
LDFLAGS = -nofor-main
LIBBLAS = -mkl=sequential -openmp

I compiled my test program with:
CFLAGS = -O2 -g -openmp -pthread
CFLAGS += -I/nfs/scratch/cosmos/dev/plasma/plasma_2.5.1/include
LAPACK = -mkl=sequential
LQUARK = /nfs/scratch/cosmos/dev/plasma/plasma_2.5.1/quark

LPATH = -L/nfs/scratch/cosmos/dev/plasma/plasma_2.5.1/lib -L${LQUARK}
LFLAGS = -lplasma -lcoreblas -lquark -mkl=sequential -lpthread -lhwloc -openmp

Can anyone point me in the right direction?
Jim
jimy_b
 
Posts: 12
Joined: Wed Jul 31, 2013 1:06 pm

Re: Disappointing performance of DSYEV

Postby haidar » Thu Aug 29, 2013 12:49 pm

Jim,
The first thing that you can do is that you have to compile with multithread mkl not sequential.
So please try to compile with mkl_thread and then let me know if you get slow performance.
Thanks
Azzam
haidar
 
Posts: 13
Joined: Tue Sep 07, 2010 12:01 pm

Re: Disappointing performance of DSYEV

Postby jimy_b » Thu Aug 29, 2013 3:32 pm

Ok what I'm confused about with PLASMA is that I thought that PLASMA had it's own parallelism at the highest levels and did away with BLAS level parallelism. So I compiled with sequential BLAS, but you've said to compile with multithreaded MKL. Does it actually use both parallel BLAS and it's own task parallelism at the same time or something?
jimy_b
 
Posts: 12
Joined: Wed Jul 31, 2013 1:06 pm

Re: Disappointing performance of DSYEV

Postby haidar » Thu Aug 29, 2013 5:07 pm

Well,
PLASMA use its own parallelism and based on sequential BLAS, but the eigenvalues and singular value routine requires MKL_thread
since the routines switch between our own parallelism and some mkl_thread usage.

Thanks
Azzam
haidar
 
Posts: 13
Joined: Tue Sep 07, 2010 12:01 pm

Re: Disappointing performance of DSYEV

Postby jimy_b » Thu Aug 29, 2013 5:27 pm

Ah! I see.
So do I have to set MKL_NUM_THREADS when I run it?

Cheers,
James
jimy_b
 
Posts: 12
Joined: Wed Jul 31, 2013 1:06 pm

Re: Disappointing performance of DSYEV

Postby haidar » Thu Aug 29, 2013 5:37 pm

It doesn't matter, because PLASMA set it to 1 when it need sequential BLAS and then set it to PLASMA_threads when it need it parallel.
haidar
 
Posts: 13
Joined: Tue Sep 07, 2010 12:01 pm


Return to User discussion

Who is online

Users browsing this forum: Yahoo [Bot] and 1 guest

cron