performance dgemm tille on cray XT5

Open forum for general discussions relating to PLASMA.

performance dgemm tille on cray XT5

Postby vweber » Fri Sep 03, 2010 1:56 pm

Dear Plasma

I am trying to run the time_dgemm_tile routine on a node of a cray xt5
(each node has of 2 six-core AMD Opteron 2.4 GHz Istanbul processors, then 12 cores in total) .

plasma was compile with the
./setup.py --cc=cc --fc=ftn --downall
and the PrgEnv-gnu/2.2.48B compilers.

The speedup is good till 2 threads, but then I get a plateau (see bellow).
Any explanation for this behavior? What would fix it?

Thanks
V


./time_dgemm_tile --threads=xxx
# N NRHS threads seconds Gflop/s Deviation
3000 1 1 49.589 1.09 0.00
3000 1 2 25.439 2.12 0.00
3000 1 4 25.383 2.13 0.00
3000 1 8 25.819 2.09 0.00
vweber
 
Posts: 2
Joined: Wed Jul 08, 2009 8:10 am

Re: performance dgemm tille on cray XT5

Postby admin » Fri Sep 03, 2010 2:48 pm

Make sure that you're using fast BLAS (MKL, ACML, GOTO, ...)
See PLASMA README for more details.

Make sure that you set the number of BLAS threads to 1:
> export OMP_NUM_THREADS=1
> export MKL_NUM_THREADS=1
> export GOTO_NUM_THREADS=1

Try using NUMA control with the flag "--interleave=all":
numactl --interleave=all ./time_dgemm_tile --threads=xxx

Let us know if your numbers get better.
Jakub
admin
Site Admin
 
Posts: 79
Joined: Wed May 13, 2009 1:27 pm


Return to User discussion

Who is online

Users browsing this forum: No registered users and 0 guests

cron