Dear Plasma
I am trying to run the time_dgemm_tile routine on a node of a cray xt5
(each node has of 2 six-core AMD Opteron 2.4 GHz Istanbul processors, then 12 cores in total) .
plasma was compile with the
./setup.py --cc=cc --fc=ftn --downall
and the PrgEnv-gnu/2.2.48B compilers.
The speedup is good till 2 threads, but then I get a plateau (see bellow).
Any explanation for this behavior? What would fix it?
Thanks
V
./time_dgemm_tile --threads=xxx
# N NRHS threads seconds Gflop/s Deviation
3000 1 1 49.589 1.09 0.00
3000 1 2 25.439 2.12 0.00
3000 1 4 25.383 2.13 0.00
3000 1 8 25.819 2.09 0.00
