Hi,
Hi,
I am trying to tune Plasma_dpotrf. When I try to vary the tile size, performance abruptly drops when tile size is increased from 125 to 150 and beyond.
Implementation,tile_size,num_threads,N,GFlops
Plasma,40,12,6000,79.4
Plasma,48,12,6000,83.7
Plasma,50,12,6000,80.8
Plasma,60,12,6000,86
Plasma,75,12,6000,79.5
Plasma,80,12,6000,91.8
Plasma,100,12,6000,93.5
Plasma,120,12,6000,94.7
Plasma,125,12,6000,84.6
Plasma,150,12,6000,9.85
Plasma,200,12,6000,10.2
Plasma,240,12,6000,10.2
Plasma,250,12,6000,10.1
Plasma,300,12,6000,10
Plasma,375,12,6000,9.52
Plasma,400,12,6000,10
Plasma,500,12,6000,9.91
This numbers were taken from Intel Xeon-X5650 @ 2.67 GHz - 6 Core dual socket system(total 12 cores) with peak performance of 128 GFlops. Theoretical peak performance of single core is around 10.5 GFlops. The version of plasma that I'm using is 2.4.6. Plasma is built with linking Intel MKL 10.2. On brief examination of source code, I could not find anything that would cause such behavior. I would like to know if anything similar has been observed or documented.
