Open forum for general discussions relating to PLASMA.
I would like to understand this behavior here. I've run time_dgemm and time_dgemm_tile on my machine (Xeon X5650 4CPUx6Cores 48GB ). See the result below
Why is that happening? Any ideas would be much appreciated.
- dgemm.jpg (41.94 KiB) Viewed 2678 times
- Posts: 3
- Joined: Fri Apr 19, 2013 4:52 am
So you probably know that dgemm_tile is faster than dgemm, because it skips the layout translation.
So, your question is about the drop-off when exceeding 12 cores.
The first thing on my mind is a NUMA effect.
Try using numaclt --interleave=all
- Site Admin
- Posts: 79
- Joined: Wed May 13, 2009 1:27 pm
Return to User discussion
Who is online
Users browsing this forum: Yahoo [Bot] and 1 guest