Open discussion for MAGMA
i am using magma blas function segmv for my application and i run it for a square matrix 4096x4096 in size and multiply it with a 4096x1 vector. However the occupancy is 50% but the bandwidth is pretty neat at (80-85) Gb/s for my tesla card(C1060).
I increased the block size to 128 and recompiled my code with this version but it didn't change the performance of the code and both bandwidth and execution time remained the same. Also the occupancy increased to 100%.
Does this mean that my kernel is bandwidth bound?
Please dont mind if my question seems to be rookie. I am doing such an analysis for the first time.
thanks and regards
- Posts: 11
- Joined: Thu Jul 01, 2010 12:12 pm
Yes, the kernel is memory bound.
For n^2 * sizeof(float) bytes of data the flops are 2 n^2, i.e. only 0.5 flops per byte.
This means that if for example the bus is 140 GB/s (in the GTX280), the theoretical
peak for sgemv (due to the memory speed limitation) will be 70 GFlop/s (assuming
we do the computations for "free"). The sgemv achieves up to 66 GFlop/s on the
GTX280, which is very good.
- Posts: 253
- Joined: Fri Aug 21, 2009 10:39 pm
Return to User discussion
Who is online
Users browsing this forum: Baidu [Spider], Bing [Bot], quemener, Yahoo [Bot] and 1 guest