magma sgemv 50% occupancy 86gb/s tesla

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

magma sgemv 50% occupancy 86gb/s tesla

Postby itabhiyanta » Thu Jul 01, 2010 12:31 pm


i am using magma blas function segmv for my application and i run it for a square matrix 4096x4096 in size and multiply it with a 4096x1 vector. However the occupancy is 50% but the bandwidth is pretty neat at (80-85) Gb/s for my tesla card(C1060).

I increased the block size to 128 and recompiled my code with this version but it didn't change the performance of the code and both bandwidth and execution time remained the same. Also the occupancy increased to 100%.

Does this mean that my kernel is bandwidth bound?

Please dont mind if my question seems to be rookie. I am doing such an analysis for the first time.

thanks and regards
Posts: 11
Joined: Thu Jul 01, 2010 12:12 pm

Re: magma sgemv 50% occupancy 86gb/s tesla

Postby Stan Tomov » Thu Aug 05, 2010 11:10 am

Yes, the kernel is memory bound.
For n^2 * sizeof(float) bytes of data the flops are 2 n^2, i.e. only 0.5 flops per byte.
This means that if for example the bus is 140 GB/s (in the GTX280), the theoretical
peak for sgemv (due to the memory speed limitation) will be 70 GFlop/s (assuming
we do the computations for "free"). The sgemv achieves up to 66 GFlop/s on the
GTX280, which is very good.
Stan Tomov
Posts: 258
Joined: Fri Aug 21, 2009 10:39 pm

Return to User discussion

Who is online

Users browsing this forum: No registered users and 1 guest