I completely understand that Cublas Gemm is much optimized than MagmaBlas gemm.

I am working on a problem which requires me to use GPGPU simulator.

**I have a CUBLAS Gemm in my application, as the Cublas GEMM is not opensource, GPGPU simulator cannot extract the PTX from it.**

The next best optimized library after Cublas Gemm is the MagmaBlas Gemm.

Unfortunately as the size of matrices are in the range of

M is 32

K is 27

N is 369664

I cannot use the textured and more optimized version of MagmaBlas Gemm.

So, I am planning to comment out the #define texture1_D line and make it again.

I just want to confirm it with you that even for large sizes the magma gemm (global memory version will work correctly).

