Hello Mark,

I completely understand that Cublas Gemm is much optimized than MagmaBlas gemm.

I am working on a problem which requires me to use GPGPU simulator.

I have a CUBLAS Gemm in my application, as the Cublas GEMM is not opensource, GPGPU simulator cannot extract the PTX from it.

The next best optimized library after Cublas Gemm is the MagmaBlas Gemm.

Unfortunately as the size of matrices are in the range of

M is 32

K is 27

N is 369664

I cannot use the textured and more optimized version of MagmaBlas Gemm.

So, I am planning to comment out the #define texture1_D line and make it again.

I just want to confirm it with you that even for large sizes the magma gemm (global memory version will work correctly).

Thanks a lot for all the help Mark.