Hello,

I have been trying to find a parallel (GPU+CUDA) tiled SGEMM, to solve big matrices which otherwise would not fit in GPU memory.

A friend of mine said he once saw a Magma version of tiled SGEMM, but I can't find it.

Does such implementation exist?

Thanks for your time...