I recently had a curious hiccup using magma's sgemm. sgemm, as you likely know is C=alpha*A*B + beta*C. When beta=0, the last term can theoretically be entirely dropped.

After allocating C on the GPU, I had been initializing it with zeros. I was advised by the CULA people that I could skip this step when beta=0, since the beta*C term was then ignored. We strive for efficiency in our calculations, of course.

I adapted my code to magma and was having problems getting the right answers out of sgemm... It developed that I needed to once again zero-out C after allocation, even when beta=0. It seems that magma implements the beta*C calculation even when beta=0. (?)

I was surprised by this, but happy that the numbers I was getting back from magma agreed with matlab and CULA, at last. Perhaps a word of warning to all, and a suggestion to magma for a way to make a (small, to be sure) improvement in computational efficiency of sgemm.

If it matters: I was using the latest magma on a 555m/laptop and stock Suse linux 11.4. Magma is certainly a bear to get set up with BLAS, etc.!