SGEMM when beta=0

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

SGEMM when beta=0

I recently had a curious hiccup using magma's sgemm. sgemm, as you likely know is C=alpha*A*B + beta*C. When beta=0, the last term can theoretically be entirely dropped.

After allocating C on the GPU, I had been initializing it with zeros. I was advised by the CULA people that I could skip this step when beta=0, since the beta*C term was then ignored. We strive for efficiency in our calculations, of course.

I adapted my code to magma and was having problems getting the right answers out of sgemm... It developed that I needed to once again zero-out C after allocation, even when beta=0. It seems that magma implements the beta*C calculation even when beta=0. (?)

I was surprised by this, but happy that the numbers I was getting back from magma agreed with matlab and CULA, at last. Perhaps a word of warning to all, and a suggestion to magma for a way to make a (small, to be sure) improvement in computational efficiency of sgemm.

If it matters: I was using the latest magma on a 555m/laptop and stock Suse linux 11.4. Magma is certainly a bear to get set up with BLAS, etc.!
Boxed Cylon

Posts: 29
Joined: Sat Nov 21, 2009 6:03 pm

Re: SGEMM when beta=0

When you didn't initialize the matrix, what result were you getting, and what result did you expect? A short sample code would be helpful.

Technically, the beta*C must be carried out to properly propagate NAN values that may be in the C matrix. I'll look into how some different libraries handle this case.
-mark
mgates3

Posts: 566
Joined: Fri Jan 06, 2012 2:13 pm

Re: SGEMM when beta=0

Here are some results on a Tesla T20 (Fermi), Magma 1.1, CUDA 4.0, Intel MKL. (Source code attached.) The first column of C is set to NAN on input. When beta=0, implementations differ on whether they propagate NAN values or not. In particular, MAGMA BLAS does propagate NAN values. Therefore, you do not need to zero out the C matrix if beta=0, but you do need to ensure that all of C has valid numbers, not NAN, INF, etc.

Code: Select all
`remus> ./cuda-gemm 3 5 9m 3, n 5, k 9A   0.22  0.49  0.47  0.96  0.19  0.17  0.96  0.50  0.44  0.22  0.57  0.03  0.15  0.36  0.08  0.21  0.79  0.50  0.45  0.47  0.95  0.26  0.62  0.59  0.78  0.94  0.83B   0.90  0.76  0.60  0.91  0.45  0.74  0.93  0.24  0.14  0.17  0.24  0.06  0.66  0.17  0.32  0.84  0.72  0.19  0.74  0.48  0.96  0.89  0.20  0.04  0.93  0.46  0.22  0.87  0.91  0.25  0.29  0.98  0.97  0.99  0.54  0.45  0.08  0.70  0.88  0.65  0.03  0.58  0.66  0.87  0.13C    nan  0.34  0.00  0.87  0.54   nan  0.23  0.42  0.39  0.30   nan  0.45  0.65  0.35  0.49==========C := 1 A B + 1 C, with cublas   nan  3.12  2.51  3.86  2.45   nan  1.93  1.98  2.21  1.63   nan  3.47  4.33  4.06  3.07C := 1 A B + 1 C, with magmablas   nan  3.12  2.51  3.86  2.45   nan  1.93  1.98  2.21  1.63   nan  3.47  4.33  4.06  3.07C := 1 A B + 1 C, with lapack   nan  3.12  2.51  3.86  2.45   nan  1.93  1.98  2.21  1.63   nan  3.47  4.33  4.06  3.07==========C := 1 A B + 0 C, with cublas  2.26  2.78  2.51  2.99  1.91  1.57  1.70  1.56  1.83  1.33  2.75  3.02  3.67  3.70  2.58C := 1 A B + 0 C, with magmablas   nan  2.78  2.51  2.99  1.91   nan  1.70  1.56  1.83  1.33   nan  3.02  3.67  3.70  2.58C := 1 A B + 0 C, with lapack  2.26  2.78  2.51  2.99  1.91  1.57  1.70  1.56  1.83  1.33  2.75  3.02  3.67  3.70  2.58==========C := 1 A B + 1e-06 C, with cublas   nan  2.78  2.51  2.99  1.91   nan  1.70  1.56  1.83  1.33   nan  3.02  3.67  3.70  2.58C := 1 A B + 1e-06 C, with magmablas   nan  2.78  2.51  2.99  1.91   nan  1.70  1.56  1.83  1.33   nan  3.02  3.67  3.70  2.58C := 1 A B + 1e-06 C, with lapack   nan  2.78  2.51  2.99  1.91   nan  1.70  1.56  1.83  1.33   nan  3.02  3.67  3.70  2.58`

Note Matlab follows MAGMA BLAS in propogating NANs. (The results are slightly different than above because the input was rounded to 2 digits.)
>> A*B + 0*C
ans =
NaN 2.7848 2.4997 2.9946 1.9133
NaN 1.7041 1.5454 1.8214 1.3242
NaN 3.0259 3.6589 3.7124 2.5754

-mark
Attachments
cuda-gemm.cpp
mgates3

Posts: 566
Joined: Fri Jan 06, 2012 2:13 pm

Re: SGEMM when beta=0

Thanks for the reply - I think you are right that the issue was NaN's, etc. in the (uninitialized) C matrix. The code is an application I've developed to call Cuda/Magma from matlab - its overly complicated to post its code, and I haven't the energy to extract out an example. I suspect your discussion above is the issue. Its probably "best practice" to zero out the C when initialized, in any case, even though in this case it is not used. Doing so takes up minimal cpu resources.

The application is well-tested and returns well-tested single precision numbers. I think without zeroing C I was getting either garbage or NaN's back - an obvious and glaring discrepancy from what I had been getting before.
Boxed Cylon

Posts: 29
Joined: Sat Nov 21, 2009 6:03 pm

Re: SGEMM when beta=0

I believe I am experiencing something similar with sgemm. It is extremelly hard to reproduce the effect, but I think I finally spotted it. From time to time, and I've also noticed that frontend machine dependant, part of the matrix result of sgemm when beta = 0 are NaN. Should I zeroing the matrix output of sgemm to ensure correctness?
luiceur

Posts: 26
Joined: Tue Jul 10, 2012 4:38 am

Re: SGEMM when beta=0

Yes, it's best practice to zero out the C matrix, even when beta=0.
Actually, setting C to any valid numbers should work, so long it does not contain NaN, inf, etc.
-mark
mgates3

Posts: 566
Joined: Fri Jan 06, 2012 2:13 pm