SGEMM when beta=0

Open discussion for MAGMA

SGEMM when beta=0

Postby Boxed Cylon » Sun Mar 04, 2012 12:27 pm

I recently had a curious hiccup using magma's sgemm. sgemm, as you likely know is C=alpha*A*B + beta*C. When beta=0, the last term can theoretically be entirely dropped.

After allocating C on the GPU, I had been initializing it with zeros. I was advised by the CULA people that I could skip this step when beta=0, since the beta*C term was then ignored. We strive for efficiency in our calculations, of course.

I adapted my code to magma and was having problems getting the right answers out of sgemm... It developed that I needed to once again zero-out C after allocation, even when beta=0. It seems that magma implements the beta*C calculation even when beta=0. (?)

I was surprised by this, but happy that the numbers I was getting back from magma agreed with matlab and CULA, at last. Perhaps a word of warning to all, and a suggestion to magma for a way to make a (small, to be sure) improvement in computational efficiency of sgemm.

If it matters: I was using the latest magma on a 555m/laptop and stock Suse linux 11.4. Magma is certainly a bear to get set up with BLAS, etc.!
Boxed Cylon
 
Posts: 27
Joined: Sat Nov 21, 2009 6:03 pm

Re: SGEMM when beta=0

Postby mgates3 » Fri May 04, 2012 10:49 am

When you didn't initialize the matrix, what result were you getting, and what result did you expect? A short sample code would be helpful.

Technically, the beta*C must be carried out to properly propagate NAN values that may be in the C matrix. I'll look into how some different libraries handle this case.
-mark
mgates3
 
Posts: 442
Joined: Fri Jan 06, 2012 2:13 pm

Re: SGEMM when beta=0

Postby mgates3 » Fri May 04, 2012 3:43 pm

Here are some results on a Tesla T20 (Fermi), Magma 1.1, CUDA 4.0, Intel MKL. (Source code attached.) The first column of C is set to NAN on input. When beta=0, implementations differ on whether they propagate NAN values or not. In particular, MAGMA BLAS does propagate NAN values. Therefore, you do not need to zero out the C matrix if beta=0, but you do need to ensure that all of C has valid numbers, not NAN, INF, etc.

Code: Select all
remus> ./cuda-gemm 3 5 9
m 3, n 5, k 9
A
  0.22  0.49  0.47  0.96  0.19  0.17  0.96  0.50  0.44
  0.22  0.57  0.03  0.15  0.36  0.08  0.21  0.79  0.50
  0.45  0.47  0.95  0.26  0.62  0.59  0.78  0.94  0.83

B
  0.90  0.76  0.60  0.91  0.45
  0.74  0.93  0.24  0.14  0.17
  0.24  0.06  0.66  0.17  0.32
  0.84  0.72  0.19  0.74  0.48
  0.96  0.89  0.20  0.04  0.93
  0.46  0.22  0.87  0.91  0.25
  0.29  0.98  0.97  0.99  0.54
  0.45  0.08  0.70  0.88  0.65
  0.03  0.58  0.66  0.87  0.13

C
   nan  0.34  0.00  0.87  0.54
   nan  0.23  0.42  0.39  0.30
   nan  0.45  0.65  0.35  0.49

==========
C := 1 A B + 1 C, with cublas
   nan  3.12  2.51  3.86  2.45
   nan  1.93  1.98  2.21  1.63
   nan  3.47  4.33  4.06  3.07

C := 1 A B + 1 C, with magmablas
   nan  3.12  2.51  3.86  2.45
   nan  1.93  1.98  2.21  1.63
   nan  3.47  4.33  4.06  3.07

C := 1 A B + 1 C, with lapack
   nan  3.12  2.51  3.86  2.45
   nan  1.93  1.98  2.21  1.63
   nan  3.47  4.33  4.06  3.07

==========
C := 1 A B + 0 C, with cublas
  2.26  2.78  2.51  2.99  1.91
  1.57  1.70  1.56  1.83  1.33
  2.75  3.02  3.67  3.70  2.58

C := 1 A B + 0 C, with magmablas
   nan  2.78  2.51  2.99  1.91
   nan  1.70  1.56  1.83  1.33
   nan  3.02  3.67  3.70  2.58

C := 1 A B + 0 C, with lapack
  2.26  2.78  2.51  2.99  1.91
  1.57  1.70  1.56  1.83  1.33
  2.75  3.02  3.67  3.70  2.58

==========
C := 1 A B + 1e-06 C, with cublas
   nan  2.78  2.51  2.99  1.91
   nan  1.70  1.56  1.83  1.33
   nan  3.02  3.67  3.70  2.58

C := 1 A B + 1e-06 C, with magmablas
   nan  2.78  2.51  2.99  1.91
   nan  1.70  1.56  1.83  1.33
   nan  3.02  3.67  3.70  2.58

C := 1 A B + 1e-06 C, with lapack
   nan  2.78  2.51  2.99  1.91
   nan  1.70  1.56  1.83  1.33
   nan  3.02  3.67  3.70  2.58


Note Matlab follows MAGMA BLAS in propogating NANs. (The results are slightly different than above because the input was rounded to 2 digits.)
>> A*B + 0*C
ans =
NaN 2.7848 2.4997 2.9946 1.9133
NaN 1.7041 1.5454 1.8214 1.3242
NaN 3.0259 3.6589 3.7124 2.5754

-mark
Attachments
cuda-gemm.cpp
(3.73 KiB) Downloaded 116 times
mgates3
 
Posts: 442
Joined: Fri Jan 06, 2012 2:13 pm

Re: SGEMM when beta=0

Postby Boxed Cylon » Sat May 12, 2012 3:14 pm

Thanks for the reply - I think you are right that the issue was NaN's, etc. in the (uninitialized) C matrix. The code is an application I've developed to call Cuda/Magma from matlab - its overly complicated to post its code, and I haven't the energy to extract out an example. I suspect your discussion above is the issue. Its probably "best practice" to zero out the C when initialized, in any case, even though in this case it is not used. Doing so takes up minimal cpu resources.

The application is well-tested and returns well-tested single precision numbers. I think without zeroing C I was getting either garbage or NaN's back - an obvious and glaring discrepancy from what I had been getting before.
Boxed Cylon
 
Posts: 27
Joined: Sat Nov 21, 2009 6:03 pm

Re: SGEMM when beta=0

Postby luiceur » Mon Feb 25, 2013 6:06 am

I believe I am experiencing something similar with sgemm. It is extremelly hard to reproduce the effect, but I think I finally spotted it. From time to time, and I've also noticed that frontend machine dependant, part of the matrix result of sgemm when beta = 0 are NaN. Should I zeroing the matrix output of sgemm to ensure correctness?
luiceur
 
Posts: 26
Joined: Tue Jul 10, 2012 4:38 am

Re: SGEMM when beta=0

Postby mgates3 » Wed Mar 13, 2013 3:48 pm

Yes, it's best practice to zero out the C matrix, even when beta=0.
Actually, setting C to any valid numbers should work, so long it does not contain NaN, inf, etc.
-mark
mgates3
 
Posts: 442
Joined: Fri Jan 06, 2012 2:13 pm


Return to User discussion

Who is online

Users browsing this forum: Baidu [Spider] and 3 guests