SGEMM when beta=0

 Posts: 34
 Joined: Sat Nov 21, 2009 6:03 pm
SGEMM when beta=0
I recently had a curious hiccup using magma's sgemm. sgemm, as you likely know is C=alpha*A*B + beta*C. When beta=0, the last term can theoretically be entirely dropped.
After allocating C on the GPU, I had been initializing it with zeros. I was advised by the CULA people that I could skip this step when beta=0, since the beta*C term was then ignored. We strive for efficiency in our calculations, of course.
I adapted my code to magma and was having problems getting the right answers out of sgemm... It developed that I needed to once again zeroout C after allocation, even when beta=0. It seems that magma implements the beta*C calculation even when beta=0. (?)
I was surprised by this, but happy that the numbers I was getting back from magma agreed with matlab and CULA, at last. Perhaps a word of warning to all, and a suggestion to magma for a way to make a (small, to be sure) improvement in computational efficiency of sgemm.
If it matters: I was using the latest magma on a 555m/laptop and stock Suse linux 11.4. Magma is certainly a bear to get set up with BLAS, etc.!
After allocating C on the GPU, I had been initializing it with zeros. I was advised by the CULA people that I could skip this step when beta=0, since the beta*C term was then ignored. We strive for efficiency in our calculations, of course.
I adapted my code to magma and was having problems getting the right answers out of sgemm... It developed that I needed to once again zeroout C after allocation, even when beta=0. It seems that magma implements the beta*C calculation even when beta=0. (?)
I was surprised by this, but happy that the numbers I was getting back from magma agreed with matlab and CULA, at last. Perhaps a word of warning to all, and a suggestion to magma for a way to make a (small, to be sure) improvement in computational efficiency of sgemm.
If it matters: I was using the latest magma on a 555m/laptop and stock Suse linux 11.4. Magma is certainly a bear to get set up with BLAS, etc.!
Re: SGEMM when beta=0
When you didn't initialize the matrix, what result were you getting, and what result did you expect? A short sample code would be helpful.
Technically, the beta*C must be carried out to properly propagate NAN values that may be in the C matrix. I'll look into how some different libraries handle this case.
mark
Technically, the beta*C must be carried out to properly propagate NAN values that may be in the C matrix. I'll look into how some different libraries handle this case.
mark
Re: SGEMM when beta=0
Here are some results on a Tesla T20 (Fermi), Magma 1.1, CUDA 4.0, Intel MKL. (Source code attached.) The first column of C is set to NAN on input. When beta=0, implementations differ on whether they propagate NAN values or not. In particular, MAGMA BLAS does propagate NAN values. Therefore, you do not need to zero out the C matrix if beta=0, but you do need to ensure that all of C has valid numbers, not NAN, INF, etc.
Note Matlab follows MAGMA BLAS in propogating NANs. (The results are slightly different than above because the input was rounded to 2 digits.)
>> A*B + 0*C
ans =
NaN 2.7848 2.4997 2.9946 1.9133
NaN 1.7041 1.5454 1.8214 1.3242
NaN 3.0259 3.6589 3.7124 2.5754
mark
Code: Select all
remus> ./cudagemm 3 5 9
m 3, n 5, k 9
A
0.22 0.49 0.47 0.96 0.19 0.17 0.96 0.50 0.44
0.22 0.57 0.03 0.15 0.36 0.08 0.21 0.79 0.50
0.45 0.47 0.95 0.26 0.62 0.59 0.78 0.94 0.83
B
0.90 0.76 0.60 0.91 0.45
0.74 0.93 0.24 0.14 0.17
0.24 0.06 0.66 0.17 0.32
0.84 0.72 0.19 0.74 0.48
0.96 0.89 0.20 0.04 0.93
0.46 0.22 0.87 0.91 0.25
0.29 0.98 0.97 0.99 0.54
0.45 0.08 0.70 0.88 0.65
0.03 0.58 0.66 0.87 0.13
C
nan 0.34 0.00 0.87 0.54
nan 0.23 0.42 0.39 0.30
nan 0.45 0.65 0.35 0.49
==========
C := 1 A B + 1 C, with cublas
nan 3.12 2.51 3.86 2.45
nan 1.93 1.98 2.21 1.63
nan 3.47 4.33 4.06 3.07
C := 1 A B + 1 C, with magmablas
nan 3.12 2.51 3.86 2.45
nan 1.93 1.98 2.21 1.63
nan 3.47 4.33 4.06 3.07
C := 1 A B + 1 C, with lapack
nan 3.12 2.51 3.86 2.45
nan 1.93 1.98 2.21 1.63
nan 3.47 4.33 4.06 3.07
==========
C := 1 A B + 0 C, with cublas
2.26 2.78 2.51 2.99 1.91
1.57 1.70 1.56 1.83 1.33
2.75 3.02 3.67 3.70 2.58
C := 1 A B + 0 C, with magmablas
nan 2.78 2.51 2.99 1.91
nan 1.70 1.56 1.83 1.33
nan 3.02 3.67 3.70 2.58
C := 1 A B + 0 C, with lapack
2.26 2.78 2.51 2.99 1.91
1.57 1.70 1.56 1.83 1.33
2.75 3.02 3.67 3.70 2.58
==========
C := 1 A B + 1e06 C, with cublas
nan 2.78 2.51 2.99 1.91
nan 1.70 1.56 1.83 1.33
nan 3.02 3.67 3.70 2.58
C := 1 A B + 1e06 C, with magmablas
nan 2.78 2.51 2.99 1.91
nan 1.70 1.56 1.83 1.33
nan 3.02 3.67 3.70 2.58
C := 1 A B + 1e06 C, with lapack
nan 2.78 2.51 2.99 1.91
nan 1.70 1.56 1.83 1.33
nan 3.02 3.67 3.70 2.58
>> A*B + 0*C
ans =
NaN 2.7848 2.4997 2.9946 1.9133
NaN 1.7041 1.5454 1.8214 1.3242
NaN 3.0259 3.6589 3.7124 2.5754
mark
 Attachments

 cudagemm.cpp
 (3.73 KiB) Downloaded 181 times

 Posts: 34
 Joined: Sat Nov 21, 2009 6:03 pm
Re: SGEMM when beta=0
Thanks for the reply  I think you are right that the issue was NaN's, etc. in the (uninitialized) C matrix. The code is an application I've developed to call Cuda/Magma from matlab  its overly complicated to post its code, and I haven't the energy to extract out an example. I suspect your discussion above is the issue. Its probably "best practice" to zero out the C when initialized, in any case, even though in this case it is not used. Doing so takes up minimal cpu resources.
The application is welltested and returns welltested single precision numbers. I think without zeroing C I was getting either garbage or NaN's back  an obvious and glaring discrepancy from what I had been getting before.
The application is welltested and returns welltested single precision numbers. I think without zeroing C I was getting either garbage or NaN's back  an obvious and glaring discrepancy from what I had been getting before.
Re: SGEMM when beta=0
I believe I am experiencing something similar with sgemm. It is extremelly hard to reproduce the effect, but I think I finally spotted it. From time to time, and I've also noticed that frontend machine dependant, part of the matrix result of sgemm when beta = 0 are NaN. Should I zeroing the matrix output of sgemm to ensure correctness?
Re: SGEMM when beta=0
Yes, it's best practice to zero out the C matrix, even when beta=0.
Actually, setting C to any valid numbers should work, so long it does not contain NaN, inf, etc.
mark
Actually, setting C to any valid numbers should work, so long it does not contain NaN, inf, etc.
mark