NaN bug in MAGMAblas dgemm

Open discussion for MAGMA

NaN bug in MAGMAblas dgemm

Postby eric_heinemeyer » Thu Feb 28, 2013 11:41 am

Hello,

I'd like to report the following bug (MAGMA 1.3.0 and all previous version) concerning the implementation
of the general dgemm for FERMI cards.

Consider the following case where matrices A and B equal
Code: Select all
 101. 104. 107. nan
 102. 105. 108. nan
 103. 106. 109. nan
 nan   nan  nan  nan

and C, alpha and beta are arbitrary (not containing any nan's).
Computing C = alpha*op(A)*op(B) + beta*C for the submatrix A (1:3,1:3) and B(1:3,1:3)
yields a result containing nan's.

Bug fix
The first problem is the following computation of the matrix size that is used for the texture mapping of the arrays A and B.

In the function magmablas_dgemm_fermi(...) replace
Code: Select all
    size_t sizeA = (size_t) lda * (size_t) (!TransA ? k : m);
    size_t sizeB = (size_t) ldb * (size_t) (!TransB ? n : k);

with
Code: Select all
    size_t sizeA = (size_t) (lda * ((!TransA ? k : m) - 1) + (!TransA ? m : k) );
    size_t sizeB = (size_t) (ldb * ((!TransB ? n : k) - 1) + (!TransB ? k : n) );


This fixes the bug for C = alpha*A*B^T + beta*C.

For C = alpha*A^T*B + beta*C the access to A and B must be protected
with a conditional load.
In fermiDgemm_v2_kernel_TN( ...) replace
Code: Select all
    int tll = tx2;

    #pragma unroll
    for(int y=0; y<4; y++)
        Abs[ty2*4+y][tx2] =  (tll<k)*  fetch_x_A(trackA + y*lda);

    #pragma unroll
    for(int y=0; y<4; y++)
        Bb[tx2][ty2*4+y] = /* (tll<k)* */ fetch_x_B( trackB + y*ldb );

    __syncthreads();

    double Axs[4];
    double Bxp[4];

    double Cb[16] = {0,0,0,0,    0,0,0,0, 0,0,0,0, 0,0,0,0};

    int k1;
    for(k1=0; k1<(k-16); k1+=16)
    {
        tll +=16;
        B += 16;
        A += 16  ;
        trackA+=16 ;
        trackB+=16;

        #pragma unroll
        for(int y=0; y<4; y++)
            xxA[y] =  (tll<k)*  fetch_x_A(trackA + y*lda);

        #pragma unroll
        for(int y=0; y<4; y++)
            xxB[y] = /* (tll<k)* */ fetch_x_B(trackB + y*ldb);
 

with
Code: Select all
 
     int tll = tx2;

    // +++ CHANGED +++
    #pragma unroll
    for(int y=0; y<4; y++)
        Abs[ty2*4+y][tx2] =  (tll<k) ? fetch_x_A(trackA + y*lda) : 0.0;

    // +++ CHANGED +++
    #pragma unroll
    for(int y=0; y<4; y++)
        Bb[tx2][ty2*4+y]  =  (tll<k) ? fetch_x_B(trackB + y*ldb) : 0.0;

    __syncthreads();

    double Axs[4];
    double Bxp[4];

    double Cb[16] = {0,0,0,0,    0,0,0,0, 0,0,0,0, 0,0,0,0};

    int k1;
    for(k1=0; k1<(k-16); k1+=16)
    {
        tll +=16;
        B += 16;
        A += 16  ;
        trackA+=16 ;
        trackB+=16;

        // +++ CHANGED +++
        #pragma unroll
        for(int y=0; y<4; y++)
            xxA[y] = (tll<k) ? fetch_x_A(trackA + y*lda) : 0.0;

        // +++ CHANGED +++
        #pragma unroll
        for(int y=0; y<4; y++)
            xxB[y] = (tll<k) ? fetch_x_B(trackB + y*ldb) : 0.0;


For C = alpha*A*B + beta*C we must protect the access to B.
In fermiDgemm_v2_kernel_NN( ...) replace
Code: Select all
    int tll = tx2;
    #pragma unroll
    for(int y=0; y<4; y++)
        Abs[ty2*4+y][tx2] =  (tll<k)*  fetch_x_A(trackA + y*lda);

    #pragma unroll
    for(int y=0; y<4; y++)
        Bb[tx2][ty2*4+y] = /* (tll<k)* */ fetch_x_B( trackB + y*ldb );

    __syncthreads();

    double Axs[4];
    double Bxp[4];

    double Cb[16] = {0,0,0,0,    0,0,0,0, 0,0,0,0, 0,0,0,0};

    int k1;
    for(k1=0; k1<(k-16); k1+=16)
    {
        tll +=16;
        B += 16;
        A += 16  ;
        trackA+=16 ;
        trackB+=16;

        #pragma unroll
        for(int y=0; y<4; y++)
            xxA[y] =  (tll<k)*  fetch_x_A(trackA + y*lda);

        #pragma unroll
        for(int y=0; y<4; y++)
            xxB[y] = /* (tll<k)* */ fetch_x_B(trackB + y*ldb);

with
Code: Select all
    int tll = tx2;
    #pragma unroll
    for(int y=0; y<4; y++)
        Abs[tx2+ y*16][ty2] = fetch_x_A(trackA + y*16) ;

    //+++ CHANGED  +++
    #pragma unroll
    for(int y=0; y<4; y++)
        Bb[tx2][ty2*4+y] = (tll<k) ? fetch_x_B( trackB + y * ldb) : 0.0;

    __syncthreads();

    double Axs[4];
    double Bxp[4];

    double Cb[16] = {0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0};

    int k1;
    for(k1=0; k1<(k-16); k1+=16)
    {
        tll+=16;
        A += lda *16  ;
        B += 16;
        trackA += 16*lda ;
        trackB += 16;

        #pragma unroll
        for( int y=0; y<4; y++)
            xxA[y] =  fetch_x_A(trackA + y*16);

        //+++ CHANGED  +++
        #pragma unroll
        for( int y=0; y<4; y++)
            xxB[y] = (tll<k) ? fetch_x_B( trackB + y*ldb) : 0.0;


For the case C = alpha*A^T*B^T + beta*C the access to A must be protected.
This can be done in analogy to the case C=A^T*B + beta*C.

It is very likely (I did not check) that the implementation of SGEMM,CGEMM and ZGEMM contain this bug
also. For software maintenance reasons it might be better do replace this check with the pointer redirection
approach to load the last row or column if the thread block gets out of bounds.
The nice feature of the texture mapping to return zero values outside the registered array only
works for the case of trailing nan's and only in the case of C = alpha*A*B^T+beta*C.

To explore the read only cache in TESLA K20 cards one has to make sure to only
access valid positions within the submatrix.

I do not think that a lot of people ever ran into this problem. We only found it by introducing checkword areas before and after
the data parts. The 32 bit integer checkwords that we used accidentaly yielded an nan value if interpreted as 64 bit
floating point value.

Cheers Eric
eric_heinemeyer
 
Posts: 1
Joined: Thu Feb 28, 2013 10:51 am

Re: NaN bug in MAGMAblas dgemm

Postby Stan Tomov » Thu Feb 28, 2013 4:08 pm

Hi Eric,
Thanks for the report. Indeed we had a wrong assumption that certain out of bound values will be multiplied by 0 and therefore would be irrelevant for the result - which resulted in this bug when out of bound values are nan, and then nans propagated in the result. The other precisions have the same issue. We will fix it for the next release.
Stan
Stan Tomov
 
Posts: 251
Joined: Fri Aug 21, 2009 10:39 pm

Re: NaN bug in MAGMAblas dgemm

Postby fletchjp » Tue Apr 02, 2013 1:39 pm

I wonder if this report could be the reason for the other bug which I have just reported for 1.3.0 with DSGESV - that a NaN is picked up from unset memory not actually used in the calculation. That would explain why the memory error happens only sometimes.

John
fletchjp
 
Posts: 175
Joined: Mon Dec 27, 2010 7:29 pm


Return to User discussion

Who is online

Users browsing this forum: Bing [Bot] and 3 guests