## Why is Dgemm giving me the answer transposed?

Open discussion regarding features, bugs, issues, vendors, etc.

### Why is Dgemm giving me the answer transposed?

I thought the dgemm would take care of the fortran column major problem but calling dgemm to solve for C = A*B only gives the C correctly in a transposed format. I can't get it to produce C non-transposed. I implemented a naive transpose function so I'd have the correct answer for my project, but profiling shows that transpose is taking 2x more time than dgemm itself. Which leads me to another question. Is there an efficient transpose blas routine?

Aside from that, here is my test code:

Code: Select all
//~returns random matrix
double* matrix_random(double *A, int row, int col)
{
int i,j;
srand(time(NULL));
for(i=0;i<row;i++)
{
for(j=0;j<col;j++)
{
A[i*col + j] = rand() % 5;
}
}
return A;
}

dgemm_test(int rowsA, int colsB, int common)
{
int i,j,k;
char trans = 'T';
char norm = 'N';
double one = 1.0;
double zero = 0.0;
double neg = -1.0;
double *A, *B, *C, *D;
A = malloc(sizeof(double)*rowsA*common);
B = malloc(sizeof(double)*colsB*common);
C = calloc(rowsA*colsB,sizeof(double));
D = calloc(rowsA*colsB,sizeof(double));

A = matrix_random(A,rowsA,common);
B = matrix_random(B,common,colsB);

//~ Do a matrix multiply: A*B = C
for(i=0;i<rowsA;i++)
{
for(j=0;j<colsB;j++)
{
C[i*colsB+j]=0;
for(k=0;k<common;k++)
{
C[i*colsB+j]+=A[i*common+k]*B[k*colsB+j];
}
}
}
dgemm_(&trans, &trans, &rowsA, &colsB, &common, &one, A, &common, B, &colsB, &zero, C, &rowsA);

}

int main(void)
{
dgemm_test(4, 3, 2);
return 0;
}

Example output:

A, resulting 4 by 2 matrix
0, 0,
0, 4,
3, 1,
3, 2,

B, resulting 2 by 3 matrix
0, 0, 0,
4, 3, 1,

C, resulting 4 by 3 matrix
0, 0, 0,
16, 12, 4,
4, 3, 1,
8, 6, 2,

D, resulting 4 by 3 matrix
0, 16, 4,
8, 0, 12,
3, 6, 0,
4, 1, 2,

If I print C and D here, they are just the same except transposed. Using the 'n' function does not help. It only specifies whether or not the input (A and B) is transposed. Because the output of A*B is transposed, I have to perform a transpose function before I can add another matrix to it. Is there a way to get the C result to not be transposed?
ctcsback

Posts: 10
Joined: Tue Oct 26, 2010 3:26 am

### Re: Why is Dgemm giving me the answer transposed?

Hi,
first your random routine is not working correctly.
Looks like the elements of A are the same than B.

Then , you should use cblas, it has a RowMajor / Col Major argument.
You can find it here: http://www.netlib.org/blas/blast-forum/cblas.tgz

Julie

Posts: 612
Joined: Wed Dec 08, 2004 7:07 pm

### Re: Why is Dgemm giving me the answer transposed?

first your random routine is not working correctly.
Looks like the elements of A are the same than B.

Then , you should use cblas, it has a RowMajor / Col Major argument.
You can find it here: http://www.netlib.org/blas/blast-forum/cblas.tgz

Julie

Thanks for the help. I finally got cblas working(ish) and got cblas_dgemm function to spit out what I wanted. However, I just wanted to ask about/point out its behavior, because following the documentation I found for the functions gave me illegal value errors and seg faults.

I ended up modifying my dgemm test function (and fixing my random function, thanks for pointing it out, even though it wasn't crucial). Code is here:

Code: Select all
dgemm_test(int rowsA, int colsB, int common)
{
int i,j,k,w;
double *A, *B, *C, *D;
char trans = 'T';
char norm = 'N';
double one = 1.0;
double zero = 0.0;
double neg = -1.0;
enum CBLAS_ORDER order;
enum CBLAS_TRANSPOSE transa;
enum CBLAS_TRANSPOSE transb;

order = CblasRowMajor;
transa = CblasNoTrans;
transb = CblasNoTrans;

A = (double *)malloc(sizeof(double)*rowsA*common);
B = (double *)malloc(sizeof(double)*colsB*common);
C = (double *)calloc(rowsA*colsB,sizeof(double));
D = (double *)calloc(rowsA*colsB,sizeof(double));

A = matrix_random(A,rowsA,common);
B = matrix_random(B,common,colsB);

//~ Do a matrix multiply: A*B = C
for(i=0;i<rowsA;i++)
{
for(j=0;j<colsB;j++)
{
C[i*colsB+j]=0;
for(k=0;k<common;k++)
{
C[i*colsB+j]+=A[i*common+k]*B[k*colsB+j];
}
}
}
//Print A, B,C here

//~ This was my previous call
//~ dgemm_(&trans, &trans, &rowsA, &colsB, &common, &one, A, &common, B, &colsB, &zero, D, &rowsA);

cblas_dgemm(order,transa,transb,rowsA,colsB,common,1.0,A,common,B,colsB,0.0,D,colsB);
//Print D here
}

This is what ended up working for me, but it wasn't close to what links like http://www.prism.gatech.edu/~ndantam3/cblas-doc/doc/html/cblas_8h.html#cbc643095eb54edbffb0c5e9080a2450 showed - my lda, ldb, ldc values are all different. Also, I was still wondering if there are any optimized versions of matrix transpose out there. Just curious if you know of any.

Thanks again.
ctcsback

Posts: 10
Joined: Tue Oct 26, 2010 3:26 am

### Re: Why is Dgemm giving me the answer transposed?

DGEMM routine are implemented in the BLAS libraries.
You can take a look here: http://en.wikipedia.org/wiki/Basic_Line ... ubprograms
Most of them (MKL, ATLAS, ACML, Accelerate ...) are also including the CBLAS.
Julie