Using sgemm for rectangular (non-square) matrix multiply

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)

Using sgemm for rectangular (non-square) matrix multiply

Dear All,

I am fairly new to GPU computing using CUDA but have gathered a lot of information from the internet over the last month or so. I am an academic researcher who has mostly been programming computational routines in Python, due to the simplicity of scripting languages (of course I know, runtime speed kind of sucks for more complex problems...). I have identified CUDA GPGPU (and thus potentially also MAGMA) as a great way of "outsourcing" certain hotspot code fragments to the GPU. In particular I am thinking about cases such as bootstrapping techniques in econometrics and Monte Carlo integration in Finance, where in the former case I may have to solve a linear algebra problem many thousand times over, which I want to do in parallel rather than serially.

There is one problem I have been experiencing, which is that many CUDA-optimized routines - such as SGEMM in CUBLAS - appear only to work well with square matrices. Is this the same with MAGMA SGEMM as well? All I want is a general, "fire-and-forget", but very fast GPU SGEMM version for a GENERAL matrix multiplication problem, where the matrices are [p,q] x [q,r] where p neq q neq r. I think the best option in my case would be to use PyCUDA actually, because I employ Python as my main language, but to my surprise I also still have not found a working source code for a kernel which delivers a fool-proof solution to my problem. Could anybody provide me with a hint as to what approach I ought to take? I have played around a lot with Vasily Volkov's fairly new SGEMM kernel source code, but again, for me so far this has only worked for the special case of square matrices. I am 100% certain that the alternative "solution" of applying padding of zeros to general problems to make them all square cannot be answer to what I am looking for :-)

Thanks,
Eric
emscheffel

Posts: 1
Joined: Mon Apr 09, 2012 2:48 am

Re: Using sgemm for rectangular (non-square) matrix multiply

The gemm in CUBLAS (and in MAGMA BLAS) works for rectangular matrices. As far as I know, all the CUBLAS work for all matrix sizes. If you are seeing some problems, please report the specific case. A short sample code would be helpful.
-mark
mgates3

Posts: 773
Joined: Fri Jan 06, 2012 2:13 pm

Re: Using sgemm for rectangular (non-square) matrix multiply

Hello,
I did non - square matrix multiplication by using cublasSgemm, I am getting the wrong output. I m very new to use those library functions. Please any one help me.
I used "cublasSgemm('n','n',N1,M2,M1,alpha,d_a,M1,d_b,M1,beta,d_c,N1)"
int N1 = 2; // row of matrix1
int M1 = 3; // col of matrix1
int N2 = 3; // row of matrix2
int M2 = 2; // col of matrix2

Anitha

Posts: 3
Joined: Thu Oct 04, 2012 2:35 am

Re: Using sgemm for rectangular (non-square) matrix multiply

Hello,
I m very new to use these library files. I did non - square matrix multiplication using cublasSgemm function, but am getting the wrong ouput. Please any one help me.
My sample code :
"cublasSgemm('n','n',N1,M2,M1,alpha,d_a,M1,d_b,M1,beta,d_c,N1)"

int N1 = 2; // row of matrix1
int M1 = 3; // col of matrix1
int N2 = 3; // row of matrix2
int M2 = 2; // col of matrix2

Anitha

Posts: 3
Joined: Thu Oct 04, 2012 2:35 am

Re: Using sgemm for rectangular (non-square) matrix multiply

It will help if you use the BLAS m, n, k convention. For C := alpha A B + beta C
C is m x n
A is m x k
B is k x n

It appears your leading dimensions may be wrong. The leading dimension (lda, ldb, ldc) is the number of rows allocated for the matrix, which may be greater than the number of rows in the gemm. If the gemm is the whole matrix without padding, then lda=m, ldb=k, and ldc=m. You can of course add padding (extra unused rows on bottom), so lda >= m, ldb >= k, ldc >= m. We usually pad the number of rows to a multiple of 32 on the GPU. See magma/testing/testing_sgemm.cpp for sample code.

cublasSgemm( 'n', 'n', m, n, k, alpha, dA, lda, dB, ldb, beta, dC, ldc );

You used N for rows and M for columns, which is the opposite of the normal matrix convention (m rows x n cols). In your notation,
lda = N1;
ldb = N2; // == M1, the inner, k, dimension
ldc = N1;
cublasSgemm( 'n', 'n', N1, M2, M1, alpha, dA, lda /*N1*/, dB, ldb /*N2*/, dC, ldc /*N1*/ );
In particular, the lda you passed was M1, not N1.

Hope that helps. If you still have problems, please post a more complete sample code. I can't tell how you are allocating the matrices, putting data onto the GPU, getting results from the GPU, or what your expected and actual outputs are.

-mark
mgates3

Posts: 773
Joined: Fri Jan 06, 2012 2:13 pm