Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
Posts: 1
Joined: Wed Mar 03, 2010 11:14 am


Post by agruber » Wed Mar 03, 2010 11:45 am

Is there more documentation than the header file on how to use magmablas_stranspose?

Stan Tomov
Posts: 264
Joined: Fri Aug 21, 2009 10:39 pm

Re: magmablas_stranspose

Post by Stan Tomov » Thu Mar 04, 2010 1:45 pm

There isn't because for now the function is used internally. The function definition is

Code: Select all

extern "C" 
void magmablas_stranspose(float *odata, int ldo,
                          float *idata, int ldi,
                          int m, int n )
It takes an input m x n matrix in idata with leading dimension ldi (>=m) and transposes it, writing the output in odata with leading dimension ldo (>=n). The implementation requires m and n to be divisible by 32. Coalescent memory accesses (and hence high performance) is achieved if ldo and ldi are divisible by 16 and odata and idata addresses divisible by 16*sizeof(float). The function does not give feedback on wrong input, e.g., if m%32 != 0 and (m/32 + 32) <= ldi the matrix will be correctly inverted but so would be the strip/padding of rows from m to m/32 + 32 (as a side effect; those does not need to be initialized).

There is also another function that would do in-place transpose for square matrices.

Code: Select all

extern "C" void
magmablas_sinplace_transpose( float *A, int lda, int n )
The requirements for lda and n are the same as in the stranspose function.

Post Reply