There isn't because for now the function is used internally. The function definition is
- Code: Select all
extern "C"
void magmablas_stranspose(float *odata, int ldo,
float *idata, int ldi,
int m, int n )
It takes an input m x n matrix in idata with leading dimension ldi (>=m) and transposes it, writing the output in odata with leading dimension ldo (>=n). The implementation requires m and n to be divisible by 32. Coalescent memory accesses (and hence high performance) is achieved if ldo and ldi are divisible by 16 and odata and idata addresses divisible by 16*sizeof(float). The function does not give feedback on wrong input, e.g., if m%32 != 0 and (m/32 + 32) <= ldi the matrix will be correctly inverted but so would be the strip/padding of rows from m to m/32 + 32 (as a side effect; those does not need to be initialized).
There is also another function that would do in-place transpose for square matrices.
- Code: Select all
extern "C" void
magmablas_sinplace_transpose( float *A, int lda, int n )
The requirements for lda and n are the same as in the stranspose function.