EDIT 1: Sorry, but I don't know why the [code] [/code] environment doesn't show the correct text...
EDIT 2: The version of MAGMA I have installed is the 2.4.6
I want to use the PLASMA_dgetrf() function in order to perform the LU decomposition of a matrix. One can read in dgetrf.c about the A matrix input parameter for PLASMA_dgetrf():
* @param[in,out] A
* On entry, the M-by-N matrix to be factored.
* On exit, the tile factors L and U from the factorization.
Then I suppose that the output LU matrix is stored in A in tile format.
I used this matrix in order to do some tests:
1.0 7.0 13.0 19.0 25.0
-2.0 8.0 14.0 20.0 26.0
3.0 -9.0 15.0 21.0 27.0
4.0 10.0 -16.0 22.0 28.0
5.0 11.0 17.0 -23.0 29.0
6.0 12.0 18.0 24.0 -30.0
The first test was the execution of
and the results for A and ipiv were
6.00000 12.00000 18.00000 24.00000 -30.00000
0.50000 -15.00000 6.00000 9.00000 42.00000
0.66667 -0.13333 -27.20000 7.20000 53.60000
-0.33333 -0.80000 -0.91176 41.76471 98.47059
0.83333 -0.06667 -0.08824 -1.00000 160.00000
0.16667 -0.33333 -0.44118 0.50704 0.11074
6 3 4 4 5
I've checked the results using GNU Octave and they are right. But, it is really the output matrix stored in tile format? As it can be seen, I could print it right as a normal matrix.
One can think that this is due to the internal tile size NB is greater than the matrix dimensions, so the entire matrix is containded in one tile. That is right because the internal tile size in my installation is
PLASMA_Get(PLASMA_TILE_SIZE,&nb); -> nb=128
So I have repeated the test changing the tile size via:
I have tested with NB values from 1 to 10 and the result is the same as in the original case. So the question is:
Is the A output argument (LU decomposition) in PLASMA_dgetrf() stored by tiles or as a normal matrix? If the last option is the correct one (as shows the tests), is the documentation in dgetrf.c (and dgetri.c, dgetrs.c, dgesv.c) wrong?
On the other hand, suppose I want to use the PLASMA_dgetrf_Tile() interface. In this case I should create the original matrix in tiled form prior to call the function. In short, the steps are:
But in this case the results are not correct. They are correct only if I set NB to 1 or to a number equals or greater than the maximun A dimension (in this case NB>=6). So, how I should use the *_Tile() interface? Am I using incorrectly the *_Desc_Create() function? Or maybe the problem is in *_Lapack_to_Tile() or *_Tile_to_Lapack() routines?