## About LU et al. routines [and QR]

Open forum for general discussions relating to PLASMA.

### About LU et al. routines [and QR]

EDIT 1: Sorry, but I don't know why the [code] [/code] environment doesn't show the correct text...
EDIT 2: The version of MAGMA I have installed is the 2.4.6

Hello:

I want to use the PLASMA_dgetrf() function in order to perform the LU decomposition of a matrix. One can read in dgetrf.c about the A matrix input parameter for PLASMA_dgetrf():

[code]
* @param[in,out] A
* On entry, the M-by-N matrix to be factored.
* On exit, the tile factors L and U from the factorization.
[/code]

Then I suppose that the output LU matrix is stored in A in tile format.

I used this matrix in order to do some tests:

[code]
1.0 7.0 13.0 19.0 25.0
-2.0 8.0 14.0 20.0 26.0
3.0 -9.0 15.0 21.0 27.0
4.0 10.0 -16.0 22.0 28.0
5.0 11.0 17.0 -23.0 29.0
6.0 12.0 18.0 24.0 -30.0
[/code]

The first test was the execution of

[code]
PLASMA_dgetrf(6,5,A,6,ipiv);
[/code]

and the results for A and ipiv were

[code]
6.00000 12.00000 18.00000 24.00000 -30.00000
0.50000 -15.00000 6.00000 9.00000 42.00000
0.66667 -0.13333 -27.20000 7.20000 53.60000
-0.33333 -0.80000 -0.91176 41.76471 98.47059
0.83333 -0.06667 -0.08824 -1.00000 160.00000
0.16667 -0.33333 -0.44118 0.50704 0.11074

6 3 4 4 5
[/code]

I've checked the results using GNU Octave and they are right. But, it is really the output matrix stored in tile format? As it can be seen, I could print it right as a normal matrix.
One can think that this is due to the internal tile size NB is greater than the matrix dimensions, so the entire matrix is containded in one tile. That is right because the internal tile size in my installation is

[code]
PLASMA_Get(PLASMA_TILE_SIZE,&nb); -> nb=128
[/code]

So I have repeated the test changing the tile size via:

[code]
PLASMA_Disable(PLASMA_AUTOTUNING);
PLASMA_Set(PLASMA_TILE_SIZE,3);
PLASMA_dgetrf(6,5,A,6,ipiv);
[/code]

I have tested with NB values from 1 to 10 and the result is the same as in the original case. So the question is:

Is the A output argument (LU decomposition) in PLASMA_dgetrf() stored by tiles or as a normal matrix? If the last option is the correct one (as shows the tests), is the documentation in dgetrf.c (and dgetri.c, dgetrs.c, dgesv.c) wrong?

On the other hand, suppose I want to use the PLASMA_dgetrf_Tile() interface. In this case I should create the original matrix in tiled form prior to call the function. In short, the steps are:

[code]
int NB=3,M=6,N=5;LDA=6;
PLASMA_desc* descA=NULL;
PLASMA_Disable(PLASMA_AUTOTUNING);
PLASMA_Set(PLASMA_TILE_SIZE,NB);
PLASMA_Desc_Create(&descA,A,PlasmaRealDouble,NB,NB,NB*NB,LDA,N,0,0,M,N);
PLASMA_Lapack_to_Tile(A,LDA,descA);
PLASMA_dgetrf_Tile(descA,IPIV);
PLASMA_Tile_to_Lapack(descA,A,LDA);
PLASMA_Desc_Destroy(&descA);
[/code]

But in this case the results are not correct. They are correct only if I set NB to 1 or to a number equals or greater than the maximun A dimension (in this case NB>=6). So, how I should use the *_Tile() interface? Am I using incorrectly the *_Desc_Create() function? Or maybe the problem is in *_Lapack_to_Tile() or *_Tile_to_Lapack() routines?

Cheers
jgpallero

Posts: 29
Joined: Sat Jul 28, 2012 12:12 pm

### Re: About LU et al. routines [and QR]

Hello:

In this one year old thread (http://icl.cs.utk.edu/plasma/forum/viewtopic.php?f=2&t=931#p1387) I've seen that the behavior of the PLASMA_dgetrf (and I suppose the same for the all LU related PLASMA functions) is the same as for the LAPACK routine and the origin of my misunderstand was the dgetrf.c documentation is out of date. So now this is all clear and I know that PLASMA_dgetrf and all others functions related (but not the *_Tile interface) with LU decomposition uses the same arguments as the LAPACK routines.

And what about the PLASMA_dgeqrf and all the related QR and LQ functions? Are the same the output argument A from PLASMA_dgeqrf and PLASMA_dgelqf as the correspondent from the LAPACK routines? The documentation in dgeqrf.c and dgelqf.c says that on exit A stores the data 'by tiles', but as for the Lu functions I suspect that this is a documentation bug and the data are stored in the same way as for the LAPACK functions. Am I right?

On the other hand, the QR and LQ decomposition PLASMA functions returns a descT argument in the same sense as LAPACK routines returns the TAU vector. Is this suposition correct? I suppose also that I must allocate the correct amount of memory to descT before to call PLASMA_dgeqrf or PLASMA_dgelqf. Is this amount of memory the equivalent for a vector of length min(M,N), as for the LAPACK functions (this detail is not documented)? And if I need to transform the descT argument to a classical vector after the computation, how can I do it? Should I use the PLASMA_Tile_to_Lapack function or exists any way to extract the vector data from the descT variable.

And finally, suppose I have decomposed the matrix A=QR using PLASMA_dgeqrf and I have transformed the output descT to a regular vector variable TAU. Then I want to use PLASMA_dorgqr in order to reconstruct the Q matrix. So I need to transform the TAU vector again to a descT variable and I suppose I should use the function PLASMA_Tile_to_Lapack to do that. Should I use the same NB parameter for PLASMA_Tile_to_Lapack as it was used in the original PLASMA_dgeqrf function?

I think also the description of the K parameter in dorgqr.c contains a bug. It says that M >= K >= 0 but it should be N >= K >= 0, as it can be seen in http://www.netlib.org/lapack/double/dorgqr.f (M >= K >= 0 is the correspondent for dorglq function)

Cheers
jgpallero

Posts: 29
Joined: Sat Jul 28, 2012 12:12 pm

### Re: About LU et al. routines [and QR]

Thanks for pointing out the inconsistencies in the documentation.
My apologies. It's easy to get excited about the software and forget to keep the documentation in sync.
That being said, the T argument should be allocated using PLASMA's workspace allocation functions.
It cannot be translated to the TAU of LAPACK. It's not simply a matter of layout.
Allocate the workspace, provide it to DGEQRF. Pass what DGEQRF returns to DORMQR or DGEMQR.
Hope it helps.
Jakub

Posts: 84
Joined: Wed May 13, 2009 1:27 pm

### Re: About LU et al. routines [and QR]

Hello:

Thank you for your answer. And could be it possible to transform the descT to TAU Lapack vector in any other way via some computations, not simply transforming the struct layout?

And what about the way to use PLASMA_dgetrf_Tile() interface asked in my first post? I don't understand why is the reason because I obtain wrong results. I think it could be a misunderstand of the documentation, but I can't find the error

Cheers
jgpallero

Posts: 29
Joined: Sat Jul 28, 2012 12:12 pm

### Re: About LU et al. routines [and QR]

jgpallero wrote:Hello:

Thank you for your answer. And could be it possible to transform the descT to TAU Lapack vector in any other way via some computations, not simply transforming the struct layout?

Cheers

No, it is not possible.

jgpallero wrote:Hello:

And what about the way to use PLASMA_dgetrf_Tile() interface asked in my first post? I don't understand why is the reason because I obtain wrong results. I think it could be a misunderstand of the documentation, but I can't find the error

Cheers

I am sorry, I don't have the time to go through your example.
The only difference between the standard interface and the tile interface is the layout.
The tile interface takes the matrix in tiles and returns the results in tiles.
Jakub