Page 1 of 1

remove unnecessary transpositions from lpacke_?_work layer

PostPosted: Tue Dec 10, 2013 7:14 am
by lawrence mulholland
lapacke_dpotrf_work.c and lapacke_dpotrs_work.c ...

In row major order, the dpotrf layer:
i. transposes a on input into a_t
ii. calls dpotrf_ with a_t
iii. transposes output a_t back into a
the dpotrs layer:
i. transposes a_t on input into a_t and b into b_t
ii. calls dptrs_ with a_t and b_t
iii. transposes b_t back into b

By switching UPLO 'L' <--> 'U' in both layers you can work purely with 'a'
and remove need to work with a_t. b_t is still required however.

Similar tricks can be made in many other cases, e.g.:
SVD by switching U<-->VT, m<-->n;
orcsd and uncsd : has 'trans' argument so no transposes required at all.

In some cases there is a change in algorithm between UPLO='L' and 'U' so
the wrappers would need to be consistent between parts of a set (as dpotrf,dpotrs above).

It would be a nice project for someone to work on these and produce benchmarks on
performance improvements and memory savings that these changes would make.

Re: remove unnecessary transpositions from lpacke_?_work lay

PostPosted: Tue Dec 10, 2013 4:10 pm
by Julien Langou
Hi Lawrence,

I agree with you. (This was actually mentioned in our discussion during the design of LAPACKE.) "One" could do this. That (1) would be "fun" and (2) it would be useful by saving (quite a lot of) memory and time. Yep: DGESVD, DPOTRF, DORMQR, etc. All these could be written with this in mind.

The idea is the same as for the CBLAS (C interface to the BLAS). Actually, the CBLAS supports Row Major Format and Column Major Format by (1) relying only on a Column Major Format implementation, and (2) not performing any memory allocation. This is done just by playing with order of operands and transpose arguments and tricks like this. It's fun (and useful). Beautiful.

Well anyway, yes, this is a good idea, and so I added it to our Wish List. This wish list item should already have been list, but never made it.
We might come up to it at some point.

In the wish list as well, and on a related topic, we also wanted to try to have the inplace transposition algorithm of Fred Gustavson, Lars Karlsson and Bo Kågström. (See "Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion", ACM TOMS 2012.) The algorithm is in PLASMA already thanks to Mathieu Faverge, and it would be nice to have it in LAPACK, and that LAPACKE uses it for transposition. When no trick is possible in the layer and transposition is necessary, this would avoid memory allocation.


Re: remove unnecessary transpositions from lpacke_?_work lay

PostPosted: Tue Dec 10, 2013 7:20 pm
by lawrence mulholland
Hi Julien,

very glad to here that this as made the wish list.

The reference for the in-place matrix conversion algorithm will certainly be of interest.
Looks like there are plenty of obvious applications, thanks for this.