Example:
lapacke_dpotrf_work.c and lapacke_dpotrs_work.c ...
In row major order, the dpotrf layer:
i. transposes a on input into a_t
ii. calls dpotrf_ with a_t
iii. transposes output a_t back into a
the dpotrs layer:
i. transposes a_t on input into a_t and b into b_t
ii. calls dptrs_ with a_t and b_t
iii. transposes b_t back into b
By switching UPLO 'L' <--> 'U' in both layers you can work purely with 'a'
and remove need to work with a_t. b_t is still required however.
Similar tricks can be made in many other cases, e.g.:
SVD by switching U<-->VT, m<-->n;
orcsd and uncsd : has 'trans' argument so no transposes required at all.
In some cases there is a change in algorithm between UPLO='L' and 'U' so
the wrappers would need to be consistent between parts of a set (as dpotrf,dpotrs above).
It would be a nice project for someone to work on these and produce benchmarks on
performance improvements and memory savings that these changes would make.