## Least-squares methods for rank deficient matrices

Open discussion regarding features, bugs, issues, vendors, etc.

### Least-squares methods for rank deficient matrices

Hello,

My apologies if the question sounds too trivial or lacks of a rigorous methodology but I am not a mathematician.
My problem is the following.
I need to iteratively solve linear systems AX=B in a least-squares sense, where A and B are built from a stream of data coming from an external source.
So, since I have no control over the data, it happens that sometimes A is rank deficient and ill-conditioned.

From this post, I know that for such cases the best method to use is xGELSD.
However, I've noted that from that discussion the xGELSS method is missing
So my first question is: How does xGELSS compare with respect to xGELSD?

From the doc, they appear quite similar. So, to investigate further, I've created a C++ program that applies the four LAPACK methods (xGELS, xGELSD, xGELSS, xGELSY) to some test cases and evaluate their accuracy by computing the norm-2 of the residuals B-AX.
I've attached a ZIP file containing:
- lsq_rank_deficient.cpp: The C++ source code.
- lsq_rank_deficient.mk: a makefile for Linux systems
The makefile ca be used as follws:
Code: Select all
`\$ make -f lsq_rank_deficient.mk`

Please ignore all the test cases but the #4 (function test_case_4) which uses a "monster" A matrix (that I obtained in my real-world experiments) which has a dimension of 200x228. a rank of 144, and a very bad scaling with values ranging from 1.9e-40 to 1.
Under my system (Fedora Linux 22 64bit - GCC 5.1.1 - LAPACK 3.5.), for the test case #4 I've obtained the following results (on the left I show the method used to solve the system, while on the right there is the corresponding norm of residuals):

LAPACK GELS: 5.31935e+11
LAPACK GELSD: 0.0313709
LAPACK GELSS: 55143.1
LAPACK GELSY: 2.75552

It's seems the winner wrt accuracy is GELSD. But I'm not really convinced about its correctness due to the differences between the various SVD-based methods.
So, here the second question: Is it possible that the results differ so much?

Also, just for curiosity, I've tried to solve the same system with other programs. Here below the results:

Octave 3.8.2 (A\B): 0.0161223301978318
Octave 3.8.2 (pinv(A)*B): 0.0146152437722315
MATLAB 2012b (A\B): 0.014615243771877
MATLAB 2012b (pinv(A)*B): 0.014615243772290
Mathematica 10 (LeastSquares): 0.0146152

These results are quite similar to the ones obtained with GELSD, but I'm surprised that all of them are close to each other (but the one obtained with Octave's A\B method).
So, here my third and last question. According to your opinion, since my results differ from those above, am I doing something wrong in using LAPACK functions?

Thank you very much for your help.

Marco
Attachments lsq_rank_deficient.zip
C++ program (and Makefile) used for evaluating the different methods available in LAPACK to solve LSQ problems 