## A Bug in DGELSD?

Open discussion regarding features, bugs, issues, vendors, etc.

### A Bug in DGELSD?

I think that there is a bug in the LAPACK function DGELSD (to be more precise, in the function DLALSD which is called from DGELSD).

The problem I encountered is that I cannot pass the argument RCOND as zero in DGELSD. But in the documentation of DGELSD it is written that RCOND can be zero.

Code: Select all
*  RCOND   (input) DOUBLE PRECISION*          RCOND is used to determine the effective rank of A.*          Singular values S(i) <= RCOND*S(1) are treated as zero.*          If RCOND < 0, machine precision is used instead.

Here is the DGELSD prototype:
Code: Select all
      SUBROUTINE DGELSD( M, N, NRHS, A, LDA, B, LDB, S, RCOND, RANK,     \$                   WORK, LWORK, IWORK, INFO )

In the DGELSD code, if RCOND is zero it is immediately replaced by EPS and this behavior is not documented. This replacement is done in DLALSD.

Code: Select all
**     Set up the tolerance.*      IF( ( RCOND.LE.ZERO ) .OR. ( RCOND.GE.ONE ) ) THEN         RCOND = EPS      END IF

I think that in place of
Code: Select all
RCOND.LE.ZERO
above there should be
Code: Select all
RCOND.LT.ZERO

This small bug implies another problem. Since I cannot change LAPACK code (I use the MKL library from Intel in binary form) and I find some cases where RCOND = 0 is useful for me, I try to assign a small number to RCOND, say, RCOND = 1e-32. But even with this small value DGELSD still does not work as expected.

In my program I want to find the least-squares solution to this matrix
Code: Select all
        [ 1  0      1]A = [ 1  1e-20  0]    [ 0  0      1]

and to this right-hand side
Code: Select all
b = [0 1 0].

This is a simplified example of a much larger system I wanted to solve. Solving this problem by hand it is easy to find that the exact solution is
Code: Select all
x = [0 1e+20 0]

So I expected a similar solution from DGELSD with RCOND = 1e-32. I checked that the singular values of the matrix A are
Code: Select all
s = [1.73205  1.  5.77350e-21]

so this value of RCOND is small enough and it should not replace the smallest singular value 5.77350e-21 with zero. But the solution given by DGELSD is always the same
Code: Select all
x1 = [0.666667  0  -0.333333]

independently of what value of RCOND I choose. It also replaces the smallest singular value by zero. And to be sure that it is not a bug in MKL I linked my program with the original CLAPACK (ver. 3.0) library. The result is still the same.

I suspect that the problem is in the function DLALSD where in many places EPS (= DLAMCH( 'Epsilon' )) is used instead of TOL (but I can be wrong here). TOL is defined in DLALSD as
Code: Select all
TOL = RCOND*ABS( D( IDAMAX( N, D, 1 ) ) )

I am interested if this is a known problem.

Zbigniew
zibi14

Posts: 9
Joined: Tue Jan 30, 2007 6:57 pm
Location: College Station, Texas

Hello,

There are two questions in one in your email.

First, yes there is an error in the documentation of the LAPACK routine DGELSD. You

Code: Select all
*  RCOND   (input) DOUBLE PRECISION*          RCOND is used to determine the effective rank of A.*          Singular values S(i) <= RCOND*S(1) are treated as zero.*          If RCOND < =0 or RCOND >= 1, machine precision is used instead.

You suggest that we allow RCOND = 0 and that it would be useful for you but I do not
understand what you expect the behavior of the routine to be? For me LAPACK DGELSD
code is ok and the comment is not. Let me know if you agree.

Second, you complain that, for your matrix and RHS, the LAPACK routine DGELSD does
not do the job. Reasonning in exact arithmetic you are absoltutely correct. Now you
need to know that LAPACK routine DGELSD relies on a normwise stable SVD
computation that is to say the error term in sigma_i , the i-th singular value, is of the order
of sigma_1*epsilon where epsilon is the machine precision. When sigma_i is close from
sigma_1, you do not care much. Well when sigma_i is far from sigma_1, you begin to see
the difference between normwise accuracy and high relative accracy.

For example in your particular case, the SVD returned by DGELSD (check the array
s) is on my computer:
Code: Select all
    [ +1.732051e+00 ]s = [ +1.000000e+00 ]    [ +0.000000e+00 ]

s(3) shoud be 5.7e-21, and the computed value is fl(s(3))=0.0e+00.
If you think in term of relative accuracy, this is completely inaccurate, But notice that
this is normwise accurate since:
Code: Select all
|fl (s(3)) - s(3) | ~ 5.7e-21<= O(eps).s(1)

I think now you understand why DGELSD truncate s(3) while RCOND=1e-32 whereas
exact arithmetic tells you that s(3) should have been kept.

Remark list:
1. Now you should understand why the default value for RCOND in DGELSD is EPS.
DGELSD (same for DGELS) have no way to guarantee smaller RCOND than EPS.
2. In your particular case DGELSY (QR with column pivoting) give an ok solution. This is
not the recommended way to do it though. It is kind of odd that DGELSY give better
results than DGELSD.
3. Zlatko Drmac is working on SVD algorithm with high relative accuracy. This should
provide a reliable way to tackle this kind of problem in the future.
4. Using Remi's wrapper: http://icl.cs.utk.edu/~delmas/lapwrapmw.htm enables
you to have acess from Matlab to DGELS, DGELSD, DGESLY and DGELSS if you want
to play with all this.

Julien
Julien Langou

Posts: 835
Joined: Thu Dec 09, 2004 12:32 pm
Location: Denver, CO, USA

Julien,

Thank you for your detailed explanation.

Julien Langou wrote:
You suggest that we allow RCOND = 0 and that it would be useful for you but I do not understand what you expect the behavior of the routine to be? For me LAPACK DGELSD code is ok and the comment is not. Let me know if you agree.

This is what I would like to suggest. I do not understand why it is not possible to use RCOND = 0. What is there in the DGELSD code that does not permit to use this value of RCOND? And I think that the documentation is correct and the problem is in the code.

As to what I want to get from DGELSD when RCOND = 0. I would like to get this solution to my least squares problem
Code: Select all
x = [0 1e+20 0]

and not this one
Code: Select all
x1 = [0.666667  0  -0.333333]

The fact that I used RCOND = 1e-32 is because of the above limitation. I realy would not have used this value of RCOND if RCOND = 0 were permitted. I think that your explanation why RCOND = 1e-32 is replaced by EPS in DGELSD is reasonable and I do not complain about this DGELSD behavior although initially it was difficult for me to understand it.

Zbigniew
zibi14

Posts: 9
Joined: Tue Jan 30, 2007 6:57 pm
Location: College Station, Texas

Dear Zbigniew,

Julien's explanation is correct, in double precision arithmetic you cannot expect the SVD routines to find s(3) accurately. Like it or not, in double precision arithmetic the numerical rank of your matrix is 2 so that the least squares solution is not unique and DGELSD finds the minimum norm solution.

LAPACK is a numerical software package. If you want to treat your data as exact, then you need to use a symbolic package such as Maple or Mathematica. Is your actual data exact, or is it experimental data subject to errors?

Best wishes,

Sven Hammarling.
sven

Posts: 146
Joined: Wed Dec 22, 2004 4:28 am

Dear Sven,

Thank for your reply. Yes, you are right about SVD and DGELSD. The problem I posted seems to be easy and you can solve it by hand, but it is a surprise when you get a different answer. But after analyzing how DGELSD obtained the result it is evident that we cannot get anything else because the smallest singular value is too small with respect to the largest singular value and, therefore, is replaced by zero. By the way, my data are not experimental. They are test data which I used for testing DGELSD.

Unfortunately, although your reply answered the question about why DGELSD returns the result which it returns I still cannot understand why RCOND = 0 is incorrect or unavailable and why RCOND is replaced by EPS inside DGELSD. I do not think that such a replacement is necessary. Now I see that the example I posted is not a good one, and moreover, it shifted the meaning of my post into analyzing this particular problem which was not my goal. In the meantime I will try to find a better example.

Zbigniew
zibi14

Posts: 9
Joined: Tue Jan 30, 2007 6:57 pm
Location: College Station, Texas

Dear Zbigniew,

By looking at the definition of RCOND, RCOND=0 implies that you want to take
any nonzero singular value into account when solving the least-square problem.

In one hand it's dangerous. Keep in mind that we use the SVD: A= (Ur Sr Vr^T)
where r is determined with the numerical rank of A defined with RCOND and then
get the pseudo inverse via: A^{\dagger} = Vr*inv(Sr)*Ur^T So having small
entries in Sr is not a good idea. It polutes your solution with components
that are essentially in the null space of A. (So the need for a user-defined
threshold: RCOND.)

In the other hand, assume that you really know what you are doing or do not
really know actually, but for sure you want to avoid the ZERO singular values
and keep all the others whetver they are, then RCOND=0 makes sense.

Allowing RCOND to ZERO is not that worse that allowing RCOND to 2.3E-308 ...
and reveals a clear and neat intention from the user (use any nonzero singular
value in the solve). We might interpret RCOND=ZERO as any singular value that
does not overflow when inverted is accepted.

So you might indeed be in the right, when you say that the comments of the
LAPACK routine are correct but the code is not. (I was saying the opposite.)

I alert other lapackers and we come back to you. For sure something need to be
change.

(BTW, to play with RCOND without fighting with the numerical SVD problem,
simply take a diagonal matrix.)

Julien.
Julien Langou

Posts: 835
Joined: Thu Dec 09, 2004 12:32 pm
Location: Denver, CO, USA

Thanks for pointing out this problem, actually 2 problems

(1) I agree with Julien that the comments and code of DGELSD do not agree and should.
But before we change it we should look at similar logic elsewhere in LAPACK
and try to handle such tolerances as consistently as possible. RCOND in DGELSS
and ABSTOL in DSTEBZ come to mind.

(2) You are right that DGELSD lets the user pick a tiny RCOND and then uses
EPS = DLAMCH('E') internally to decide on convergence criteria etc that basically
override RCOND. So in some sense this is false advertising. But letting the user
change EPS, say, is not reasonable because the code would become quite unreliable,
because the algorithm cannot get better accuracy with any guarantee. This is different
from DSTEBZ, whose different algorithm permits a different error analysis,
so that in fact we can guarantee convergence for almost tiny nonzero ABSTOL
(this does not mean the eigenvalues are determined this accurately, just that the code
terminates with a "good" approximation.)

Depending on your matrix structure, it may or may not be possible to get tiny
singular values accurately. Your matrix had the property that scaling the columns to
make then have the same norm makes the matrix well-conditioned. In this case
perturbations to each column, that are small in norm compared to the column they
are perturbing, do only make small changes in the tiny singular values, and some
(but not all) algorithms can get them this accurately. DGELSD is not guaranteed.
Neither is the slower routine DGELSS, but it often works, especially if you sort the
columns so they are in order of decreasing norm. We have a guaranteed (but still
slower) algorithm in the works, based on Jacobi's method, but it is not ready yet.

(1) See if your matrix has the scaling property I mentioned, and try DGELSS.
(2) Run LAPACK in higher precision. We have a version in development, but
(3) Wait for Jacobi.

Jim Demmel
JimDemmel

Posts: 4
Joined: Tue Jan 09, 2007 2:17 pm