Page 1 of 1

Bugs in DLASD8 and DLASD6

PostPosted: Thu Sep 18, 2014 5:09 pm
by forswg
We got a fatal DLASD4 error message when we run IpOpt: "On entry to DLASD4 paramater number -1 had an illegal value", we think the bug is in DLASD8, which calls DLASD4 and then handles the INFO=1 error return incorrectly (see line 282 in DLASD8 That error indicates that DLASD4 failed to converge and should have been reported as an INFO=1 return to DLASD8's caller. It should not have been passed to XERBLA (this will terminate the executable). We also found similar problems in DLASD6.

The cell graph in doxygen shows dlasda calls dlasd6 calls dlasd8, dlasda shows what we think is the correct behavior (see line 486 to line 507 in DLASDA Here it is just passing the info value from dlasd6 back (setting info =1 before returning might be cleaner). This method allows the higher-level subroutine that called dlasda to recover from the low-level convergence. We think the fix needs to be done in both dlasd8 and dlasd6.

Re: Bugs in DLASD8 and DLASD6

PostPosted: Wed Feb 04, 2015 6:56 pm
by sherm1
(Julie, Justin Si [forswg] has left Stanford so I'm following up on this -- Sherm)

I see that this is on the Lapack 3.5.0 Errata as bug 121. It is marked "CORRECTED?" in rev 1505. It was assumed to be the same as bug 115. I looked at the code changes in the rev 1505 checkin, and I don't think it fixes bug 121. The checkin had this comment (abbr)
Commit fix from Osni Marquez for Bug 115 reported from Duncan Po (Mathworks) ... I suspect that the problem reported in an e-mail sent by Justin Weiguang Si to on Sep 22 is also related to those, although Justin has not yet replied to my request for the offending matrix...

The bug has been traced to [d/s]laed6, which computes the root closest to the origin of a secular equation and is used in the D&C tridiagonal eigensolver (DSTEDC) and D&C least squares solver (DGELSD). We have interacted with Ren-Cang Li (the original developer of [d/s]laed6) about possible fixes, and I am attaching a new version of [d/s]laed6.
The bug was related to a too stringent tolerance convergence criterion, line 390 in laed6. ... corrected bugs: 115 and 121

The problem we reported is not a complaint about the failure to converge, but rather the error reporting. Our users encounter the problem sporadically. It is due to an intermediate matrix generated internal to a 3rd-party code we depend on (IpOpt), a nonlinear interior point optimizer that uses Lapack internally. IpOpt generates matrix approximations that may indeed become unsolvable, but IpOpt is equipped to deal with a Lapack failure by restarting and generating cleaner matrices. The problem is that it doesn't get a chance to do so because the INFO=1 error first caught in DLASD8 is not getting returned properly; due to what appears to be a fairly recent change in that code it causes a premature XERBLA call that aborts our executing application (not to mention that the error message itself is incorrect). We found that the code still in DLASDA
Code: Select all
  505             IF( info.NE.0 ) THEN
  506                RETURN
  507             END IF

appears to be the correct way to handle a lower-level failure, but the code in DLASD8 has been changed to:
Code: Select all
  281          IF( info.NE.0 ) THEN
  282             CALL xerbla( 'DLASD4', -info )
  283             RETURN
  284          END IF

which does not let the error bubble up to where it can be dealt with (the same problem exists in DLASD6).

We would propose that DLASD6 and DLASD8 be changed to report the info=1 failures up, like DLASDA, rather than aborting execution with XERBLA. We tried making these changes ourselves and that allowed IpOpt to recover gracefully.

Please feel free to email me at if you have questions.

Thanks and regards,
Sherm (Michael Sherman)

Re: Bugs in DLASD8 and DLASD6

PostPosted: Thu Feb 05, 2015 1:05 am
by Julien Langou
Hi Sherm,

(*) Thanks for pointing this out. Indeed calling XERBLA is a little "violent".

(*) Note that there is always a way for the users to overwrite XERBLA, so, for example if XERBLA is called with first INPUT parameter: DLASD4, then a XERBLA for IpOpt could have a specified intended behavior and that would be one way to do it.

(*) That said, as you are writing I agree that calling XERBLA might be a little "violent", and simply returning with an error code, so with INFO.GT.ZERO, could be much better and more in the LAPACK "spirit".

(*) We'll check with a few people before committing the changes, also we'll make sure that INFO.GT.ZERO is correctly propagated from DLASD4 all the way up the chain to the driver. (We have the case where INFO.GT.ZERO in a lower subroutine is erased by the routine just on top and the overall driver returns with INFO.EQ.ZERO but a totally false answer. So this needs to be checked.)

(*) Quick question: in your experience, simply commenting line 282 of DLASD8, and 420 of DLASD6 was all it took? Or did you need other modifications involved?

(*) All this said, it is quite concerning to hear that the code IpOpt "sporadically" generates matrices that are "unsolvable" with the current LAPACK SVD algorithm. I understand that by unsolvable you mean that DLASD8 returns INFO.EQ.1 and currently calls XERBLA. It would be great for us to have a sample of these matrices. If someone can take the time to extract a few of these matrices that would be great. We could see what patterns these matrices have, and try to understand why LAPACK SVD algorithm does not converge. It would really useful to have a few of these matrices at hand to try to understand and hopefully fix the problem. Maybe this is "simply" about increasing MAXIT in DLASD4. Currently we have it set at 400, this seems awfully large already, but maybe we still need to increase MAXIT.

With best regards,

Re: Bugs in DLASD8 and DLASD6

PostPosted: Fri Feb 06, 2015 4:42 pm
by Julien Langou
Hi Sherm,

We applied your suggestion in SVN revision #1525. Thanks!

Please see my previous post, it would be great if you can answer some of the questions. No worries if you do not.


Re: Bugs in DLASD8 and DLASD6

PostPosted: Wed Feb 18, 2015 5:50 pm
by sherm1
Hi, Julien. Sorry for the delayed response -- somehow I wasn't getting notified of posts to this topic. I have now subscribed.

Thank you very much for the fix in rev 1525. The changes look perfect. I will build a new library and send it to some of our users who are having problems; we still don't yet have a way to reproduce the problem in house. If we can get it failing here I will see if we can grab the matrix out of IpOpt and send it to you. However, it seems possible that you might already have fixed the IpOpt SVD convergence problem with rev 1505 fixing bug 115.