That's really huge matrices you are speaking about there ... You are right there is a problem in this case. You have hit one of the few LAPACK routine that requires huge workspace. For the Divide and Conquer SVD routine, the required workspace is larger than the input matrix itself (in the square case). Our LWORK mechanism was not meant to handle that huge workspace with that huge matrices, in case of huge workspace (Divide and Conquer) and huge matrices, then you can potentially overflow the 32-bit INTEGER in which the computation are done for the workspace mechanism query (LWORK=-1). This is a good point. Moreover, assuming that you allocate the relevant workspace size, I have no idea how you can say it to LAPACK xGESDD with LWORK .... LWORK will overflow as well. Oups problem. If you can recompile LAPACK quickly by forcing all the INTEGER to be 64-bit by default (and not 32) that will fix the problem (I think, I do not know how to do this though, some compilers have a flag for this I believe).
So I am adding this to the bug list. Thanks for reporting this. See:http://www.netlib.org/lapack/Errata/
(send me an email for last name and affiliation if relevant)
For information, in your case (JOBZ='A', M=N=20020) using the default ILAENV, the value of the required workspace size (to use the defualt block size) is: 1202541340. This is smaller (slightly) than 2^31, so I am not sure why you see some overflow at this level. Anyway, this is a good point.)
You wrote that LAPACK for an SVD of an 1,000-by-1,000 matrix was taking hours. You meant 10,000-by-10,000? Otherwise that's kind of slow, are you sure you are using an optimized BLAS library?