I am planning a fit that might require quadruple precision to find the least-squares solution so I took some simple steps to implement a quadruple-precision build and test of lapack/blas. I am posting those steps and test results here for further comment.
The basic idea is to use the gcc-4.6.1 option, -fdefault-real-8, to interpret real, complex, double precision, and double complex variable types and constants as 64-bit real, 128-bit complex, 128-bit real, and 256-bit complex. But that option interprets definite types just as they are with no doubling of the precision. The only definite type in the code that I could find was COMPLEX*16. I used the following commands to convert that type to the indefinite DOUBLE COMPLEX type, that can be interpreted in "doubled" form by the -fdefault-real-8 option:
cp -a lapack-3.3.1 lapack-3.3.1_double_complex
for FILE in $(find lapack-3.3.1_double_complex -name "*\.f*"); \
do echo $FILE; NAME=$(echo $FILE|sed "s?^.*/??"); echo $NAME; \
sed 's?COMPLEX\*16?DOUBLE COMPLEX?' <$FILE >| /tmp/$NAME; \
mv -f /tmp/$NAME $FILE; done
I then built and tested the code as follows:
export FC="gfortran-4.6"
# N.B. -fdefault-real-8 doubles all precision interpretation of non-definite types.
# -fPIC allows shared libraries to link to static library versions of lapack/blas
# and -fixed-line-length-132 allows the longer "DOUBLE COMPLEX" strings above
# not to overflow the allowed line length.
export FFLAGS="-O3 -fdefault-real-8 -fPIC -ffixed-line-length-132"
mkdir build_quadruple_dir
cd build_quadruple_dir
cmake ../lapack-3.3.1_double_complex >& cmake.out
make VERBOSE=1 -j4 >& make.out
ctest --verbose --timeout 36000 >& ctest_quadruple_verbose.txt
I have attached ctest_quadruple_verbose.txt.gz (and also the equivalent ctest_double_verbose.txt.gz as a comparison for the case when the -fdefault-real-8 option is not used).
All tests passed for both the quadruple and double precision cases. However, lapack/blas tests can pass even though individual components of the tests fail. Here are the failing individual test messages.
From ctest_quadruple_verbose.txt
15: DGB drivers: 6 out of 30969 tests failed to pass the threshold
16: ZGB drivers: 6 out of 30969 tests failed to pass the threshold
24: SST: 1 out of 4662 tests failed to pass the threshold
29: SXV drivers: 200 out of 5000 tests failed to pass the threshold
43: CST: 1 out of 4662 tests failed to pass the threshold
48: CXV drivers: 24 out of 5000 tests failed to pass the threshold
67: DXV drivers: 200 out of 5000 tests failed to pass the threshold
81: ZST: 1 out of 4662 tests failed to pass the threshold
86: ZXV drivers: 24 out of 5000 tests failed to pass the threshold
100% tests passed, 0 tests failed out of 98
From ctest_double_verbose.txt:
24: SST: 1 out of 4662 tests failed to pass the threshold
25: SBD: 1 out of 5510 tests failed to pass the threshold
29: SXV drivers: 37 out of 5000 tests failed to pass the threshold
43: CST drivers: 1 out of 11664 tests failed to pass the threshold
43: CST: 1 out of 4662 tests failed to pass the threshold
62: DST: 1 out of 4662 tests failed to pass the threshold
62: DST: 1 out of 4662 tests failed to pass the threshold
62: DST drivers: 1 out of 14256 tests failed to pass the threshold
67: DXV drivers: 200 out of 5000 tests failed to pass the threshold
81: ZST: 1 out of 4662 tests failed to pass the threshold
81: ZST: 1 out of 4662 tests failed to pass the threshold
81: ZST: 1 out of 4662 tests failed to pass the threshold
81: ZST: 1 out of 4662 tests failed to pass the threshold
86: ZXV drivers: 24 out of 5000 tests failed to pass the threshold
100% tests passed, 0 tests failed out of 98
Could somebody knowledgeable comment on those individual failing tests?
If it turns out those individual failing tests are reasonable/expected, then it appears that thanks to the gfortran-4.6.1 option, -fdefault-real-8, that all Linux developers here will have access to a working quadruple-precision version of lapack/blas. However, there is one major caveat; all the 128-bit real and 256-bit complex tests took something like a factor of 100 (!) longer to complete on my ordinary Intel 64-bit box than the corresponding 64-bit real and 128-bit complex tests. So unless your computer has hardware support for 128-bit real and 256-bit complex arithmetic, results from quadruple-precision lapack/blas for those types will be extremely slow and therefore only useful as a last resort if ill-conditioning is killing you for the default-precision lapack/blas.