Modification of Diagonalization Algorithms.

..........

..........

..........

To overcome limitations imposed by the 32-bit architecture of the processor, the theory presented in [5] was developed. It required considerable revision of many functions of LAPACK package. These modifications enhanced the performance of diagonalization algorithms in [13] with respect to two parameters: calculation rate (by up to 80%) and demand for computer resources (by a factor of 6). A decrease in the computer resource used allowed calculation of more complex systems. New diagonalization algorithms are also efficient on 64-bit processors.

..........

..........

..........

However, in the LAPACK code, certain nontrivial inaccuracies concerning efficient use of block methods for matrix processing have not yet been eliminated. They considerably reduce the performance of the code obtained by assembling the programs that use the LAPACK package.

Let us consider as example the required modification of the dlarfb function from the LAPACK package (the latest version, 3.2.2). The performance of this function is mainly determined by two calls of the matrix multiplication function dgemm. The choice of this example is governed by the fact that the dlarfb function has numerous applications, not restricted to diagonalization of real symmetric matrices. Let us isolate from the code of the dlarfb function the block responsible for setting the initial symmetric real matrix in the lower triangle of a square matrix, and let us extract from this block the call to the dgemm function responsible for slowing down of the diagonalization program.

CALL DGEMM('Transpose', 'No transpose',

$ LASTC, K, LASTV-K,

$ ONE, C(K+1, 1), LDC, V(K+1, 1), LDV,

$ ONE, WORK, LDWORK)

Here two matrices are multiplied: In the first matrix of dimension LASTC* (LASTV-K), the number of rows is smaller than the number of columns, and in the second matrix of dimension (LASTV-K)* K, the number of columns K is very small (64128 at optimal use of dlarfb function). Because of the small number of columns in the second matrix, the processor cache is used extremely inefficiently. The problem can be solved by passing to multiplication of the transposed matrices (corprespondingly, the multipliers switch places). Then multiplication of the matrices will be set by the following code:

CALL DGEMM('Transpose', 'No transpose',

$ K, LASTC, LASTV-K,

$ ONE, V(K+1, 1), LDV, C(K+1, 1), LDC,

$ ONE, WORK, K)

Now the number of columns in the second matrix in the course of calculation of eigenvectors of the initial real symmetric matrix from the eigenvectors of the tridiagonal matrix will be constant and equal to LASTC >> K. In this case, efficient use of the processor cache will be ensured.

..........

..........

..........

Such modification of the code, with proper matrix multiplication strategy, increases the multiplication rate by no less than 35% for the highly optimized code (on old processor, the gain will be still more significant). For the initial dgemm code from the LAPACK package, the difference reaches 45%.

..........

..........

..........

The developers of the LAPACK package and the INTEL Company using the LAPACK package in its work were informed on the possibility of considerable acceleration of the diagonalization procedure.

..........

..........

..........

1. Semenov, S.G. and Sigolaev, Yu.F., Zh. Obshch. Khim., 2006, vol. 76, no. 6, p. 1006.

2. Semenov, S.G. and Sigolaev, Yu.F., Zh. Obshch. Khim., 2006, vol. 76, no. 11, p. 1809.

3. Semenov, S.G. and Sigolaev, Yu.F., Zh. Obshch. Khim., 2007, vol. 77, no. 5, p. 780.

..........

..........

..........

5. Wilkinson, J.H., The Algebraic Eigenvalue Problem, Oxford: Clarendon, 1965.

..........

..........

..........

Legendary intelligence officer Drozdov was nicknamed «Fabergé» owing to his unique capability to work with information, to get information, and to convert it into the most precious treasures.

-------------------------------------------------