Page 1 of 1

performance and algorithm used in DGBTRF

PostPosted: Tue Mar 29, 2005 8:47 am
by weitzel
Hi,

I am struggling with the details of the LU factorization algorithm for band matrices implemented in dgbtrf.f. Referring to the comments the partial pivoting (row interchanges) technique is used.

On entry, the caller has to reserve some space for KL (/lower bandwidth) additional diagonals and it's clear why... On exit the routine returns with a LU factorization having THE SAME lower bandwidth KL and the new upper bandwidth KU.

What trick is used here? Why doesn't the lower bandwidth change? A standard LU factorization spoils the lower bandwidth - for example:

Code: Select all
4x4 matrix A, lower bandwidth 1, upper bandwidth 1

   7   7   0   0
   7   7   3   0
   0   4   8   2
   0   0   6  10

step/row 1:
 - pivot is A(1,1)=7 (no row interchange)
 - divide first column beginning in row 2 through pivot
 - compute in A(i=2..4,j=2..4): A(i,j) = A(i,j) - A(i,1)*A(1,j)
 - in the example only A(2,2) is affected, resulting in:

   7   7   0   0
   1   0   3   0
   0   4   8   2
   0   0   6  10

step/row 2:
 - pivot is A(3,2)=4, interchange rows 2 and 3; LOWER BANDWIDTH INCREASES
   (upper bandwidth also increases)
 - divide second column beginning in row 3 through pivot
 - compute in A(i=3..4,j=3..4): A(i,j) = A(i,j) - A(i,2)*A(2,j)
 - only the interchange-step has an effect:

   7   7   0   0
   0   4   8   2
   1   0   3   0
   0   0   6  10

step/row 3:
 - pivot is A(4,3)=7, interchange rows 3 and 4; LOWER BANDWIDTH INCREASES again
 - elimination results in:
 
   7   7   0    0
   0   4   8    2
   0   0   6   10
   1   0   0.5 -5


What is the worst-case running-time complexity of the factorization running on a n-by-n-matrix? (sorry! index computations for band storage and calls to subroutines makes the code a bit hard to read)
- O( n**2 * bandwidth**2 ) ???
- O( n * bandwidth**2 ) ???

thanks a lot!

PostPosted: Tue Mar 29, 2005 1:37 pm
by Julien Langou
Hello,

Why doesn't the lower bandwidth change? A standard LU factorization spoils the lower bandwidth


The LU factorization you give is completely correct in the framework of general matrices and as you have observed the banded structure of the matrix A can be completely lost in the L factor due to row interchange in the partial pivoting.

The answer of your question can be found for example in
  1. Proposition 2.4, p.80 -- Demmel (1997), Applied Numerical Linear Algebra, SIAM.
  2. Theorem 4.3.3, p.154 -- Golub and van Loan (1996). Matrix Computations. Third Edition. The Johns Hopkins University Press.

For banded matrix the storage of the LU factors is slightly different than for the general case. It takes into account the property that:
  1. U is banded with upper bandwidth KL+KU
  2. L is essentially banded with lower bandwidth KL. This means that L has at most KL+1 nonzeros in each column
So that the LU factors can be stored in a 2*KL+KU+1 array.
The exact position of the entries of L are recoverred with IPIV.

For example in your case, LAPACK will return the LU factors and IPIV as:
Code: Select all
 LUB = [
  *    *   0      2         second upper diagonal of U (fill-in due to row interchange)
  *    7   8    10        first upper diagonal of U
  7    4   6     -5        diagonal terms of U
  1    0   0.5   * ];     first lower diagonal of L (but you need to rearrange it with IPIV to get the "exact" L)

  IPIV =[  1 3 4 4];

What is the worst-case running-time complexity of the factorization running on a n-by-n-matrix? (sorry! index computations for band storage and calls to subroutines makes the code a bit hard to read)
- O( n**2 * bandwidth**2 ) ???
- O( n * bandwidth**2 ) ???


O( n * bandwidth**2 ), more precisely, L and U can be computed in 2n.KL.KU operations.

Julien

PostPosted: Wed Mar 30, 2005 4:18 am
by weitzel
Hi Julien,

many thanks for your answer although it frustrated me a bit: I haven't yet studied the literature you gave (I have only access to the 2nd Ed. of Golub and van Loan; Demmel should be available) but it seems I have rediscovered a very similar algorithm. After LU decomposition my matrix looks the same (the obvious remedy for the bandwidth issue with L is to prevent the row interchanges).

My forward-step is different and probably prevents the swapping performed by DSWAP but needs some extra initialization.

btw: did you notice that the last element of IPIV is never used and IPIV(N) is always N?

btw^2: a general question: why does LAPACK use BLAS' subroutines for basic operations like row interchanging? For readability and code-size? Preventing a function call can save a few clocks...

thanks again,
Michael