LAPACK Archives

[Lapack] LAPACK 3.5 -- xSYSV_ROOK

Hi Bobby,

When I was working on this I wrote level three versions of the solve. Well, it 
forms L, does the solve, and then restores A. Even this crude approach was many 
times faster than the level 2 I recall. Although I never understood why the 
factorization routine couldn't just return L is the first place.

Anyway the code below in case of interest.

Craig




     SUBROUTINE L3_ROOKDSYTRS( UPLO, N, NRHS, A, LDA, IPIV, B, LDB,
     $                          WORK, INFO )
*
*     Craig Lucas
*     The Numerical Algorithms Group.
*     October 2008
*
*  -- LAPACK routine (version 3.1) --
*     Univ. of Tennessee, Univ. of California Berkeley and NAG Ltd..
*     November 2006
*
*     .. Scalar Arguments ..
      INTEGER            INFO, LDA, LDB, N, NRHS
      CHARACTER          UPLO
*     ..
*     .. Array Arguments ..
      DOUBLE PRECISION   A( LDA, * ), B( LDB, * ), WORK( * )
      INTEGER            IPIV( * )
*     ..
*
*  Purpose
*  =======
*
*  rookDSYTRS solves a system of linear equations A*X = B with a real
*  symmetric matrix A using the factorization A = U*D*U**T or
*  A = L*D*L**T computed by rookDSYTRF or fastbpDSYTRF. This is a
*  Level 3 BLAS version.
*
*  Arguments
*  =========
*
*  UPLO    (input) CHARACTER*1
*          Specifies whether the details of the factorization are stored
*          as an upper or lower triangular matrix.
*          = 'U':  Upper triangular, form is A = U*D*U**T;
*          = 'L':  Lower triangular, form is A = L*D*L**T.
*
*  ** ONLY UPLO = 'L' IS IMPLEMENTED **
*
*  N       (input) INTEGER
*          The order of the matrix A.  N >= 0.
*
*  NRHS    (input) INTEGER
*          The number of right hand sides, i.e., the number of columns
*          of the matrix B.  NRHS >= 0.
*
*  A       (input) DOUBLE PRECISION array, dimension (LDA,N)
*          The block diagonal matrix D and the multipliers used to
*          obtain the factor U or L as computed by rookSSYTRF or
*          fastbpSSYTRF.
*
*  LDA     (input) INTEGER
*          The leading dimension of the array A.  LDA >= max(1,N).
*
*  IPIV    (input) INTEGER array, dimension (N)
*          Details of the interchanges and the block structure of D
*          as determined by rookDSYTRF or fastbpDSYTRF.
*
*  B       (input/output) DOUBLE PRECISION array, dimension (LDB,NRHS)
*          On entry, the right hand side matrix B.
*          On exit, the solution matrix X.
*
*  LDB     (input) INTEGER
*          The leading dimension of the array B.  LDB >= max(1,N).
*
*  WORK    (input) DOUBLE PRECISION array, dimension (N)
*          Workspace.
*
*  INFO    (output) INTEGER
*          = 0:  successful exit
*          < 0:  if INFO = -i, the i-th argument had an illegal value
*
*  =====================================================================
*
*     .. Parameters ..
      DOUBLE PRECISION   ONE
      PARAMETER          ( ONE = 1.0D+0 )
*     ..
*     .. Local Scalars ..
      DOUBLE PRECISION   AK, AKM1, AKM1K, BK, BKM1, DENOM
      INTEGER            J, K, KP
      LOGICAL            UPPER
*     ..
*     .. External Functions ..
      LOGICAL            LSAME
      EXTERNAL           LSAME
*     ..
*     .. External Subroutines ..
      EXTERNAL           DCOPY, DSCAL, DSWAP, DTRSM, XERBLA
*     ..
*     .. Intrinsic Functions ..
      INTRINSIC          MAX
*     ..
*
*     .. Executable Statements ..
      INFO = 0
      UPPER = LSAME( UPLO, 'U' )
      IF( .NOT.UPPER .AND. .NOT.LSAME( UPLO, 'L' ) ) THEN
         INFO = -1
      ELSE IF( N.LT.0 ) THEN
         INFO = -2
      ELSE IF( NRHS.LT.0 ) THEN
         INFO = -3
      ELSE IF( LDA.LT.MAX( 1, N ) ) THEN
         INFO = -5
      ELSE IF( LDB.LT.MAX( 1, N ) ) THEN
         INFO = -8
      END IF
      IF( INFO.NE.0 ) THEN
*
         CALL XERBLA( 'l3_rookDSYTRS', -INFO )
*
         RETURN
      END IF
*
*     Quick return if possible
*
      IF( N.EQ.0 .OR. NRHS.EQ.0 )
     $   RETURN
*
      IF( UPPER ) THEN
*
         RETURN
*
      ELSE
*
*        Solve A*X = B, where A = L*D*L'.
*
*        First copy out lower diagonal to workspace, and form L in
*        lower triangle of A. Zero D( 2, 1). Permute B.
*
*        K is the main loop index, increasing from 1 to N in steps of
*        1 or 2, depending on the size of the diagonal blocks.
*
         CALL DCOPY( N-1, A( 2, 1 ), LDA+1, WORK( 1 ), 1 )
         K = 1
   10    CONTINUE
*
*        If K > N, exit from loop.
*
         IF( K.GT.N )
     $      GO TO 20
*
         IF( IPIV(K).GT.0 ) THEN
*
*           1 x 1 diagonal block
*
            KP = IPIV( K )
            IF( KP.NE.K ) THEN
               CALL DSWAP( K-1, A( K, 1 ), LDA, A( KP, 1 ), LDA )
               CALL DSWAP( NRHS, B( K, 1 ), LDB, B( KP, 1 ), LDB )
            END IF
            K = K + 1
         ELSE
*
*           2 x 2 diagonal block
*
            KP = -IPIV( K )
            IF( KP.NE.K ) THEN
                CALL DSWAP( K-1, A( K, 1 ), LDA, A( KP, 1 ), LDA )
                CALL DSWAP( NRHS, B( K, 1 ), LDB, B( KP, 1 ), LDB )
            END IF
            KP = -IPIV( K+1 )
            IF( KP.NE.K+1 ) THEN
               CALL DSWAP( K-1, A( K+1, 1 ), LDA, A( KP, 1 ), LDA )
               CALL DSWAP( NRHS, B( K+1, 1 ), LDB, B( KP, 1 ), LDB )
            END IF
*
            A( K+1, K ) = 0
            K = K + 2
         END IF
*
         GO TO 10
   20    CONTINUE
*
*        Next solve L*D*X = B, overwriting B with X.
*
         CALL DTRSM( 'Left', 'Lower', 'No transpose', 'Unit', N, NRHS,
     $               ONE, A, LDA, B, LDB )
*
*        Multiply by the inverse of the diagonal blocks.
*
         K = 1
   30    CONTINUE
*
*        If K > N, exit from loop.
*
         IF( K.GT.N )
     $      GO TO 50
         IF( IPIV( K ).GT.0 ) THEN
*
*           1 x 1 diagonal block
*           Interchange rows K and IPIV(K).
*
            CALL DSCAL( NRHS, ONE / A( K, K ), B( K, 1 ), LDB )
            K = K + 1
*
         ELSE
*
*           2 x 2 diagonal block
*           Interchange rows K and -IPIV(K) THEN K-1 and -IPIV(K-1)
*
            AKM1K = WORK( K )
            AKM1 = A( K, K ) / AKM1K
            AK = A( K+1, K+1 ) / AKM1K
            DENOM = AKM1*AK - ONE
            DO 40 J = 1, NRHS
               BKM1 = B( K, J ) / AKM1K
               BK = B( K+1, J ) / AKM1K
               B( K, J ) = ( AK*BKM1-BK ) / DENOM
               B( K+1, J ) = ( AKM1*BK-BKM1 ) / DENOM
   40       CONTINUE
            K = K + 2
         END IF
*
         GO TO 30
   50    CONTINUE
*
*        Next solve L'*X = B, overwriting B with X.
*
         CALL DTRSM( 'Left', 'L', 'Transpose', 'Unit', N, NRHS, ONE,
     $               A, LDA, B, LDB )
*
*        Restore A to input state. Permute B.
*
*        K is the main loop index, decreasing from N to 1 in steps of
*        1 or 2, depending on the size of the diagonal blocks.
*
         K = N
   60    CONTINUE
*
*        If K < 1, exit from loop.
*
         IF( K.LT.1 )
     $      GO TO 70
*
         IF( IPIV( K ).GT.0 ) THEN
*
*           1 x 1 diagonal block
*           Interchange rows K and IPIV(K).
*
            KP = IPIV( K )
            IF( KP.NE.K ) THEN
               CALL DSWAP( K-1, A( K, 1 ), LDA, A( KP, 1 ), LDA )
               CALL DSWAP( NRHS, B( K, 1 ), LDB, B( KP, 1 ), LDB )
            END IF
            K = K - 1
         ELSE
*
*           2 x 2 diagonal block
*           Interchange rows K and -IPIV(K) THEN K-1 and -IPIV(K-1)
*
            KP = -IPIV( K )
            IF( KP.NE.K ) THEN
               CALL DSWAP( K-2, A( K, 1 ), LDA, A( KP, 1 ), LDA )
               CALL DSWAP( NRHS, B( K, 1 ), LDB, B( KP, 1 ), LDB )
            END IF
            KP = -IPIV( K-1 )
            IF( KP.NE.K-1 ) THEN
               CALL DSWAP( K-2, A( K-1, 1 ), LDA, A( KP, 1 ), LDA )
               CALL DSWAP( NRHS, B( K-1, 1 ), LDB, B( KP, 1 ), LDB )
            END IF
            K = K - 2
         END IF
*
         GO TO 60
   70    CONTINUE
         CALL DCOPY( N-1, WORK( 1 ), 1, A( 2, 1 ), LDA+1  )
      END IF
*
      RETURN
*
*     End of rookDSYTRS
*
      END



From: lapack-bounces@Domain.Removed [mailto:lapack-bounces@Domain.Removed] On 
Behalf Of James Demmel
Sent: 13 November 2014 15:01
To: Bobby Cheng; lapack@Domain.Removed
Cc: beboppers@Domain.Removed; Pat Quillen; Duncan Po; Penny Anderson
Subject: Re: [Lapack] LAPACK 3.5 -- xSYSV_ROOK

The fastest alternative that we have explored is based on Aasen's method,
which factors A into P*L*T*L^T*P^T, where P is a permutation, L is lower
triangular, and T is tridiagonal. There is a paper (which won a prize at
IPDPS'13) on this, see
   http://www.eecs.berkeley.edu/~odedsc/papers/block_aasen.pdf
These more recent versions modify Aasen's original factorization by going in
steps from A to narrower band and then solving the narrower band problem.
So the output factorization differs from Bunch-Kaufman or rook, but this
seems necessary to attain high performance. How would such a change
affect the Matlab interface and its users?

Jim

On 11/12/14, 11:38 AM, Bobby Cheng wrote:
Hi LAPACK team,

It is great to see that LDL^T with rook pivoting made it into LAPACK 3.5.

I observed that xSYTRI_ROOK and xSYTRS_ROOK use only level 2 BLAS similar to 
xSYTRI and xSYTRS.  However, I have not found the level three BLAS counterpart 
of xSYTRI2 and xSYTRS2. This is an issue for MATLAB, as customers, not 
surprisingly, have found the performance of xSYTRI  not acceptable for large 
matrices. This will be the same for xSYTRI_ROOK. I am wondering what your plan 
is for applying what is done in xSYTRI2 to rook pivoting.

Regards,
---Bobby Cheng






_______________________________________________

Lapack mailing list

Lapack@Domain.Removed<mailto:Lapack@Domain.Removed>

http://lists.eecs.utk.edu/mailman/listinfo/lapack


________________________________________________________________________
This e-mail has been scanned for all viruses by Star.
________________________________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.eecs.utk.edu/mailman/private/lapack/attachments/20141113/c7cf543c/attachment-0001.html>

<Prev in Thread] Current Thread [Next in Thread>


For additional information you may use the LAPACK/ScaLAPACK Forum.
Or one of the mailing lists, or