LAPACK Archives

### [Lapack] LAPACK 3.5 -- xSYSV_ROOK

 ```Hi Bobby, When I was working on this I wrote level three versions of the solve. Well, it forms L, does the solve, and then restores A. Even this crude approach was many times faster than the level 2 I recall. Although I never understood why the factorization routine couldn't just return L is the first place. Anyway the code below in case of interest. Craig SUBROUTINE L3_ROOKDSYTRS( UPLO, N, NRHS, A, LDA, IPIV, B, LDB, \$ WORK, INFO ) * * Craig Lucas * The Numerical Algorithms Group. * October 2008 * * -- LAPACK routine (version 3.1) -- * Univ. of Tennessee, Univ. of California Berkeley and NAG Ltd.. * November 2006 * * .. Scalar Arguments .. INTEGER INFO, LDA, LDB, N, NRHS CHARACTER UPLO * .. * .. Array Arguments .. DOUBLE PRECISION A( LDA, * ), B( LDB, * ), WORK( * ) INTEGER IPIV( * ) * .. * * Purpose * ======= * * rookDSYTRS solves a system of linear equations A*X = B with a real * symmetric matrix A using the factorization A = U*D*U**T or * A = L*D*L**T computed by rookDSYTRF or fastbpDSYTRF. This is a * Level 3 BLAS version. * * Arguments * ========= * * UPLO (input) CHARACTER*1 * Specifies whether the details of the factorization are stored * as an upper or lower triangular matrix. * = 'U': Upper triangular, form is A = U*D*U**T; * = 'L': Lower triangular, form is A = L*D*L**T. * * ** ONLY UPLO = 'L' IS IMPLEMENTED ** * * N (input) INTEGER * The order of the matrix A. N >= 0. * * NRHS (input) INTEGER * The number of right hand sides, i.e., the number of columns * of the matrix B. NRHS >= 0. * * A (input) DOUBLE PRECISION array, dimension (LDA,N) * The block diagonal matrix D and the multipliers used to * obtain the factor U or L as computed by rookSSYTRF or * fastbpSSYTRF. * * LDA (input) INTEGER * The leading dimension of the array A. LDA >= max(1,N). * * IPIV (input) INTEGER array, dimension (N) * Details of the interchanges and the block structure of D * as determined by rookDSYTRF or fastbpDSYTRF. * * B (input/output) DOUBLE PRECISION array, dimension (LDB,NRHS) * On entry, the right hand side matrix B. * On exit, the solution matrix X. * * LDB (input) INTEGER * The leading dimension of the array B. LDB >= max(1,N). * * WORK (input) DOUBLE PRECISION array, dimension (N) * Workspace. * * INFO (output) INTEGER * = 0: successful exit * < 0: if INFO = -i, the i-th argument had an illegal value * * ===================================================================== * * .. Parameters .. DOUBLE PRECISION ONE PARAMETER ( ONE = 1.0D+0 ) * .. * .. Local Scalars .. DOUBLE PRECISION AK, AKM1, AKM1K, BK, BKM1, DENOM INTEGER J, K, KP LOGICAL UPPER * .. * .. External Functions .. LOGICAL LSAME EXTERNAL LSAME * .. * .. External Subroutines .. EXTERNAL DCOPY, DSCAL, DSWAP, DTRSM, XERBLA * .. * .. Intrinsic Functions .. INTRINSIC MAX * .. * * .. Executable Statements .. INFO = 0 UPPER = LSAME( UPLO, 'U' ) IF( .NOT.UPPER .AND. .NOT.LSAME( UPLO, 'L' ) ) THEN INFO = -1 ELSE IF( N.LT.0 ) THEN INFO = -2 ELSE IF( NRHS.LT.0 ) THEN INFO = -3 ELSE IF( LDA.LT.MAX( 1, N ) ) THEN INFO = -5 ELSE IF( LDB.LT.MAX( 1, N ) ) THEN INFO = -8 END IF IF( INFO.NE.0 ) THEN * CALL XERBLA( 'l3_rookDSYTRS', -INFO ) * RETURN END IF * * Quick return if possible * IF( N.EQ.0 .OR. NRHS.EQ.0 ) \$ RETURN * IF( UPPER ) THEN * RETURN * ELSE * * Solve A*X = B, where A = L*D*L'. * * First copy out lower diagonal to workspace, and form L in * lower triangle of A. Zero D( 2, 1). Permute B. * * K is the main loop index, increasing from 1 to N in steps of * 1 or 2, depending on the size of the diagonal blocks. * CALL DCOPY( N-1, A( 2, 1 ), LDA+1, WORK( 1 ), 1 ) K = 1 10 CONTINUE * * If K > N, exit from loop. * IF( K.GT.N ) \$ GO TO 20 * IF( IPIV(K).GT.0 ) THEN * * 1 x 1 diagonal block * KP = IPIV( K ) IF( KP.NE.K ) THEN CALL DSWAP( K-1, A( K, 1 ), LDA, A( KP, 1 ), LDA ) CALL DSWAP( NRHS, B( K, 1 ), LDB, B( KP, 1 ), LDB ) END IF K = K + 1 ELSE * * 2 x 2 diagonal block * KP = -IPIV( K ) IF( KP.NE.K ) THEN CALL DSWAP( K-1, A( K, 1 ), LDA, A( KP, 1 ), LDA ) CALL DSWAP( NRHS, B( K, 1 ), LDB, B( KP, 1 ), LDB ) END IF KP = -IPIV( K+1 ) IF( KP.NE.K+1 ) THEN CALL DSWAP( K-1, A( K+1, 1 ), LDA, A( KP, 1 ), LDA ) CALL DSWAP( NRHS, B( K+1, 1 ), LDB, B( KP, 1 ), LDB ) END IF * A( K+1, K ) = 0 K = K + 2 END IF * GO TO 10 20 CONTINUE * * Next solve L*D*X = B, overwriting B with X. * CALL DTRSM( 'Left', 'Lower', 'No transpose', 'Unit', N, NRHS, \$ ONE, A, LDA, B, LDB ) * * Multiply by the inverse of the diagonal blocks. * K = 1 30 CONTINUE * * If K > N, exit from loop. * IF( K.GT.N ) \$ GO TO 50 IF( IPIV( K ).GT.0 ) THEN * * 1 x 1 diagonal block * Interchange rows K and IPIV(K). * CALL DSCAL( NRHS, ONE / A( K, K ), B( K, 1 ), LDB ) K = K + 1 * ELSE * * 2 x 2 diagonal block * Interchange rows K and -IPIV(K) THEN K-1 and -IPIV(K-1) * AKM1K = WORK( K ) AKM1 = A( K, K ) / AKM1K AK = A( K+1, K+1 ) / AKM1K DENOM = AKM1*AK - ONE DO 40 J = 1, NRHS BKM1 = B( K, J ) / AKM1K BK = B( K+1, J ) / AKM1K B( K, J ) = ( AK*BKM1-BK ) / DENOM B( K+1, J ) = ( AKM1*BK-BKM1 ) / DENOM 40 CONTINUE K = K + 2 END IF * GO TO 30 50 CONTINUE * * Next solve L'*X = B, overwriting B with X. * CALL DTRSM( 'Left', 'L', 'Transpose', 'Unit', N, NRHS, ONE, \$ A, LDA, B, LDB ) * * Restore A to input state. Permute B. * * K is the main loop index, decreasing from N to 1 in steps of * 1 or 2, depending on the size of the diagonal blocks. * K = N 60 CONTINUE * * If K < 1, exit from loop. * IF( K.LT.1 ) \$ GO TO 70 * IF( IPIV( K ).GT.0 ) THEN * * 1 x 1 diagonal block * Interchange rows K and IPIV(K). * KP = IPIV( K ) IF( KP.NE.K ) THEN CALL DSWAP( K-1, A( K, 1 ), LDA, A( KP, 1 ), LDA ) CALL DSWAP( NRHS, B( K, 1 ), LDB, B( KP, 1 ), LDB ) END IF K = K - 1 ELSE * * 2 x 2 diagonal block * Interchange rows K and -IPIV(K) THEN K-1 and -IPIV(K-1) * KP = -IPIV( K ) IF( KP.NE.K ) THEN CALL DSWAP( K-2, A( K, 1 ), LDA, A( KP, 1 ), LDA ) CALL DSWAP( NRHS, B( K, 1 ), LDB, B( KP, 1 ), LDB ) END IF KP = -IPIV( K-1 ) IF( KP.NE.K-1 ) THEN CALL DSWAP( K-2, A( K-1, 1 ), LDA, A( KP, 1 ), LDA ) CALL DSWAP( NRHS, B( K-1, 1 ), LDB, B( KP, 1 ), LDB ) END IF K = K - 2 END IF * GO TO 60 70 CONTINUE CALL DCOPY( N-1, WORK( 1 ), 1, A( 2, 1 ), LDA+1 ) END IF * RETURN * * End of rookDSYTRS * END From: lapack-bounces@Domain.Removed [mailto:lapack-bounces@Domain.Removed] On Behalf Of James Demmel Sent: 13 November 2014 15:01 To: Bobby Cheng; lapack@Domain.Removed Cc: beboppers@Domain.Removed; Pat Quillen; Duncan Po; Penny Anderson Subject: Re: [Lapack] LAPACK 3.5 -- xSYSV_ROOK The fastest alternative that we have explored is based on Aasen's method, which factors A into P*L*T*L^T*P^T, where P is a permutation, L is lower triangular, and T is tridiagonal. There is a paper (which won a prize at IPDPS'13) on this, see http://www.eecs.berkeley.edu/~odedsc/papers/block_aasen.pdf These more recent versions modify Aasen's original factorization by going in steps from A to narrower band and then solving the narrower band problem. So the output factorization differs from Bunch-Kaufman or rook, but this seems necessary to attain high performance. How would such a change affect the Matlab interface and its users? Jim On 11/12/14, 11:38 AM, Bobby Cheng wrote: Hi LAPACK team, It is great to see that LDL^T with rook pivoting made it into LAPACK 3.5. I observed that xSYTRI_ROOK and xSYTRS_ROOK use only level 2 BLAS similar to xSYTRI and xSYTRS. However, I have not found the level three BLAS counterpart of xSYTRI2 and xSYTRS2. This is an issue for MATLAB, as customers, not surprisingly, have found the performance of xSYTRI not acceptable for large matrices. This will be the same for xSYTRI_ROOK. I am wondering what your plan is for applying what is done in xSYTRI2 to rook pivoting. Regards, ---Bobby Cheng _______________________________________________ Lapack mailing list Lapack@Domain.Removed http://lists.eecs.utk.edu/mailman/listinfo/lapack ________________________________________________________________________ This e-mail has been scanned for all viruses by Star. ________________________________________________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: ```
 Current Thread [Lapack] LAPACK 3.5 -- xSYSV_ROOK, Bobby Cheng [Lapack] LAPACK 3.5 -- xSYSV_ROOK, James Demmel [Lapack] LAPACK 3.5 -- xSYSV_ROOK, Craig Lucas <=

For additional information you may use the LAPACK/ScaLAPACK Forum.
Or one of the mailing lists, or