Open discussion regarding features, bugs, issues, vendors, etc.
I have an upper Hessenberg matrix i.e. upper trapezoidal with a block=1 band under the diagonal with non zero elements from some column onwards, and I need to triangularize it in the fastest possible way. This use-case is the result of computing e.g. QR and then deleting one column (yes it is always only one column) from R. I need to make R upper triangular again by annihilating all those (one per column) non-zero elements below the diagonal e.g. I need to get rid of the 2s in the fastest way:
- Code: Select all
1 1 1 1 1
0 1 1 1 1
0 2 1 1 1
0 0 2 1 1
0 0 0 2 1
I have tried:
- Using Givens rotations combining dlartg and dlasr in a loop. Issue: very bad spatial locality, keeps moving along the rows that consequences in high cache misses.
- Using householder reflectors combining dlarfg and dlarf in a double loop. Issue: very bad spatial locality, keeps moving along the rows that consequences in high cache misses.
- Using the latest 3.4.x QR decompositions e.g. dgeqrt. This is a work overkill for this simple update, algorithmically very bad even when the column deleted is among the firsts ones.
Can anyone point me to a set of lower level functions I can use to implement my use-case in the fastest way? Otherwise I might end up reinventing the wheel, studying the update pattern and doing it manually in a block of registers along the columns.
- Posts: 12
- Joined: Sat Mar 24, 2012 3:23 pm
We also need to triangularize upper Hessenberg matrices in the context of the GMRES method. This is
traditionally done with Given rotations. (Similarly to what you describe.) The main difference with your
application is that the Hessenberg matrix is known one column at a time, so for each column we first
apply the previous rotations and compute the new one, and then move to the next column. There is not
much opportunity for data reuse, etc. In your case you have the while matrix at once so you might be able
to do something smart.
This reminds me of "bulge chasing. What people have been trying to do in the Bulge chasing case is accumulate the Givens
rotations in a small matrix. Say if your matrix is 100-by-100, you would consider the first 10 rows and construct
the first nine rotations. You only work on the 10-by-10 left top submatrix for that, then you accumulate all these rotations
in a 10-by-10 matrix. (You multiply against the identity first.) And then apply this all at once on the remaining
columns using a DGEMM. There is a paper from 2002 in SIMAX from Braman, Byers and Mathias entitled: THE MULTISHIFT QR
ALGORITHM. PART I: MAINTAINING WELL-FOCUSED SHIFTS AND LEVEL 3 PERFORMANCE. What I was thinking
was the Level 3 BLAS performance. You do more flops so it is not obvious this is a winning strategy.
In any case, I do not know much about performance optimization for the operation you are performing.
- Posts: 735
- Joined: Thu Dec 09, 2004 12:32 pm
- Location: Denver, CO, USA
Return to User Discussion
Who is online
Users browsing this forum: Baidu [Spider], Bing [Bot], Yahoo [Bot] and 3 guests