Block Cholesky algorithm (xPOTRF)

Post here if you have a question about LAPACK or ScaLAPACK algorithm or data format

Block Cholesky algorithm (xPOTRF)

Postby pa33 » Mon May 27, 2019 1:17 pm

I am trying to understand the block Cholesky (Level 3 BLAS) algorithm as implemented in DPOTRF. The explanation given here: http://www.netlib.org/utk/papers/factor/node9.html is quite clear and states that there are 3 steps:

1. DPOTF2 (compute L11)
2. DTRSM (compute L22)
3. DSYRK (update A22)

That website appears to be based on the 1996 publication:

Choi, J., Dongarra, J. J., Ostrouchov, L. S., Petitet, A. P., Walker, D. W., & Whaley, R. C. (1996). Design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. Scientific Programming, 5(3), 173-184.

However, the actual implementation of DPOTRF uses a seemingly different approach:

1. DSYRK
2. DPOTF2
3. DGEMM
4. DTRSM

I was unable to find an explanation of these steps. But I found an old LAPACK version of DPOTRF from 1993 which predates that 1996 publication and is still based on the 4 steps above. Does anyone know of a reference describing the block Cholesky algorithm in DPOTRF, and whether these 4 steps are more efficient than the 3 step algorithm in the 1996 publication?
pa33
 
Posts: 11
Joined: Fri Apr 27, 2007 5:17 pm

Re: Block Cholesky algorithm (xPOTRF)

Postby admin » Mon May 27, 2019 1:34 pm

Hi. The code in LAPACK/SRC is the left looking variant. The right looking version is at: LAPACK/SRC/VARIANTS/cholesky/RL. Cheers, Julien Langou.
admin
Site Admin
 
Posts: 616
Joined: Wed Dec 08, 2004 7:07 pm

Re: Block Cholesky algorithm (xPOTRF)

Postby pa33 » Mon May 27, 2019 11:54 pm

Thank you, this is quite helpful. I found a publication (Anderson and Dongarra, 1990) which compares the different variants of the block (Level 3) Cholesky, and they find that they all have similar performance, so it doesn't seem to make much difference which one is used.
pa33
 
Posts: 11
Joined: Fri Apr 27, 2007 5:17 pm

Re: Block Cholesky algorithm (xPOTRF)

Postby Julien Langou » Tue May 28, 2019 9:28 am

Thank you, this is quite helpful. I found a publication (Anderson and Dongarra, 1990) which compares the different variants of the block (Level 3) Cholesky, and they find that they all have similar performance, so it doesn't seem to make much difference which one is used.


The left-looking variant has less ``write`` (and more ``read``) than the right-looking variant. The left-looking variant was chosen for LAPACK. The right-looking is more parallel. The critical path length of the right looking variant is O(n) (where the input matrix is n-by-n tiles), while the critical path length for the left looking variant is O(n^2). The right-looking variant was chosen for ScaLAPACK. Using a multithreaded BLAS library on a multicore architecture, you would see a much better performance from the right-looking variant than the left-looking variant. A good (the best?) variant is the recursive variant.
Julien Langou
 
Posts: 832
Joined: Thu Dec 09, 2004 12:32 pm
Location: Denver, CO, USA


Return to Algorithm / Data

Who is online

Users browsing this forum: Google [Bot] and 3 guests