## Block Cholesky algorithm (xPOTRF)

Post here if you have a question about LAPACK or ScaLAPACK algorithm or data format

### Block Cholesky algorithm (xPOTRF)

I am trying to understand the block Cholesky (Level 3 BLAS) algorithm as implemented in DPOTRF. The explanation given here: http://www.netlib.org/utk/papers/factor/node9.html is quite clear and states that there are 3 steps:

1. DPOTF2 (compute L11)
2. DTRSM (compute L22)
3. DSYRK (update A22)

That website appears to be based on the 1996 publication:

Choi, J., Dongarra, J. J., Ostrouchov, L. S., Petitet, A. P., Walker, D. W., & Whaley, R. C. (1996). Design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. Scientific Programming, 5(3), 173-184.

However, the actual implementation of DPOTRF uses a seemingly different approach:

1. DSYRK
2. DPOTF2
3. DGEMM
4. DTRSM

I was unable to find an explanation of these steps. But I found an old LAPACK version of DPOTRF from 1993 which predates that 1996 publication and is still based on the 4 steps above. Does anyone know of a reference describing the block Cholesky algorithm in DPOTRF, and whether these 4 steps are more efficient than the 3 step algorithm in the 1996 publication?
pa33

Posts: 11
Joined: Fri Apr 27, 2007 5:17 pm

### Re: Block Cholesky algorithm (xPOTRF)

Hi. The code in LAPACK/SRC is the left looking variant. The right looking version is at: LAPACK/SRC/VARIANTS/cholesky/RL. Cheers, Julien Langou.
admin
Site Admin

Posts: 616
Joined: Wed Dec 08, 2004 7:07 pm

### Re: Block Cholesky algorithm (xPOTRF)

Thank you, this is quite helpful. I found a publication (Anderson and Dongarra, 1990) which compares the different variants of the block (Level 3) Cholesky, and they find that they all have similar performance, so it doesn't seem to make much difference which one is used.
pa33

Posts: 11
Joined: Fri Apr 27, 2007 5:17 pm

### Re: Block Cholesky algorithm (xPOTRF)

Thank you, this is quite helpful. I found a publication (Anderson and Dongarra, 1990) which compares the different variants of the block (Level 3) Cholesky, and they find that they all have similar performance, so it doesn't seem to make much difference which one is used.

The left-looking variant has less ``write`` (and more ``read``) than the right-looking variant. The left-looking variant was chosen for LAPACK. The right-looking is more parallel. The critical path length of the right looking variant is O(n) (where the input matrix is n-by-n tiles), while the critical path length for the left looking variant is O(n^2). The right-looking variant was chosen for ScaLAPACK. Using a multithreaded BLAS library on a multicore architecture, you would see a much better performance from the right-looking variant than the left-looking variant. A good (the best?) variant is the recursive variant.
Julien Langou

Posts: 835
Joined: Thu Dec 09, 2004 12:32 pm
Location: Denver, CO, USA

Return to Algorithm / Data

### Who is online

Users browsing this forum: No registered users and 1 guest