I am trying to understand the block Cholesky (Level 3 BLAS) algorithm as implemented in DPOTRF. The explanation given here: http://www.netlib.org/utk/papers/factor/node9.html is quite clear and states that there are 3 steps:

1. DPOTF2 (compute L11)

2. DTRSM (compute L22)

3. DSYRK (update A22)

That website appears to be based on the 1996 publication:

Choi, J., Dongarra, J. J., Ostrouchov, L. S., Petitet, A. P., Walker, D. W., & Whaley, R. C. (1996). Design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. Scientific Programming, 5(3), 173-184.

However, the actual implementation of DPOTRF uses a seemingly different approach:

1. DSYRK

2. DPOTF2

3. DGEMM

4. DTRSM

I was unable to find an explanation of these steps. But I found an old LAPACK version of DPOTRF from 1993 which predates that 1996 publication and is still based on the 4 steps above. Does anyone know of a reference describing the block Cholesky algorithm in DPOTRF, and whether these 4 steps are more efficient than the 3 step algorithm in the 1996 publication?