Hi,

The Tiled LU and QR cost more floating point operations than standard LAPACK implementation and to minimise this overhead the idea of inner blocking is used [1]. It means that I can not use the flop count of LAPACK from http://www.netlib.org/lapack/lawns/lawn41.ps for these operations. If this is true then could anyone please tell what would be the measure of total number of operations (flop count ) for each of the below block operations.

DGETRF

DGESSM

DTSTRF

DSSSSM

Thank you very much.

[1] A class of parallel tiled linear algebra algorithms for multicore architectures by Buttari et al