TILED LU Operation Count

Open forum for general discussions relating to PLASMA.

TILED LU Operation Count

Postby rational » Mon Mar 07, 2011 2:23 pm


The Tiled LU and QR cost more floating point operations than standard LAPACK implementation and to minimise this overhead the idea of inner blocking is used [1]. It means that I can not use the flop count of LAPACK from http://www.netlib.org/lapack/lawns/lawn41.ps for these operations. If this is true then could anyone please tell what would be the measure of total number of operations (flop count ) for each of the below block operations.


Thank you very much.

[1] A class of parallel tiled linear algebra algorithms for multicore architectures by Buttari et al
Posts: 1
Joined: Mon Mar 07, 2011 1:58 pm

Re: TILED LU Operation Count

Postby admin » Mon Mar 07, 2011 2:53 pm

I don't exactly know what the overhead is for the first three kernels.
However, all you care for are the extra flops for the DSSSSM kernel.
I think the extra flops for DSSSSM are:

0.5 x (IB / NB)^2

With IB = NB, you get 50% more flops than the canonical LU (1.5 the number of flops).
With more reasonable settings, e.g. IB = 20, NB = 200 (PLASMA defaults for LU),
you get 0.5% more flops (1.005 the number of flops).
In other words, it is really of no concern as long as your choice of the IB, NB pair is reasonable
(say your IB is one fourth of your NB or less).
Site Admin
Posts: 84
Joined: Wed May 13, 2009 1:27 pm

Return to User discussion

Who is online

Users browsing this forum: No registered users and 2 guests