LAPACK Archives

[Lapack] BLAS question

Hello Tim,

Well, I think you are stucked. This is a BLAS problem and has been 
reported several times. There is no BLAS routine to exploit the symmetry 
in the output H when H is computed as

H <- M^H * h * M

with h Hermitian.

Assume M is n-by-m and h is n-by-n. Then your optimal number of FLOPS is 
to compute H is 2mn^2. If you also want to compute Q then you end up 
with 3mn^2 operations.

You can not get neither one nor the other with BLAS right now. BLAS 
enables you to get 4mn^2.

This is a known problem and bothers lots of people.
(e.g. The folks working on LDL^T for example)

Here are two tricks.

[1] There are two conditions for the first trick.
        (a) n << m
        (b) h is Hermitian positive definite
        (i) Cholesky of h (POTRF = n^3/3): r <- chol (h), you get r such
             that h = r^H*r and r upper triangular,
        (ii) apply r to M from the left  Q <- r*M, mn^2
        (iii) apply SYRK: H <- Q^H Q -> mn^2
        (iv) apply Q <- r^H * Q (this is mn^2)

total: 3mn^2 + n^3/3 so 3mn^2 id n<<m.

Well ok condition (a) and (b) might be really too restrictive but if you 
have both conditions I encourage you to try.

In LDL^T, h is indefinite so we are stucked.

[2] If you really want your FLOPS back when you perform

        H <- M^H * Q

without losing too much efficiency then you will need to break the 
operations by yourself in blocks of size NB. (NB to be tuned). For each 
NB, you compute only a column (or row) of H. and you adjust the length to 
take into account the symmetry. This is not perfect FLOPS saving but 
roughly you get where you want. If NB = 1, you have perfect FLOPS saving 
(mn^2 for this operation) but you have bad performance. If NB = N, well 
you have no FLOPS saving (back to 2mn^2). Using a hand made block 
algorithm with NB = 64 (e.g.) enables to get the sought FLOPS saving and 

This is the way we (LAPACK) do in for example LDL^T.

Best way would have had to have the BLAS routine for this operation, sure.

Best wishes,

On Thu, 28 Jun 2007, Professor Boykin wrote:

Dear LAPACK folks,

We need to compute a matrix product of the form (hc=hermitian conjugate):

      H = hc(M)*Q, Q = h*M

where H and h are hermitian, Q is already computed (it's needed for
another stage).  As you can see we really need only the full upper
(or lower) triangle of the product matrix, H.  Looking through the
BLAS routines we have not found any special routine which would be faster
than the general matrix multiply of hc(M) and Q.  Is there some multiply
routine where we could specify only compute the full upper (or lower)
triangle of the product (so as to speed things up by a bit less than a
factor of 2)?

Tim Boykin


Timothy Boykin, Ph.D.
Professor (*)
Department of Electrical & Computer Engineering
The University of Alabama in Hunstville
Huntsville, Alabama  35899

(*) effective Aug. 2007; presently "Associate Professor"...

email:  boykin@Domain.Removed

Voice:  (256) 824-6269
FAX:    (256) 824-6803  (9AM-5PM Central Time)
      (256) 824-3482  (5PM-9AM Central Time)
"You know, Foley, I have only one eye.  I have a right to be blind
sometimes.  I really do not see the signal."

Vice-Admiral Lord Horatio Nelson (with telescope held to his bad eye) to
Captain Foley (on board Nelson's flagship HMS Elephant at the Battle of
Copenhagen, 2 April 1801) explaining why he is ignoring Admiral Sir Hyde
Parker's signal to discontinue action.  Parker assumed that Nelson would
disregard the order if he (Nelson) could continue the attack and indeed
supported Nelson's squadron when he saw that Nelson was pressing forward.

Lapack mailing list

<Prev in Thread] Current Thread [Next in Thread>

For additional information you may use the LAPACK/ScaLAPACK Forum.
Or one of the mailing lists, or