LAPACK Archives

[Lapack] Question about packaging of lapack

I always enjoy getting hectoring e-mail from random people, but who the heck
are you, and what are you using ATLAS for?

I looked into that at one point and ran away screaming.  Added
Clint to the cc list explicitly.  IIRC, the two things that would
make packaging ATLAS much easier:

 1. An easy method to compile using some moderate defaults with *no
    tuning*.  Package auto-builders should not tune.

I guess ATLAS becomes LAS, since we are going to remove the automatic tuning
part . . .  Not sure what you mean by package auto-builders, since I've
been given no context for the conversation I find myself in.

 2. A long-term method for storing and using saved tuning
    values.  Long-term meaning that new ATLAS releases can
    relevant read old values.

Nothing can ever save you from compiler changes, other than assembly code.
I believe ATLAS already possesses this to the degree that it is possible:
   
http://math-atlas.sourceforge.net/devel/atlas_devel/atlas_devel.html#SECTION00070000000000000000

It is not technically hard to extend the arch def to specify everything
and then do no timings, but it is such a practical pain in the ass that
I am probably never going to do it without some amazingly compelling case
being made.

The first would help distributions (and sites with no spare admin
time) install one baseline package across all their systems.
Perhaps it could be addressed by packaging the GEMM-based BLAS
with a moderately optimized GEMM rather than ATLAS.  I suspect
that almost all copies of ATLAS in use are *not* tuned for the
installed platform.

That would be a tragedy if true.  No generic build can get adequate performance
on a modern system.  The arch defs are supposed to handle most of what
you are talking about: speed up the install, leaving only the fast-changing
things unspecified (eg., L2 cache sizes).  To get performance that is even
in the ballpark, you minimally need to separate into having SSE/SSE2/SSE3
or not.  Even then, you'll lose substantial perf if you use an AMD-targeted
binary on an Intel (perhaps slightly less in the opposite direction).

Sounds like what you want are the f77refblas to me: trivial to install,
not optimized at all.  You'll lose a factor of 5-20 or so on performance,
but it truly is easy to install with no tuning.  You can reduce the
performance loss to roughly a factor 2-4 with a small-blocked GEMM-based
BLAS as you outline . . .

Regards,
Clint

**************************************************************************
** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley **
**************************************************************************

<Prev in Thread] Current Thread [Next in Thread>


For additional information you may use the LAPACK/ScaLAPACK Forum.
Or one of the mailing lists, or