LAPACK Archives

[Lapack] LAPACK meeting notes, 17 March.

Many people attended, but distressingly few people at UCB wore
green.

Jason

Outline:
  UCB: DoE notes
  UCB: DARPA visit at UCB next week; HPCS pitch
  UCB: GAMM meeting in Berlin
  UTK: Remi Delas's wrappers
  UTK: Divide and conquer failure report and performance
  UCB: Technical progress
  UTK: Multi-precision source generation

* UCB: DoE notes

** Basic grant status

Program manager just returned from vacation; Jack Dongarra will
ping him when appropriate.

** SciDAC status from UTK

230 SciDAC proposals: 120 from labs and 110 from universities.

Announcements will be in June, but there is an April meeting to
discuss the proposals.

The institute winners will be announced on Monday and must be
ready to present on Friday.

* UCB: DARPA visit at UCB next week; HPCS pitch

How do we pitch the project to the HPCS folks?  Hardware will not
exist for quite some time.

** Performance modeling

The HPCS systems are covered by NDAs, so we cannot publish models
ScaLAPACK and PBLAS performance that use features of specific
HPCS systems.  Modeling non-disclosable systems that do not exist
with non-disclosable features is a challenge.

[Serving as a cheap labor arm of some companies sucks.]

*** Users are requesting better models.

Later in the meeting, Xiaoye Li mentioned that one user, Julian
Borril at LBNL, wants more detailed performance models.  His
problem of interpreting CMB data involves symmetric inversion,
eigenvalue computations, and many low-level PBLAS operations.  He
needs better models to predict how his codes will perform on
larger systems and with more data.

** Asynchrony and heterogeneous systems

Some of these non-disclosable systems may or may not require
greater asynchrony to achieve full performance.  So exploiting
asynchrony may or may not be more important than it already is.

Also, following some discussion of the same old broken ideas on
the 754 mailing list, John Lewis asked in email about a possible
future Cray system that may or may not be related to certain
other topics.  The system may or may not have a vector
coprocessor with different floating-point semantics than the main
COTS processor.

Jim Demmel volunteered to relay the historical horror stories
regarding heterogeneous floating-point semantics to the DARPA
folks next week.

[Amusing (?) note: There's another company named Clearspeed
retrying the same basic idea.  They have a limited Maspar MP-2 on
a PCI-X card (HTX in development) and yet another damned SIMD C
variant for programming the thing.  The public information is
enough to know this is the same old idea, but this company at
least claims full IEEE-754 compliance.

Also, many desktops already have heterogeneous floating-point
units in one system: the x87 and SSE FPUs.  The GNU compilers
have tried to use both units at once, but they dropped that idea
because of performance problems and not numerical ones.  The
performance killer is comparisons between values in separate
FPUs...]

** Language issues

Going back to the heady days of yore, each new, non-disclosable
system has its own new language semi-structured to expose each
new, non-disclosable feature.  These languages should make it
easier to exploit the unknown features on unknown hardware that
doesn't exist yet.

Jack Dongarra suggests we propose to rewrite the three amigos in
these HPCS languages.  Jim Demmel also wants to investigate
Hessenberg QR reductions.

[I nominate Chevy Chase to be re-written in Fortress.]

** Fault tolerance

UTK is interested in three levels of fault tolerance on massive
systems:

  1) automatic checkpointing and restarting,
  2) log-based resumption, and
  3) library-level solutions.

The first is the classical solution.  Some monitoring system
saves entire process images and restarts them when a node fails.
Parallel systems need consistency between nodes.

The second replays all the messages sent to a node before its
failure to bring it back up to the status.

The third requires redundant representations of the input data
along with more algorithmic and development work.

* UCB: GAMM meeting in Berlin

Send Jim your slides from SIAM PP04.

* UTK: Remi Delmas's wrappers

Slides sent to some subset of people.

Remi Delmas's script produces an intermediate form and then C and
Matlab wrappers.  These are proper low-level wrappers and not a
higher-level interface.  There is little error checking; you can
pass the wrong types in and the code will overwrite memory.

Remi and Julien Langou have a higher-level Matlab interface to
the eigenvalue routines which Julien demonstrated at Berkeley.
Julien has sent the higher-level routine to show that writing
higher-level wrappers is a non-trivial task.

* UTK: Divide and conquer failure email and performance

Julien Langou discussed a report of divide and conquer failures
with different compilers, precisions, and systems.  Jason Riedy
pointed out that we know how to convince most compilers to break
just about any code.  Julien has since mailed the report to the
UCB mailing lists.

[Note: All mailings larger than 40k sit in my mail until I
approve them.  Please send _pointers_ to information whenever
reasonable. -- ejr]

Jack Dongarra notes that Remi's Matlab interface makes testing
and timing simple.  They were surprised to see that divide and
conquer applied to a symmetrized rand(n,n) (uniformly random
entries) performs about 3x faster than MRRR.  For the Gaussian
ensemble, D&C slightly out-performs MRRR.  This is rather
surprising to the folks at UCB, who have performance data showing
the opposite situation.

* UCB: Technical progress

** Least-squares refinement.

Least-squares refinement is nearing algorithmic completion.

** Interface plans

The iterative refinement routines return many error estimates and
condition numbers.  UCB proposes to return them in an array
rather than as separate parameter arrays.  The array would
contain the following information for each right-hand side:

  1: "Guaranteed" normwise error estimate
  2: "Guaranteed" componentwise error estimate
  3: Componentwise backward error
  4: RCOND = 1/kappa_inf(R*A*C) (traditional RCOND)
  5: RCOND_NRM = 1/kappa_inf(Rs*A) where Rs equilibrates
     the rows in the 1-norm.
  6: RCOND_CMP = 1/kappa_inf(Rs*A*diag(X)).
  7: 1/final pivot growth
  8: Raw normwise error estimate
  9: Raw componentwise error estimate

UCB will send a more detailed proposal.  The least squares
refinement will return even more bounds.

** Testing plans

Add new routines to existing tests to ensure all input options
work.  To test new numerical functionality, we will construct
special systems (e.g. Hilbert) and test returned answers and
bounds against explicit formulas.

* UTK: Multi-precision source generation

Yozo Hida sent a pointer to his Perl script around.  The script
transforms double and double-complex routines into single, quad,
and other precisions.

Julie Langou is reading Yozo's script and will compare it to
other options, including NAG's tools.

<Prev in Thread] Current Thread [Next in Thread>


For additional information you may use the LAPACK/ScaLAPACK Forum.
Or one of the mailing lists, or