Many people attended, but distressingly few people at UCB wore
green.
Jason
Outline:
UCB: DoE notes
UCB: DARPA visit at UCB next week; HPCS pitch
UCB: GAMM meeting in Berlin
UTK: Remi Delas's wrappers
UTK: Divide and conquer failure report and performance
UCB: Technical progress
UTK: Multiprecision source generation
* UCB: DoE notes
** Basic grant status
Program manager just returned from vacation; Jack Dongarra will
ping him when appropriate.
** SciDAC status from UTK
230 SciDAC proposals: 120 from labs and 110 from universities.
Announcements will be in June, but there is an April meeting to
discuss the proposals.
The institute winners will be announced on Monday and must be
ready to present on Friday.
* UCB: DARPA visit at UCB next week; HPCS pitch
How do we pitch the project to the HPCS folks? Hardware will not
exist for quite some time.
** Performance modeling
The HPCS systems are covered by NDAs, so we cannot publish models
ScaLAPACK and PBLAS performance that use features of specific
HPCS systems. Modeling nondisclosable systems that do not exist
with nondisclosable features is a challenge.
[Serving as a cheap labor arm of some companies sucks.]
*** Users are requesting better models.
Later in the meeting, Xiaoye Li mentioned that one user, Julian
Borril at LBNL, wants more detailed performance models. His
problem of interpreting CMB data involves symmetric inversion,
eigenvalue computations, and many lowlevel PBLAS operations. He
needs better models to predict how his codes will perform on
larger systems and with more data.
** Asynchrony and heterogeneous systems
Some of these nondisclosable systems may or may not require
greater asynchrony to achieve full performance. So exploiting
asynchrony may or may not be more important than it already is.
Also, following some discussion of the same old broken ideas on
the 754 mailing list, John Lewis asked in email about a possible
future Cray system that may or may not be related to certain
other topics. The system may or may not have a vector
coprocessor with different floatingpoint semantics than the main
COTS processor.
Jim Demmel volunteered to relay the historical horror stories
regarding heterogeneous floatingpoint semantics to the DARPA
folks next week.
[Amusing (?) note: There's another company named Clearspeed
retrying the same basic idea. They have a limited Maspar MP2 on
a PCIX card (HTX in development) and yet another damned SIMD C
variant for programming the thing. The public information is
enough to know this is the same old idea, but this company at
least claims full IEEE754 compliance.
Also, many desktops already have heterogeneous floatingpoint
units in one system: the x87 and SSE FPUs. The GNU compilers
have tried to use both units at once, but they dropped that idea
because of performance problems and not numerical ones. The
performance killer is comparisons between values in separate
FPUs...]
** Language issues
Going back to the heady days of yore, each new, nondisclosable
system has its own new language semistructured to expose each
new, nondisclosable feature. These languages should make it
easier to exploit the unknown features on unknown hardware that
doesn't exist yet.
Jack Dongarra suggests we propose to rewrite the three amigos in
these HPCS languages. Jim Demmel also wants to investigate
Hessenberg QR reductions.
[I nominate Chevy Chase to be rewritten in Fortress.]
** Fault tolerance
UTK is interested in three levels of fault tolerance on massive
systems:
1) automatic checkpointing and restarting,
2) logbased resumption, and
3) librarylevel solutions.
The first is the classical solution. Some monitoring system
saves entire process images and restarts them when a node fails.
Parallel systems need consistency between nodes.
The second replays all the messages sent to a node before its
failure to bring it back up to the status.
The third requires redundant representations of the input data
along with more algorithmic and development work.
* UCB: GAMM meeting in Berlin
Send Jim your slides from SIAM PP04.
* UTK: Remi Delmas's wrappers
Slides sent to some subset of people.
Remi Delmas's script produces an intermediate form and then C and
Matlab wrappers. These are proper lowlevel wrappers and not a
higherlevel interface. There is little error checking; you can
pass the wrong types in and the code will overwrite memory.
Remi and Julien Langou have a higherlevel Matlab interface to
the eigenvalue routines which Julien demonstrated at Berkeley.
Julien has sent the higherlevel routine to show that writing
higherlevel wrappers is a nontrivial task.
* UTK: Divide and conquer failure email and performance
Julien Langou discussed a report of divide and conquer failures
with different compilers, precisions, and systems. Jason Riedy
pointed out that we know how to convince most compilers to break
just about any code. Julien has since mailed the report to the
UCB mailing lists.
[Note: All mailings larger than 40k sit in my mail until I
approve them. Please send _pointers_ to information whenever
reasonable.  ejr]
Jack Dongarra notes that Remi's Matlab interface makes testing
and timing simple. They were surprised to see that divide and
conquer applied to a symmetrized rand(n,n) (uniformly random
entries) performs about 3x faster than MRRR. For the Gaussian
ensemble, D&C slightly outperforms MRRR. This is rather
surprising to the folks at UCB, who have performance data showing
the opposite situation.
* UCB: Technical progress
** Leastsquares refinement.
Leastsquares refinement is nearing algorithmic completion.
** Interface plans
The iterative refinement routines return many error estimates and
condition numbers. UCB proposes to return them in an array
rather than as separate parameter arrays. The array would
contain the following information for each righthand side:
1: "Guaranteed" normwise error estimate
2: "Guaranteed" componentwise error estimate
3: Componentwise backward error
4: RCOND = 1/kappa_inf(R*A*C) (traditional RCOND)
5: RCOND_NRM = 1/kappa_inf(Rs*A) where Rs equilibrates
the rows in the 1norm.
6: RCOND_CMP = 1/kappa_inf(Rs*A*diag(X)).
7: 1/final pivot growth
8: Raw normwise error estimate
9: Raw componentwise error estimate
UCB will send a more detailed proposal. The least squares
refinement will return even more bounds.
** Testing plans
Add new routines to existing tests to ensure all input options
work. To test new numerical functionality, we will construct
special systems (e.g. Hilbert) and test returned answers and
bounds against explicit formulas.
* UTK: Multiprecision source generation
Yozo Hida sent a pointer to his Perl script around. The script
transforms double and doublecomplex routines into single, quad,
and other precisions.
Julie Langou is reading Yozo's script and will compare it to
other options, including NAG's tools.
