To satisfy their increasing demand for computing power, advanced numerical simulations are required to harness larger numbers of processors offered by modern capability computing systems, such as the IBM BlueGene/L system JUBL at Forschungszentrum Jülich. Unfortunately, satisfactory speedup on many thousands of processors is extraordinarily hard to achieve. Sustained application performance is often significantly below the achievable limit and leaves substantial room for optimization. However, tools that normally assist developers in the optimization process often cease to work in a satisfactory manner when deployed on large processor counts.
Event tracing is a well-accepted technique for post-mortem performance analysis of parallel applications. Time-stamped events, such as entering a function or sending a message, are recorded at runtime and analyzed afterwards with the help of software tools. Because event traces preserve the spatial and temporal relationships of individual runtime events, they are especially well suited for detailed inter-process analysis. Automatic off-line trace analysis tools, such as KOJAK, can quickly provide the user with relevant information by automatically searching traces for complex patterns of inefficient behavior and quantifying their significance.
Unfortunately, sequentially analyzing a single trace file does not scale to applications running on thousands of processors. In the SCALASCA project (SCalable performance Analysis of LArge SCale Applications), we have developed a scalable trace analysis tool based on the KOJAK approach that exploits both distributed memory and parallel processing capabilities available on modern large-scale systems. Instead of sequentially analyzing a single global trace file, SCALASCA analyzes separate local trace files in parallel by replaying the original communication on as many CPUs as have been used to execute the target application itself. Whereas in earlier times the performance tool was a sequential program used to analyze a parallel program, SCALASCA is a parallel program in its own right.