Overview

The first release of the benchmark will measure the following tests.

  • HPL benchmark (MPI on whole system)
  • DGEMM (single CPU)
  • *DGEMM (embarrassingly parallel)
  • STREAM (single CPU)
  • *STREAM (embarrassingly parallel)
  • PTRANS A = A + B^T (MPI on whole system)
  • RandomAccess (single CPU)
  • RandomAccess (MPI on whole system)
  • MPI-FFTE (MPI on whole system)
  • FFTE (single CPU)
  • *FFTE (embarrassingly parallel)
  • *RandomAccess (embarrassingly parallel)
  • Latency/Bandwidth (under varying conditions and between multiple pairs of nodes)


Rules for Running the Benchmark

There must be one baseline run submitted for each computer system entered in the archive. There may also exist an optimized run for each computer system.
  1. Baseline Runs
    Optimizations as described below are allowed.
    1. Compile and load options
      Compiler or loader flags which are supported and documented by the supplier are allowed. These include porting, optimization, and preprocessor invocation.
    2. Libraries
      Linking to optimized versions of the following libraries is allowed:
      • BLAS
      • FFT
      • MPI
      Acceptable use of such libraries is subject to the following rules:
      • All libraries used shall be disclosed with the results submission. Each library shall be identified by library name, revision, and institution supplying the source code.
      • Libraries which are not generally available are not permitted unless they are made available by the reporting organization within 6 months. Upon request, these libraries should be usable by others (possibly under NDA).
      • Calls to library subroutines should have the same syntax and semantics as in the released benchmark code. Code modifications to accommodate various library call formats are not allowed.
    3. Software Tools
      Any tools used to build and run the benchmark (including pre-processors, compilers, static and dynamic linkers, operating systems) must be generally available on the tested system (or they must be made available by the reporting organization within 6 months.)
    4. Only complete benchmark output may be submitted - partial results will not be accepted.
  2. Optimized Runs
    1. Libraries
      Linking to optimized versions of the following libraries is allowed:
      • BLAS
      • FFT
      • MPI
      Upon request, these libraries should be usable by others (possibly under NDA).
    2. Code modification
      Provided that the input and output specification is preserved, the following routines may be substituted:
      • In HPL: HPL_pdgesv(), HPL_pdtrsv() (factorization and substitution functions)
      • no changes are allowed in the DGEMM testing harness and the substituted DGEMM routine (if any) should conform to BLAS definition
      • In PTRANS: pdtrans()
      • In STREAM: tuned_STREAM_Copy(), tuned_STREAM_Scale(), tuned_STREAM_Add(), tuned_STREAM_Triad()
      • In RandomAccess: Power2NodesMPIRandomAccessUpdate(), AnyNodesMPIRandomAccessUpdate(), and RandomAccessUpdate()
      • In FFTE: fftw_malloc(), fftw_free(), fftw_create_plan(), fftw_one(), fftw_destroy_plan(), fftw_mpi_create_plan(), fftw_mpi_local_sizes(), fftw_mpi(), fftw_mpi_destroy_plan() (all these functions are compatible with FFTW 2.1.5 so the benchmark code can be directly linked against FFTW 2.1.5 by only adding proper compiler and linker flags including -DUSING_FFTW)
      • changes are allowed in parts of the b_eff component but portability and conformance to the MPI standard (MPI 1.1 or later) need to be preserved. Detailed list of removed and added MPI function calls has to be provided upon submission. Modified source code is subject to review by the HPC Challenge Committee.
    3. Limitations of Optimization
      • Code with limited calculation accuracy
        The calculation should be carried out in full precision (64-bit or the equivalent). However the substitution of algorithms is allowed (see Exchange of the used mathematical algorithm).
      • Exchange of the used mathematical algorithm
        Any change of algorithms must be fully disclosed and is subject to review by the HPC Challenge Committee. Passing the verification test is a necessary condition for such an approval. The substituted algorithm must be as robust as the baseline algorithm. For the matrix multiply in the HPL benchmark, Strassen Algorithm may not be used as it changes the operation count of the algorithm.
      • Using the knowledge of the solution
        Any modification of the code or input data sets, which uses the knowledge of the solution or of the verification test, is not permitted.
      • Code to circumvent the actual computation
        Any modification of the code to circumvent the actual computation is not permitted.
    4. Only complete benchmark output may be submitted - partial results will not be accepted.

Project Handouts

Jun 23 2022 Contact: Admin Login