Base Results
Optimized Results
Base and Optimized
Base Results
Optimized Results
Base and Optimized
Base Results
Optimized Results
Base and Optimized
Base Results
Optimized Results
Base and Optimized
Base Results
Optimized Results
Base and Optimized
Base Results
Optimized Results
Base and Optimized
Base Results
Optimized Results
Base and Optimized
Manufacturer/Processor Type, Speed, Count, Threads, Processes
Includes the manufacturer/processor type, processor speed, number of processors, threads, and number of processes.
Move mouse over this column for each row to display additional information, including; manufacturer, system name, interconnect, MPI, affiliation, and submission date.

Run Type

Run Type, indicates whether the benchmark was a base run or was optimized.

Processors

Processors, this is the number of processors used in the benchmark, entered in the form by the benchmark submitter.

PP-HPL ( per processor )
HPL, Solves a randomly generated dense linear system of equations in double floating-point precision (IEEE 64-bit) arithmetic using MPI. The linear system matrix is stored in a two-dimensional block-cyclic fashion and multiple variants of code are provided for computational kernels and communication patterns. The solution method is LU factorization through Gaussian elimination with partial row pivoting followed by a backward substitution. Unit: Tera Flops per Second
PP-PTRANS (A=A+B^T, MPI) ( per processor )
PTRANS (A=A+B^T, MPI), Implements a parallel matrix transpose for two-dimensional block-cyclic storage. It is an important benchmark because it exercises the communications of the computer heavily on a realistic problem where pairs of processors communicate with each other simultaneously. It is a useful test of the total communications capacity of the network. Unit: Giga Bytes per Second
PP-RandomAccess ( per processor )
Global RandomAccess, also called GUPs, measures the rate at which the computer can update pseudo-random locations of its memory - this rate is expressed in billions (giga) of updates per second (GUP/s). Unit: Giga Updates per Second
PT-SN-STREAM ( per thread )
The Single Process STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple numerical vector kernels. It is run on single computational process chosen at random. Unit: Giga Bytes per Second
PT-SN-DGEMM ( per thread )
The Single Process DGEMM benchmark measures the floating-point execution rate of double precision real matrix-matrix multiply performed by the DGEMM subroutine from the BLAS (Basic Linear Algebra Subprograms). It is run on single computational process chosen at random. Unit: Giga Flops per Second
PP-FFTE ( per processor )
FFTE, performs the same test as FFTE but across the entire system by distributing the input vector in block fashion across all the processes. Unit: Giga Flops per Second
Randomly Ordered Ring Bandwidth ( per process )
Randomly Ordered Ring Bandwidth, reports bandwidth achieved in the ring communication pattern. The communicating processes are ordered randomly in the ring (with respect to the natural ordering of the MPI default communicator). The result is averaged over various random assignments of processes in the ring. Unit: Giga Bytes per second
Randomly-Ordered Ring Latency ( per process )
Randomly-Ordered Ring Latency ( per process ), reports latency in the ring communication pattern. The communicating processes are ordered randomly in the ring (with respect to the natural ordering of the MPI default communicator) in the ring. The result is averaged over various random assignments of processes in the ring. Unit: micro-seconds







  The values plotted for HPL, PTRANS, RandomAccess, and FFTE are per processor. The values plotted for SN-DGEMM and SN-STREAM are per thread. The value plotted for RandomRing Latency is normalized using it's reciprocal. Only those systems that have values for all the tests plotted are available for the diagram. Use the left-hand column to select up to 6 systems to plot in the Kiviat diagram.

Systems for Kiviat Chart - Optimized Runs Only - 29 Systems - Generated on Sun Nov 22 19:04:22 2009
PlotSystem Information
System - Processor - Speed - Count - Threads - Processes
PP-HPL PP-PTRANS PP-Random
Access
PT-SN-STREAM
Triad
PP-FFTE PT-SN-DGEMM RandomRing Bandwidth RandomRing Latency
 MA/PT/PS/PC/TH/PR/CM/CS/IC/IA/SDTFlop/s GB/s Gup/s GB/s GFlop/s GFlop/s GB/s usec
Manufacturer: Cray
Processor Type: AMD Opteron
Processor Speed: 2.6GHz
Processor Count: 196608
Threads: 3
Processses: 65536
System Name: XT5
Interconnect: Seastar
MPI: MPT 3.4.2
Affiliation: Oak Ridge National Laboratory
Submission Date: 11-10-09
0.00681
0.00961
0.0001853
2.118
0.05442
9.60
0.0404
15.99
Manufacturer: Cray
Processor Type: AMD Opteron
Processor Speed: 2.6GHz
Processor Count: 223112
Threads: 2
Processses: 111556
System Name: XT5
Interconnect: Seastar
MPI: MPT 3.4.2
Affiliation: Oak Ridge National Laboratory
Submission Date: 11-10-09
0.00658
0.06151
0.0001689
5.232
0.01739
9.74
0.0264
31.09
Manufacturer: Cray Inc.
Processor Type: Cray X1E
Processor Speed: 1.13GHz
Processor Count: 248
Threads: 1
Processses: 248
System Name: mfeg8
Interconnect: Modified 2D Torus
MPI: mpt 2.4
Affiliation: Cray
Submission Date: 06-15-05
0.01366
0.26617
0.0074788
32.821
-0.00403
14.77
0.2989
14.58
Manufacturer: Cray Inc.
Processor Type: AMD Opteron
Processor Speed: 2.4GHz
Processor Count: 12960
Threads: 1
Processses: 25920
System Name: Red Storm/XT3
Interconnect: Cray custom
MPI: MPICH 2 v1.0.2
Affiliation: NNSA/Sandia National Laboratories
Submission Date: 11-10-06
0.00702
0.18144
0.0023008
4.466
0.11799
4.41
0.0591
15.76
Manufacturer: Cray Inc.
Processor Type: AMD Opteron
Processor Speed: 2.4GHz
Processor Count: 12800
Threads: 1
Processses: 25600
System Name: Red Storm/XT3
Interconnect: Seastar
MPI: xt-mpt/1.5.39 based on MPICH 2.0
Affiliation: DOE/NNSA/Sandia National Laboratories
Submission Date: 11-06-07
0.00731
0.39013
0.0026221
4.846
0.11839
4.41
0.0424
19.25
Manufacturer: Cray Inc.
Processor Type: AMD Opteron
Processor Speed: 2.4GHz
Processor Count: 12960
Threads: 1
Processses: 25920
System Name: Red Storm/XT3
Interconnect: Seastar
MPI: xt-mpt/1.5.39 based on MPICH 2.0
Affiliation: DOE/NNSA/Sandia National Laboratories
Submission Date: 11-06-07
0.00719
0.18298
0.0022728
3.224
0.22152
4.41
0.0444
19.58
Manufacturer: Cray Inc.
Processor Type: Cray X1E
Processor Speed: 1.13GHz
Processor Count: 1008
Threads: 1
Processses: 1008
System Name: X1
Interconnect: Cray Modified 2D torus
MPI: MPT
Affiliation: DOE/Office of Science/ORNL
Submission Date: 11-02-05
0.01217
0.14382
0.0076272
32.849
0.24315
15.12
0.1532
16.30
Manufacturer: Cray Inc.
Processor Type: AMD Opteron
Processor Speed: 2.4GHz
Processor Count: 5208
Threads: 1
Processses: 5208
System Name: XT3
Interconnect: Cray Seastar
MPI: xt-mpt/1.3.07
Affiliation: Oak Ridge National Laboratory, DOE Office of Science
Submission Date: 11-10-05
0.00392
0.18092
0.0001267
5.626
0.14966
4.41
0.2047
9.33
Manufacturer: Cray Inc.
Processor Type: AMD Opteron
Processor Speed: 2.4GHz
Processor Count: 5208
Threads: 1
Processses: 5208
System Name: XT3
Interconnect: Cray Seastar
MPI: xt-mpt/1.3.07
Affiliation: Oak Ridge National Laboratories - DOE Office of Science
Submission Date: 11-12-05
0.00392
0.18092
0.0001267
5.626
0.14966
4.41
0.2047
9.33
Manufacturer: Cray Inc.
Processor Type: AMD Opteron
Processor Speed: 2.4GHz
Processor Count: 5208
Threads: 1
Processses: 5208
System Name: XT3
Interconnect: Cray Seastar
MPI: xt-mpt/1.3.07
Affiliation: Oak Ridge National Lab - DOD Office of Science
Submission Date: 11-12-05
0.00390
0.18130
0.0001320
5.633
0.16422
4.42
0.1988
9.18
Manufacturer: Cray Inc.
Processor Type: AMD Opteron
Processor Speed: 2.6GHz
Processor Count: 10404
Threads: 1
Processses: 10404
System Name: XT3 Dual-Core
Interconnect: Cray SeaStar
MPI: xt-mpt 1.5.25
Affiliation: Oak Ridge National Lab
Submission Date: 11-06-06
0.00418
0.19597
0.0010257
5.174
0.10791
4.80
0.0820
17.04
Manufacturer: Cray, Inc.
Processor Type: AMD Opteron
Processor Speed: 2.6GHz
Processor Count: 98304
Threads: 3
Processses: 32768
System Name: XT5
Interconnect: SeaStar 2+
MPI: MPT 3.4.2
Affiliation: National Institute for Computational Sciences
Submission Date: 11-02-09
0.00669
0.01587
0.0001882
2.664
0.07659
9.71
0.0559
15.45
Manufacturer: IBM
Processor Type: IBM PowerPC 440
Processor Speed: 0.7GHz
Processor Count: 1024
Threads: 1
Processses: 1024
System Name: Blue Gene/L
Interconnect: Custom
MPI: MPICH 1.0 customized for Blue Gene/L
Affiliation: Blue Gene Computational Center at IBM T.J. Watson Research Center
Submission Date: 04-11-05
0.00139
0.02734
0.0001316
1.410
0.04876
2.55
0.0346
4.83
Manufacturer: IBM
Processor Type: IBM PowerPC 440
Processor Speed: 0.7GHz
Processor Count: 131072
Threads: 1
Processses: 65536
System Name: Blue Gene/L
Interconnect: Custom Torus / Tree
MPI: MPICH2 1.0.1
Affiliation: National Nuclear Security Administration
Submission Date: 11-02-05
0.00192
0.00282
0.0002706
2.460
0.01763
2.07
0.0111
7.89
Manufacturer: IBM
Processor Type: IBM PowerPC 440
Processor Speed: 0.7GHz
Processor Count: 131072
Threads: 1
Processses: 65536
System Name: Blue Gene/L
Interconnect: Custom Torus / Tree
MPI: MPICH2 1.0.1
Affiliation: National Nuclear Security Administration
Submission Date: 11-02-05
0.00198
0.00286
0.0002516
2.440
0.01700
2.32
0.0111
7.78
Manufacturer: IBM
Processor Type: IBM PowerPC 440
Processor Speed: 0.7GHz
Processor Count: 32768
Threads: 1
Processses: 16384
System Name: Blue Gene/L
Interconnect: Blue Gene Custom Interconnect
MPI: MPICH 1.1
Affiliation: IBM T.J. Watson Research Center
Submission Date: 11-04-05
0.00205
0.00419
0.0005277
2.441
0.03016
2.32
0.0219
5.88
PlotSystem Information
System - Processor - Speed - Count - Threads - Processes
PP-HPL PP-PTRANS PP-Random
Access
PT-SN-STREAM
Triad
PP-FFTE PT-SN-DGEMM RandomRing Bandwidth RandomRing Latency
 MA/PT/PS/PC/TH/PR/CM/CS/IC/IA/SDTFlop/s GB/s Gup/s GB/s GFlop/s GFlop/s GB/s usec
Manufacturer: IBM
Processor Type: PowerPC 450
Processor Speed: 0.85GHz
Processor Count: 32768
Threads: 4
Processses: 32768
System Name: Blue Gene/P
Interconnect: Torus
MPI: MPICH 2
Affiliation: Argonne National Lab - LCF
Submission Date: 11-17-08
0.00529
0.01908
0.0031488
0.995
0.15502
2.42
0.0220
6.24
Manufacturer: IBM
Processor Type: Power PC 450
Processor Speed: 0.85GHz
Processor Count: 131072
Threads: 4
Processses: 32768
System Name: Dawn
Interconnect: Custom Torus + Tree + Barrier
MPI: MPICH2 1.0.7
Affiliation: NNSA - Lawrence Livermore National Laboratory
Submission Date: 11-11-09
0.00281
0.00578
0.0008936
0.995
0.02442
2.77
0.0223
5.59
Manufacturer: IBM
Processor Type: IBM Power5+
Processor Speed: 2.2GHz
Processor Count: 64
Threads: 1
Processses: 64
System Name: P5 P575+
Interconnect: HPS
MPI: poe 4.2.2.3
Affiliation: IBM
Submission Date: 05-08-06
0.00768
0.69218
0.0041258
12.790
0.36323
8.38
0.2692
8.99
Manufacturer: IBM
Processor Type: IBM Power5+
Processor Speed: 2.2GHz
Processor Count: 128
Threads: 1
Processses: 128
System Name: P5 P575+
Interconnect: HPS
MPI: poe 4.2.2.3
Affiliation: IBM
Submission Date: 05-08-06
0.00774
0.70306
0.0034272
12.746
0.32407
8.47
0.2181
9.67
Manufacturer: NEC
Processor Type: NEC SX-7
Processor Speed: 0.552GHz
Processor Count: 32
Threads: 1
Processses: 32
System Name: NEC SX-7
Interconnect: non
MPI: MPI/SX 7.0.6
Affiliation: Tohoku University, Information Synergy Center
Submission Date: 03-24-06
0.00825
1.13035
0.0080963
35.340
2.48367
8.83
10.1288
14.80
Manufacturer: NEC
Processor Type: NEC SX-7
Processor Speed: 0.552GHz
Processor Count: 32
Threads: 16
Processses: 2
System Name: NEC SX-7
Interconnect: non
MPI: MPI/SX 7.0.6
Affiliation: Tohoku University, Information Synergy Center
Submission Date: 03-24-06
0.00557
0.68809
0.0045895
34.589
0.25001
8.83
15.7361
4.83
Manufacturer: NEC
Processor Type: NEC SX-8
Processor Speed: 2GHz
Processor Count: 40
Threads: 1
Processses: 40
System Name: NEC SX-7C
Interconnect: IXS
MPI: MPI/SX 7.1.3
Affiliation: Tohoku University, Information Synergy Center
Submission Date: 03-24-06
0.01528
1.75225
0.0002130
63.206
2.32069
15.90
1.3304
10.33
Manufacturer: NEC
Processor Type: NEC SX-8
Processor Speed: 2GHz
Processor Count: 40
Threads: 8
Processses: 5
System Name: NEC SX-7C
Interconnect: IXS
MPI: MPI/SX 7.1.3
Affiliation: Tohoku University, Information Synergy Center
Submission Date: 03-24-06
0.00756
0.50325
0.0000536
37.330
0.74057
15.95
12.3982
6.67
Manufacturer: NEC
Processor Type: NEC SX-9
Processor Speed: 3.2GHz
Processor Count: 32
Threads: 16
Processses: 2
System Name: SX-9
Interconnect: IXS
MPI: MPI/SX 8.0.0/ISC
Affiliation: TOHOKU UNIVERSITY
Submission Date: 11-06-08
0.05704
4.03062
0.0030401
176.007
1.81197
82.71
26.0639
5.12
Manufacturer: NEC
Processor Type: NEC SX-9
Processor Speed: 3.2GHz
Processor Count: 256
Threads: 1
Processses: 256
System Name: SX-9
Interconnect: IXS
MPI: MPI/SX 8.0.0/ISC
Affiliation: TOHOKU UNIVERSITY
Submission Date: 11-06-08
0.07886
3.04226
0.0054730
224.168
9.28637
86.26
3.6404
9.40
Manufacturer: NEC
Processor Type: SX-9
Processor Speed: 3.2GHz
Processor Count: 960
Threads: 1
Processses: 960
System Name: SX-9
Interconnect: IXS
MPI: MPI/SX 8.0.10
Affiliation: Japan Agency for Marine-Earth Science and Technology (JAMSTEC)
Submission Date: 11-11-09
0.08286
2.41363
0.0021560
230.411
7.23166
88.39
2.5100
13.84
Manufacturer: NEC
Processor Type: SX-9
Processor Speed: 3.2GHz
Processor Count: 8
Threads: 1
Processses: 2
System Name: SX-9
Interconnect: IXS
MPI: MPI/SX 8.0.10
Affiliation: Japan Agency for Marine-Earth Science and Technology (JAMSTEC)
Submission Date: 11-16-09
0.02600
8.75649
0.0196734
908.373
0.05804
326.96
72.5959
4.15
Manufacturer: NEC
Processor Type: SX-9
Processor Speed: 3.2GHz
Processor Count: 16
Threads: 1
Processses: 2
System Name: SX-9
Interconnect: IXS
MPI: MPI/SX 8.0.10
Affiliation: Japan Agency for Marine-Earth Science and Technology (JAMSTEC)
Submission Date: 11-16-09
0.03684
6.29381
0.0056895
1473.900
0.03009
720.39
22.6232
6.81

 

Column Definitions
PP-HPL ( per processor )
Solves a randomly generated dense linear system of equations in double floating-point precision (IEEE 64-bit) arithmetic using MPI. The linear system matrix is stored in a two-dimensional block-cyclic fashion and multiple variants of code are provided for computational kernels and communication patterns. The solution method is LU factorization through Gaussian elimination with partial row pivoting followed by a backward substitution. Unit: Tera Flops per Second
PP-PTRANS (A=A+B^T, MPI) ( per processor )
Implements a parallel matrix transpose for two-dimensional block-cyclic storage. It is an important benchmark because it exercises the communications of the computer heavily on a realistic problem where pairs of processors communicate with each other simultaneously. It is a useful test of the total communications capacity of the network. Unit: Giga Bytes per Second
PP-RandomAccess ( per processor )
RandomAccess, also called GUPs, measures the rate at which the computer can update pseudo-random locations of its memory - this rate is expressed in billions (giga) of updates per second (GUP/s). Unit: Giga Updates per Second
PT-SN-STREAM ( per thread )
The Single Process STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple numerical vector kernels. It is run on single computational process chosen at random. Unit: Giga Bytes per Second
PP-FFTE ( per processor )
FFTE, performs the same test as FFTE but across the entire system by distributing the input vector in block fashion across all the processes. Unit: Giga Flops per Second
PT-SN-DGEMM ( per thread )
The Single Process DGEMM benchmark measures the floating-point execution rate of double precision real matrix-matrix multiply performed by the DGEMM subroutine from the BLAS (Basic Linear Algebra Subprograms). It is run on single computational process chosen at random. Unit: Giga Flops per Second
Random Ring Bandwidth ( per process )
Randomly Ordered Ring Bandwidth, reports bandwidth achieved in the ring communication pattern. The communicating processes are ordered randomly in the ring (with respect to the natural ordering of the MPI default communicator). The result is averaged over various random assignments of processes in the ring. Unit: Giga Bytes per second
Random Ring Latency ( per process )
Randomly-Ordered Ring Latency, reports latency in the ring communication pattern. The communicating processes are ordered randomly in the ring (with respect to the natural ordering of the MPI default communicator) in the ring. The result is averaged over various random assignments of processes in the ring. Unit: micro-seconds




Sun Nov 22 19:04:22 2009
0.3878 seconds