Base Results
Optimized Results
Base and Optimized
Base Results
Optimized Results
Base and Optimized
Base Results
Optimized Results
Base and Optimized
Base Results
Optimized Results
Base and Optimized
Base Results
Optimized Results
Base and Optimized
Base Results
Optimized Results
Base and Optimized
Base Results
Optimized Results
Base and Optimized
Manufacturer/Processor Type - Speed - Count - Threads - Processes
Includes the manufacturer/processor type, processor speed, number of processors, threads, and number of processes.
Move mouse over this column for each row to display additional information, including; manufacturer, system name, interconnect, MPI, affiliation, and submission date.

Computer System
Name and version of Message Passing Interface (MPI) implementation.

Run Type

Run Type, indicates whether the benchmark was a base run or was optimized.

Processors

Processors, this is the number of processors used in the benchmark, entered in the form by the benchmark submitter.

G-HPL ( system performance )
HPL, solves a randomly generated dense linear system of equations in double floating-point precision (IEEE 64-bit) arithmetic using MPI. The linear system matrix is stored in a two-dimensional block-cyclic fashion and multiple variants of code are provided for computational kernels and communication patterns. The solution method is LU factorization through Gaussian elimination with partial row pivoting followed by a backward substitution. Unit: Tera Flops per Second
G-PTRANS (A=A+B^T, MPI) ( system performance )
PTRANS (A=A+B^T, MPI), implements a parallel matrix transpose for two-dimensional block-cyclic storage. It is an important benchmark because it exercises the communications of the computer heavily on a realistic problem where pairs of processors communicate with each other simultaneously. It is a useful test of the total communications capacity of the network. Unit: Giga Bytes per Second
S-DGEMM ( single MPI process )
Single MPI process DGEMM, benchmark measures the floating-point execution rate of double precision real matrix-matrix multiply performed by the DGEMM subroutine from the BLAS (Basic Linear Algebra Subprograms). It is run on single computational process chosen at random. Unit: Giga Flops per Second
EP-DGEMM ( embarrassingly parallel )
Embarrassingly Parallel DGEMM, benchmark measures the floating-point execution rate of double precision real matrix-matrix multiply performed by the DGEMM subroutine from the BLAS (Basic Linear Algebra Subprograms). It is run in embarrassingly parallel manner - all computational processes perform the benchmark at the same time, the arithmetic average rate is reported. Unit: Giga Flops per Second
S-STREAM ( single MPI process )
The Single MPI process STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels. It is run on single computational chosen at random. Unit: Giga Bytes per Second
EP-STREAM ( per process )
The Embarrassingly Parallel STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels. It is run in embarrassingly parallel manner - all computational processes perform the benchmark at the same time, the average is computed. Unit: Giga Bytes per Second
S-Random Access ( single MPI process )
Single MPI process Random Access (also called GUPs), measures the rate at which the computer can update pseudo-random locations of its memory. The single CPU version runs the code locally on a randomly chosen processor. No explicit communication is performed and so the performance of the local memory subsystem is revealed. Unit: Giga Updates per Second
EP-RandomAccess ( embarrassingly parallel )
Embarrassingly Parallel Random Access (also called GUPs), measures the rate at which the computer can update pseudo-random locations of its memory. The embarrassingly parallel version runs the code locally on each processor. No explicit communication is performed (but shared-memory effects might occur). Unit: Giga Updates per Second
G-Random Access ( system performance )
Global Random Access (also called GUPs), measures the rate at which the computer can update pseudo-random locations of its memory - this rate is expressed in billions (giga) of updates per second (GUP/s). The MPI version generates the updating sequence locally and then distributes it using all-to-all collective communication. Unit: Giga Updates per Second
S-FFTE ( single MPI process )
Single MPI process FFTE, measures the floating point rate of execution of double precision complex one-dimensional Discrete Fourier Tranform (DFT). The vector size is a power of two. Unit: Giga Flops per Second
EP-FFTE ( embarrassingly parallel )
Embarrassingly Parallel FFTE, performs the same test as FFT but in embarrassingly parallel fashion - the code is run locally on each processor. No explicit communication is performed (but shared-memory effects might occur). Unit: Giga Flops per Second
G-FFTE ( system performance )
Global FFTE, performs the same test as FFT but across the entire system by distributing the input vector in block fashion across all the processes. Unit: Giga Flops per Second
Maximum Ping-Pong Latency
Maximum Ping-Pong Latency, reports the maximum latency for a number of non-simultaneous ping-pong tests. The ping-pongs are performed between as many as possible (there is an upper bound on the time it takes to complete this test) distinct pairs of processors. The test uses MPI standard send and receive routines. Unit: micro-seconds
Randomly-Ordered Ring Latency ( per process )
Randomly-Ordered Ring Latency, reports latency in the ring communication pattern. The communicating processes are ordered randomly in the ring. The result is averaged over various random assignments of processes in the ring.
Unit: micro-seconds
Minimum Ping-Pong Bandwidth
Minimum Ping-Pong Bandwidth, reports the minimum bandwidth for a number of non-simultaneous ping-pong tests. The ping-pongs are performed between as many as possible (there is an upper bound on the time it takes to complete this test) distinct pairs of processors. The test uses MPI standard send and receive routines. Unit: Giga Bytes per second
Randomly Ordered Ring Bandwidth ( per process )
Randomly Ordered Ring Bandwidth, reports bandwidth achieved in the ring communication pattern. The communicating processes are ordered randomly in the ring. The result is averaged over various random assignments of processes in the ring.
Unit: Giga Bytes per second per process
Naturally Ordered Ring Bandwidth ( per process )
Naturally Ordered Ring Bandwidth, reports bandwidth achieved in the ring communication pattern. The ring is formed with consecutive processes in MPI_COMM_WORLD.
Unit: Giga Bytes per second per process
Description above

See the row above this column for this columns description

Description below

See the row below this column for this columns description









HPCC Results - Optimized Runs Only - 33 Systems - Generated on Sun Nov 22 18:27:24 2009
System Information
System - Processor - Speed - Count - Threads - Processes
G-HPLG-PTRANSS-STREAMEP-STREAMRandom AccessLatencyBandwidthDGEMMFFTE
CopyScaleAddTriadCopyScaleAddTriadSEPGPingPong Max.Random RingPingPong Min.Random RingNatural RingSEPSEPG
MA/PT/PS/PC/TH/PR/CM/CS/IC/IA/SDTFlop/sGB/sGB/sGB/sGB/sGB/sGB/sGB/sGB/sGB/sGup/sGup/sGup/susecusecGB/sGB/sGB/sGFlop/sGFlop/sGFlop/sGFlop/sGFlop/s
Manufacturer: Cray
Processor Type: AMD Opteron
Processor Speed: 2.6GHz
Processor Count: 196608
Threads: 3
Processses: 65536
System Name: XT5
Interconnect: Seastar
MPI: MPT 3.4.2
Affiliation: Oak Ridge National Laboratory
Submission Date: 11-10-09
Cray XT5 AMD Opteron   2.6GHz   196608   3   65536
1338.670
1889.2
5.76
5.75
6.35
6.35
3.300
3.308
3.692
3.713
0.01432
0.01124
36.42750
9.00
15.99
1.526
0.0404
0.0403
28.81
28.39
1.211
1.243
10698.50
Manufacturer: Cray
Processor Type: AMD Opteron
Processor Speed: 2.6GHz
Processor Count: 223112
Threads: 2
Processses: 111556
System Name: XT5
Interconnect: Seastar
MPI: MPT 3.4.2
Affiliation: Oak Ridge National Laboratory
Submission Date: 11-10-09
Cray XT5 AMD Opteron   2.6GHz   223112   2   111556
1467.660
13723.2
10.48
10.57
10.34
10.46
3.531
3.527
3.498
3.570
0.03324
0.01706
37.68960
10.55
31.09
1.519
0.0264
0.3498
19.49
19.25
1.196
0.792
3879.21
Manufacturer: Cray Inc.
Processor Type: Cray X1E
Processor Speed: 1.13GHz
Processor Count: 248
Threads: 1
Processses: 248
System Name: mfeg8
Interconnect: Modified 2D Torus
MPI: mpt 2.4
Affiliation: Cray
Submission Date: 06-15-05
Cray Inc. mfeg8 Cray X1E   1.13GHz   248   1   248
3.389
66.0
27.48
27.37
32.98
32.82
10.750
10.819
13.379
13.229
0.24911
0.13560
1.85475
11.59
14.58
8.191
0.2989
3.1534
14.77
13.56
2.452
1.837
-1.00
Manufacturer: Cray Inc.
Processor Type: AMD Opteron
Processor Speed: 2.4GHz
Processor Count: 12960
Threads: 1
Processses: 25920
System Name: Red Storm/XT3
Interconnect: Cray custom
MPI: MPICH 2 v1.0.2
Affiliation: NNSA/Sandia National Laboratories
Submission Date: 11-10-06
Cray Inc. Red Storm/XT3 AMD Opteron   2.4GHz   12960   1   25920
90.990
2351.5
4.34
4.84
4.48
4.47
2.072
2.103
2.042
2.079
0.01818
0.00911
29.81800
9.31
15.76
1.978
0.0591
0.1545
4.41
4.40
0.711
0.624
1529.14
Manufacturer: Cray Inc.
Processor Type: AMD Opteron
Processor Speed: 2.4GHz
Processor Count: 12800
Threads: 1
Processses: 25600
System Name: Red Storm/XT3
Interconnect: Seastar
MPI: xt-mpt/1.5.39 based on MPICH 2.0
Affiliation: DOE/NNSA/Sandia National Laboratories
Submission Date: 11-06-07
Cray Inc. Red Storm/XT3 AMD Opteron   2.4GHz   12800   1   25600
93.579
4993.6
4.78
4.81
4.83
4.85
2.864
2.871
2.892
3.013
0.01598
0.00905
33.56300
10.25
19.25
1.977
0.0424
0.0571
4.41
4.40
0.722
0.618
1515.42
Manufacturer: Cray Inc.
Processor Type: AMD Opteron
Processor Speed: 2.4GHz
Processor Count: 12960
Threads: 1
Processses: 25920
System Name: Red Storm/XT3
Interconnect: Seastar
MPI: xt-mpt/1.5.39 based on MPICH 2.0
Affiliation: DOE/NNSA/Sandia National Laboratories
Submission Date: 11-06-07
Cray Inc. Red Storm/XT3 AMD Opteron   2.4GHz   12960   1   25920
93.243
2371.4
3.87
3.74
3.25
3.22
2.514
2.606
2.534
2.688
0.01830
0.00905
29.45560
10.37
19.58
1.977
0.0444
0.0960
4.41
4.40
0.699
0.633
2870.88
Manufacturer: Cray Inc.
Processor Type: Cray X1 MSP
Processor Speed: 0.8GHz
Processor Count: 252
Threads: 1
Processses: 252
System Name: X1
Interconnect: X1
MPI: MPT 2.4
Affiliation: Oak Ridge National Laboratory
Submission Date: 04-26-04
Cray Inc. X1 Cray MSP   0.8GHz   252   1   252
2.368
96.1
21.02
21.16
23.91
24.02
19.565
18.899
21.092
21.741
0.20847
0.20855

10.29
22.64
4.891
0.4383
2.5966





Manufacturer: Cray Inc.
Processor Type: Cray X1 MSP
Processor Speed: 0.8GHz
Processor Count: 60
Threads: 1
Processses: 60
System Name: X1
Interconnect: Cray modified 2D torus
MPI: MPT 2.4
Affiliation: U.S. Army Engineer Research and Development Center Major Shared Resource Center
Submission Date: 04-26-04
Cray Inc. X1 Cray MSP   0.8GHz   60   1   60
0.579
31.1
21.74
21.77
23.39
23.94
19.443
19.453
20.581
21.768
0.21195
0.21038

9.27
21.16
4.407
1.0099
3.4332





Manufacturer: Cray Inc.
Processor Type: Cray X1 MSP
Processor Speed: 0.8GHz
Processor Count: 124
Threads: 1
Processses: 124
System Name: X1
Interconnect: Cray modified 2D torus
MPI: MPT.2.3.0.3
Affiliation: Army High Performance Computing Research Center (AHPCRC)
Submission Date: 05-03-04
Cray Inc. X1 Cray MSP   0.8GHz   124   1   124
1.182
39.4
20.49
21.02
23.84
24.02
19.507
19.342
21.185
21.752
0.20822
0.20868

9.69
20.85
4.998
0.8039
4.1185





Manufacturer: Cray Inc.
Processor Type: Cray X1 MSP
Processor Speed: 0.8GHz
Processor Count: 124
Threads: 1
Processses: 124
System Name: X1
Interconnect: Cray modified 2D torus
MPI: MPT 2.3.0.3
Affiliation: Army High Performance Computing Research Center (AHPCRC)
Submission Date: 05-05-04
Cray Inc. X1 Cray MSP   0.8GHz   124   1   124
1.182
39.4
20.49
21.02
23.84
24.02
19.507
19.342
21.185
21.752
0.20822
0.20868

9.69
20.85
4.998
0.8039
4.1185





Manufacturer: Cray Inc.
Processor Type: Cray X1E
Processor Speed: 1.13GHz
Processor Count: 1008
Threads: 1
Processses: 1008
System Name: X1
Interconnect: Cray Modified 2D torus
MPI: MPT
Affiliation: DOE/Office of Science/ORNL
Submission Date: 11-02-05
Cray Inc. X1 Cray E   1.13GHz   1008   1   1008
12.265
145.0
26.76
26.76
32.84
32.85
10.948
10.877
13.447
12.587
0.33272
0.16474
7.68819
9.58
16.30
5.959
0.1532
3.0292
15.12
14.18
1.707
1.471
245.09
Manufacturer: Cray Inc.
Processor Type: AMD Opteron
Processor Speed: 2.4GHz
Processor Count: 5208
Threads: 1
Processses: 5208
System Name: XT3
Interconnect: Cray Seastar
MPI: xt-mpt/1.3.07
Affiliation: Oak Ridge National Laboratory, DOE Office of Science
Submission Date: 11-10-05
Cray Inc. XT3 AMD Opteron   2.4GHz   5208   1   5208
20.416
942.3
5.69
4.82
4.67
5.63
5.655
4.816
4.724
5.630
0.01961
0.01961
0.66005
8.51
9.33
1.148
0.2047
0.5785
4.41
4.41
0.594
0.594
779.43
Manufacturer: Cray Inc.
Processor Type: AMD Opteron
Processor Speed: 2.4GHz
Processor Count: 5208
Threads: 1
Processses: 5208
System Name: XT3
Interconnect: Cray Seastar
MPI: xt-mpt/1.3.07
Affiliation: Oak Ridge National Laboratories - DOE Office of Science
Submission Date: 11-12-05
Cray Inc. XT3 AMD Opteron   2.4GHz   5208   1   5208
20.416
942.3
5.69
4.82
4.67
5.63
5.655
4.816
4.724
5.630
0.01961
0.01961
0.66005
8.51
9.33
1.148
0.2047
0.5785
4.41
4.41
0.594
0.594
779.43
Manufacturer: Cray Inc.
Processor Type: AMD Opteron
Processor Speed: 2.4GHz
Processor Count: 5208
Threads: 1
Processses: 5208
System Name: XT3
Interconnect: Cray Seastar
MPI: xt-mpt/1.3.07
Affiliation: Oak Ridge National Lab - DOD Office of Science
Submission Date: 11-12-05
Cray Inc. XT3 AMD Opteron   2.4GHz   5208   1   5208
20.337
944.2
5.69
4.82
4.67
5.63
5.662
4.768
4.729
5.610
0.01960
0.01961
0.68744
8.18
9.18
1.149
0.1988
0.6492
4.42
4.42
0.593
0.605
855.24
Manufacturer: Cray Inc.
Processor Type: AMD Opteron
Processor Speed: 2.6GHz
Processor Count: 10404
Threads: 1
Processses: 10404
System Name: XT3 Dual-Core
Interconnect: Cray SeaStar
MPI: xt-mpt 1.5.25
Affiliation: Oak Ridge National Lab
Submission Date: 11-06-06
Cray Inc. XT3 Dual-Core AMD Opteron   2.6GHz   10404   1   10404
43.506
2038.9
5.39
4.14
3.45
5.17
2.542
2.237
2.054
2.551
0.01811
0.01015
10.67110
8.69
17.04
1.147
0.0820
0.2017
4.80
4.79
0.736
0.654
1122.70
Manufacturer: Cray, Inc.
Processor Type: AMD Opteron
Processor Speed: 2.6GHz
Processor Count: 98304
Threads: 3
Processses: 32768
System Name: XT5
Interconnect: SeaStar 2+
MPI: MPT 3.4.2
Affiliation: National Institute for Computational Sciences
Submission Date: 11-02-09
Cray, Inc. XT5 AMD Opteron   2.6GHz   98304   3   32768
657.625
1559.6
7.11
7.11
7.97
7.99
3.381
3.383
3.824
3.882
0.01146
0.00888
18.49650
8.51
15.45
1.541
0.0559
0.0553
29.14
28.82
1.370
1.220
7529.50
System Information
System - Processor - Speed - Count - Threads - Processes
G-HPLG-PTRANSS-STREAMEP-STREAMRandom AccessLatencyBandwidthDGEMMFFTE
CopyScaleAddTriadCopyScaleAddTriadSEPGPingPong Max.Random RingPingPong Min.Random RingNatural RingSEPSEPG
MA/PT/PS/PC/TH/PR/CM/CS/IC/IA/SDTFlop/sGB/sGB/sGB/sGB/sGB/sGB/sGB/sGB/sGB/sGup/sGup/sGup/susecusecGB/sGB/sGB/sGFlop/sGFlop/sGFlop/sGFlop/sGFlop/s
Manufacturer: IBM
Processor Type: IBM PowerPC 440
Processor Speed: 0.7GHz
Processor Count: 1024
Threads: 1
Processses: 1024
System Name: Blue Gene/L
Interconnect: Custom
MPI: MPICH 1.0 customized for Blue Gene/L
Affiliation: Blue Gene Computational Center at IBM T.J. Watson Research Center
Submission Date: 04-11-05
IBM Blue Gene/L PowerPC 440   0.7GHz   1024   1   1024
1.420
28.0
1.29
1.31
1.42
1.41
0.718
0.722
0.815
0.843
0.00669
0.00408
0.13473
4.25
4.83
0.159
0.0346
0.0940
2.55
2.47
0.229
0.216
49.93
Manufacturer: IBM
Processor Type: IBM PowerPC 440
Processor Speed: 0.7GHz
Processor Count: 131072
Threads: 1
Processses: 65536
System Name: Blue Gene/L
Interconnect: Custom Torus / Tree
MPI: MPICH2 1.0.1
Affiliation: National Nuclear Security Administration
Submission Date: 11-02-05
IBM Blue Gene/L PowerPC 440   0.7GHz   131072   1   65536
252.297
369.6
1.64
1.31
2.07
2.46
1.644
1.307
2.049
2.442
0.00644
0.00644
35.47060
8.98
7.89
0.152
0.0111
0.1805
2.07
2.07
0.237
0.237
2311.09
Manufacturer: IBM
Processor Type: IBM PowerPC 440
Processor Speed: 0.7GHz
Processor Count: 131072
Threads: 1
Processses: 65536
System Name: Blue Gene/L
Interconnect: Custom Torus / Tree
MPI: MPICH2 1.0.1
Affiliation: National Nuclear Security Administration
Submission Date: 11-02-05
IBM Blue Gene/L PowerPC 440   0.7GHz   131072   1   65536
259.213
374.4
1.64
1.31
2.08
2.44
1.644
1.307
2.054
2.440
0.00652
0.00652
32.98340
8.81
7.78
0.152
0.0111
0.1851
2.32
2.31
0.214
0.214
2228.39
Manufacturer: IBM
Processor Type: IBM PowerPC 440
Processor Speed: 0.7GHz
Processor Count: 32768
Threads: 1
Processses: 16384
System Name: Blue Gene/L
Interconnect: Blue Gene Custom Interconnect
MPI: MPICH 1.1
Affiliation: IBM T.J. Watson Research Center
Submission Date: 11-04-05
IBM Blue Gene/L PowerPC 440   0.7GHz   32768   1   16384
67.117
137.2
1.64
1.31
2.08
2.44
1.644
1.307
2.064
2.440
0.00652
0.00652
17.29110
6.74
5.88
0.152
0.0219
0.1850
2.32
2.31
0.208
0.209
988.18
Manufacturer: IBM
Processor Type: PowerPC 450
Processor Speed: 0.85GHz
Processor Count: 32768
Threads: 4
Processses: 32768
System Name: Blue Gene/P
Interconnect: Torus
MPI: MPICH 2
Affiliation: Argonne National Lab - LCF
Submission Date: 11-17-08
IBM Blue Gene/P PowerPC 450   0.85GHz   32768   4   32768
173.362
625.2
5.44
3.63
3.98
3.98
5.438
3.626
3.980
3.980
0.00969
0.00969
103.18000
6.62
6.24
0.380
0.0220
0.7436
9.68
9.68
1.214
1.214
5079.59
Manufacturer: IBM
Processor Type: Power PC 450
Processor Speed: 0.85GHz
Processor Count: 131072
Threads: 4
Processses: 32768
System Name: Dawn
Interconnect: Custom Torus + Tree + Barrier
MPI: MPICH2 1.0.7
Affiliation: NNSA - Lawrence Livermore National Laboratory
Submission Date: 11-11-09
IBM Dawn Power PC 450   0.85GHz   131072   4   32768
367.821
757.1
7.77
3.63
6.04
3.98
7.767
3.626
6.042
3.980
0.01018
0.01018
117.12700
6.64
5.59
0.377
0.0223
0.3742
11.07
11.07
1.229
1.225
3201.20
Manufacturer: IBM
Processor Type: IBM Power5+
Processor Speed: 2.2GHz
Processor Count: 64
Threads: 1
Processses: 64
System Name: P5 P575+
Interconnect: HPS
MPI: poe 4.2.2.3
Affiliation: IBM
Submission Date: 05-08-06
IBM P5 P575+ Power5+   2.2GHz   64   1   64
0.492
44.3
9.21
10.73
11.06
12.79
8.767
9.350
11.229
11.956
0.02323
0.02323
0.26405
3.95
8.99
3.557
0.2692
0.4575
8.38
8.39
0.749
0.756
23.25
Manufacturer: IBM
Processor Type: IBM Power5+
Processor Speed: 2.2GHz
Processor Count: 128
Threads: 1
Processses: 128
System Name: P5 P575+
Interconnect: HPS
MPI: poe 4.2.2.3
Affiliation: IBM
Submission Date: 05-08-06
IBM P5 P575+ Power5+   2.2GHz   128   1   128
0.991
90.0
11.43
11.40
13.31
12.75
9.697
9.353
12.056
11.966
0.02324
0.02317
0.43868
3.95
9.67
3.545
0.2181
0.3259
8.47
8.46
0.749
0.758
41.48
Manufacturer: NEC
Processor Type: NEC SX-7
Processor Speed: 0.552GHz
Processor Count: 32
Threads: 1
Processses: 32
System Name: NEC SX-7
Interconnect: non
MPI: MPI/SX 7.0.6
Affiliation: Tohoku University, Information Synergy Center
Submission Date: 03-24-06
NEC SX-7   0.552GHz   32   1   32
0.264
36.2
35.10
34.82
35.34
35.34
27.286
27.267
27.912
27.644
0.37741
0.32049
0.25908
9.40
14.80
9.751
10.1288
11.1196
8.83
8.62
5.434
1.490
79.48
Manufacturer: NEC
Processor Type: NEC SX-7
Processor Speed: 0.552GHz
Processor Count: 32
Threads: 16
Processses: 2
System Name: NEC SX-7
Interconnect: non
MPI: MPI/SX 7.0.6
Affiliation: Tohoku University, Information Synergy Center
Submission Date: 03-24-06
NEC SX-7   0.552GHz   32   16   2
0.178
22.0
552.23
546.63
554.00
553.43
424.261
422.575
448.820
452.361
0.38826
0.38606
0.14686
4.20
4.83
16.208
15.7361
15.7410
141.28
140.94
32.489
32.470
8.00
Manufacturer: NEC
Processor Type: NEC SX-8
Processor Speed: 2GHz
Processor Count: 40
Threads: 1
Processses: 40
System Name: NEC SX-7C
Interconnect: IXS
MPI: MPI/SX 7.1.3
Affiliation: Tohoku University, Information Synergy Center
Submission Date: 03-24-06
NEC SX-7C SX-8   2GHz   40   1   40
0.611
70.1
63.15
63.08
63.20
63.21
39.617
36.464
37.313
35.999
0.58802
0.49797
0.00852
5.06
10.33
12.985
1.3304
12.2699
15.90
15.95
11.338
7.907
92.83
Manufacturer: NEC
Processor Type: NEC SX-8
Processor Speed: 2GHz
Processor Count: 40
Threads: 8
Processses: 5
System Name: NEC SX-7C
Interconnect: IXS
MPI: MPI/SX 7.1.3
Affiliation: Tohoku University, Information Synergy Center
Submission Date: 03-24-06
NEC SX-7C SX-8   2GHz   40   8   5
0.302
20.1
337.79
296.78
271.46
298.64
319.800
283.920
266.471
288.597
0.59430
0.59469
0.00214
5.00
6.67
12.982
12.3982
12.3867
127.59
114.67
11.435
11.430
29.62
Manufacturer: NEC
Processor Type: NEC SX-9
Processor Speed: 3.2GHz
Processor Count: 32
Threads: 16
Processses: 2
System Name: SX-9
Interconnect: IXS
MPI: MPI/SX 8.0.0/ISC
Affiliation: TOHOKU UNIVERSITY
Submission Date: 11-06-08
NEC SX-9   3.2GHz   32   16   2
1.825
129.0
2746.17
2729.91
2760.59
2816.12
2735.730
2696.180
2649.600
2771.820
0.78617
0.78607
0.09728
3.19
5.12
30.519
26.0639
26.2547
1323.38
1304.81
429.634
435.474
57.98
Manufacturer: NEC
Processor Type: NEC SX-9
Processor Speed: 3.2GHz
Processor Count: 256
Threads: 1
Processses: 256
System Name: SX-9
Interconnect: IXS
MPI: MPI/SX 8.0.0/ISC
Affiliation: TOHOKU UNIVERSITY
Submission Date: 11-06-08
NEC SX-9   3.2GHz   256   1   256
20.188
778.8
201.79
201.61
223.93
224.17
169.543
170.782
166.082
169.656
0.78534
0.78423
1.40110
5.36
9.40
27.923
3.6404
24.9582
86.26
86.50
39.901
24.780
2377.31
Manufacturer: NEC
Processor Type: SX-9
Processor Speed: 3.2GHz
Processor Count: 960
Threads: 1
Processses: 960
System Name: SX-9
Interconnect: IXS
MPI: MPI/SX 8.0.10
Affiliation: Japan Agency for Marine-Earth Science and Technology (JAMSTEC)
Submission Date: 11-11-09
NEC SX-9   3.2GHz   960   1   960
79.546
2317.1
201.91
201.72
214.25
230.41
171.242
169.584
173.492
180.188
0.84649
0.84996
2.06978
8.72
13.84
19.180
2.5100
16.5676
88.39
89.18
9.779
9.472
6942.39
Manufacturer: NEC
Processor Type: SX-9
Processor Speed: 3.2GHz
Processor Count: 8
Threads: 1
Processses: 2
System Name: SX-9
Interconnect: IXS
MPI: MPI/SX 8.0.10
Affiliation: Japan Agency for Marine-Earth Science and Technology (JAMSTEC)
Submission Date: 11-16-09
NEC SX-9   3.2GHz   8   1   2
0.208
70.1
786.86
786.71
916.40
908.37
679.260
672.788
737.967
760.720
0.77668
0.77880
0.15739
2.68
4.15
84.319
72.5959
72.7310
326.96
330.67
130.155
94.932
0.46
System Information
System - Processor - Speed - Count - Threads - Processes
G-HPLG-PTRANSS-STREAMEP-STREAMRandom AccessLatencyBandwidthDGEMMFFTE
CopyScaleAddTriadCopyScaleAddTriadSEPGPingPong Max.Random RingPingPong Min.Random RingNatural RingSEPSEPG
MA/PT/PS/PC/TH/PR/CM/CS/IC/IA/SDTFlop/sGB/sGB/sGB/sGB/sGB/sGB/sGB/sGB/sGB/sGup/sGup/sGup/susecusecGB/sGB/sGB/sGFlop/sGFlop/sGFlop/sGFlop/sGFlop/s
Manufacturer: NEC
Processor Type: SX-9
Processor Speed: 3.2GHz
Processor Count: 16
Threads: 1
Processses: 2
System Name: SX-9
Interconnect: IXS
MPI: MPI/SX 8.0.10
Affiliation: Japan Agency for Marine-Earth Science and Technology (JAMSTEC)
Submission Date: 11-16-09
NEC SX-9   3.2GHz   16   1   2
0.589
100.7
1382.16
1387.86
1406.16
1473.90
1382.630
1376.140
1400.110
1464.400
0.72965
0.72966
0.09103
4.09
6.81
25.073
22.6232
22.6092
720.39
684.94
224.187
223.799
0.48



 

Note:
Blank fields in the table above are from early benchmark runs that did not include that individual benchmark,
in particular G-RandomAccess, FFTE and DGEMM.



Column Definitions
G-HPL ( system performance )
Solves a randomly generated dense linear system of equations in double floating-point precision (IEEE 64-bit) arithmetic using MPI. The linear system matrix is stored in a two-dimensional block-cyclic fashion and multiple variants of code are provided for computational kernels and communication patterns. The solution method is LU factorization through Gaussian elimination with partial row pivoting followed by a backward substitution. Unit: Tera Flops per Second
G-PTRANS (A=A+B^T, MPI) ( system performance )
Implements a parallel matrix transpose for two-dimensional block-cyclic storage. It is an important benchmark because it exercises the communications of the computer heavily on a realistic problem where pairs of processors communicate with each other simultaneously. It is a useful test of the total communications capacity of the network. Unit: Giga Bytes per Second
S-STREAM ( single MPI process )
The Single MPI process STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple numerical vector kernels. It is run on single computational process chosen at random. Unit: Giga Bytes per Second
EP-STREAM ( per process )
The Embarrassingly Parallel STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple numerical vector kernels. It is run in embarrassingly parallel manner - all computational processes perform the benchmark at the same time, the arithmetic average rate is reported. Unit: Giga Bytes per Second
S-RandomAccess ( single MPI process )
Single MPI process RandomAccess also called GUPs, measures the rate at which the computer can update pseudo-random locations of its memory - this rate is expressed in billions (giga) of updates per second (GUP/s). The single process version runs the code locally on a randomly chosen processor. No explicit communication is performed and so the performance of the local memory subsystem is revealed. Unit: Giga Updates per Second
EP-RandomAccess ( per process )
Embarrassingly Parallel RandomAccess, also called GUPs, measures the rate at which the computer can update pseudo-random locations of its memory - this rate is expressed in billions (giga) of updates per second (GUP/s). The embarrassingly parallel version runs the code locally on each process. No explicit communication is performed (but shared-memory effects might occur). Unit: Giga Updates per Second
G-RandomAccess ( system performance )
Also called GUPs, measures the rate at which the computer can update pseudo-random locations of its memory - this rate is expressed in billions (giga) of updates per second (GUP/s). The MPI version generates the updating sequence locally and then distributes it using all-to-all collective communication. Unit: Giga Updates per Second
PingPong Max. Latency
Maximum Ping-Pong Latency, reports the maximum latency for a number of non-simultaneous ping-pong tests. The ping-pongs are performed between as many as possible (there is an upper bound on the time it takes to complete this test) distinct pairs of processors. The test uses MPI standard send and receive routines. Unit: micro-seconds
RandomRing Latency
Randomly Ordered Ring Latency, reports latency in the ring communication pattern. The communicating processes are ordered randomly in the ring (with respect to the natural ordering of the MPI default communicator) in the ring. The result is averaged over various random assignments of processes in the ring. Unit: micro-seconds
PingPong Min. Bandwidth
Minimum Ping-Pong Bandwidth, reports the minimum bandwidth for a number of non-simultaneous ping-pong tests. The ping-pongs are performed between as many as possible (there is an upper bound on the time it takes to complete this test) distinct pairs of processors. The test uses MPI standard send and receive routines. Unit: Giga Bytes per second
RandomRing Bandwidth
Randomly Ordered Ring Bandwidth, reports bandwidth achieved in the ring communication pattern. The communicating processes are ordered randomly in the ring (with respect to the natural ordering of the MPI default communicator). The result is averaged over various random assignments of processes in the ring. Unit: Giga Bytes per second per process
Natural Ring Bandwidth
Naturally Ordered Ring Bandwidth, reports bandwidth achieved in the ring communication pattern. The ring is formed with consecutive processes in MPI_COMM_WORLD. Unit: Giga Bytes per second per process
S-DGEMM ( single MPI process )
The Single MPI process DGEMM benchmark measures the floating-point execution rate of double precision real matrix-matrix multiply performed by the DGEMM subroutine from the BLAS (Basic Linear Algebra Subprograms). It is run on single computational process chosen at random. Unit: Giga Flops per Second
EP-DGEMM ( per process )
The Embarrassingly Parallel DGEMM benchmark measures the floating-point execution rate of double precision real matrix-matrix multiply performed by the DGEMM subroutine from the BLAS (Basic Linear Algebra Subprograms). It is run in embarrassingly parallel manner - all computational processes perform the benchmark at the same time, the arithmetic average rate is reported. Unit: Giga Flops per Second
S-FFTE ( single MPI process )
Single MPI process FFTE measures the floating point rate of execution of double precision complex one-dimensional Discrete Fourier Tranform (DFT). The vector size is a power of two. Unit: Giga Flops per Second
EP-FFTE ( per process )
Embarrassingly Parallel FFTE performs the same test as FFTE but in embarrassingly parallel fashion - the code is run locally on each processor. No explicit communication is performed (but shared-memory effects might occur). Unit: Giga Flops per Second
G-FFTE ( system performance )
Global FFTE performs the same test as FFTE but across the entire system by distributing the input vector in block fashion across all the processes. Unit: Giga Flops per Second




Sun Nov 22 18:27:24 2009
0 seconds