Parallel benchmark on multi-core CPUs
From Gerris
(Difference between revisions)
| Revision as of 22:58, 4 February 2010 Popinet (Talk | contribs) ← Previous diff |
Revision as of 21:42, 13 October 2011 Popinet (Talk | contribs) (Updated with more recent openmpi and gerris versions) Next diff → |
||
| Line 1: | Line 1: | ||
| - | This benchmark uses the the [http://gfs.sourceforge.net/examples/examples/cylinder.html#htoc5 parallel Bénard–von Kármán Vortex Street] example. Various implementations of MPI were tested, with and without load-balancing, on the following system: | + | This benchmark uses the the [http://gfs.sourceforge.net/examples/examples/cylinder.html#htoc5 parallel Bénard–von Kármán Vortex Street] example. The problem size is small which makes good parallel performance difficult to reach. |
| - | * Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GHz, 64-bits | + | = popinet-new: Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GHz, 64-bits = |
| - | * Ubuntu 9.10 64-bits | + | |
| - | * Linux popinet 2.6.31-17-generic #54-Ubuntu SMP Thu Dec 10 17:01:44 UTC 2009 x86_64 GNU/Linux | + | * Ubuntu 10.04 LTS 64-bits |
| - | * Gerris2D version 2010-01-29 | + | * Linux popinet-new 2.6.32-34-generic #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011 x86_64 GNU/Linux |
| + | * Gerris2D version 2011-10-13 | ||
| MPI versions: | MPI versions: | ||
| - | ; MPICH1 : 1.2.7-9.1ubuntu1 (packages mpich-shmem-bin, libmpich-shmem1.0-dev), | + | ; Open MPI : 1.4.1-2 |
| - | ; MPICH2 : 1.2-1ubuntu1.1 (packages mpich2, libmpich2-dev, libmpich2-1.2), | + | |
| - | ; OpenMPI : | + | |
| - | == MPICH1 == | + | == Open MPI == |
| - | [[Image:balance-mpich1.png]] | + | [[Image:balance-openmpi-new.png]] |
| - | == MPICH2 == | + | {| border="1" |
| + | |- | ||
| + | ! #CPUs | ||
| + | ! Relative speedup | ||
| + | |- | ||
| + | | 1 | ||
| + | | 1 | ||
| + | |- | ||
| + | | 2 (load-balanced) | ||
| + | | 2.3 | ||
| + | |- | ||
| + | | 4 (load-balanced) | ||
| + | | 3.27 | ||
| + | |} | ||
| - | [[Image:balance-mpich2.png]] | + | = fitzroy: IBM Power 575 4.7 GHz = |
| - | == Parameter file == | + | [[Image:balance-fitzroy.png]] |
| + | |||
| + | {| border="1" | ||
| + | |- | ||
| + | ! #CPUs | ||
| + | ! Relative speedup | ||
| + | |- | ||
| + | | 1 | ||
| + | | 1 | ||
| + | |- | ||
| + | | 2 (load-balanced) | ||
| + | | 1.92 | ||
| + | |- | ||
| + | | 4 (load-balanced) | ||
| + | | 2.32 | ||
| + | |} | ||
| + | |||
| + | = Parameter file = | ||
| Large outputs and movie generation were turned off, the single-CPU parameter file is: | Large outputs and movie generation were turned off, the single-CPU parameter file is: | ||
| Line 63: | Line 92: | ||
| 7 8 right | 7 8 right | ||
| </pre> | </pre> | ||
| + | |||
| + | = See also = | ||
| + | |||
| + | * [[Parallel benchmark]] | ||
| + | * [[Parallel benchmark on other systems]] | ||
Revision as of 21:42, 13 October 2011
This benchmark uses the the parallel Bénard–von Kármán Vortex Street example. The problem size is small which makes good parallel performance difficult to reach.
Contents |
popinet-new: Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GHz, 64-bits
- Ubuntu 10.04 LTS 64-bits
- Linux popinet-new 2.6.32-34-generic #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011 x86_64 GNU/Linux
- Gerris2D version 2011-10-13
MPI versions:
- Open MPI
- 1.4.1-2
Open MPI
| #CPUs | Relative speedup |
|---|---|
| 1 | 1 |
| 2 (load-balanced) | 2.3 |
| 4 (load-balanced) | 3.27 |
fitzroy: IBM Power 575 4.7 GHz
| #CPUs | Relative speedup |
|---|---|
| 1 | 1 |
| 2 (load-balanced) | 1.92 |
| 4 (load-balanced) | 2.32 |
Parameter file
Large outputs and movie generation were turned off, the single-CPU parameter file is:
8 7 GfsSimulation GfsBox GfsGEdge {} {
Time { end = 15 }
Solid (x*x + y*y - 0.0625*0.0625)
RefineSolid 6
VariableTracer {} T
Init {} { U = 1 }
AdaptVorticity { istep = 1 } { maxlevel = 6 cmax = 1e-2 }
AdaptGradient { istep = 1 } { maxlevel = 6 cmax = 1e-2 } T
SourceViscosity 0.00078125
EventBalance { istep = 1 } 0.1
OutputTime { istep = 10 } stderr
OutputTime { istep = 1 } balance
OutputBalance { istep = 1 } balance
OutputProjectionStats { istep = 10 } stderr
OutputTiming { start = end } stderr
OutputSimulation { start = end } end.gfs
}
GfsBox {
left = Boundary {
BcDirichlet U 1
BcDirichlet T { return y < 0. ? 1. : 0.; }
}
}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox { right = BoundaryOutflow }
1 2 right
2 3 right
3 4 right
4 5 right
5 6 right
6 7 right
7 8 right



