Parallel benchmark on multi-core CPUs
From Gerris
(Difference between revisions)
| Revision as of 23:45, 6 February 2010 Popinet (Talk | contribs) ← Previous diff |
Revision as of 21:42, 13 October 2011 Popinet (Talk | contribs) (Updated with more recent openmpi and gerris versions) Next diff → |
||
| Line 1: | Line 1: | ||
| - | This benchmark uses the the [http://gfs.sourceforge.net/examples/examples/cylinder.html#htoc5 parallel Bénard–von Kármán Vortex Street] example. Various implementations of MPI were tested, with and without load-balancing, on the following system: | + | This benchmark uses the the [http://gfs.sourceforge.net/examples/examples/cylinder.html#htoc5 parallel Bénard–von Kármán Vortex Street] example. The problem size is small which makes good parallel performance difficult to reach. |
| - | * Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GHz, 64-bits | + | = popinet-new: Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GHz, 64-bits = |
| - | * Ubuntu 9.10 64-bits | + | |
| - | * Linux popinet 2.6.31-17-generic #54-Ubuntu SMP Thu Dec 10 17:01:44 UTC 2009 x86_64 GNU/Linux | + | * Ubuntu 10.04 LTS 64-bits |
| - | * Gerris2D version 2010-01-29 | + | * Linux popinet-new 2.6.32-34-generic #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011 x86_64 GNU/Linux |
| + | * Gerris2D version 2011-10-13 | ||
| MPI versions: | MPI versions: | ||
| - | ; MPICH1 : 1.2.7-9.1ubuntu1 (packages mpich-shmem-bin, libmpich-shmem1.0-dev), | + | ; Open MPI : 1.4.1-2 |
| - | ; MPICH2 : 1.2-1ubuntu1.1 (packages mpich2, libmpich2-dev, libmpich2-1.2), | + | |
| - | ; Open MPI : 1.3.2-3ubuntu1.1 (packages libopenmpi1.3, openmpi-common, libopenmpi-dev, openmpi-bin) | + | |
| - | == MPICH1 == | + | == Open MPI == |
| - | [[Image:balance-mpich1.png]] | + | [[Image:balance-openmpi-new.png]] |
| {| border="1" | {| border="1" | ||
| Line 21: | Line 20: | ||
| ! Relative speedup | ! Relative speedup | ||
| |- | |- | ||
| - | | 1 | + | | 1 |
| - | | 1 | + | | 1 |
| |- | |- | ||
| - | | 2 (load-balanced) | + | | 2 (load-balanced) |
| - | | 1.33 | + | | 2.3 |
| |- | |- | ||
| | 4 (load-balanced) | | 4 (load-balanced) | ||
| - | | 1.97 | + | | 3.27 |
| |} | |} | ||
| - | == MPICH2 == | + | = fitzroy: IBM Power 575 4.7 GHz = |
| - | [[Image:balance-mpich2.png]] | + | [[Image:balance-fitzroy.png]] |
| {| border="1" | {| border="1" | ||
| Line 42: | Line 41: | ||
| | 1 | | 1 | ||
| | 1 | | 1 | ||
| - | |- | ||
| - | | 2 (not load-balanced) | ||
| - | | 1.46 | ||
| - | |- | ||
| - | | 4 (not load-balanced) | ||
| - | | 2.44 | ||
| |- | |- | ||
| | 2 (load-balanced) | | 2 (load-balanced) | ||
| - | | 2.1 | + | | 1.92 |
| |- | |- | ||
| | 4 (load-balanced) | | 4 (load-balanced) | ||
| - | | 3.5 | + | | 2.32 |
| |} | |} | ||
| - | == Open MPI == | + | = Parameter file = |
| - | + | ||
| - | [[Image:balance-openmpi.png]] | + | |
| - | + | ||
| - | {| border="1" | + | |
| - | |- | + | |
| - | ! #CPUs | + | |
| - | ! Relative speedup | + | |
| - | |- | + | |
| - | | 1 | + | |
| - | | 1 | + | |
| - | |- | + | |
| - | | 2 (not load-balanced) | + | |
| - | | 1.46 | + | |
| - | |- | + | |
| - | | 4 (not load-balanced) | + | |
| - | | hanged at t = 12 (but this varies) | + | |
| - | |- | + | |
| - | | 2 (load-balanced) | + | |
| - | | 2.1 | + | |
| - | |- | + | |
| - | | 4 (load-balanced) | + | |
| - | | hanged at startup | + | |
| - | |} | + | |
| - | + | ||
| - | == Conclusions == | + | |
| - | + | ||
| - | Either Open MPI triggers a bug in Gerris which the other two libraries do not, or Open MPI (or its Ubuntu packaging) have serious problems. Rumours and various posts on Ubuntu Launchpad and other sites suggest that this may be the case. Note also that each Open MPI instance of gerris2D takes about 150 MB of virtual memory in contrast to ~70 MB for MPICH2 and 5 MB for the serial version. | + | |
| - | + | ||
| - | The performance of Gerris/MPICH2 is very satisfactory taking into account the small problem size (~5000 elements/CPU). | + | |
| - | + | ||
| - | I have switched to MPICH2 on my development system. | + | |
| - | + | ||
| - | == Parameter file == | + | |
| Large outputs and movie generation were turned off, the single-CPU parameter file is: | Large outputs and movie generation were turned off, the single-CPU parameter file is: | ||
| Line 133: | Line 93: | ||
| </pre> | </pre> | ||
| - | == See also == | + | = See also = |
| * [[Parallel benchmark]] | * [[Parallel benchmark]] | ||
| + | * [[Parallel benchmark on other systems]] | ||
Revision as of 21:42, 13 October 2011
This benchmark uses the the parallel Bénard–von Kármán Vortex Street example. The problem size is small which makes good parallel performance difficult to reach.
Contents |
popinet-new: Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GHz, 64-bits
- Ubuntu 10.04 LTS 64-bits
- Linux popinet-new 2.6.32-34-generic #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011 x86_64 GNU/Linux
- Gerris2D version 2011-10-13
MPI versions:
- Open MPI
- 1.4.1-2
Open MPI
| #CPUs | Relative speedup |
|---|---|
| 1 | 1 |
| 2 (load-balanced) | 2.3 |
| 4 (load-balanced) | 3.27 |
fitzroy: IBM Power 575 4.7 GHz
| #CPUs | Relative speedup |
|---|---|
| 1 | 1 |
| 2 (load-balanced) | 1.92 |
| 4 (load-balanced) | 2.32 |
Parameter file
Large outputs and movie generation were turned off, the single-CPU parameter file is:
8 7 GfsSimulation GfsBox GfsGEdge {} {
Time { end = 15 }
Solid (x*x + y*y - 0.0625*0.0625)
RefineSolid 6
VariableTracer {} T
Init {} { U = 1 }
AdaptVorticity { istep = 1 } { maxlevel = 6 cmax = 1e-2 }
AdaptGradient { istep = 1 } { maxlevel = 6 cmax = 1e-2 } T
SourceViscosity 0.00078125
EventBalance { istep = 1 } 0.1
OutputTime { istep = 10 } stderr
OutputTime { istep = 1 } balance
OutputBalance { istep = 1 } balance
OutputProjectionStats { istep = 10 } stderr
OutputTiming { start = end } stderr
OutputSimulation { start = end } end.gfs
}
GfsBox {
left = Boundary {
BcDirichlet U 1
BcDirichlet T { return y < 0. ? 1. : 0.; }
}
}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox { right = BoundaryOutflow }
1 2 right
2 3 right
3 4 right
4 5 right
5 6 right
6 7 right
7 8 right



