Parallel benchmark on multi-core CPUs

From Gerris

(Difference between revisions)
Jump to: navigation, search
Revision as of 01:51, 8 April 2010
Delauxs (Talk | contribs)
(See also)
← Previous diff
Revision as of 04:22, 10 October 2011
Popinet (Talk | contribs)

Next diff →
Line 1: Line 1:
-This benchmark uses the the [http://gfs.sourceforge.net/examples/examples/cylinder.html#htoc5 parallel Bénard–von Kármán Vortex Street] example. Various implementations of MPI were tested, with and without load-balancing, on the following system:+This benchmark uses the the [http://gfs.sourceforge.net/examples/examples/cylinder.html#htoc5 parallel Bénard–von Kármán Vortex Street] example. Various implementations of MPI were tested, with and without load-balancing, on the following systems:
 + 
 += popinet-new: Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GHz, 64-bits =
-* Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GHz, 64-bits 
* Ubuntu 9.10 64-bits * Ubuntu 9.10 64-bits
* Linux popinet 2.6.31-17-generic #54-Ubuntu SMP Thu Dec 10 17:01:44 UTC 2009 x86_64 GNU/Linux * Linux popinet 2.6.31-17-generic #54-Ubuntu SMP Thu Dec 10 17:01:44 UTC 2009 x86_64 GNU/Linux
Line 89: Line 90:
I have switched to MPICH2 on my development system. I have switched to MPICH2 on my development system.
-== Parameter file ==+= fitzroy: IBM Power 575 4.7 GHz =
 + 
 +[[Image:balance-fitzroy.png]]
 + 
 +{| border="1"
 +|-
 +! #CPUs
 +! Relative speedup
 +|-
 +| 1
 +| 1
 +|-
 +| 2 (load-balanced)
 +| 1.92
 +|-
 +| 4 (load-balanced)
 +| 2.32
 +|}
 + 
 += Parameter file =
Large outputs and movie generation were turned off, the single-CPU parameter file is: Large outputs and movie generation were turned off, the single-CPU parameter file is:
Line 133: Line 153:
</pre> </pre>
-== See also ==+= See also =
* [[Parallel benchmark]] * [[Parallel benchmark]]
* [[Parallel benchmark on other systems]] * [[Parallel benchmark on other systems]]

Revision as of 04:22, 10 October 2011

This benchmark uses the the parallel Bénard–von Kármán Vortex Street example. Various implementations of MPI were tested, with and without load-balancing, on the following systems:

Contents

popinet-new: Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GHz, 64-bits

  • Ubuntu 9.10 64-bits
  • Linux popinet 2.6.31-17-generic #54-Ubuntu SMP Thu Dec 10 17:01:44 UTC 2009 x86_64 GNU/Linux
  • Gerris2D version 2010-01-29

MPI versions:

MPICH1 
1.2.7-9.1ubuntu1 (packages mpich-shmem-bin, libmpich-shmem1.0-dev),
MPICH2 
1.2-1ubuntu1.1 (packages mpich2, libmpich2-dev, libmpich2-1.2),
Open MPI 
1.3.2-3ubuntu1.1 (packages libopenmpi1.3, openmpi-common, libopenmpi-dev, openmpi-bin)

MPICH1

Image:balance-mpich1.png

#CPUs Relative speedup
1 1
2 (load-balanced) 1.33
4 (load-balanced) 1.97

MPICH2

Image:balance-mpich2.png

#CPUs Relative speedup
1 1
2 (not load-balanced) 1.46
4 (not load-balanced) 2.44
2 (load-balanced) 2.1
4 (load-balanced) 3.5

Open MPI

Image:balance-openmpi.png

#CPUs Relative speedup
1 1
2 (not load-balanced) 1.46
4 (not load-balanced) hanged at t = 12 (but this varies)
2 (load-balanced) 2.1
4 (load-balanced) hanged at startup

Conclusions

Either Open MPI triggers a bug in Gerris which the other two libraries do not, or Open MPI (or its Ubuntu packaging) have serious problems. Rumours and various posts on Ubuntu Launchpad and other sites suggest that this may be the case. Note also that each Open MPI instance of gerris2D takes about 150 MB of virtual memory in contrast to ~70 MB for MPICH2 and 5 MB for the serial version.

The performance of Gerris/MPICH2 is very satisfactory taking into account the small problem size (~5000 elements/CPU).

I have switched to MPICH2 on my development system.

fitzroy: IBM Power 575 4.7 GHz

Image:balance-fitzroy.png

#CPUs Relative speedup
1 1
2 (load-balanced) 1.92
4 (load-balanced) 2.32

Parameter file

Large outputs and movie generation were turned off, the single-CPU parameter file is:

8 7 GfsSimulation GfsBox GfsGEdge {} {
  Time { end = 15 }
  Solid (x*x + y*y - 0.0625*0.0625)
  RefineSolid 6
  VariableTracer {} T
  Init {} { U = 1 }
  AdaptVorticity { istep = 1 } { maxlevel = 6 cmax = 1e-2 }
  AdaptGradient { istep = 1 } { maxlevel = 6 cmax = 1e-2 } T
  SourceViscosity 0.00078125
  EventBalance { istep = 1 } 0.1
  OutputTime { istep = 10 } stderr
  OutputTime { istep = 1 } balance
  OutputBalance { istep = 1 } balance
  OutputProjectionStats { istep = 10 } stderr
  OutputTiming { start = end } stderr
  OutputSimulation { start = end } end.gfs
}
GfsBox {
  left = Boundary {
    BcDirichlet U 1
    BcDirichlet T { return y < 0. ? 1. : 0.; }
  }
}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox { right = BoundaryOutflow }
1 2 right
2 3 right
3 4 right
4 5 right
5 6 right
6 7 right
7 8 right

See also

Personal tools
communication