Parallel benchmark on multi-core CPUs

From Gerris

Revision as of 23:09, 4 February 2010; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

This benchmark uses the the parallel Bénard–von Kármán Vortex Street example. Various implementations of MPI were tested, with and without load-balancing, on the following system:

  • Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GHz, 64-bits
  • Ubuntu 9.10 64-bits
  • Linux popinet 2.6.31-17-generic #54-Ubuntu SMP Thu Dec 10 17:01:44 UTC 2009 x86_64 GNU/Linux
  • Gerris2D version 2010-01-29

MPI versions:

MPICH1 
1.2.7-9.1ubuntu1 (packages mpich-shmem-bin, libmpich-shmem1.0-dev),
MPICH2 
1.2-1ubuntu1.1 (packages mpich2, libmpich2-dev, libmpich2-1.2),
OpenMPI 

MPICH1

Image:balance-mpich1.png

#CPUs Relative speedup
1 1
2 (load-balanced) 1.33
4 (load-balanced) 1.97

MPICH2

Image:balance-mpich2.png

Parameter file

Large outputs and movie generation were turned off, the single-CPU parameter file is:

8 7 GfsSimulation GfsBox GfsGEdge {} {
  Time { end = 15 }
  Solid (x*x + y*y - 0.0625*0.0625)
  RefineSolid 6
  VariableTracer {} T
  Init {} { U = 1 }
  AdaptVorticity { istep = 1 } { maxlevel = 6 cmax = 1e-2 }
  AdaptGradient { istep = 1 } { maxlevel = 6 cmax = 1e-2 } T
  SourceViscosity 0.00078125
  EventBalance { istep = 1 } 0.1
  OutputTime { istep = 10 } stderr
  OutputTime { istep = 1 } balance
  OutputBalance { istep = 1 } balance
  OutputProjectionStats { istep = 10 } stderr
  OutputTiming { start = end } stderr
  OutputSimulation { start = end } end.gfs
}
GfsBox {
  left = Boundary {
    BcDirichlet U 1
    BcDirichlet T { return y < 0. ? 1. : 0.; }
  }
}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox { right = BoundaryOutflow }
1 2 right
2 3 right
3 4 right
4 5 right
5 6 right
6 7 right
7 8 right
Personal tools
communication