Parallel benchmark on multi-core CPUs

From Gerris

Jump to: navigation, search

This benchmark uses the the parallel Bénard–von Kármán Vortex Street example. The problem size is small which makes good parallel performance difficult to reach.

Contents

popinet-new: Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GHz, 64-bits

  • Ubuntu 10.04 LTS 64-bits
  • Linux popinet-new 2.6.32-34-generic #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011 x86_64 GNU/Linux
  • Gerris2D version 2012-11-29

MPI versions:

Open MPI 
1.4.1-2

Open MPI

Image:balanced-20121129.png

#CPUs Relative speedup
1 1
2 (load-balanced) 2.32
4 (load-balanced) 3.60

fitzroy: IBM Power 575 4.7 GHz

Image:balance-20121130.png

#CPUs Relative speedup
1 1
2 (load-balanced) 1.98
4 (load-balanced) 3.46

load leveler script

for 4 tasks

#!/bin/bash
 
#@ job_name = gerris
#@ class = General
#@ job_type = parallel
#@ node = 1
#@ tasks_per_node = 4
#@ task_affinity = core(1)
#@ output = gerris4.out
#@ error = log-4
#@ queue
 
. /home/popinet/.bashrc
 
export PATH=$PATH:/opt/xlcpp/v10.1.0.4/usr/vacpp/bin
poe gerris2D parallel-p2.gfs

Parameter file

Large outputs and movie generation were turned off, the single-CPU parameter file is:

8 7 GfsSimulation GfsBox GfsGEdge {} {
Time { end = 15 }
Solid (x*x + y*y - 0.0625*0.0625)
RefineSolid 6
VariableTracer {} T
Init {} { U = 1 }
AdaptVorticity { istep = 1 } { maxlevel = 6 cmax = 1e-2 }
AdaptGradient { istep = 1 } { maxlevel = 6 cmax = 1e-2 } T
SourceViscosity 0.00078125
EventBalance { istep = 1 } 0.1
OutputTime { istep = 10 } stderr
OutputTime { istep = 1 } balance
OutputBalance { istep = 1 } balance
OutputProjectionStats { istep = 10 } stderr
OutputTiming { start = end } stderr
OutputSimulation { start = end } end.gfs
}
GfsBox {
left = Boundary {
BcDirichlet U 1
BcDirichlet T { return y < 0. ? 1. : 0.; }
}
}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox {}
GfsBox { right = BoundaryOutflow }
1 2 right
2 3 right
3 4 right
4 5 right
5 6 right
6 7 right
7 8 right

Gnuplot file

set term pngcairo
set output 'balanced.png'
set xlabel 'Simulation time'
set ylabel 'Wall-clock time (s)'
set grid
set key top left
plot [0:15]'< grep step: log-1' u 4:10 w l t '1 CPU', \
           '< grep step: log-2' u 4:10 w l t '2 CPUs (load-balanced)', \
           '< grep step: log-4' u 4:10 w l t '4 CPUs (load-balanced)'

See also

Personal tools
communication