Parallel benchmark

From Gerris

Revision as of 11:25, 17 November 2010; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

This benchmark tests the scalability of Gerris by solving a 3D lid-driven cavity problem with different levels of refinement and different numbers of processors.

The scripts used to realize these tests are available in this darcs repository. This repository contains three main scripts:

  • install.sh installs Gerris in a local directory. Pkg-config, Glib and GTS are installed as well which means that the only requirements should be to have a working bourne shell, GNU make, mpirun, sed, awk and other standard Unix utilities (tar etc...).
  • stats.sh contains the test routines. The minimum and maximum number of processors should be setup in the parallelstats.par file.
  • post.sh does the post processing. It generates several graphs which can be found in the cartesian/ and adaptive/ directories.

It is very likely that these scripts need to be modified to take into account the queueing system of your cluster.

Contents

Results for the NIWA linux cluster "turbine" (new system)

Turbine is a linux cluster made of 26 quadcore Intel Xeon 2.66GHz nodes connected with a Gigabit ethernet network.

Wall clock time as a function of problem size and number of processors. Regular Cartesian grid.
Enlarge
Wall clock time as a function of problem size and number of processors. Regular Cartesian grid.
CPU usage as a function of problem size and number of processors. Regular Cartesian grid.
Enlarge
CPU usage as a function of problem size and number of processors. Regular Cartesian grid.

Results for the NIWA linux cluster "turbine" (older system with virtualization)

Wall clock time as a function of problem size and number of processors. Regular Cartesian grid.
Enlarge
Wall clock time as a function of problem size and number of processors. Regular Cartesian grid.
CPU usage as a function of problem size and number of processors. Regular Cartesian grid.
Enlarge
CPU usage as a function of problem size and number of processors. Regular Cartesian grid.

Results for an IBM p575/p6 system with 4.7 GHz processors (up to 1024 PEs)

Wall clock time as a function of problem size and number of processors. Regular Cartesian grid.
Enlarge
Wall clock time as a function of problem size and number of processors. Regular Cartesian grid.
Wall clock time as a function of problem size and number of processors. Adaptive grid (without load-balancing).
Enlarge
Wall clock time as a function of problem size and number of processors. Adaptive grid (without load-balancing).

Results for the Babbage cluster of the d'Alembert institute: 16 x 4 x Six-Core AMD Opteron(tm) Processor 8431

Wall clock time as a function of problem size and number of processors. Regular Cartesian grid.
Enlarge
Wall clock time as a function of problem size and number of processors. Regular Cartesian grid.
Personal tools
communication