# Gerris in parallel

### From Gerris

The principle is relatively simple: each GfsBox can take a `pid`

argument which defines the rank of the process on which the solution
for this GfsBox will be computed. If you take the "half cylinder"
example and do something like:

4 3 GfsSimulation GfsBox GfsGEdge {} { Time { end = 10 } Refine 6 GtsSurfaceFile half-cylinder.gts Init {} { U = 1 } OutputProjectionStats { step = 0.02 } stderr OutputSimulation { step = 1 } simulation-%3.1f OutputTiming { start = end } stderr } GfsBox { pid = 0 left = BoundaryInflowConstant 1 } GfsBox { pid = 1 } GfsBox { pid = 2 } GfsBox { pid = 3 right = BoundaryOutflow } 1 2 right 2 3 right 3 4 right

if you run this using

% gerris2D half-cylinder.gfs

it will run on one processor. If you now do

% mpirun -np 4 gerris2D half-cylinder.gfs

it will run on 4 processors with each of the `GfsBoxes`

assigned to a
different processor. Gerris takes care of the communications necessary
at the boundaries between `GfsBoxes`

on different processors.

Any Gerris parameter file can be manually "parallelised" as explained previously. Gerris also includes functions designed to create "parallelised" simulation files. A short description of these functions is given when typing:

% gerris2D -h

-s N --split=N splits the domain N times and returns the corresponding simulation -i --pid keep box pids when splitting -p N --partition=N partition the domain in 2^N subdomains and returns the corresponding simulation -b N --bubble=N partition the domain in N subdomains and returns the corresponding simulation

The option `-s`

is used to split the domain which will create more `GfsBoxes`

.
It takes the already existing domain, splits it N times and attribute a new pid to each `GfsBox`

(unless the `-i`

option is specified).

The `-p`

and `-b`

options are used to "parallelise" a Gerris file that already contains enough `GfsBoxes`

(at least 2^N for the `-p`

option and N for the `-b`

option). It will group the `GfsBoxes`

together in order to get 2^N (or N) subdomains. Each of these domains is attributed a different pid. The difference between the `-p`

and `-b`

options is the algorithm used to perform the graph partitioning. The `-b`

option uses a simple and fast *bubble partitioning* algorithm which will not necessarily yield well-balanced subdomains. The `-p`

options uses a more complex and slower *recursive bisection* algorithm which is optimised to yield well-balanced subdomains.

### Example: the GfsSimulation is 2D and made of 3 `GfsBoxes`

1- If only one processor is to be used no parallelisation is required. We use the usual command line:

% gerris2D simulation.gfs

2- If 2 processors are to be used: being constituted of 3 `GfsBoxes`

2 solutions can be considered:

- Either the 3
`GfsBox`

can be redistributed in 2 subdomains, which are bound to be 1 subdomain of 1`GfsBox`

and one of 2`GfsBoxes`

. This can be done by:

% gerris2D -b 2 simulation.gfs > parallelsimulation.gfs

then the simulation can be started using:

% mpirun -np 2 gerris2D parallelsimulation.gfs

- If we want to get a better balance between the size of the 2 subdomains, it is possible to split the simulation once and then reassemble it.

The 3 `GfsBoxes`

can be split once which would create 3*4 = 12 `GfsBoxes`

% gerris2D -s 1 simulation.gfs > splitsimulation.gfs

then the 12 `GfsBoxes`

simulation can be partitioned in 2 groups of 6 `GfsBoxes`

, where the same pid is given to the `GfsBoxes`

of the same subdomain:

% gerris2D -b 2 splitsimulation.gfs > parallelsimulation.gfs

The simulation is still started in the same way:

% mpirun -np 2 gerris2D parallelsimulation.gfs

3- If 4 processors are to be used, then the domain has to be split anyway.

The 3 `GfsBoxes`

can be split once wich would create 3*4 = 12 `GfsBoxes`

% gerris2D -s 1 simulation.gfs > splitsimulation.gfs

then the 12 `GfsBoxes`

simulation can be partitioned in 4 groups of 3 `GfsBoxes`

, where the same pid is given to the `GfsBoxes`

of the same subdomain:

% gerris2D -b 4 splitsimulation.gfs > parallelsimulation.gfs

The simulation is still started in the same way:

% mpirun -np 4 gerris2D parallelsimulation.gfs

### Dynamic load-balancing

When adaptive mesh refinement is used, the number of cells of each subdomain will change during the course of the simulation. If the size of the subdomains is not changed, some processors will end up working much harder than others which will lead to inefficient parallelisation. It is then necessary to "rebalance" the simulation. This is done using the GfsEventBalance object. Note that in this case the quality of the initial partition does not matter much as it will be rebalanced regularly anyway. In this case using the simpler and faster `-b`

option to create the initial partition is adequate.