# Many lazy walkers at the beach

You can think at least of two ways to simulate the expansion of populations of individuals in a discrete landscape:

- dispersing each individual separately according to a user-defined probability distribution.
- splitting each population across space according to user-defined migration probabilities.

These two strategies can seem quite similar in theory, but their computational border-effects are quite distinct and are detailed below.

## Individual-based dispersal in small populations

For the biologist, the most intuitive simulation behavior would be that each individual disperse independently. Accordingly, we defined a demographic expansion algorithm where the location of each individual after the dispersal event is sampled in a dispersal location kernel.

### Pros

This is a comfortable simulation model, as it guarantees without computational tricks that the number of individual in a deme is always an integer: simulating the coalescence process is then straightforward.

### Cons

This solution is **computationally costly** whenever the number of individuals
in the landscape is too high. Unfortunately, we typically have little control over this constraint, due to the
ABC random sampling of parameters value.

### Mathematical description

After the reproduction, the children dispersal is done by sampling their destination in a multinomial law, that defines $ \Phi_{x,y}^t $, the number of individuals going from $x$ to $y$ at time $t$:

\[ ( \Phi_{x,y}^{t} )_{y \in X} \sim M(\tilde{N}_{x}^{t},(m_{xy})_y) \]

The term $ (m_{xy})_y $ denotes the parameters of the multinomial law giving for an individual in $x$ its proability to go to $y$. These probabilities are given by the dispersal law with parameter $\theta$:

\[
\begin{array}{cclcl}
m & : & X^2 & \mapsto & R_{+}

& & (x,y) & \mapsto & m^{\theta}(x,y) ~.

\end{array}
\]

After migration, the number of individuals in deme $x$ is defined by the total number of individuals converging to $x$:

\[ N(x,t+1) = \displaystyle \sum_{i\in X} \Phi_{i,x}^{t}~. \]

### Step-by-step implementation

First create a new file `demo.cpp`

to write the simulation code in.

#### Including the required features

First we need to include the files allowing to build a demographic expansion.
This can be for example the `History`

class, that is part of the `demography`

module. So first make sure to include the demography module:

```
#include "quetzal/demography.h"
```

We will also need a couple of things:

- print objects in the terminal
- a random number generator for the simulation
- a simple, standard bernoulli random distribution to representation migration between two demes

To get these features we include some STL features:

```
#include <iostream>
#include <random>
```

So now that we included the required pre-existing code, we can begin to write our own code in the main function of the program:

```
#include "quetzal/demography.h"
#include <iostream>
#include <random>
int main
{
// we will write things here
return 0;
}
```

#### Expliciting the simulation model framework and hypothesis

A feature of Quetzal is that it makes no hypothesis about the space or the time representation. It is quite important to allow users to consider any way to represent space and time dimensions, using for example a geographic coordinates system.

Here we keep things simple so we consider the time to be represented by integer years, and the space to be represented also by integers:

- the number 1 representing the first deme
- the number -1 representing the other deme.

We specify these choices by defining some **type aliases** to keep things clear and
to be able to later change more easily the types if needed:

```
using coord_type = int;
using time_type = unsigned int;
```

We also want to use a standard **random number generator**, the mersenne twister engine:

```
using generator_type = std::mt19937;
```

We choose the individual-base strategy. Again, a **type alias** is welcome here:

```
using quetzal::demography::strategy::individual_based;
```

#### Demographic history initialization

We want to initialize the demographic history with some individuals in deme 1 in year 2018.
We use the `History`

class, that is an implementation of a forward-in-time
demographic simulator defined in the `demography`

module.

Using **template arguments**, we declare that this history has to be
recorded using the `coord_type`

coordinate system and `time_type`

temporal points.
We also need to specify that we need the special simulator version implemented with the `individual_based`

strategy.

```
quetzal::demography::History<coord_type, time_type, individual_based> history(1, 2018, 10);
```

#### Implementing a local growth process

Quetzal allow to represent the number of individuals in deme $x$ at time $t$ by any function of space and time: constant functions, uniform, stochastic or deterministic forms are all possible.

However, most of the time, we would like some kind of temporal dependency. That is, we need to know the number of individuals at the previous generation to be able to define the number of children.

We can retrieve this information using the following code line:

```
auto N = std::cref(history.pop_sizes());
```

The use to `std::cref`

is motivated by the fact we need a **copiable reference**
to the population size history. This little trick allows to capture the history
in a **lambda expression** encoding the growth process without having to copy the whole
historical database.

We can define the number of children as being deterministically twice the number of parents:

```
auto growth = [N](auto& gen, coord_type x, time_type t){ return 2*N(x,t) ; };
```

This little **functor** is very cheap to copy and can be freely passed around the
simulation context.

We could as well have used the random generator argument to define it as a stochastic process modeled by a Poisson distribution with mean parameter $\lambda = 2N(x,t)$:

```
auto growth = [N](auto& gen, coord_type x, time_type t){
std::poisson_distribution<unsigned int> dist(2*N(x,t));
return dist(gen);
};
```

#### Implementing a dispersal location kernel

The `individual_based`

strategy requires a specific type of dispersal function.
Remember that for each individual, a post-dispersal location is sampled in a distribution.

Complex distance-based location dispersal distribution can be constructed from a geographic support using Quetzal helper functions, but for now a very simple dispersal process will be sufficient to highlight the simulator behavior.

Basically we need to construct a function that can be called with three arguments:

- the random number generator for the random part of the process
- a
`coord_type`

argument that gives the place where the individual is*before dispersal* - a
`time_type`

argument giving the time at which the dispersal occurs.

The temporal argument is generally useless for pure distance-based dispersal that are constant in time. However, this argument allows important extensions:

- if the temporal scale is large enough, the dispersal parameters can vary through time.
- if the landscape heterogeneity is assumed to impact dispersal, then one may have to account for temporal variations of the landscape.

Here we define a very simple stochastic dispersal distribution, where the geographic sampling space is only the two considered demes ${-1 , 1}$ and with a 50% chance to disperse to the other deme:

```
auto sample_location = [](auto& gen, coord_type x, time_type t){
std::bernoulli_distribution d(0.5);
if(d(gen)){ x = -x; }
return x;
};
```

#### Expanding the history

We first define the number of non-overlapping generations to simulate, and we also initialize the random number generator:

```
unsigned int nb_generations = 3;
generator_type gen;
```

We can finally expand the history through space and time, and print out the flows historical database for visual check:

```
history.expand(nb_generations, growth, sample_location, gen);
std::cout << "Population flows from x to y at time t:\n\n"
<< history.flows()
<< std::endl;
```

### Complete script and compilation options

#### Compilation options

The complete `demo.cpp`

script is given below.

It compiles with the following terminal command:

```
g++ -Wall -std=c++14 demo.cpp
```

- the option
`-Wall`

enables all warnings - the option
`-std=c++14`

enables the C++14 standard compiler support

Then you can run the program with the following command:

```
./a.out
```

#### Complete script

```
#include "quetzal/demography.h"
#include <iostream>
#include <random>
int main(){
// Here we simulate a population expansion through a 2 demes landscape.
// Use type aliasing for readability
using coord_type = int;
using time_type = unsigned int;
using generator_type = std::mt19937;
// choose the strategy to be used
using quetzal::demography::strategy::individual_based;
// Initialize the history with 10 individuals introduced in deme x=1 at time t=2018
quetzal::demography::History<coord_type, time_type, individual_based> history(1, 2018, 10);
// Get a reference on the population sizes database
auto N = std::cref(history.pop_sizes());
// Capture it with a lambda expression to build a growth function
auto growth = [N](auto& gen, coord_type x, time_type t){ return 2*N(x,t) ; };
// Number of non-overlapping generations for the demographic simulation
unsigned int nb_generations = 3;
// Random number generation
generator_type gen;
// Stochastic dispersal kernel, purposely very simple
// The geographic sampling space is {-1 , 1}, there is 50% chance to migrate
auto sample_location = [](auto& gen, coord_type x, time_type t){
std::bernoulli_distribution d(0.5);
if(d(gen)){ x = -x; }
return x;
};
history.expand(nb_generations, growth, sample_location, gen);
std::cout << "Population flows from x to y at time t:\n\n"
<< history.flows()
<< std::endl;
}
```

The output would be:

```
Population flows from x to y at time t:
time from to flow
2020 1 -1 17
2020 1 1 25
2020 -1 -1 21
2018 1 1 12
2020 -1 1 17
2019 1 -1 12
2019 -1 -1 7
2018 1 -1 8
2019 -1 1 9
2019 1 1 12
```

## Mass-based dispersal for big populations

We can avoid the cost of having to disperse each individual of the landscape by considering a model where population masses would be split through the landscape according to migration probabilities: the model is deterministic in the sense that no random number is generated.

### Pros

This algorithm is more efficient as the computation time is now only related to the number of demes in the landscape. Consequently it allows to perform simulations with high number of individuals in a very reasonable amount of time

### Cons

Defining a strategy that is more efficient for high populations levels comes with the major drawback that it is unsuited whenever populations are too small:

What if 10 individuals are split across 1000 demes ?

What does 0.001 individual mean in terms of coalescence probability ?

It could seem easy to ensure it never happens. Unfortunately, population levels are actually hard to control in an ABC simulation context where parameters are randomly sampled, so we can hardly guarantee that this situation will not arise.

We avoid the problem with a computational little trick: when summing the population flows arriving to a same deme, the least integer greater or equal to the actual flow value is used.

So when dispersing 2 individuals in a 4-demes landscape with equal migration probabilities, the simulation will end up with $\lceil 2 \times 0.25 \rceil = 1$ individual in each demeā¦

That is, dispersing 2 individuals leads to having 4 individuals in the lanscape.

Obviously this behavior is not satisfying if the high-population hypothesis is violated, but when the population size in each deme is of the order of the number of demes, this little excess is expected to have little impact.

### Mathematical description

The children dispersal is done by splitting the population masses according to their migration probabilities, defining $ \Phi_{x,y}^t $, the population flow going from $x$ to $y$ at time $t$:

\(
(\Phi_{x,y}^{t})*{y\in X} = ( \lceil \tilde{N}*{x}^{t} \times m_{xy} \rceil )_{y\in X} ~.
\)

The term $ m_{xy} $ denotes the parameters of the transition kernel, giving for an individual in $x$ its probability to go to $y$. These probabilities are given by the dispersal law with parameter $\theta$:

\(
\begin{array}{cclcl}
m & : & X^2 & \mapsto & R_{+}

& & (x,y) & \mapsto & m^{\theta}(x,y) ~.

\end{array}
\)

After migration, the population size in deme $x$ is defined by the sum of population flows converging to $x$:

\( N(x,t+1) = \displaystyle \sum_{i\in X} \Phi_{i,x}^{t}~. \)

### Implementation

Most of the features are the same than for the individual_based strategy.

The only thing changing is the way to represent a dispersal kernel. For the individual-based strategy, we needed to sample a new location conditionally to the present location.

Now we just need a function that returns $m_{xy}$, the probability to move from $x$ to $y$ at time $t$. Basically, we need a transition matrix, not a random sampler.

Note that we are not forced to use a matrix data structure: it is enough to implement
a **functor** returning the migration rates, that is to define the member function
`operator()(coord_type x, coord_type y, time_type t)`

.

Another important constraint is that we need this transition matrix to give the set of inhabitable demes representing the landscape.

```
struct transition_matrix {
using coord_type = int;
using time_type = unsigned int;
std::vector<coord_type> state_space(time_type t)
{
return {-1, 1};
}
// 1/2 probability to change of location
double operator()(coord_type x, coord_type y, time_type t)
{
return 0.5;
}
};
```

Here we assume that the spatial representation
is constant through time so the temporal argument `time_type t`

is not used in the
body of the `state_space`

function.

It is however an important extension point. Indeed at large temporal scales we expect the landscape to go through some form of modification, for example contraction or expansion of the inhabitable areas due to sea-level rises or glaciation.

In this case we would be interested to use this time argument in the function implementation in order to implement this dynamic.

For the rest it is totally similar with the individual based strategy.

### Complete script and compilation options

#### Compilation options

The complete `demo.cpp`

script is given below.

It compiles with the following terminal command:

```
g++ -Wall -std=c++14 demo.cpp
```

- the option
`-Wall`

enables all warnings - the option
`-std=c++14`

enables the C++14 standard compiler support

Then you can run the program with the following command:

```
./a.out
```

#### Complete script

The complete code is given below. We changed the type of the demes identifiers - that have now names of cities, just to demonstrate that Quetzal functions are not linked to any precise geographical coordinate system.

```
#include "quetzal/demography.h"
#include <iostream>
#include <random>
#include <map>
struct transition_matrix {
using coord_type = std::string;
using time_type = unsigned int;
std::vector<coord_type> state_space(time_type t)
{
return {"Paris", "Ann Arbor"};
}
double operator()(coord_type x, coord_type y, time_type t)
{
return 0.5; // 1/2 probability to change of location
}
};
int main(){
using coord_type = std::string;
using time_type = unsigned int;
using generator_type = std::mt19937;
// Initialize history: 100 individuals introduced at x=1, t=2018
using quetzal::demography::strategy::mass_based;
quetzal::demography::History<coord_type, time_type, mass_based> history("Paris", 2018, 100);
// Growth function
auto N = std::cref(history.pop_sizes());
auto growth = [N](auto& gen, coord_type x, time_type t){ return 2*N(x,t) ; };
// Number of non-overlapping generations for the demographic simulation
unsigned int nb_generations = 3;
// Random number generation
generator_type gen;
transition_matrix M;
history.expand(nb_generations, growth, M, gen);
std::cout << "Population flows from x to y at time t:\n\n" << history.flows() << std::endl;
return 0;
}
```

The expected output is:

```
Population flows from x to y at time t:
time from to flow
2020 Ann Arbor Ann Arbor 20
2020 Paris Paris 20
2018 Paris Ann Arbor 10
2020 Ann Arbor Paris 20
2019 Ann Arbor Ann Arbor 10
2020 Paris Ann Arbor 20
2019 Paris Paris 10
2018 Paris Paris 10
2019 Paris Ann Arbor 10
2019 Ann Arbor Paris 10
```

# Conclusion

You now know how to implement your own little demographic simulator.

If you have very peculiar growth or dispersal model, this tutorial demonstrated that you can inject user-defined implementations into Quetzal components.

However, you will surely not manually hard-code the migration rates between hundreds of demes in a landscape.

You may find in Quetzal a way to automate the process of constructing spatially explicit, distance-based dispersal kernels.

But before to compute probabilities across space, you first need to know more about how to represent a spatially explicit landscape!