*

# Sampling scheme under Decrypt

In the current version, sampling schemes are quite simplistic. Each scheme is defined by two clusters of individuals, defining two sampling populations $P_1$ and $P_2$.

• $P_1$, the red circle in the following picture, is fixed for all simulations at a user-defined location $(lon_1, lat_1)$. At sampling time, $n_1$ gene copies are sampled uniformly in a radius $r_1$ around this coordinate.
• $P_2$, white circle, is a population which location varies across simulations: it is uniformly sampled in the distribution area. At sampling time, $n_2$ gene copies are sampled uniformly in a radius $r_1$.

# Folder structure

This directory contains various folders:

• decrypt a folder where the project has been installed along with several examples files
• decrypt/output a temporary output folder to store simulations results
• decrypt/examples a temporary output folder to store simulations results

## Example configuration files

The sandbox/decrypt folder contains:

• an examples directory where you can find:
• australia_precipitation_6032.tif a raster representing the rainfall in North Australia
• config_1.ctl a configuration file for the model_1 program
• config_2.ctl a configuration file for the model_2 program
• config_3.ctl a configuration file for the model_3 program
• bpp.ctl a configuration file for BPP, not used in this demo
• data_extract a bunch of output generated by decrypt that we pre-computed
• last_N.tif a raster giving the spatial distribution of population sizes at sampling time when data_extract was generated
• an animate.R Rscript that you can call to generate animations of the simulations.

## Running the spatial process

Run the following in a terminal:

model_1 --config examples/config_1.ctl --landscape examples/australia_precipitation_6032.tif


You should see in the terminal if the demographic history has been simulated, then a not-functional-anymore progress bar,

--- Expanding demography
--- Simulating coalescents

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
--- Genealogies in Newick format:

((18:296953680,(((19:8,16:8):475,(7:4,8:4):479):156533027,((23:27,25:27):354,(11:268,(27:134,4:134):134):113):156533129):140420170):542315996,((((29:17,2:17):57,(5:9,3:9):65):1407153652,((6:1,9:1,(14:0,15:0):1):834405852,(22:497,((((1:4,0:4):54,10:58):91,(6:1,13:1):148):134,(21:0,17:0):283):214):834405356):572747873):995549076,((((56:53,(48:6,37:6):47):285,(45:2,(41:1,40:1,39:1):1):336):22,(43:240,(58:110,((34:11,36:11):4,(30:14,47:14):1):95):130):120):121,(((26:72,(24:68,28:68):4):377,(((59:2,57:2):309,42:311):66,(31:333,(((46:1,51:1):304,((((41:1,44:1):49,(49:0,50:0):50):3,32:53):110,((38:84,52:84):4,55:88):75):142):9,((53:0,54:0):7,(33:3,35:3):4):307):19):44):72):12,(20:31,12:31):430):20):1892264975):1646902178);



The program outputs in a databse as many gene trees in Newick format as they are loci, with each node being represented by a gene copy ID and the branch length in generations.

As many Imap files as they are simulations are also printed, mapping each gene copy to a sampling cluster, that is a putative population/species for BPP.

The program generates a bunch of files in the output directory giving access to various aspects of the demographic process. We will look to the demographic process more in details in the second part.

## Demographic process

Run the following:

model_1 --config examples/config_1.ctl --landscape examples/australia_precipitation_6032.tif
./animate.R output/N.tif 100


We are giving this time a different configuration file to spatial_process. Its content should be:

landscape=../decrypt/example/australia_precipitation_6032.tif
n_sim_gen=5
n_loci=5
lat_0=-20.0
lon_0=125.0
N_0=1000
duration=500
lat_1=-20.0
lon_1=125.0
n_sample_1=30
n_sample_2=30
sampling_threshold=30
suitability_threshold=26.4
K_max=50
p=0.2
K_min_a=15
K_min_b=50
r=1
emigrant_rate=0.5
friction_suitable=0.4
friction_unsuitable=0.6
demography_out=output/N.tif
last_layer_out=output/last_N.tif
distribution_area_out=output/distribution_area.shp
sample_out=output/sample.shp
database=output/test.db


At the bottom of this page, there is an equivalent file with comment documenting the meaning of the parameters. We are here mostly interested in changing values from the suitability_threshold to the friction_unsuitable options, configuring the demographic process.

### Demographic model

In its present version, the demographic model considers that the landscape is divided into suitable and unsuitable areas. Suitable areas are locations where the value of the landscape is greater than a threshold. Suitable areas are characterized by higher carrying capacity $K_{max}$ and a facilitated migration. Unsuitable areas have most of the time a low carrying capacity $K_{min}^a$, but with probability $p$ a location can switch to a higher $K_{min}^b$.

We may surely develop a simpler alternative model, but in its current state it allows to simulate interesting patterns of population persistence in unsuitable areas.

The growth rate is assumed constant across the landscape, as well as the emigrant rate.

This configuration allow to generate the following demgoraphic history.

You may need google chrome to be able to see this movie

Change the value options to generate different histories. Lower $p$ will be, the more the demographic expansion will be constrained to suitable areas.

## Visualize pre-computed results

### Sampling scheme

In the spatial process configuration file, we limited the number of simulations to 5 sampling schemes, each one composed of:

• 1 sampling cluster fixed on a given coordinate
• 1 sampling cluster that varies uniformly across the distribution area

Within a radius of 30km each of these coordinates, 30 individuals are sampled uniformly. These parameters can be change in the spatial_process.ctl configuration file.

First this script allows to generate a plot sampling_scheme.png representing the fixed sampling cluster, in red, and the 5 varying clusters with their respective radius, in black, on top of the spatial distribution of the population sizes at sampling time, in colors.

We find these kind of plots useful to configure the sampling scheme properties. The R script also generates visualization of the BPP robustness analysis.

### Posterior probability

To visualize the combined effects of departures from the MSC model hypothesis and sampling scheme, you can either look at the raw posterior probabilities, or perform a spatial interpolation of this probability.

The script generated a plot raw_posterior_probability.png representing at the location of population $P_2$ the posterior probability of detecting more than 1 species by BPP.

A spatial interpolation of these probabilities is also generated by the R script, and saved as interpolation.png

This plot give an interesting overview of what we should expect BPP to infer under a spatial history.