# Demographic models¶

By default, msprime assumes a single randomly mating population of a fixed size for ancestry simulations, which is unrealistic for most purposes. To enable more realistic and complex simulations, msprime models population structure by defining a set of discrete Populations, with Migration between these populations occurring at different rates. Populations and their parameters can change over time, via demographic events. Please see the Definitions section for mathematical details.

The information required to define a demographic model is encapsulated by the Demography object. To run ancestry simulations for a given demography we use the demography parameter to sim_ancestry() and define what populations our samples are drawn from. For example, here we create a three-population model in which populations A and B split from C 1000 generations ago. We then run an ancestry simulation based on this demography; here we draw one diploid sample each from A and B at time zero:

demography = msprime.Demography()
ts = msprime.sim_ancestry(samples={"A": 1, "B": 1}, demography=demography, random_seed=12)
ts

Tree Sequence
Trees1
Sequence Length1.0
Sample Nodes4
Total Size2.3 KiB
Edges 6 172 Bytes
Individuals 2 44 Bytes
Migrations 0 4 Bytes
Mutations 0 8 Bytes
Nodes 7 172 Bytes
Populations 3 278 Bytes
Provenances 1 1.6 KiB
Sites 0 8 Bytes

Please see the Example models section for more detailed examples of models from the literature.

## Quick reference¶

Demography

Description of a demographic model

Building models

Demography.add_population()

Add a population to the model

Demography.set_symmetric_migration_rate()

Sets the symmetric migration rates among a set of populations

Demography.set_migration_rate()

Sets the migration rate from one population to another

Demography.add_population_parameters_change()

Change population size and growth rate values at some time

Demography.add_migration_rate_change()

Change the migration rate from one population to another at some time

Demography.add_symmetric_migration_rate_change()

Change the migration rate among a set of populations at some time

Demography.add_population_split()

One or more derived populations split from an ancestral population at a given time

Demography.add_admixture()

A derived population where the given fractions of the individuals are migrants from the corresponding ancestral populations.

Predefined theoretical models

Demography.isolated_model()

A model of isolated populations

Demography.island_model()

An interconnected web of populations

Demography.stepping_stone_model()

Simple spatial model

Constructing models from existing definitions

Demography.from_species_tree()

Parse a newick species tree

Demography.from_starbeast()

Parse StarBeast output

Demography.from_old_style()

Demography from msprime legacy 0.x inputs

Demography.from_demes()

Convert a Demes demographic model to an msprime Demography

Demography.to_demes()

Convert an msprime Demography to a Demes demographic model

Debugging

Demography.debug()

Get a Debugger for a demography

DemographyDebugger

Debugger for demographic models

Numerical methods

DemographyDebugger.coalescence_rate_trajectory()

Compute mean coalescence rate and fraction of uncoalesced lineages

DemographyDebugger.mean_coalescence_time()

Compute the mean time until coalescence

DemographyDebugger.lineage_probabilities()

Probability of the location of lineages over time

## Model¶

In this section we define the model used to describe demography in msprime, as encapsulated by a Demography object.

### Direction of time¶

Msprime simulates retrospective population models like the coalescent in which we begin with a sample of lineages at the present time and then simulate the historical events that affect these lineages. The process moves backwards in time, and time is measured in units of generations ago. Everything in msprime is considered from this backwards in time perspective of the simulation. So here, and throughout the documentation, when we say something happens “before” or “after” a particular event this is in terms of the simulation itself, and therefore usually the opposite of what you would mean if thinking about things forwards in time.

This is initially very confusing, but getting used to thinking “backwards in time” is essential for coalescent modelling and working with ancestral histories in general, so it’s worth the effort of getting used to it. For example, understanding how different patterns of lineages moving from one population to another backwards in time corresponds to population splits and admixture events is a great way to deepen your understanding of the coalescent in general.

Mostly it doesn’t matter, and it’s just a point of clarification for reading this documentation and understanding why the models we define are usually expressed backwards in time. However, it is essential to be remember that migration is described from the perspective of the coalescent, so that when describing migrations in msprime, the following reversed terminology holds:

destination (or dest)

The population containing the immediate ancestors of the migrant(s). Forwards in time, this is where migrants come from. In reverse time, a lineage that is traced backwards in time (i.e. from the current day into the past) will appear to jump into this ancestral destination at the precise time of the migration event.

source

The population containing the immediate descendants of the migrants. Forwards in time, this is where migrants end up. In reverse time, a lineage traced backwards from the current day will appear to jump from this population into the ancestral (“destination”) population.

We try to flag up the meaning of these definitions in the documentation, where it’s important.

### Definitions¶

Formally, population structure in msprime is modelled by specifying a fixed number of subpopulations $$d$$. Each population has an initial absolute population size $$s$$ and a per generation exponential growth rate $$\alpha$$. The size of a given population at time $$t$$ in the past (measured in generations) is therefore given by $$s e^{-\alpha t}$$. Demographic events that occur in the history of the simulated population alter some aspect of this population configuration at a particular time in the past.

Continuous migration between populations is modelled by a $$d \times d$$ matrix $$M$$ of per-generation migration rates. The $$(j,k)^{th}$$ entry of $$M$$ is the expected number of migrants moving from population $$k$$ to population $$j$$ per generation, divided by the size of population $$j$$. In terms of the coalescent process, $$M_{j,k}$$ gives the rate at which an ancestral lineage moves from population $$j$$ to population $$k$$, as one follows it back through time. In continuous-time models, when $$M_{j,k}$$ is close to zero, this rate is approximately equivalent to the fraction of population $$j$$ that is replaced each generation by migrants from population $$k$$. In discrete-time models, the equivalence is exact and each row of $$M$$ has the constraint $$\sum_{k \neq j} M_{j,k} \leq 1$$. This differs from the migration matrix one usually uses in population demography: if $$m_{k,j}$$ is the proportion of individuals (in the usual sense; not lineages) in population $$k$$ that move to population $$j$$ per generation, then translating this proportion of population $$k$$ to a proportion of population $$j$$, we have $$M_{j,k} = m_{k,j} \times N_k / N_j$$.

The details of population structure in msprime closely follow the model used in the classical ms program.

### Populations¶

A Demography contains a list of Population objects. A population represents some randomly mating set of individuals that optionally exchange migrants with other populations at some rate.

Note

Msprime version 0.x performed poorly when there were large numbers of populations, even if most of those populations had no lineages present for much of the simulation. Version 1 has a much better support for large numbers of populations, and there is now no performance cost for having inactive populations.

Populations essentially have two purposes:

1. To define the state of the population at the start of the simulation (i.e., the Initial size and the Growth rate)

2. To define Metadata which is associated with the corresponding tskit.Population objects in simulated tree sequences.

Populations additionally have a “state” which determines whether lineages can be present within them or not at a given time, as described in the Life cycle section.

#### Identifying populations¶

When used as part of the Demography, each population has an integer ID (its zero-based index in the populations list) and a name attribute. By default, msprime assigns the name pop_j to the population at index j in a demographic model. These default names can be overridden, and users should give populations more descriptive names when building complex models.

Attention

Population names must be unique within a single Demography and be valid Python identifiers. For example, this means that names like “my_pop_1” is valid, but “my-pop-1” and “my pop 1” are not.

When defining empirical models (such as those in the Example models section) names should be chosen for clarity. We also encourage the use of population split events to explicitly name and describe the different phases of a demographic history rather than use anonymous population size changes (as encouraged by the legacy 0.x API). This makes it easier to interpret the resulting tree sequence files (since detailed metadata about the populations in which coalescences occurred is maintained) and makes it easier to avoid modelling errors.

#### Initial size¶

The initial_size of a population is its size, as a number of individuals, at the start of a simulation (looking backwards in time). In other words, an initial size of n means that the population had n * ploidy chromosomes at the most recent time. If the population’s Growth rate is zero, then the population will have the same size for all time (unless there are some Events that change these parameters).

Note

The initial_size of a population refers to its size at time zero, and is not affected by the simulation start_time.

When using methods like Demography.island_model() to describe simple multipopulation models, the initial_size parameter takes a list of size values which defines both the number of populations and their sizes. For example, here we define two isolated populations (i.e., which have no migration) with sizes 100 and 200:

demography = msprime.Demography.isolated_model([100, 200])
demography

Populations (2)
0pop_0100.000{}
1pop_1200.000{}
Migration matrix (all zero)
Events (0)

When we have multiple populations that are the same size we can use some Python tricks to avoid code duplication. For example, here we create three populations with size 100:

demography = msprime.Demography.isolated_model([100] * 3)
demography

Populations (3)
0pop_0100.000{}
1pop_1100.000{}
2pop_2100.000{}
Migration matrix (all zero)
Events (0)

Warning

We use the Demography.isolated_model() function here as a way to create example demographies. However, it’s important to note that simulations in which we sample from multiple isolated populations will fail with an error because the lineages in question can never coalesce. See the Infinite waiting time section for more details.

#### Growth rate¶

Each population has an exponential growth rate so that the size of a population with initial size $$s$$ and growth rate $$\alpha$$ is $$s e^{-\alpha t}$$ at time $$t$$ generations in the past (see the Definitions section for more details).

Growth rates for functions like Demography.island_model() that construct a Demography are specified in a similar way to Initial size: we provide a list of sizes equal to the number of populations. For example, here we define 2 populations with different population sizes and growth rates:

demography = msprime.Demography.isolated_model([100, 200], growth_rate=[0.01, 0.02])
demography

Populations (2)
0pop_0100.00.010{}
1pop_1200.00.020{}
Migration matrix (all zero)
Events (0)

Note

The DemographyDebugger is a useful tool to help understand how population sizes change over time. See the Debugging tools section for details.

In tskit the Population class largely exists as a container for Metadata. Metadata is important and useful: it lets us associate information about our simulated populations with the output tree sequence, which we can then use when we are analysing the data.

Msprime associates two pieces of mandatory metadata with every population: their name and description. For example,

demography = msprime.Demography.stepping_stone_model([100, 100], migration_rate=0.01)
ts = msprime.sim_ancestry({0: 1, 1: 1}, demography=demography)

[{'description': '', 'name': 'pop_0'}, {'description': '', 'name': 'pop_1'}]


Here we have two populations in the output tree sequence, and the metadata for each population is a dictionary containing the keys name and description. These correspond to the same attributes on the msprime Population class. We don’t have to just use the defaults for these values: we can set them to (more or less) anything we like. For example,

demography = msprime.Demography.stepping_stone_model([100, 100], migration_rate=0.01)
demography[0].name = "awesome_pop"
demography[0].description = "This population is totally awesome"
demography[1].name = "more_awesome_pop"
demography[1].description = "This population is even more awesome"
ts = msprime.sim_ancestry({0: 1, 1: 1}, demography=demography)
for population in ts.populations():

id=0: {'description': 'This population is totally awesome', 'name': 'awesome_pop'}
id=1: {'description': 'This population is even more awesome', 'name': 'more_awesome_pop'}


As well as the default name and description metadata keys we can also associate additional metadata with population objects using the extra_metadata attribute of the msprime Population object. For example,

demography = msprime.Demography.stepping_stone_model([100, 100], migration_rate=0.01)
demography[0].name = "awesome_pop"
demography[1].name = "more_awesome_pop"
ts = msprime.sim_ancestry({0: 1, 1: 1}, demography=demography)
for pop in ts.populations():

0 	 👍 awesome_pop
1 	 🤘 more_awesome_pop


#### Life cycle¶

Each population has a state associated with it during the simulation, which can be one of INACTIVE, ACTIVE or PREVIOUSLY_ACTIVE. Populations can follow one of two state progressions:

INACTIVE → ACTIVE


and

ACTIVE → PREVIOUSLY_ACTIVE


All populations are ACTIVE by default at the start of the simulation (looking backwards in time) except if they are ancestral in a Population split. In this case populations are initially INACTIVE, by default. An ancestral population then transitions from INACTIVEACTIVE when the corresponding population split event occurs.

It is possible to override the default initial state of the population that is ancestral in a population split event using the initially_active boolean flag in the Population class. This is to facilitate “trunk population” models, as shown in this example.

Populations transition from ACTIVEPREVIOUSLY_ACTIVE when they are derived in either population split or admixture events.

Any attempts to move lineages into a population that is not ACTIVE via sampling, continuous or pulse migration will result in an error during the simulation.

#### Default sampling time¶

The default_sampling_time property of a Population defines the default time at which samples will be drawn from a population in sim_ancestry(). See the Specifying samples section for details on how this time is used when running simulations, and how sampling time can also be specified directly, overriding the population default.

The default sampling time can be specified directly using either the Demography.add_population() method, or by setting the default_sampling_time attribute of the Population later. The default value is 0, unless the population is ancestral in a Population split event added using the Demography.add_population_split() method. In this case the default sampling time is set to the time of the event (i.e., when the population first becomes active), unless the initially_active flag has been set.

### Migration¶

Migration is the process of lineages moving from one population to another during the course of the simulation. This either happens through continuous migration, where a rate of migration between each pair of populations is defined, or through Pulse (mass) migration events. In this section we are concerned with continuous migration.

As described in the Definitions section, continuous migration between populations is modelled by a matrix of rates, so that M[j, k] is the rate at which lineages move from population j to population k in the coalescent process, that is, backwards in time (see the Direction of time section for more details). Lineages that move from population j to k backwards in time actually correspond to individuals migrating from population k to j forwards in time.

Note

If you’re confused at this point, don’t worry. Everyone is confused by this.

Let’s take an example to clarify. Suppose we have a two population model in which we have migration at rate 0.1 from source population 0 to dest population 1 and no migration in the other direction. We’ll then take one haploid sample from each population:

demography = msprime.Demography.isolated_model([100, 100])
demography.set_migration_rate(source=0, dest=1, rate=0.1)
ts = msprime.sim_ancestry(
samples={0: 1, 1: 1},
demography=demography,
ploidy=1,
random_seed=12345)
ts.tables.nodes

01000.00000000b''
11110.00000000b''
201-164.57256316b''

Because we have only two samples and no recombination we have only one coalescent event in the simulation, and therefore only one ancestral node. We are interested in what populations nodes are associated with, which is shown in the population column of the node table. We can see that our samples, nodes 0 and 1, are associated with populations 0 and 1, respectively. Node 2, which is the ancestor of nodes 0 and 1, is from population 1 which means that the lineage for node 0 moved from (source) population 0 to (dest) population 1 as we traced its history back through time, so that it could ultimately coalesce with lineage 1 in population 1. However, forwards in time, this must mean that one of the individuals along node 0’s lineage (each lineage is a succession of individuals, passing on genetic information through the generations) must have migrated from population 1 to population 0.

Note

If you’re still confused, don’t worry, it’s still OK. Just remember that migration rates are confusing and come back to check the documentation whenever you need to work with them.

Fortunately, asymmetric migration rates are rare in practice and we can instead use the Demography.set_symmetric_migration_rate() and Demography.add_symmetric_migration_rate_change() methods to set the migration rate in both directions. This has the advantage of reducing duplication in our code, as well as meaning that we don’t need to remember what source and dest mean. See the Example models section for examples of setting symmetric migration rates.

Todo

Clarify that demes does things the other way around.

### Events¶

Setting the population parameters and migration matrix in the Demography object define the state of demographic model at the start of the simulation (i.e., the present). We are often interested in population models in which these parameters change over time; these are implemented through a set of demographic events, which make some instantaneous changes to the state of the simulation.

The DemographyDebugger is very useful for inspecting the state of the simulation at each of the epochs defined by these events—see the Debugging tools for examples.

#### Population parameters change¶

The Demography.add_population_parameters_change() method adds an event to change the initial_size or growth_rate of a population (or populations) at a given time. For example, here we create a two-population model and add two events that change the parameters of these populations over time. The first event changes the size of population A to 200, 10 generations in the past. The second event changes the size of both populations to 10 after 20 generations using the population=None shorthand.

demography = msprime.Demography()
demography.debug()

Epoch[0]: [0, 10) generations
Populations (total=2 active=2)
startendgrowth_rateAB
A 100.0 100.0 000
B 100.0 100.0 000
Events @ generation 10
timetypeparameterseffect
10Population parameter changepopulation=A, initial_size=200initial_size → 2e+02 for population A
Epoch[1]: [10, 20) generations
Populations (total=2 active=2)
startendgrowth_rateAB
A 200.0 200.0 000
B 100.0 100.0 000
Events @ generation 20
timetypeparameterseffect
20Population parameter changepopulation=-1, initial_size=10initial_size → 10 for all populations
Epoch[2]: [20, inf) generations
Populations (total=2 active=2)
startendgrowth_rateAB
A 10.0 10.0 000
B 10.0 10.0 000

#### Migration rate change¶

The Demography.add_symmetric_migration_rate_change() and Demography.add_migration_rate_change() methods are used to change the state of the migration matrix at a given time. Here we create a two population model with no migration initially, and then add symmetric migration between the two populations at rate 0.1 from generation 10.

demography = msprime.Demography()
demography.debug()

Epoch[0]: [0, 10) generations
Populations (total=2 active=2)
startendgrowth_rateAB
A 100.0 100.0 000
B 100.0 100.0 000
Events @ generation 10
timetypeparameterseffect
10Symmetric migration rate changepopulations=[A, B], rate=0.1Sets the symmetric migration rate between A and B to 0.1 per generation
Epoch[1]: [10, inf) generations
Populations (total=2 active=2)
startendgrowth_rateAB
A 100.0 100.0 000.1
B 100.0 100.0 00.10

#### Population split¶

The Demography.add_population_split() method is used to declare that one or more populations derive from a given ancestral population at some time. Consider the following population tree (much like a species tree) in which we have three present-day populations A, B and C, which merge into ancestral populations AB and ABC:

import tskit

tables = tskit.TableCollection(1)
for _ in range(3):
tables.sort()

tree = tables.tree_sequence().first()
SVG(tree.draw_svg(node_labels={0:"A", 1:"B", 2:"C", 3:"AB", 4:"ABC"}))


We can describe this population topology as follows:

demography = msprime.Demography()
demography

Populations (5)
0A100.000{}
1B100.000{}
2C100.000{}
3AB100.0010{}
4ABC100.0020{}
Migration matrix (all zero)
Events (2)
timetypeparameterseffect
10Population Splitderived=[A, B], ancestral=ABMoves all lineages from derived populations 'A' and 'B' to the ancestral 'AB' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.
20Population Splitderived=[AB, C], ancestral=ABCMoves all lineages from derived populations 'AB' and 'C' to the ancestral 'ABC' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.

Note

Note that the default_sampling_time value for the ancestral populations is set to the time of the split event.

The debug output for this demography shows how the various populations become active and inactive over time:

demography.debug()

Epoch[0]: [0, 10) generations
Populations (total=5 active=3)
startendgrowth_rateABC
A 100.0 100.0 0000
B 100.0 100.0 0000
C 100.0 100.0 0000
Events @ generation 10
timetypeparameterseffect
10Population Splitderived=[A, B], ancestral=ABMoves all lineages from derived populations 'A' and 'B' to the ancestral 'AB' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.
Epoch[1]: [10, 20) generations
Populations (total=5 active=2)
startendgrowth_rateCAB
C 100.0 100.0 000
AB 100.0 100.0 000
Events @ generation 20
timetypeparameterseffect
20Population Splitderived=[AB, C], ancestral=ABCMoves all lineages from derived populations 'AB' and 'C' to the ancestral 'ABC' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.
Epoch[2]: [20, inf) generations
Populations (total=5 active=1)
startendgrowth_rate
ABC 100.0 100.0 0

See the population tree example for a Human out-of-Africa model from the literature.

It is also possible to have less tree-like population topologies, in which several populations successively merge into one “trunk” population. For example, here we define three populations, where B and C merge into A 10 and 20 generations ago, respectively:

demography = msprime.Demography()
demography.debug()

Epoch[0]: [0, 10) generations
Populations (total=3 active=3)
startendgrowth_rateABC
A 100.0 100.0 0000
B 100.0 100.0 0000
C 100.0 100.0 0000
Events @ generation 10
timetypeparameterseffect
10Population Splitderived=[B], ancestral=AMoves all lineages from the 'B' derived population to the ancestral 'A' population. Also set 'B' to inactive, and all migration rates to and from the derived population to zero.
Epoch[1]: [10, 20) generations
Populations (total=3 active=2)
startendgrowth_rateAC
A 100.0 100.0 000
C 100.0 100.0 000
Events @ generation 20
timetypeparameterseffect
20Population Splitderived=[C], ancestral=AMoves all lineages from the 'C' derived population to the ancestral 'A' population. Also set 'C' to inactive, and all migration rates to and from the derived population to zero.
Epoch[2]: [20, inf) generations
Populations (total=3 active=1)
startendgrowth_rate
A 100.0 100.0 0

We can see that the B and C populations successively become inactive as lineages move into A.

Important

For trunk populations like A here we must set the initially_active flag if we want to draw samples from it at the present day.

See the trunk population example of this style of model from the literature.

The Demography.add_admixture() method is used to declare that a derived population is admixed from a set of ancestral populations. In the following example we create a demography with 4 populations: A, B and ADMIX are contemporary populations, and ANC is an ancestral root population. We add an admixture event in which ADMIX is derived from A and B 10 generations ago, and A and B then merge into ANC 20 generations ago. The debug output can help us understand what this means:

demography = msprime.Demography()
time=10, derived="ADMIX", ancestral=["A", "B"], proportions=[0.25, 0.75])
demography.debug()

Epoch[0]: [0, 10) generations
Populations (total=4 active=3)
A 100.0 100.0 0000
B 100.0 100.0 0000
Events @ generation 10
timetypeparameterseffect
10Admixturederived=ADMIX ancestral=[A, B] proportions=[0.25, 0.75]Moves all lineages from admixed population 'ADMIX' to ancestral populations. Lineages move to 'A' with proba 0.25; 'B' with proba 0.75. Set 'ADMIX' to inactive, and all migration rates to and from 'ADMIX' to zero.
Epoch[1]: [10, 20) generations
Populations (total=4 active=2)
startendgrowth_rateAB
A 100.0 100.0 000
B 100.0 100.0 000
Events @ generation 20
timetypeparameterseffect
20Population Splitderived=[A, B], ancestral=ANCMoves all lineages from derived populations 'A' and 'B' to the ancestral 'ANC' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.
Epoch[2]: [20, inf) generations
Populations (total=4 active=1)
startendgrowth_rate
ANC 100.0 100.0 0

In the first epoch (from 0 to 10 generations ago) we have three active populations: A, B and ADMIX. Then the admixture event happens: backwards in time we take all the lineages that are in ADMIX, and for each one we move it to A with probability 1/4 and to B with probability 3/4. In the next epoch (10 to 20 generations ago), we just have populations A and B active because the derived ADMIX population is now inactive.

What does this mean forwards in time? In this case, rather than thinking about sampled lineages moving from population to population we must think about whole populations and individuals migrating. So forwards in time, 10 generations ago the ADMIX population suddenly appeared and it contained 100 individuals (note the initial size): 25 of these individuals were migrants from A and the other 75 were migrants from B.

Note

The ANC population in this model isn’t important, it’s there to ensure that it’s possible for lineages to coalesce and we don’t get an Infinite waiting time.

See the American Admixture example below.

#### Pulse (mass) migration¶

Important

We do not recommend using mass migration events unless it is really necessary; most of the time, the desired effect can be achieved using Population split and Admixture events. If you are msprime 0.x user who is accustomed to working with mass migrations or updating legacy code, please consider “migrating” to using these new, more specific, events.

The Demography.add_mass_migration() method is used to move a proportion of the lineages currently in one population (the source) to another (the dest).

Warning

There are two things it’s vitally important to realise about mass migrations:

1. Like continuous migration the source and destination populations are from the perspective of the coalescent process, that is backwards in time. So, if a lineage is moved from source population A to dest population B by msprime, this corresponds to an individual moving from B to A, forwards in time. (Yes, this is confusing!)

2. Mass migration events do not alter the migration rates between populations. So, even if all lineages are moved out of a particular population (i.e., if proportion=1) that population may still have lineages migrating into it after the event. This can easily lead to errors when modelling population splits using MassMigration events. This error was present in the documentation for an older version of msprime, which lead to an incorrect version of an example demographic model being used in a number of publications. See this paper for more details.

The effect of mass migration events are summarised in the DemographyDebugger output:

demography = msprime.Demography.island_model([100, 100], migration_rate=0.1)
demography.debug()

Epoch[0]: [0, 10) generations
Populations (total=2 active=2)
startendgrowth_ratepop_0pop_1
pop_0 100.0 100.0 000.1
pop_1 100.0 100.0 00.10
Events @ generation 10
timetypeparameterseffect
10Mass Migrationsource=0, dest=1, proportion=0.5Lineages currently in population 0 move to 1 with probability 0.5 (equivalent to individuals migrating from 1 to 0 forwards in time)
Epoch[1]: [10, inf) generations
Populations (total=2 active=2)
startendgrowth_ratepop_0pop_1
pop_0 100.0 100.0 000.1
pop_1 100.0 100.0 00.10

#### Instantaneous bottleneck¶

A common approach to modelling the effect of demographic history on genealogies is to assume that effective population size ($$N_e$$) changes in discrete steps which define a series of epochs. In this setting of piece-wise constant $$N_e$$, capturing a population bottleneck requires three epochs: $$N_e$$ is reduced by some fraction $$b$$ at the start of the bottleneck, $$T_{start}$$, and recovers to its initial value at time $$T_{end}$$ (Marth et al 2004). If bottlenecks are short both on the timescale of coalescence and mutations, one may expect little information about the duration of a bottleneck $$(T_{end}-T_{start})$$ in sequence data. Thus a simpler, alternative model is to assume that bottlenecks are instantaneous ($$T_{end}-T_{start} \rightarrow 0$$) and generate a sudden burst of coalescence events in the genealogy.

The strength of the bottleneck $$B$$ can be thought of as an (imaginary) time period during which coalescence events are collapsed, i.e. there is no growth in genealogical branches during $$B$$ and the probability that a single pair of lineages entering the bottleneck coalesce during the bottleneck is $$1-e^{-B}$$.

In the following example we create a population of size 100, and then 25 generations ago create a bottleneck equivalent to 200 generations of the coalescent using the add_instantaneous_bottleneck() method:

demography = msprime.Demography()

ts = msprime.sim_ancestry(5, demography=demography, random_seed=1234)
SVG(ts.draw_svg(y_axis=True, size=(300, 200)))


We can see that there is a strong burst of coalescence 25 generations ago.

Important

Note that there is no possibility of recombination, migration or any other events occuring during the bottleneck: it is strictly instantaneous and can only result in coalescences within the trees.

See Bunnefeld et al. (2015) and Galtier et al. (2000) for more details on this model of population bottlenecks.

#### Simple bottleneck¶

In a simple bottleneck each lineage in a given population has a specified probability of coalescing into a single ancestral lineage and a given time. For example, here we create a bottleneck in which lineages have a probability of 0.8 of coalescing at time 1:

demography = msprime.Demography()

ts = msprime.sim_ancestry(4, demography=demography, random_seed=243)
SVG(ts.draw_svg(y_axis=True))


We can see that at time 1 we had 7 extant lineages, and all of these happened to coalesce, leading to a large polytomy at the root.

Note

The Demography.add_simple_bottleneck() method can be useful for creating trees with particular properties, but it is not based on any particular theoretical model and is unlikely to be useful in modelling real populations.

## Demography objects¶

Demography objects provide a number of convenient ways of accessing individual population data. Consider the following three population example:

demography = msprime.Demography()
demography

Populations (3)
0A1.000{}
1B2.000{}
2C3.000{}
Migration matrix (all zero)
Events (0)

Demography objects are essentially a mapping from population identifiers to the corresponding population objects, and behave like a dict with some extra methods. The additional feature provided by the Demography class is that populations can be found either by their integer ID their string names (see the Identifying populations section for more details):

print(demography["B"])
print(demography[1])

Population(initial_size=2, growth_rate=0, name='B', description='', extra_metadata={}, default_sampling_time=None, initially_active=None, id=1)
Population(initial_size=2, growth_rate=0, name='B', description='', extra_metadata={}, default_sampling_time=None, initially_active=None, id=1)


Note that the keys for a Demography are the names:

list(demography.keys())

['A', 'B', 'C']


The values for population objects can be updated in-place. For example, suppose we wanted to change the Growth rate for A to 0.001:

demography["A"].growth_rate = 0.001
demography

Populations (3)
0A1.00.0010{}
1B2.000{}
2C3.000{}
Migration matrix (all zero)
Events (0)

We can also loop over all populations using the standard dict protocols. For example, if we wanted to multiply all population sizes by 100 we could do:

for pop in demography.values():
pop.initial_size *= 100
demography

Populations (3)
0A100.00.0010{}
1B200.000{}
2C300.000{}
Migration matrix (all zero)
Events (0)

## Debugging tools¶

Creating demographic models is a difficult and error prone process and we need all the help we can get to avoid making mistakes. Msprime provides some basic debugging utilities to help understand and inspect the contents of Demography objects, as well as understand how the model parameters change over time.

### Demography objects¶

When working in a Jupyter notebook environment it is very easy to get a summary of the state of a Demography object: using Jupyter magic, we write the variable at the end of a cell and it gets summarised automatically:

# Define a demography
demography = msprime.Demography.island_model([100, 100], migration_rate=0.1)
# Ask Jupyter to print out a summary of it.
demography

Populations (2)
0pop_0100.000{}
1pop_1100.000{}
Migration matrix
pop_0pop_1
pop_000.1
pop_10.10
Events (0)

If you prefer to work in a terminal or other non-notebook environment the same information can be accessed by converting the demography object to a string, e.g., by calling print:

print(demography)

Demography
╟  Populations
║  ┌────────────────────────────────────────────────────────────────────────────────────────────────┐
║  │ id │name   │description  │initial_size  │ growth_rate │  default_sampling_time│extra_metadata  │
║  ├────────────────────────────────────────────────────────────────────────────────────────────────┤
║  │ 0  │pop_0  │             │100.0         │      0      │                      0│{}              │
║  │ 1  │pop_1  │             │100.0         │      0      │                      0│{}              │
║  └────────────────────────────────────────────────────────────────────────────────────────────────┘
╟  Migration Matrix
║  ┌───────────────────────┐
║  │       │ pop_0 │ pop_1 │
║  ├───────────────────────┤
║  │  pop_0│   0   │  0.1  │
║  │  pop_1│  0.1  │   0   │
║  └───────────────────────┘
╟  Events
║  ┌───────────────────────────────────┐
║  │  time│type  │parameters  │effect  │
║  ├───────────────────────────────────┤
║  └───────────────────────────────────┘


Note

This may not render correctly in all browsers because of the lack of support for fixed width fonts.

### DemographyDebugger¶

The DemographyDebugger is a very useful tool for understanding how a model changes over time. For example, let’s consider a simple population tree:

demography = msprime.Demography()
demography.debug()

Epoch[0]: [0, 1e+03) generations
Populations (total=3 active=2)
startendgrowth_rateAB
A 10000.0 10000.0 000
B 5000.0 5000.0 000
Events @ generation 1e+03
timetypeparameterseffect
1000Population Splitderived=[A, B], ancestral=CMoves all lineages from derived populations 'A' and 'B' to the ancestral 'C' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.
Epoch[1]: [1e+03, inf) generations
Populations (total=3 active=1)
startendgrowth_rate
C 1000.0 1000.0 0

Here we define a Demography object and then get a visual summary of the evolution of the model over time by calling the Demography.debug() method (which returns a DemographyDebugger instance).

The debugger splits time up in to simulation “epochs” during which all the parameters are the same, and no events happen (as everywhere in msprime, we work backwards in time). In each epoch we give the state of the population parameters and migration rates for all active populations. We also show the events that signal the end of the epoch, which create the next epoch.

Again, the same information is available for non-notebook users by converting the debugger object to string:

print(demography.debug())

DemographyDebugger
╠══════════════════════════════════╗
║ Epoch[0]: [0, 1e+03) generations ║
╠══════════════════════════════════╝
╟    Populations (total=3 active=2)
║    ┌───────────────────────────────────────────────┐
║    │   │     start│       end│growth_rate  │ A │ B │
║    ├───────────────────────────────────────────────┤
║    │  A│   10000.0│   10000.0│ 0           │ 0 │ 0 │
║    │  B│    5000.0│    5000.0│ 0           │ 0 │ 0 │
║    └───────────────────────────────────────────────┘
╟    Events @ generation 1e+03
║    ┌──────────────────────────────────────────────────────────────────────────────┐
║    │  time│type        │parameters       │effect                                  │
║    ├──────────────────────────────────────────────────────────────────────────────┤
║    │  1000│Population  │derived=[A, B],  │Moves all lineages from derived         │
║    │      │Split       │ancestral=C      │populations 'A' and 'B' to the          │
║    │      │            │                 │ancestral 'C' population. Also set the  │
║    │      │            │                 │derived populations to inactive, and    │
║    │      │            │                 │all migration rates to and from the     │
║    │      │            │                 │derived populations to zero.            │
║    └──────────────────────────────────────────────────────────────────────────────┘
╠════════════════════════════════════╗
║ Epoch[1]: [1e+03, inf) generations ║
╠════════════════════════════════════╝
╟    Populations (total=3 active=1)
║    ┌─────────────────────────────────────┐
║    │   │    start│      end│growth_rate  │
║    ├─────────────────────────────────────┤
║    │  C│   1000.0│   1000.0│ 0           │
║    └─────────────────────────────────────┘


The DemographyDebugger has a number of methods to perform calculations based on a given demography. See the Numerical utilities section for details and examples.

## Example models¶

In this section we show how to build demographic models by working through some example models from the literature.

### Population tree¶

The first example we examine is the Gutenkunst et al. out-of-Africa model, in which the history of three present-day populations is modelled by a tree of ancestral populations.

Important

If you want to use this precise model in your analyses we strongly recommend using stdpopsim, which provides a community maintained catalog of species information and demographic models for simulation. The model given here is equivalent to the HomSap/OutOfAfrica_3G09 model.

First we declare some variables and compute some quantities that will come in useful:

import math

# Times are provided in years, so we convert into generations.
generation_time = 25
T_OOA = 21.2e3 / generation_time
T_AMH = 140e3 / generation_time
T_ANC = 220e3 / generation_time
# We need to work out the starting population sizes based on
# the growth rates provided for these two populations
r_CEU = 0.004
r_CHB = 0.0055
N_CEU = 1000 / math.exp(-r_CEU * T_OOA)
N_CHB = 510 / math.exp(-r_CHB * T_OOA)
N_AFR = 12300


The time and other variables are declared here following the DRY principle: when we have to use the same numerical value more than once it’s a very good idea to use a variable to avoid errors. The time conversions are needed because time in msprime is always measured in generations, and the paper reported times in units of years. The size computations are required because the paper reported the European and Asian population sizes at the time they split from the Out-of-Africa population, but we need the present-day sizes.

Once we have computed these quantities we can declare the Demography object representing the model we are building and add some Populations:

demography = msprime.Demography()
name="YRI",
initial_size=N_AFR,
)
name="CEU",
description=(
"Utah Residents (CEPH) with Northern and Western European Ancestry"
),
initial_size=N_CEU,
growth_rate=r_CEU,
)
name="CHB",
description="Han Chinese in Beijing, China",
initial_size=N_CHB,
growth_rate=r_CHB,
)
name="OOA",
description="Bottleneck out-of-Africa population",
initial_size=2100,
)
name="AMH", description="Anatomically modern humans", initial_size=N_AFR
)
name="ANC",
description="Ancestral equilibrium population",
initial_size=7300,
)
demography

Populations (6)
1CEUUtah Residents (CEPH) with Northern and Western European Ancestry29725.30.0040{}
2CHBHan Chinese in Beijing, China54090.30.00550{}
3OOABottleneck out-of-Africa population2100.000{}
4AMHAnatomically modern humans12300.000{}
5ANCAncestral equilibrium population7300.000{}
Migration matrix (all zero)
Events (0)

Here we create a new Demography object and add six populations using the Demography.add_population() method. The first three populations represent present-day populations and the others ancestral populations. For convenience, we define the Initial size of the ancestral populations here, but technically those populations won’t be active until later in the simulation (i.e., further back in time). To check that we have correctly set up the starting states of the populations, we can look at the HTML representation using Jupyter magic.

This sets up our populations, but there is no Migration between any of them, as we can see in the output migration matrix. We can add migration to the model using the Demography.set_symmetric_migration_rate() method (we can also set asymmetric migration rates using Demography.set_migration_rate(), but beware of the confusion around the meaning of source and dest).

# Set the migration rates between extant populations
demography.set_symmetric_migration_rate(["CEU", "CHB"], 9.6e-5)
demography.set_symmetric_migration_rate(["YRI", "CHB"], 1.9e-5)
demography.set_symmetric_migration_rate(["YRI", "CEU"], 3e-5)
demography

Populations (6)
1CEUUtah Residents (CEPH) with Northern and Western European Ancestry29725.30.0040{}
2CHBHan Chinese in Beijing, China54090.30.00550{}
3OOABottleneck out-of-Africa population2100.000{}
4AMHAnatomically modern humans12300.000{}
5ANCAncestral equilibrium population7300.000{}
Migration matrix
YRICEUCHBOOAAMHANC
YRI03e-051.9e-05000
CEU3e-0509.6e-05000
CHB1.9e-059.6e-050000
OOA000000
AMH000000
ANC000000
Events (0)

Now we can see that the migration matrix shows the appropriate rates between the extant populations.

Tip

If you hover your mouse over an entry in the migration matrix it will tell which populations are involved, and how the rate can be interpreted.

The model still does not have any sense of how the modern and ancient populations relate to each other, or how parameters change over time. To do this, we need to declare some Events:

demography.add_population_split(time=T_OOA, derived=["CEU", "CHB"], ancestral="OOA")
time=T_OOA, populations=["YRI", "OOA"], rate=25e-5
)
demography

Populations (6)
1CEUUtah Residents (CEPH) with Northern and Western European Ancestry29725.30.0040{}
2CHBHan Chinese in Beijing, China54090.30.00550{}
3OOABottleneck out-of-Africa population2100.008.5e+02{}
4AMHAnatomically modern humans12300.005.6e+03{}
5ANCAncestral equilibrium population7300.008.8e+03{}
Migration matrix
YRICEUCHBOOAAMHANC
YRI03e-051.9e-05000
CEU3e-0509.6e-05000
CHB1.9e-059.6e-050000
OOA000000
AMH000000
ANC000000
Events (4)
timetypeparameterseffect
848Population Splitderived=[CEU, CHB], ancestral=OOAMoves all lineages from derived populations 'CEU' and 'CHB' to the ancestral 'OOA' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.
848Symmetric migration rate changepopulations=[YRI, OOA], rate=0.00025Sets the symmetric migration rate between YRI and OOA to 0.00025 per generation
5600Population Splitderived=[YRI, OOA], ancestral=AMHMoves all lineages from derived populations 'YRI' and 'OOA' to the ancestral 'AMH' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.
8800Population Splitderived=[AMH], ancestral=ANCMoves all lineages from the 'AMH' derived population to the ancestral 'ANC' population. Also set 'AMH' to inactive, and all migration rates to and from the derived population to zero.

The first event we added was a Population split using the Demography.add_population_split() method. This has three effects:

1. At the time of the split, all lineages currently in the CEU and CHB populations are moved into ancestral OOA population (remember, this is backwards in time).

2. It marks the OOA population as inactive “before” the event, and sets its sampling time to the time of the population split (as can be seen in the Populations table above).

3. It marks the CEU and CHB populations as previously active “after” the event. Any subsequent attempts to move lineages into these populations will result in an error.

Tip

The effect column in the events output summarises the effect of each event.

The next event is a Migration rate change using the Demography.add_symmetric_migration_rate_change() method, that also happens at the Out-of-Africa split time. This sets the migration rate between the OOA and YRI populations.

Note

Any number of events can occur at the same time in msprime, and they are guaranteed to be executed in the order that they are specified.

The remaining events are then both population splits: 5600 generations ago, the OOA and YRI split from the AMH population, and finally 8800 generation ago the AMH population “split” from the ANC population. This last population split may seem unnecessary, since we could have achieved the same effect by a population size change, but from an interpretation perspective it is much better to explicitly model distinct phases as separate populations.

We can now see the model as a set of populations, their initial state and migration rates and a set of events. To help understand how this model evolves over time the DemographyDebugger is extremely useful. To get a debugger instance for a given demography use the Demography.debug() method. This will then display the state of the demography as a set of “epochs”, where we see the state of the populations and the migration matrix during different time periods:

Note

Only populations that are active during a particular epoch are displayed in the debugger output.

demography.debug()

Epoch[0]: [0, 848) generations
Populations (total=6 active=3)
startendgrowth_rateYRICEUCHB
YRI 12300.0 12300.0 003e-051.9e-05
CEU 29725.3 1000.0 0.0043e-0509.6e-05
CHB 54090.3 510.0 0.00551.9e-059.6e-050
Events @ generation 848
timetypeparameterseffect
848Population Splitderived=[CEU, CHB], ancestral=OOAMoves all lineages from derived populations 'CEU' and 'CHB' to the ancestral 'OOA' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.
848Symmetric migration rate changepopulations=[YRI, OOA], rate=0.00025Sets the symmetric migration rate between YRI and OOA to 0.00025 per generation
Epoch[1]: [848, 5.6e+03) generations
Populations (total=6 active=2)
startendgrowth_rateYRIOOA
YRI 12300.0 12300.0 000.00025
OOA 2100.0 2100.0 00.000250
Events @ generation 5.6e+03
timetypeparameterseffect
5600Population Splitderived=[YRI, OOA], ancestral=AMHMoves all lineages from derived populations 'YRI' and 'OOA' to the ancestral 'AMH' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.
Epoch[2]: [5.6e+03, 8.8e+03) generations
Populations (total=6 active=1)
startendgrowth_rate
AMH 12300.0 12300.0 0
Events @ generation 8.8e+03
timetypeparameterseffect
8800Population Splitderived=[AMH], ancestral=ANCMoves all lineages from the 'AMH' derived population to the ancestral 'ANC' population. Also set 'AMH' to inactive, and all migration rates to and from the derived population to zero.
Epoch[3]: [8.8e+03, inf) generations
Populations (total=6 active=1)
startendgrowth_rate
ANC 7300.0 7300.0 0

Many models of populations over time cannot be described as a simple tree. In this example we implement the American Admixture model from Browning et al. 2011.

Important

If you want to use this precise model in your analyses we strongly recommend using the stdpopsim HomSap/AmericanAdmixture_4B11 model.

We start again by assigning some values that we use repeatedly to variables, declaring some populations, and setting up the initial migration rates between the extant populations:

T_OOA = 920

demography = msprime.Demography()
name="EUR",
description="European",
initial_size=34039,
growth_rate=0.0038,
)
name="EAS",
description="East Asian",
initial_size=45852,
growth_rate=0.0048,
)
initial_size=54664,
growth_rate=0.05,
)
name="OOA",
description="Bottleneck out-of-Africa",
initial_size=1861,
)
name="AMH", description="Anatomically modern humans", initial_size=14474
)
name="ANC",
description="Ancestral equilibrium",
initial_size=7310,
)
demography.set_symmetric_migration_rate(["AFR", "EUR"], 2.5e-5)
demography.set_symmetric_migration_rate(["AFR", "EAS"], 0.78e-5)
demography.set_symmetric_migration_rate(["EUR", "EAS"], 3.11e-5)

demography

Populations (7)
0AFRAfrican14474.000{}
1EUREuropean34039.00.00380{}
2EASEast Asian45852.00.00480{}
4OOABottleneck out-of-Africa1861.000{}
5AMHAnatomically modern humans14474.000{}
6ANCAncestral equilibrium7310.000{}
Migration matrix
AFR02.5e-057.8e-060000
EUR2.5e-0503.11e-050000
EAS7.8e-063.11e-0500000
OOA0000000
AMH0000000
ANC0000000
Events (0)

Here we set up seven populations, their initial states and the migration rates between them. The next thing we do is add an admixture event 12 generations in the past:

demography.add_admixture(
12,
ancestral=["AFR", "EUR", "EAS"],
proportions=[1 / 6, 2 / 6, 3 / 6],
);


During a simulation, in an admixture event all of the lineages present in the derived population move to one of the ancestral populations with the appropriate probability. Forwards in time, this corresponds to the initial state of the admixed population being composed of these fractions of individuals from the ancestral populations.

We can then add population split events, similar to the out of Africa model. The DemographyDebugger output then shows how the populations change over time:

demography.add_population_split(T_OOA, derived=["EUR", "EAS"], ancestral="OOA")
time=T_OOA, populations=["AFR", "OOA"], rate=15e-5
)
demography.debug()

Epoch[0]: [0, 12) generations
Populations (total=7 active=4)
AFR 14474.0 14474.0 002.5e-057.8e-060
EUR 34039.0 32521.7 0.00382.5e-0503.11e-050
EAS 45852.0 43285.5 0.00487.8e-063.11e-0500
Events @ generation 12
timetypeparameterseffect
12Admixturederived=ADMIX ancestral=[AFR, EUR, EAS] proportions=[0.17, 0.33, 0.50]Moves all lineages from admixed population 'ADMIX' to ancestral populations. Lineages move to 'AFR' with proba 0.167; 'EUR' with proba 0.333; 'EAS' with proba 0.5. Set 'ADMIX' to inactive, and all migration rates to and from 'ADMIX' to zero.
Epoch[1]: [12, 920) generations
Populations (total=7 active=3)
startendgrowth_rateAFREUREAS
AFR 14474.0 14474.0 002.5e-057.8e-06
EUR 32521.7 1032.0 0.00382.5e-0503.11e-05
EAS 43285.5 554.0 0.00487.8e-063.11e-050
Events @ generation 920
timetypeparameterseffect
920Population Splitderived=[EUR, EAS], ancestral=OOAMoves all lineages from derived populations 'EUR' and 'EAS' to the ancestral 'OOA' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.
920Symmetric migration rate changepopulations=[AFR, OOA], rate=0.00015Sets the symmetric migration rate between AFR and OOA to 0.00015 per generation
Epoch[2]: [920, 2.04e+03) generations
Populations (total=7 active=2)
startendgrowth_rateAFROOA
AFR 14474.0 14474.0 000.00015
OOA 1861.0 1861.0 00.000150
Events @ generation 2.04e+03
timetypeparameterseffect
2040Population Splitderived=[OOA, AFR], ancestral=AMHMoves all lineages from derived populations 'OOA' and 'AFR' to the ancestral 'AMH' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.
Epoch[3]: [2.04e+03, 5.92e+03) generations
Populations (total=7 active=1)
startendgrowth_rate
AMH 14474.0 14474.0 0
Events @ generation 5.92e+03
timetypeparameterseffect
5920Population Splitderived=[AMH], ancestral=ANCMoves all lineages from the 'AMH' derived population to the ancestral 'ANC' population. Also set 'AMH' to inactive, and all migration rates to and from the derived population to zero.
Epoch[4]: [5.92e+03, inf) generations
Populations (total=7 active=1)
startendgrowth_rate
ANC 7310.0 7310.0 0

### Trunk population models¶

For many empirical models we want to sequentially merge derived populations into a “trunk” population. We illustrate this with a slightly different version of the example above, this time defining only four populations, as in this illustration of the model using the demography package (see also Figure 2B of the Gutenkunst et. al paper):

Warning

The version of this model in this documentation from 31 May 2016 to 29 May 2020 (on the stable branch) was incorrect. Specifically, it mistakenly allowed for migration to continue beyond the merger of the African and Eurasian bottleneck populations. This has now been fixed, but if you had copied this model from the tutorial for your own analyses, you should update your model code or use the implementation that has been verified in stdpopsim project. See here for more details on the faulty model and its likely effects on downstream analyses.

We proceed as before, calculating some quantities that we need for the model, and creating a Demography:

# Times are provided in years, so we convert into generations.
generation_time = 25
T_OOA = 21.2e3 / generation_time
T_AMH = 140e3 / generation_time
T_ANC = 220e3 / generation_time
# We need to work out the starting population sizes based on
# the growth rates provided for these two populations
r_CEU = 0.004
r_CHB = 0.0055
N_CEU = 1000 / math.exp(-r_CEU * T_OOA)
N_CHB = 510 / math.exp(-r_CHB * T_OOA)

demography = msprime.Demography()
# This is the "trunk" population that we merge other populations into
name="YRI",
description="Africa",
initial_size=12300,
# NOTE: we have to set this flag if we have a population that's
# ancestral in a population split *and* is contemporary population.
initially_active=True,
);
name="CEU",
description="European",
initial_size=N_CEU,
growth_rate=r_CEU,
)
name="CHB",
description="East Asian",
initial_size=N_CHB,
growth_rate=r_CHB,
)
name="OOA",
description="Bottleneck out-of-Africa population",
initial_size=2100,
)

# Set the migration rates between extant populations
demography.set_symmetric_migration_rate(["CEU", "CHB"], 9.6e-5)
demography.set_symmetric_migration_rate(["YRI", "CHB"], 1.9e-5)
demography.set_symmetric_migration_rate(["YRI", "CEU"], 3e-5)

time=T_OOA, derived=["CEU", "CHB"], ancestral="OOA"
)
time=T_OOA, populations=["YRI", "OOA"], rate=25e-5
)
time=T_AMH, derived=["OOA"], ancestral="YRI"
)
time=T_ANC, population="YRI", initial_size=7300
);


The first population we add is YRI, which represents both contemporary African population as well as ancient populations (which are highly misleading to label as YRI — see the note below). This is the “trunk” population that we merge other populations into over time, which we do via Population split events. Because these events are intended to conveniently model “population trees”, we assume that the ancestral population will be inactive before the split (looking backwards). However, in this case, we will be merging the OOA population in to the YRI, which we wish to also be active at the start of the simulation. To do this, we use the initially_active=True flag to make sure that the YRI population is active, and can therefore represent contemporary Africans as well as ancient populations.

We can see how the YRI population then remains active through all epochs from the debug output:

demography.debug()

Epoch[0]: [0, 848) generations
Populations (total=4 active=3)
startendgrowth_rateYRICEUCHB
YRI 12300.0 12300.0 003e-051.9e-05
CEU 29725.3 1000.0 0.0043e-0509.6e-05
CHB 54090.3 510.0 0.00551.9e-059.6e-050
Events @ generation 848
timetypeparameterseffect
848Population Splitderived=[CEU, CHB], ancestral=OOAMoves all lineages from derived populations 'CEU' and 'CHB' to the ancestral 'OOA' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.
848Symmetric migration rate changepopulations=[YRI, OOA], rate=0.00025Sets the symmetric migration rate between YRI and OOA to 0.00025 per generation
Epoch[1]: [848, 5.6e+03) generations
Populations (total=4 active=2)
startendgrowth_rateYRIOOA
YRI 12300.0 12300.0 000.00025
OOA 2100.0 2100.0 00.000250
Events @ generation 5.6e+03
timetypeparameterseffect
5600Population Splitderived=[OOA], ancestral=YRIMoves all lineages from the 'OOA' derived population to the ancestral 'YRI' population. Also set 'OOA' to inactive, and all migration rates to and from the derived population to zero.
Epoch[2]: [5.6e+03, 8.8e+03) generations
Populations (total=4 active=1)
startendgrowth_rate
YRI 12300.0 12300.0 0
Events @ generation 8.8e+03
timetypeparameterseffect
8800Population parameter changepopulation=YRI, initial_size=7300initial_size → 7.3e+03 for population YRI
Epoch[3]: [8.8e+03, inf) generations
Populations (total=4 active=1)
startendgrowth_rate
YRI 7300.0 7300.0 0

Note

We do not recommend using “trunk” populations in this way as it is misleading to think of contemporary populations like the Yoruba being in any way representative of the very ancient populations being modelled in the oldest epochs.

## Importing/exporting model definitions¶

### Demes¶

The Demes format is intended to become a standard for writing demographic models, to facilitate interchange between simulators, as well as between demographic inference software and simulators. Demes models are typically written as a YAML file (see the Demes tutorial for details), and loaded into Python using the demes library. The Demography.from_demes() method converts a demes.Graph object into a Demography object, and the Demography.to_demes() method does the opposite conversion.

Warning

Demes uses similar terminology to msprime, but uses different conventions. In particular, Demes uses forwards-time conventions as opposed to the backwards-time conventions used in msprime

Demes models have time in units of “time ago”, and when using generations as the time units, the times will directly correspond to msprime’s notion of time: time 0 is “now” and time values increase towards the past. However, in contrast to msprime, Demes uses forwards-time conventions. Noteworthy differences between Demes and msprime include:

• Each deme has its own list of epochs, whereas in msprime’s DemographyDebugger, epochs are time intervals that apply across the entire demography.

• Demes uses start_time and end_time to indicate time intervals, but the start_time has a value which is larger (more ancient) than the end_time.

• For migrations and pulses, source indicates the deme in which a migrant is born and the dest indicates the deme in which a migrant has offspring. This is opposite to the convention used in msprime.

Consider the following Demes model that describes two extant demes A and B, which have a common ancestor X.

# Comments start with a hash.
description:
Asymmetric migration between two extant demes.
time_units: generations
defaults:
epoch:
start_size: 5000
demes:
- name: X
epochs:
- end_time: 1000
- name: A
ancestors: [X]
- name: B
ancestors: [X]
epochs:
- start_size: 2000
end_time: 500
- start_size: 400
end_size: 10000
end_time: 0
migrations:
- source: A
dest: B
rate: 1e-4


When visualised using demesdraw, the demographic model looks like this (where arrows indicate movement of individuals as time proceeds from the past to the present):

Let’s load this model using demes.load(), convert it to a Demography object with Demography.from_demes() and then look at the DemographyDebugger output.

import demes

demography = msprime.Demography.from_demes(graph)
demography.debug()

Epoch[0]: [0, 500) generations
Populations (total=3 active=2)
startendgrowth_rateAB
A 5000.0 5000.0 000
B 10000.0 400.0 0.006440.00010
Events @ generation 500
timetypeparameterseffect
500Population parameter changepopulation=B, initial_size=2000, growth_rate=0initial_size → 2e+03 and growth_rate → 0 for population B
Epoch[1]: [500, 1e+03) generations
Populations (total=3 active=2)
startendgrowth_rateAB
A 5000.0 5000.0 000
B 2000.0 2000.0 00.00010
Events @ generation 1e+03
timetypeparameterseffect
1000Population parameter changepopulation=X, initial_size=5000initial_size → 5e+03 for population X
1000Population Splitderived=[A, B], ancestral=XMoves all lineages from derived populations 'A' and 'B' to the ancestral 'X' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.
1000Migration rate changesource=B, dest=A, rate=0Backwards-time migration rate from B to A → 0
Epoch[2]: [1e+03, inf) generations
Populations (total=3 active=1)
startendgrowth_rate
X 5000.0 5000.0 0

The demography debugger output shows three epochs. Epoch[0] corresponds to the time interval [0, 500) generations ago, in which deme B has an initial size of 10000 at time 0 and backwards in time, shrinks in size to 400 individuals at time 500. Lineages sampled from deme B migrate into deme A at a rate of 0.0001 per generation. In Epoch[1], [500, 1000) generations ago, both A and B have constant sizes, and migration is the same as in Epoch[0]. At the end of Epoch[1], at generation 1000, lineages in demes A and B are moved into deme X and migrations are turned off. Finally, Epoch[2] shows that only deme X remains.

A Demography object can also be exported to a demes.Graph, using the Demography.to_demes() method. We’ll use this to make a visual representation of a modified Demography.island_model() using demesdraw.

import demesdraw
import matplotlib.pyplot as plt

demography = msprime.Demography.island_model([300] * 3, 1e-5)
graph = msprime.Demography.to_demes(demography)
fig, ax = plt.subplots()  # use plt.rcParams["figure.figsize"]
demesdraw.tubes(graph, ax=ax, seed=1)
plt.show()


The demes.Graph can also be written to a YAML file, e.g. for use with other simulation software.

demes.dump(graph, "/tmp/my-island-model.yaml")


### Species trees¶

Species trees hold information about the sequence and the times at which species diverged from each other, which we model via a series of Population split. The Demography.from_species_tree() method parses a species tree in Newick format and returns a Demography object. These species trees do not contain any information on the sizes of the relevant populations, however, and so these must be specified separately using the initial_size argument (see Initial size). When species trees are estimated with a program like StarBEAST they can contain estimates on the population sizes of extant and ancestral species. The Demography.from_starbeast() method parses species trees estimated with StarBEAST and sets the population sizes accordingly.

Important

When a species tree has branch lengths in units of years or millions of years rather than generations (which is common), a generation time in years must be specified.

Species trees must be encoded in Newick format, with named leaves and branch lengths. Consider the following species tree of four primates, for example:

(((human:5.6,chimpanzee:5.6):3.0,gorilla:8.6):9.4,orangutan:18.0)


When visualized in software like FigTree, the tree looks like this:

The numbers written on branches indicate the lengths of these branches, which in this case is expressed in millions of years. We can then convert this species tree into a Demography using the Demography.from_species_tree():

import msprime

demography = msprime.Demography.from_species_tree(
"(((human:5.6,chimpanzee:5.6):3.0,gorilla:8.6):9.4,orangutan:18.0)",
initial_size=10_000,
time_units="myr",
generation_time=28)
demography.debug()

Epoch[0]: [0, 2e+05) generations
Populations (total=7 active=4)
startendgrowth_ratehumanchimpanzeegorillaorangutan
human 10000.0 10000.0 00000
chimpanzee 10000.0 10000.0 00000
gorilla 10000.0 10000.0 00000
orangutan 10000.0 10000.0 00000
Events @ generation 2e+05
timetypeparameterseffect
2e+05Population Splitderived=[human, chimpanzee], ancestral=pop_4Moves all lineages from derived populations 'human' and 'chimpanzee' to the ancestral 'pop_4' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.
Epoch[1]: [2e+05, 3.07e+05) generations
Populations (total=7 active=3)
startendgrowth_rategorillaorangutanpop_4
gorilla 10000.0 10000.0 0000
orangutan 10000.0 10000.0 0000
pop_4 10000.0 10000.0 0000
Events @ generation 3.07e+05
timetypeparameterseffect
3.071e+05Population Splitderived=[pop_4, gorilla], ancestral=pop_5Moves all lineages from derived populations 'pop_4' and 'gorilla' to the ancestral 'pop_5' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.
Epoch[2]: [3.07e+05, 6.43e+05) generations
Populations (total=7 active=2)
startendgrowth_rateorangutanpop_5
orangutan 10000.0 10000.0 000
pop_5 10000.0 10000.0 000
Events @ generation 6.43e+05
timetypeparameterseffect
6.429e+05Population Splitderived=[pop_5, orangutan], ancestral=pop_6Moves all lineages from derived populations 'pop_5' and 'orangutan' to the ancestral 'pop_6' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.
Epoch[3]: [6.43e+05, inf) generations
Populations (total=7 active=1)
startendgrowth_rate
pop_6 10000.0 10000.0 0

Because the species tree does not contain complete information about the demographic model, we must provide some extra information. The initial_size parameter lets us specify the size of each of the populations; here we give all the populations a fixed size of 10,000 (there is much more flexibility here, however). Because the branch lengths in the species tree are given in millions of years, we also need provide a time_units parameter and a generation_time.

The epoch boundaries 200000, 307142.9, and 642857.1 correspond to the species divergence times 5.6, 8.6, and 18.0 after converting the branch length units of the species tree from millions of years to generations with the specified generation time of 28 years.

Running the simulation is then straightforward:

ts = msprime.sim_ancestry(
{"human": 2, "orangutan": 2}, demography=demography, random_seed=2)
ts

Tree Sequence
Trees1
Sequence Length1.0
Sample Nodes8
Total Size4.1 KiB
Edges 14 396 Bytes
Individuals 4 76 Bytes
Migrations 0 4 Bytes
Mutations 0 8 Bytes
Nodes 15 364 Bytes
Populations 7 449 Bytes
Provenances 1 2.8 KiB
Sites 0 8 Bytes

Note that the names of the populations are recorded in the population metadata:

for population in ts.populations():

{'description': '', 'name': 'human'}
{'description': '', 'name': 'chimpanzee'}
{'description': '', 'name': 'gorilla'}
{'description': '', 'name': 'orangutan'}
{'description': '', 'name': 'pop_4'}
{'description': '', 'name': 'pop_5'}
{'description': '', 'name': 'pop_6'}


## Numerical utilities¶

### Population sizes over time¶

When working with exponential growth models it is useful to be able to compute the sizes of the populations at given times. The DemographyDebugger.population_size_trajectory() gives a straightforward way of doing this. For example, here we create two exponentially growing populations and show how their sizes change over time:

demography = msprime.Demography()

debug = demography.debug()

t = np.linspace(0, 100, num=20)
S = debug.population_size_trajectory(t)
plt.plot(t, S, label=["A", "B"])
plt.xlabel("Time ago")
plt.ylabel("Population size")
plt.legend();


### Possible locations of lineages¶

When debugging a demographic model it can be helpful to track the possible location of ancestral lineages over time, as determined by the model structure and the initial composition of the sample. For example, here we create a simple model involving a population split, and then use the possible_lineage_locations() method:

demography = msprime.Demography()

debug = demography.debug()
debug.possible_lineage_locations()

{(0.0, 500.0): array([ True,  True, False]),
(500.0, inf): array([False, False,  True])}


The method returns a mapping of time intervals to numpy boolean arrays. The keys are (start, end] intervals during which the possible location of samples are the same, and the values are arrays telling us whether a given population can have lineages present during this epoch or not. By default, possible_lineage_locations() assumes that we draw samples from all populations that have a default sampling time of zero, and so for the initial period between time 0 and 500 we can have samples in population “A” and “B”, but not in “C”. The next time slice, from 500 generations ago onwards, we can only have lineages in population “C” (because after the population split all lineages will be in the ancestral population).

We can also specify that we are interested in sampling from specific populations at particular times. Suppose we take samples from “A” at time 0 and from “B” at time 20:

debug.possible_lineage_locations([
msprime.SampleSet(1, population="A", time=0),
msprime.SampleSet(1, population="B", time=20),
])

{(0.0, 20): array([ True, False, False]),
(20, 500.0): array([ True,  True, False]),
(500.0, inf): array([False, False,  True])}


Now, we can have lineages in “A” from 0 to 20 generations ago, but not in “B”.

Note

The number of samples we supply per population isn’t important (once it’s greater than 0).

The effects of migration are also taken into account by this method. For example, consider a different model in which we have two initially isolated populations, with migration between then introduced 50 generations ago:

demography = msprime.Demography()
time=50, populations=["A", "B"], rate=0.01)

debug = demography.debug()
debug.possible_lineage_locations(["A"])

{(0.0, 50.0): array([ True, False]), (50.0, inf): array([ True,  True])}


Because we only sampled lineages in “A”, we cannot have any lineages in “B” until after the migration between the the two after 50 generations ago.

### Lineage probabilities¶

The lineage_probabilities() method computes the probability that a lineage sampled at a given time in a each population is present in any other population at a set of times in the past. Consider a simple population split model:

demography = msprime.Demography()

debug = demography.debug()
debug.lineage_probabilities([250, 750])

array([[[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]],

[[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.]]])


Here we ask to compute the matrix of lineage probabilities 250 and 750 generations ago, before and after the population split. Before the split, any lineages sampled in a population can only still be in that population so the diagonal of the matrix is 1. After the split, all lineages must be in population “C”, and so the corresponding column is 1.

Migration is taken into account in these calculations. If we add some migration between “A” and “B”, we can see how lineages equilibrate between the two:

demography = msprime.Demography()
demography.set_migration_rate(source="A", dest="B", rate=0.005)
demography.set_migration_rate(source="B", dest="A", rate=0.02)

debug = demography.debug()
T = np.linspace(0, 500, 51)
X = debug.lineage_probabilities(T)

for j, pop in enumerate(demography.populations):
plt.subplot(1, 2, j + 1)
for k, dest_pop in enumerate(demography.populations):
plt.plot(T, X[:, j, k], label=dest_pop.name)
plt.legend()
plt.xlabel("Time ago")
plt.title(f"Sample from {pop.name}")
if j == 0:
plt.ylabel("Probability of being in population");


Since migration is asymmetric, lineages are more likely to be found in population “A”. We’ve set the migration rate for lineages to be higher from “B” to “A”, so lineages tend to spend more time in “A”. Since the two populations have the same size, this corresponds (in forwards time) to a higher flux of migrants from “A” to “B”, as would happen if mean fecundity in “A” is higher than in “B”.

We can also use this to see how lineages spread out over discrete space in a 1D stepping_stone_model():

N = 32
demography = msprime.Demography().stepping_stone_model(
initial_size=[100] * N, migration_rate=0.01)

debug = demography.debug()
T = [10, 100, 1000]
X = debug.lineage_probabilities(T)

for t, x in zip(T, X):
plt.plot(x[N // 2], label=f"{t} generations ago")
plt.legend()
plt.xlabel("Population ID");
plt.ylabel("Probability of being in population");


Here we create a 32 population stepping stone model, and compute the matrix of lineage probabilities at 10, 100 and 1000 generations ago. We then plot the probability of lineages sampled in the middle population being present in the linearly-connected populations. As we can see, 10 generations ago, the probability of lineages being present falls off quickly with distance, but as we go further back into the past the possible location of lineages becomes more evenly distributed in space.

### Coalescence rates and mean times¶

Lineage probabilities tell us where lineages might possibly be, which uses only information about migration rates and population splits. Using information about population sizes, we can also compute the coalescence rates, using DemographyDebugger.coalescence_rate_trajectory(). For instance, here are the three pairwise coalescence rate trajectories in the two-population model with migration above:

demography = msprime.Demography()
demography.set_migration_rate(source="A", dest="B", rate=0.005)
demography.set_migration_rate(source="B", dest="A", rate=0.02)

debug = demography.debug()
T = np.linspace(0, 500, 51)
RAA, _ = debug.coalescence_rate_trajectory(T, {"A": 2})
RBB, _ = debug.coalescence_rate_trajectory(T, {"B": 2})
RAB, _ = debug.coalescence_rate_trajectory(T, {"A": 1, "B": 1})

plt.plot(T, RAA, label="two A samples")
plt.plot(T, RBB, label="two B samples")
plt.plot(T, RAB, label="one of each")
plt.legend()
plt.xlabel("time ago")
plt.ylabel("coalecence rate");


The coalescence rate of lineages sampled from different populations begins at zero, and increases as the chance of them migrating to the same population increases. Interestingly, the coalescence rate of two lineages in population “B” is nonmonotonic, due to the asymmetric migration rates.

Many properties of genomes (e.g., mean genetic diversity) are determined by mean coalescence times, which we can compute with DemographyDebugger.mean_coalescence_time(). For instance, we might want to know how the three pairwise mean coalescence times change as we make the migration rate less asymmetric:

def get_mean_coaltimes(m):
demography.set_migration_rate(source="A", dest="B", rate=m)
debug = demography.debug()
TAA = debug.mean_coalescence_time({"A": 2})
TBB = debug.mean_coalescence_time({"B": 2})
TAB = debug.mean_coalescence_time({"A": 1, "B": 1})
return [TAA, TBB, TAB]

M = np.linspace(0.001, 0.02, 11)
T = np.zeros((len(M), 3))
for j, m in enumerate(M):
T[j, :] = get_mean_coaltimes(m)

plt.plot(M, T[:,0], label="two A samples")
plt.plot(M, T[:,1], label="two B samples")
plt.plot(M, T[:,2], label="one of each")
plt.legend()
plt.xlabel("A→B migration rate")
plt.ylabel("coalecence rate");


From this, we can see that genetic diversity increases as the migration rate increases, that divergence between the populations is always greater than diversity within either, and that diversity within “B” goes from a level close to that found between populations to that within “A”, as the “A→B” migration rate approaches the “B→A” rate.

### Inverse instantaneous coalescence rates¶

Coalescence rates are analogous to effective population sizes, in that if $$r(t)$$ is the coalescence rate at time $$t$$ ago between two lineages from a population in any population model, then a single population of changing size, with $$2 N(t) = 1/r(t)$$ genomes at time $$t$$ ago, has exactly the same distribution of coalescence times. (The factor of two assumes the organisms are diploid, and this assumes a continuous-time model.) One reason this is helpful is because we often fit simplified demographic models to more complex reality, and we might expect models that fit a single population-size-over-time to infer something closer to the inverse coalescence rate. For more discussion, see Chikhi et al., 2018 and Adrion et al., 2020.

For instance, let’s take the Gutenkunst et al 2009 “Out of Africa” model we used above in the Population tree section, and compare inverse coalescence rates to actual population sizes. To follow along, first you should define demography as in the section above. Next we make a grid of times, T, spanning the last 10,000 generations, and calculate coalescence rates at these times:

# first define Gutenkunst et al model as above
debug = demography.debug()

T = np.concatenate([
np.linspace(0, 1000, 2001),
np.linspace(1000, 1e4, 401)[1:]
])

R = np.zeros((len(T), 3))
R[:,0], _ = debug.coalescence_rate_trajectory(T, {"YRI": 2})
R[:,1], _ = debug.coalescence_rate_trajectory(T, {"CEU": 2})
R[:,2], _ = debug.coalescence_rate_trajectory(T, {"CHB": 2})


The time values look a bit weird because when at first we set them to have equal spacing across the whole time period we got a warning about possible numerical inaccuracy (and inaccurate trajectories). This is because DemographyDebugger.coalescence_rate_trajectory() uses the provided time steps to calculate a discrete approximation to the trajectory, and for this to be a good approximation we need to take small time steps when the population is changing in size rapidly. In this model, the rapid growth occurs recently, so we need a finer grid of time steps in the last 1,000 generations.

Next we compute the “census sizes”, and plot these along with the inverse instantaneous coalescence rates. Since DemographyDebugger.population_size_trajectory() gives us population sizes for each of the six populations (three modern and three ancient), we need to do some work to copy the right population sizes over (e.g., prior to the most recent split we should use sizes for the shared “OOA” population for both the non-African populations).

N = debug.population_size_trajectory(T)
# CEU-CHB merge
which_T = (T >= debug.epoch_start_time[1])
for j in (1, 2):
N[which_T, j] = N[which_T, 3]
# OOA-Afr merge
which_T = (T >= debug.epoch_start_time[2])
for j in (0, 1, 2):
N[which_T, j] = N[which_T, 4]
which_T = (T >= debug.epoch_start_time[3])
for j in (0, 1, 2):
N[which_T, j] = N[which_T, 5]

plt.plot(T, 1/(2*R), label=[demography.populations[j].name for j in range(3)])
plt.plot(T, N[:,:3], ":", label=["actual population sizes", None, None])
plt.legend()
plt.xlabel("time ago (generations)")
plt.ylabel("population size");


We can see that the inverse instantaneous coalescence rates do not respond instantly to population size changes, when there’s more than one population with migration between them. (If we had a single population, the inverse coalescence rates would exactly match the population size.) So, if the true history matched this model and we inferred the population-size-through-time using only “CEU” samples, we might expect the inferred history to have a gradual increase back towards the out-of-Africa bottleneck, instead of the actual sudden change at the bottleneck, which is due to subsequent population structure.