Demography

Georgia Tsambos

By default, msprime.sim_ancestry() simulates samples from a single population of a constant size, which isn’t particularly exciting! One of the strengths of msprime is that it can be used to specify quite complicated models of demography and population history with a simple Python API.

import msprime
import numpy as np
from IPython.display import SVG

Population structure

msprime supports simulation from multiple discrete populations, each of which is initialized in msprime.Demography via the msprime.Demography.add_population() method. For each population, you can specify a sample size, an effective population size at time = 0, an exponential growth rate and a name.

Suppose we wanted to simulate three sequences each from two populations with a constant effective population size of 500.

dem = msprime.Demography()
dem.add_population(name="A", description="Plotted in red.", initial_size=500)
dem.add_population(name="B", description="Plotted in blue.",initial_size=500)
dem  
Populations (2)
idnamedescriptioninitial_sizegrowth_ratedefault_sampling_timeextra_metadata
0APlotted in red.500.000{}
1BPlotted in blue.500.000{}
Migration matrix (all zero)
Events (0)

The msprime.Demography object that we’ve just created can be passed to msprime.sim_ancestry() via the demography argument. We’ll also use the samples argument to specify how many sample inividuals we wish to draw from each of these populations. In this case, we’ll simulate three from each. Also, note that we no longer need to specify population_size in msprime.sim_ancestry() as we have provided a separate size for each population.

# ts = msprime.sim_ancestry(
#   samples={"A" : 3, "B" : 3}, 
#   demography=dem,
#   random_seed=12,
#   sequence_length=1000, 
#   recombination_rate=1e-4
#   )

However, this simulation will run forever unless we also specify some migration between the groups! To understand why, recall that msprime is a coalescent-based simulator. The simulation will run backwards-in-time, simulating until all samples have coalesced to a single common ancestor at each genomic location. However, with no migration between our two populations, samples in one population will never coalesce with samples in another population. To fix this, let’s add some migration events to the specific demographic history.

Note

In this default setup, the samples will be drawn from time = 0, but see here if you wish to change this.

Note

Population structure in msprime closely follows the model used in the ms simulator. Unlike ms however, all times and rates are specified in generations and all population sizes are absolute (that is, not multiples of population_size).

Migrations

With msprime, you can specify continual rates of migrations between populations, as well as admixture events, divergences and one-off mass migrations.

Constant migration

constant-migration-diagram

Migration rates between the populations can be thought as the elements of an N by N numpy array, and are passed to our msprime.Demography object individually via the msprime.Demography.set_migration_rate() method. This allows us to specify the expected number of migrants moving from population dest to population source per generation, divided by the size of population source. When this rate is small (close to 0), it is approximately equal to the fraction of population source that consists of new migrants from population dest in each generation. And if both of your populations are exchanging migrants at the same rate, you can save yourself some typing by specifying them with a single msprime.Demography.set_symmetric_migration_rate() call.

Note

The reason for this (perhaps) counter-intuitive specification of source and dest is that msprime simulates backwards-in-time. See this for further explanation.

For instance, the following migration matrix specifies that in each generation, approximately 5% of population 0 consists of migrants from population 1, and approximately 2% of population 1 consists of migrants from population 0.

dem = msprime.Demography()
dem.add_population(name="A", description="Plotted in red.", initial_size=500)
dem.add_population(name="B", description="Plotted in blue.",initial_size=500)

# Set migration rates.
dem.set_migration_rate(source=0, dest=1, rate=0.05)
dem.set_migration_rate(source=1, dest=0, rate=0.02)

# Simulate.
ts = msprime.sim_ancestry(
  samples={"A" : 1, "B" : 1},
  demography=dem,
  sequence_length=1000,
  random_seed=141,
  recombination_rate=1e-7)
ts
Tree Sequence
Trees1
Sequence Length1000.0
Sample Nodes4
Total Size2.1 KiB
MetadataNo Metadata
Table Rows Size Has Metadata
Edges 6 172 Bytes
Individuals 2 44 Bytes
Migrations 0 4 Bytes
Mutations 0 8 Bytes
Nodes 7 172 Bytes
Populations 2 276 Bytes
Provenances 1 1.4 KiB
Sites 0 8 Bytes

One consequence of specifying msprime.Population objects is that each of the simulated nodes will now belong to one of our specified populations:

ts.tables.nodes
idflagspopulationindividualtimemetadata
01000.00000000b''
11000.00000000b''
21110.00000000b''
31110.00000000b''
400-1161.20712736b''
501-1336.10713452b''
601-1590.71203375b''

Notice that the population column of the node table now contains values of 0 and 1. If you are working in a Jupyter notebook, you can draw the tree sequence with nodes coloured by population label using SVG:

colour_map = {0:"red", 1:"blue"}
node_colours = {u.id: colour_map[u.population] for u in ts.nodes()}
for tree in ts.trees():
    print("Tree on interval:", tree.interval)
    # The code below will only work in a Jupyter notebook with SVG output enabled.
    display(SVG(tree.draw(node_colours=node_colours)))
Tree on interval: Interval(left=0.0, right=1000.0)
_images/demography_11_1.svg

More coalescences are happening in population 1 than population 0. This makes sense given that population 1 is specifying more migrants to population 0 than vice versa.

Changing migration rates

We can change any of the migration rates at any time in the simulation. To do this, we just need to use the msprime.Demography.add_migration_rate_change() method on our msprime.Demography object, specifying the populations whose migration rates are to be changed, the time of the change and the new migration rate.

For instance, say we wanted to specify that in each generation prior to time = 100, 1% of population 0 consisted of migrants from population 1.

dem.add_migration_rate_change(time=100, rate=0.01, source=0, dest=1)
dem
Populations (2)
idnamedescriptioninitial_sizegrowth_ratedefault_sampling_timeextra_metadata
0APlotted in red.500.000{}
1BPlotted in blue.500.000{}
Migration matrix
AB
A00.05
B0.020
Events (1)
timetypeparameterseffect
100Migration rate changesource=0, dest=1, rate=0.01Backwards-time migration rate from 0 to 1 → 0.01

The output above shows that we have successfully added our first demographic event to our msprime.Demography object, a migration rate change. We are now ready to simulate:

ts = msprime.sim_ancestry(
  samples={"A" : 1, "B" : 1},
  demography=dem,
  sequence_length=1000,
  random_seed=63461,
  recombination_rate=1e-7)
ts
Tree Sequence
Trees1
Sequence Length1000.0
Sample Nodes4
Total Size2.2 KiB
MetadataNo Metadata
Table Rows Size Has Metadata
Edges 6 172 Bytes
Individuals 2 44 Bytes
Migrations 0 4 Bytes
Mutations 0 8 Bytes
Nodes 7 172 Bytes
Populations 2 276 Bytes
Provenances 1 1.5 KiB
Sites 0 8 Bytes

Admixture and population splits

Admixture

It is also easy to specify admixture and divergence events with msprime. Suppose we wanted to specify our demography so that 50 generations ago, 30% of population 0 was a migrant from population 1:

Todo

make a diagram showing this

We can do this by using the msprime.Demography.add_admixture() method on our demography object. We must supply a list of ancestral populations participating in the admixture, and a list of the same size specifying the proportions of migrants from each of these populations.

dem = msprime.Demography()
dem.add_population(name="AncestralPop0", description="Plotted in red.", initial_size=500, growth_rate=0)
dem.add_population(name="AncestralPop1", description="Plotted in blue.", initial_size=500, growth_rate=0)
dem.add_population(name="AdmixedPop", description="Plotted in green.", initial_size=500)
dem.set_migration_rate(source=0, dest=1, rate=0.05)
dem.set_migration_rate(source=1, dest=0, rate=0.02)

# Specify admixture event.
dem.add_admixture(time=50, derived="AdmixedPop", ancestral=["AncestralPop0", "AncestralPop1"], proportions=[0.3, 0.7]);

This simulates a sample where all nodes (ancestral haplotypes) are from the admixed population up until the time of the admixture event, and before this, all nodes are from one of the ancestral populations.

ts = msprime.sim_ancestry(
  samples={"AncestralPop0" : 0, "AncestralPop1" : 0, "AdmixedPop" : 6},
  demography=dem,
  sequence_length=1000,
  random_seed=63,
  recombination_rate=1e-7)

print("Populations of nodes from time < 50:")
print([u.population for u in ts.nodes() if u.time < 50])
print("Populations of nodes from time >= 50:")
print([u.population for u in ts.nodes() if u.time >= 50])
Populations of nodes from time < 50:
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
Populations of nodes from time >= 50:
[1, 1, 1, 1, 0, 1, 1, 1, 0, 1]

Admixtures and population splits are special types of demographic events that affect the state of some of the defined populations, in addition to moving lineages between populations. The output below shows that by adding the admixture event, we are triggering a change in the state of AdmixedPop at time = 50; the population is active at the start of the simulation, but becomes inactive for all steps of the simulation beyond time 50.

dem
Populations (3)
idnamedescriptioninitial_sizegrowth_ratedefault_sampling_timeextra_metadata
0AncestralPop0Plotted in red.500.000{}
1AncestralPop1Plotted in blue.500.000{}
2AdmixedPopPlotted in green.500.000{}
Migration matrix
AncestralPop0AncestralPop1AdmixedPop
AncestralPop000.050
AncestralPop10.0200
AdmixedPop000
Events (1)
timetypeparameterseffect
50Admixturederived=AdmixedPop ancestral=[AncestralPop0, AncestralPop1] proportions=[0.30, 0.70]Moves all lineages from admixed population 'AdmixedPop' to ancestral populations. Lineages move to 'AncestralPop0' with proba 0.3; 'AncestralPop1' with proba 0.7. Set 'AdmixedPop' to inactive, and all migration rates to and from 'AdmixedPop' to zero.

This means that, for example, adding any demographic events that affect AdmixedPop beyond this time will produce an error:

dem.add_migration_rate_change(time=80, rate=0.01, source="AncestralPop0", dest="AdmixedPop");
# ts = msprime.sim_ancestry(
#   samples={"AncestralPop0" : 0, "AncestralPop1" : 0, "AdmixedPop" : 6},
#   demography=dem,
#   sequence_length=1000,
#   random_seed=63,
#   recombination_rate=1e-7)

To read more about the way that msprime treats the life cycle of populations, see here.

Population splits

We can also simulate population divergences with msprime. Suppose we want to model a situation where all lineages from multiple populations are migrants from a single ancestral population at a single point in time.

Todo

make a diagram showing this

We’ll specify this with the msprime.Demography.add_population_split() method. We need to know the time of the event, and the IDs or labels of the derived and ancestral populations participating in the divergence event. Notice that in this case, we do not need to provide proportions as we did in the case of admixture. This makes sense when you think about the fact that msprime simulates backwards-in-time: all lineages in all of the derived populations originate from the ancestral population in a split event. Any differences in ‘quantities’ of migrants must be modelled by sizes of the derived populations at the time of the split.

dem = msprime.Demography()
dem.add_population(name="A", description="Plotted in red.", initial_size=500)
dem.add_population(name="B", description="Plotted in blue.",initial_size=500)
dem.add_population(name="AncestralPopulation", description="Plotted in green.", initial_size=500)

# Add the population split.
dem.add_population_split(time=100, derived=["A","B"], ancestral="AncestralPopulation")
dem
Populations (3)
idnamedescriptioninitial_sizegrowth_ratedefault_sampling_timeextra_metadata
0APlotted in red.500.000{}
1BPlotted in blue.500.000{}
2AncestralPopulationPlotted in green.500.001e+02{}
Migration matrix (all zero)
Events (1)
timetypeparameterseffect
100Population Splitderived=[A, B], ancestral=AncestralPopulationMoves all lineages from derived populations 'A' and 'B' to the ancestral 'AncestralPopulation' population. Also set the derived populations to inactive, and all migration rates to and from the derived populations to zero.

Population splits will also modify the state of each of the derived populations, changing them from active to inactive at the time of the split.

ts = msprime.sim_ancestry(
  samples={"A" : 3, "B" : 3, "AncestralPopulation" : 0},
  demography=dem,
  sequence_length=1000,
  random_seed=63,
  recombination_rate=1e-7)

print("Populations of nodes from time < 100:")
print([u.population for u in ts.nodes() if u.time < 100])
print("Populations of nodes from time >= 100:")
print([u.population for u in ts.nodes() if u.time >= 100])
Populations of nodes from time < 100:
[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0]
Populations of nodes from time >= 100:
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]

Mass migrations

If you used any of versions of msprime prior to 1.0, you may be familiar with mass migration events. These are used to specify one-off events in which some fraction of a population moves into another population, and can be supplied to a msprime.Demography object using the msprime.Demography.add_mass_migration() method.

Warning

Unlike msprime.Demography.add_admixture() and msprime.Demography.add_population_split() mentioned above, msprime.Demography.add_mass_migration() does not change any population states or migration rates. We strongly recommend that you use the more specialised methods to model admixture and divergence events – it is both simpler and safer!

You’ll need to provide the time of the mass migration in generations, the ID of the source and destination populations, and a migration proportion.

For example, the following specifies that 50 generations ago, 30% of population 0 was a migrant from population 1.

dem = msprime.Demography()
dem.add_population(name="A", description="Plotted in red.", initial_size=500)
dem.add_population(name="B", description="Plotted in blue.",initial_size=500)
dem.set_migration_rate(source=0, dest=1, rate=0.05)
dem.set_migration_rate(source=1, dest=0, rate=0.02)

# Add the mass migration.
dem.add_mass_migration(time=50, source=0, dest=1, proportion=0.3)
dem
Populations (2)
idnamedescriptioninitial_sizegrowth_ratedefault_sampling_timeextra_metadata
0APlotted in red.500.000{}
1BPlotted in blue.500.000{}
Migration matrix
AB
A00.05
B0.020
Events (1)
timetypeparameterseffect
50Mass Migrationsource=0, dest=1, proportion=0.3Lineages currently in population 0 move to 1 with probability 0.3 (equivalent to individuals migrating from 1 to 0 forwards in time)

Note that these are viewed as backwards-in-time events, so source is the population that receives migrants from dest.

ts = msprime.sim_ancestry(
  samples={"A" : 3, "B" : 3},
  demography=dem,
  sequence_length=1000,
  random_seed=63461,
  recombination_rate=1e-7)

Changing population sizes or growth rates

We may wish to specify changes to rates of population growth, or sudden changes in population size at a particular time. Both of these can be specified by applying the msprime.Demography.add_population_parameters_change() method to our msprime.Demography object.

dem = msprime.Demography()
dem.add_population(name="A", description="Plotted in red.", initial_size=500)
dem.add_population(name="B", description="Plotted in blue.",initial_size=500)
dem.set_migration_rate(source=0, dest=1, rate=0.05)
dem.set_migration_rate(source=1, dest=0, rate=0.02)

# Bottleneck in Population 0 between 50 - 150 generations ago.
dem.add_population_parameters_change(time=50, initial_size=250, population=0)
dem.add_population_parameters_change(time=150, initial_size=500, population=0)

# Exponential growth in Population 1 starting 50 generations ago.
dem.add_population_parameters_change(time=100, growth_rate=0.01, population=1)

# Sort events, since we've added some out of time order.
dem.sort_events()

# Simulate.
ts = msprime.sim_ancestry(samples={"A" : 3, "B" : 3}, demography=dem, sequence_length=1000, random_seed=63461, recombination_rate=1e-7)

Note that because msprime simulates backwards-in-time, parameter changes must be interpreted backwards-in-time as well. For instance, the pop1_growth event in the example above specifies continual growth in the early history of population 1 up until 100 generations in the past.

Census events

There may be situations where you are particularly interested in the chromosomes that are ancestral to your simulated sample at a particular time. For instance, you might want to know how many different lineages are ancestral to your sample at some past time. Perhaps you are interested to know which populations these ancestors belonged to (ie. local ancestry). In both cases, msprime can help you by allowing you to add a census to your simulation at a particular time. This is done with the msprime.Demography.add_census() method:

dem = msprime.Demography()
dem.add_population(name="A", initial_size=500)

# Add a census at time 350.
dem.add_census(time=350)

# Simulate.
ts = msprime.sim_ancestry(
  samples={"A" : 2},
  demography=dem,
  sequence_length=1000,
  random_seed=112,
  recombination_rate=1e-7)

The effect of the census is to add nodes onto each branch of the tree sequence at the census time.

print("IDs of census nodes:")
print([u.id for u in ts.nodes() if u.flags==msprime.NODE_IS_CEN_EVENT])
SVG(ts.draw_svg())
IDs of census nodes:
[5, 6, 7]
_images/demography_37_1.svg

By extracting these node IDs, you can perform further analyses using the ancestral haplotypes. See here for a slightly more involved example of this.

Debugging demography

As we’ve seen, it’s pretty easy to make mistakes when specifying demography!

To help you spot these, msprime provides a debugger that prints out your population history in a more human-readable form. It’s good to get into the habit of running the msprime.DemographyDebugger before running your simulations.

my_history = msprime.DemographyDebugger(demography=dem)
my_history
Epoch[0]: [0, 350) generations
Populations (total=1 active=1)
startendgrowth_rate
A 500.0 500.0 0
Events @ generation 350
timetypeparameterseffect
350CensusInsert census nodes to record the location of all lineages
Epoch[1]: [350, inf) generations
Populations (total=1 active=1)
startendgrowth_rate
A 500.0 500.0 0