Msprime 1.3.0!
Msprime 1.3.0
Msprime 1.3.0 introduces a new flexible way to store information on the ancestors associated with a custom subset of
all possible events that might happen in the history of a sample. This means we can now simulate tree sequences
with varying degrees of detail ranging from the default fully simplified series of local trees
all the way to recording a fully detailed history as currently enabled by the record_full_arg
flag.
This new feature has been implemented for all ancestry models implemented in msprime.
Lastly, this new release enables the simulation of microsatellite repeats. This last feature builds on the existing
general MatrixMutationModel
class to represent the evolution of alleles containing a certain number of repeats.
Brief summary of changes
1. Additional nodes
In msprime we usually want to simulate the coalescent with recombination and represent the output as simply as possible. As a result, we don’t store individual recombination events, but rather their effects on the output tree sequence. We also do not explicitly store common ancestor events that do not result in marginal coalescences. For some purposes, however, we want record information on other events of interest, not just the mimimal representation of its outcome.
The newly implemented additional_nodes
API serves this exact purpose. This allows us to record the nodes
associated with a custom subset of all events we might observe in the history of a sample. Besides sampled
genomes and coalescence events, nodes can now also represent common ancestors, as well as ancestors associated
with recombination, gene conversion, migration, and pass through events.
The code below contains a small working example on how to specify this custom subset of node types. Because a single node can be involved in multiple types of events bitwise operations on the node flags are used to encode the various possibilities.
import msprime
import numpy as np
ts = msprime.sim_ancestry(
samples=10,
recombination_rate=1e-8,
population_size=1e4,
sequence_length=1e5,
random_seed=45,
additional_nodes=(
msprime.NodeType.RECOMBINANT |
msprime.NodeType.COMMON_ANCESTOR
),
coalescing_segments_only=False
)
def count_flags(ts):
counter = dict()
for nodetype in msprime.NodeType:
flags = np.bitwise_and(ts.nodes_flags, nodetype.value)
counter[nodetype.name] = np.sum(flags > 0)
return counter
print(count_flags(ts))
{'RECOMBINANT': 388, 'COMMON_ANCESTOR': 107, 'MIGRANT': 0, 'CENSUS': 0, 'GENE_CONVERSION': 0, 'PASS_THROUGH': 0}
2. Microsat mutation model
We have implemented a number of mutational models that are commonly used to model the evolution of microsatellite repeats. The basic idea here is that the number of copies of a given repeat is tracked, and its evolution over time subject to one of a number of potential biases. The alleles in MicrosatMutationModels represent the copy number of the repeat, not the actual sequences observed.
The example below shows how to simulate microsat repeats under the Stepwise Mutation Model (Ohta and Kimura, 1978).
ts = msprime.sim_ancestry(
5,
random_seed=2,
sequence_length=20,
recombination_rate=0.5,
population_size=100_000
)
model = msprime.SMM(lo=1, hi=10)
mts = msprime.sim_mutations(ts, rate=5e-7, model=model, random_seed=1)
for var in mts.variants():
print("Site at:", var.site.position)
print(" Alleles:", var.alleles)
print(" Genotypes:", var.genotypes)
Site at: 4.0
Alleles: ('4', '5', '6')
Genotypes: [1 1 1 1 1 2 2 1 1 0]
Site at: 8.0
Alleles: ('6', '5')
Genotypes: [0 1 1 1 1 1 1 1 1 0]
Site at: 9.0
Alleles: ('10', '9')
Genotypes: [0 1 0 0 1 0 0 0 0 0]
Site at: 11.0
Alleles: ('7', '8')
Genotypes: [1 1 0 0 0 0 0 0 0 1]
Site at: 13.0
Alleles: ('4', '3')
Genotypes: [1 0 0 0 0 0 1 1 1 0]
Site at: 15.0
Alleles: ('7', '6')
Genotypes: [0 1 1 0 1 0 1 0 1 0]
Site at: 17.0
Alleles: ('6', '5', '7', '8')
Genotypes: [3 3 1 3 1 3 1 3 3 1]
Site at: 19.0
Alleles: ('10', '9')
Genotypes: [0 1 0 0 1 0 1 0 0 0]
Feedback etc
Please post a discussion or an issue on the msprime
GitHub repository
if you have any problems. We hope the update to msprime
version 1.3 is useful in your research.