Msprime 1.3.0!

Msprime 1.3.0

Msprime 1.3.0 introduces a new flexible way to store information on the ancestors associated with a custom subset of all possible events that might happen in the history of a sample. This means we can now simulate tree sequences with varying degrees of detail ranging from the default fully simplified series of local trees all the way to recording a fully detailed history as currently enabled by the record_full_arg flag. This new feature has been implemented for all ancestry models implemented in msprime. Lastly, this new release enables the simulation of microsatellite repeats. This last feature builds on the existing
general MatrixMutationModel class to represent the evolution of alleles containing a certain number of repeats.

Brief summary of changes

1. Additional nodes

In msprime we usually want to simulate the coalescent with recombination and represent the output as simply as possible. As a result, we don’t store individual recombination events, but rather their effects on the output tree sequence. We also do not explicitly store common ancestor events that do not result in marginal coalescences. For some purposes, however, we want record information on other events of interest, not just the mimimal representation of its outcome.

The newly implemented additional_nodes API serves this exact purpose. This allows us to record the nodes associated with a custom subset of all events we might observe in the history of a sample. Besides sampled genomes and coalescence events, nodes can now also represent common ancestors, as well as ancestors associated with recombination, gene conversion, migration, and pass through events.

The code below contains a small working example on how to specify this custom subset of node types. Because a single node can be involved in multiple types of events bitwise operations on the node flags are used to encode the various possibilities.

import msprime
import numpy as np

ts = msprime.sim_ancestry(
    samples=10,
    recombination_rate=1e-8,
    population_size=1e4,
    sequence_length=1e5,
    random_seed=45,
    additional_nodes=(
         msprime.NodeType.RECOMBINANT |
         msprime.NodeType.COMMON_ANCESTOR
         ),
    coalescing_segments_only=False
)

def count_flags(ts):
    counter = dict()
    for nodetype in msprime.NodeType:
        flags = np.bitwise_and(ts.nodes_flags, nodetype.value)
        counter[nodetype.name] = np.sum(flags > 0)

    return counter

print(count_flags(ts))
{'RECOMBINANT': 388, 'COMMON_ANCESTOR': 107, 'MIGRANT': 0, 'CENSUS': 0, 'GENE_CONVERSION': 0, 'PASS_THROUGH': 0}    

2. Microsat mutation model

We have implemented a number of mutational models that are commonly used to model the evolution of microsatellite repeats. The basic idea here is that the number of copies of a given repeat is tracked, and its evolution over time subject to one of a number of potential biases. The alleles in MicrosatMutationModels represent the copy number of the repeat, not the actual sequences observed.

The example below shows how to simulate microsat repeats under the Stepwise Mutation Model (Ohta and Kimura, 1978).

ts = msprime.sim_ancestry(
    5, 
    random_seed=2, 
    sequence_length=20,
    recombination_rate=0.5, 
    population_size=100_000
)
model = msprime.SMM(lo=1, hi=10)
mts = msprime.sim_mutations(ts, rate=5e-7, model=model, random_seed=1)
for var in mts.variants():
    print("Site at:", var.site.position)
    print("    Alleles:", var.alleles)
    print("    Genotypes:", var.genotypes)
Site at: 4.0
    Alleles: ('4', '5', '6')
    Genotypes: [1 1 1 1 1 2 2 1 1 0]
Site at: 8.0
    Alleles: ('6', '5')
    Genotypes: [0 1 1 1 1 1 1 1 1 0]
Site at: 9.0
    Alleles: ('10', '9')
    Genotypes: [0 1 0 0 1 0 0 0 0 0]
Site at: 11.0
    Alleles: ('7', '8')
    Genotypes: [1 1 0 0 0 0 0 0 0 1]
Site at: 13.0
    Alleles: ('4', '3')
    Genotypes: [1 0 0 0 0 0 1 1 1 0]
Site at: 15.0
    Alleles: ('7', '6')
    Genotypes: [0 1 1 0 1 0 1 0 1 0]
Site at: 17.0
    Alleles: ('6', '5', '7', '8')
    Genotypes: [3 3 1 3 1 3 1 3 3 1]
Site at: 19.0
    Alleles: ('10', '9')
    Genotypes: [0 1 0 0 1 0 1 0 0 0]

Feedback etc

Please post a discussion or an issue on the msprime GitHub repository if you have any problems. We hope the update to msprime version 1.3 is useful in your research.