What is a tree sequence?
A succinct tree sequence, or tree sequence for short, represents the relationships between a set of DNA sequences. Tree sequences can be used to store genetic data efficiently, and enable powerful analysis of millions of whole genomes at a time. They can be created by simulation or by inferring relationships from genetic variation.
Tree sequences provide:
A record full of genetic ancestry
A tree sequence concisely captures the full history of a set of genomes by sharing common branches between adjacent genetic trees
An encoding of DNA data
Placing mutations on a tree sequence allows lossless representation and compression of DNA datasets
A unified genealogy of modern and ancient genomes
Science (2022) Wohns et al
25 February, 2022
This paper describes using
tsdate to create
a unified tree sequence of 3601 modern and 8 ancient human genome
sequences compiled from eight datasets. Then estimates
of ancestor geographic location are introduced that
recapitulate key features of human history.
Efficient ancestry and mutation simulation with msprime 1.0
Genetics (2021) Baumdicker et al
13 December, 2021
The accompanying paper to the
msprime 1.0 release, summarising its features
and performance and discussing its development model.
Do you really need mutations?
06 June, 2021
In tree sequences, the genetic genealogy exists independently of the mutations that generate genetic variation, and often we are primarily interested in genetic variation because of what it can tell us about those genealogies. This tutorial aims to illustrate when we can leave mutations and genetic variation aside and study the genealogies directly.
29 May, 2021
It is often helpful to visualize a single tree — or multiple trees along a tree sequence — together with
sites and mutations.
tskit provides functions to do this, outputting either plain ascii or unicode text,
or the more flexible Scalable Vector Graphics (SVG) format. This tutorial illustrates various examples.
Inferring the ancestry of everyone
02 December, 2020
Jerome Kelleher at PopGen Vienna
Efficiently Summarizing Relationships in Large Samples: A General Duality Between Statistics of Genealogies and Genomes
Genetics (2020) Ralph et al
01 July, 2020
This paper shows that we can think about any statistic that works on sequence data in an equivalent (and more powerful) way in terms of the underlying trees, and that we can compute these statistics very efficiently. Read this paper if you would like more technical details on how the underlying data structures work and an introduction to incremental tree sequence algorithms. —
Tree sequences and inference
22 May, 2020
Yan Wong at Phyloseminar
Tree sequence fundamentals
08 April, 2020
Wilder Wohns at Phyloseminar
Inferring whole-genome histories in large population datasets
Nature Genetics (2019) Kelleher et al
02 September, 2019
Start here if you’re new to tree sequences. This paper introduces tsinfer, the method to infer tree sequence topologies from genetic variation data. Please see the preprint if you cannot access the Nature Genetics paper.
Succinct tree sequences for megasample genomics (47:03)
26 April, 2019
Jerome Kelleher at MIA
Introduction to the tree sequence toolchain
25 April, 2019
Wilder Wohns at MIA Primer
Tree‐sequence recording in SLiM opens new horizons for forward‐time simulation of whole genomes
Molecular Ecology Resources (2019) Haller et al
22 November, 2018
Continuing on from the 2018 PLOS Computational Biology paper, we discuss here how the tree sequence recording method was implemented in the powerful SLiM simulator. We show how some simulations are orders of magnitude more efficient and examples of the new possibilities that keeping a full record of the genetic ancestry makes available.
Efficient pedigree recording for fast population genetics simulation
PLOS Computational Biology (2018) Kelleher et al
01 November, 2018
Forwards-in-time simulations are very flexible but also usually very CPU intensive. This paper shows how we used tree sequences to make forwards-in-time simulations both more efficient and even more flexible.
Simulating, storing & processing genetic variation data for millions of samples
26 April, 2017
Jerome Kelleher at MIA
Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes
PLOS Computational Biology (2016) Kelleher et al
04 April, 2016
This is where it all started. Here we introduce the msprime coalescent simulator and the core algorithms and data structures that would later be separated out into tskit. Read this paper if you would like to find out more about coalescent simulation, or to understand the core tree sequence algorithms and theoretical results. Note: much of the terminology has been updated since this original publication as the models were generalised.