Learn

What is a tree sequence?

A succinct tree sequence, or tree sequence for short, represents the relationships between a set of DNA sequences. Tree sequences can be used to store genetic data efficiently, and enable powerful analysis of millions of whole genomes at a time. They can be created by simulation or by inferring relationships from genetic variation.

Browse tutorials, publications and videos:

A unified genealogy of modern and ancient genomes

Science (2022) Wohns et al

doi: 10.1126/science.abi8264

25 February, 2022

This paper describes using tskit, tsinfer and tsdate to create a unified tree sequence of 3601 modern and 8 ancient human genome sequences compiled from eight datasets. Then estimates of ancestor geographic location are introduced that recapitulate key features of human history.

Read the article
Efficient ancestry and mutation simulation with msprime 1.0

Genetics (2021) Baumdicker et al

doi: 10.1093/genetics/iyab229

13 December, 2021

The accompanying paper to the msprime 1.0 release, summarising its features and performance and discussing its development model.

Read the article
Tskit Terminology and Concepts

21 June, 2021

This tutorial serves as an introduction to the terminology and concepts in tskit, and its underlying data structures.

See the tutorial
Getting started with tskit

21 June, 2021

You’ve run some simulations or inference methods, and you now have a TreeSequence object; what now? This tutorial is aimed users who are new to tskit and would like to get some basic tasks completed.

See the tutorial
Analysing Tree Sequences

21 June, 2021

This tutorial aims to give a quick overview of how the tskit statistics APIs work and how to use them effectively.

See the tutorial
Tables and Editing

19 June, 2021

The underlying representation of a tree sequence in tskit is a set of tables. This tutorial shows how to access and manipulate these tables.

See the tutorial
Analysing Trees

19 June, 2021

tskit provides single tree traversals, algorithms and phylogenetic statistics, of which this tutorial gives an overview.

See the tutorial
Working with Metadata

16 June, 2021

This tutorial gives an overview of tskit’s metadata system. This allows arbitrary, documented metadata to be attached to entities in tree sequences.

See the tutorial
Do you really need mutations?

06 June, 2021

In tree sequences, the genetic genealogy exists independently of the mutations that generate genetic variation, and often we are primarily interested in genetic variation because of what it can tell us about those genealogies. This tutorial aims to illustrate when we can leave mutations and genetic variation aside and study the genealogies directly.

See the tutorial
Visualization

29 May, 2021

It is often helpful to visualize a single tree — or multiple trees along a tree sequence — together with sites and mutations. tskit provides functions to do this, outputting either plain ascii or unicode text, or the more flexible Scalable Vector Graphics (SVG) format. This tutorial illustrates various examples.

See the tutorial
Tskit and R

11 May, 2021

To interface with tskit in R, we can use the reticulate R package, which lets you call Python functions within an R session. In this short tutorial, we’ll go through a couple of examples to show you how to get started.

See the tutorial
Completing forwards simulations

20 January, 2021

In this tutorial we show how to combine the best of both forwards and backwards simulation approaches by simulating the recent past using a forwards-time simulator and then complete the simulation of the ancient past using msprime.

See the tutorial
msprime tutorials

19 January, 2021

A set of tutorials for msprime. Covering demography, bottlenecks and introgression.

See the tutorial
Inferring the ancestry of everyone

02 December, 2020

Jerome Kelleher at PopGen Vienna

Play video
video thumbnail
Efficiently Summarizing Relationships in Large Samples: A General Duality Between Statistics of Genealogies and Genomes

Genetics (2020) Ralph et al

doi: 10.1534/genetics.120.303253

01 July, 2020

This paper shows that we can think about any statistic that works on sequence data in an equivalent (and more powerful) way in terms of the underlying trees, and that we can compute these statistics very efficiently. Read this paper if you would like more technical details on how the underlying data structures work and an introduction to incremental tree sequence algorithms. —

Read the article
Tree sequences and inference

22 May, 2020

Yan Wong at Phyloseminar

Play video
video thumbnail
Tree sequence fundamentals

08 April, 2020

Wilder Wohns at Phyloseminar

Play video
video thumbnail
Inferring whole-genome histories in large population datasets

Nature Genetics (2019) Kelleher et al

doi: 10.1038/s41588-019-0483-y

02 September, 2019

Start here if you’re new to tree sequences. This paper introduces tsinfer, the method to infer tree sequence topologies from genetic variation data. Please see the preprint if you cannot access the Nature Genetics paper.

Read the article
Succinct tree sequences for megasample genomics (47:03)

26 April, 2019

Jerome Kelleher at MIA

Play video
video thumbnail
Introduction to the tree sequence toolchain

25 April, 2019

Wilder Wohns at MIA Primer

Play video
video thumbnail
Tree‐sequence recording in SLiM opens new horizons for forward‐time simulation of whole genomes

Molecular Ecology Resources (2019) Haller et al

doi: 10.1111/1755-0998.12968

22 November, 2018

Continuing on from the 2018 PLOS Computational Biology paper, we discuss here how the tree sequence recording method was implemented in the powerful SLiM simulator. We show how some simulations are orders of magnitude more efficient and examples of the new possibilities that keeping a full record of the genetic ancestry makes available.

Read the article
Efficient pedigree recording for fast population genetics simulation

PLOS Computational Biology (2018) Kelleher et al

doi: 10.1371/journal.pcbi.1006581

01 November, 2018

Forwards-in-time simulations are very flexible but also usually very CPU intensive. This paper shows how we used tree sequences to make forwards-in-time simulations both more efficient and even more flexible.

Read the article
Simulating, storing & processing genetic variation data for millions of samples

26 April, 2017

Jerome Kelleher at MIA

Play video
video thumbnail
Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes

PLOS Computational Biology (2016) Kelleher et al

doi: 10.1371/journal.pcbi.1004842

04 April, 2016

This is where it all started. Here we introduce the msprime coalescent simulator and the core algorithms and data structures that would later be separated out into tskit. Read this paper if you would like to find out more about coalescent simulation, or to understand the core tree sequence algorithms and theoretical results. Note: much of the terminology has been updated since this original publication as the models were generalised.

Read the article