This site contains a number of tutorials to develop your understanding of genetic genealogies, ancestral recombination graphs, and the succinct tree sequence storage format, as implemented in tskit: the tree sequence toolkit. Also included are a number of tutorials showing advanced use of software programs, such as msprime, that form part of the tskit ecosystem.


If you are new to the world of tree sequences, we suggest you start with the first tutorial: What is a tree sequence?


Tutorials are under constant development. Those that are still a work in progress and not yet ready for use are shown in italics in the list of tutorials.

We very much welcome help developing existing tutorials or writing new ones. Please open or contribute to a GitHub issue if you would like to help out.

Other sources of help#

In addition to these tutorials, our Learn page lists selected videos and publications to help you learn about tree sequences.

We aim to be a friendly, welcoming open source community. Questions and discussion about using tskit, the tree sequence toolkit should be directed to the GitHub discussion forum, and there are similar forums for other software in the tree sequence development community, such as for msprime and tsinfer.

Running tutorial code#

It is possible to run the tutorial code on your own computer, if you wish. This will allow you to experiment with the examples provided. The recommended way to do this is from within a Jupyter notebook. As well as installing Jupyter, you will also need to install the various Python libraries, most importantly tskit, msprime, numpy, and matplotlib. These and other packages are listed in the requirements.txt file; a shortcut to installing the necessary software is therefore:

python3 -m pip install -r https://tskit.dev/tutorials/requirements.txt

In addition, to run the R tutorial you will need to install the R reticulate library, and if running it in a Jupyter notebook, the IRkernel library. This can be done by running the following command within R:

install.packages(c("reticulate", "IRkernel")); IRkernel::installspec()

Downloading tutorial datafiles#

Many of the tutorials use pre-existing tree sequences stored in the data directory. These can be downloaded individually from that link, or you can download them all at once by running the script stored in https://tskit.dev/tutorials/examples/download.py. If you are running the code in the tutorials from within a Jupyter notebook then you can simply load this code into a new cell by using the %load cell magic. Just run the following in a Jupyter code cell:

%load https://tskit.dev/tutorials/examples/download.py

Running the resulting Python code should download the data files, then print out finished downloading when all files are downloaded. You should then be able to successfully run code such as the following:

import tskit
ts = tskit.load("data/basics.trees")
print(f"The file 'data/basics.trees' exists, and contains {ts.num_trees} trees")
The file 'data/basics.trees' exists, and contains 3 trees