Introduction#
The goal of tsinfer is to infer succinct tree sequences from observed
genetic variation data. A succinct tree sequence (or
tree sequence, for short)
is an efficient way of representing the correlated genealogies that
describe the ancestry of many species. By inferring these tree sequences, we
make two very important gains:
We obtain an approximation of the true history of our sampled data, which may be useful for other inferential tasks.
The data structure itself is an extremely concise and efficient means of storing and processing the data that we have.
The output of tsinfer is a tskit.TreeSequence and so the
full tskit API can be used to
analyse real data, in precisely the same way that it is commonly used
to analyse simulation data, for example, from msprime.
Note
Tsinfer infers the genetic relationships between sampled genomes, but does not
attempt to infer the times of most recent common ancestors (tMRCAs) in the genealogy.
If you are using the output of tsinfer in downstream analysis that relies on
node times, you are advised not to use the inferred tree sequences directly; instead,
you should post-process the tsinfer output using software such as
tsdate that attempts to assign calendar or
generation times to the tree sequence nodes.