Introduction#
The goal of tsinfer
is to infer succinct tree sequences from observed
genetic variation data. A succinct tree sequence (or
tree sequence, for short)
is an efficient way of representing the correlated genealogies that
describe the ancestry of many species. By inferring these tree sequences, we
make two very important gains:
We obtain an approximation of the true history of our sampled data, which may be useful for other inferential tasks.
The data structure itself is an extremely concise and efficient means of storing and processing the data that we have.
The output of tsinfer
is a tskit.TreeSequence
and so the
full tskit API can be used to
analyse real data, in precisely the same way that it is commonly used
to analyse simulation data, for example, from msprime.
Note
Tsinfer
infers the genetic relationships between sampled genomes, but does not
attempt to infer the times of most recent common ancestors (tMRCAs) in the genealogy.
If you are using the output of tsinfer
in downstream analysis that relies on
node times, you are advised not to use the inferred tree sequences directly; instead,
you should post-process the tsinfer
output using software such as
tsdate that attempts to assign calendar or
generation times to the tree sequence nodes.