File formats#
tsinfer
uses the excellent zarr library
to encode data in a form that is both compact and efficient to process.
See the API documentation for details on
how to construct and manipulate these files using Python. The
tsinfer list command provides a way to print out a
summary of these files.
Samples File#
The samples file is tsinfer's
input format. Data must be converted into
this format before it can be processed using the SampleData
class.
Todo
Document the structure of the samples file.
Ancestors File#
The ancestors file contains the ancestral haplotype data inferred from the sample data in the Generate ancestors step.
Todo
Document the structure of the ancestors file.
Tree sequences#
The goal of tsinfer
is to infer correlated genealogies from variation
data, and it uses the very efficient succinct tree sequence data structure
to encode this output. Please see the tskit documentation for details on how to
process and manipulate such tree sequences.
The intermediate .ancestors.trees
file produced by the
Match ancestors step is also a
tree sequence and can be loaded and analysed using the
tskit API.