File formats

File formats#

tsinfer uses the excellent zarr library to encode data in a form that is both compact and efficient to process. See the API documentation for details on how to construct and manipulate these files using Python. The tsinfer list command provides a way to print out a summary of these files.

Ancestors File#

The ancestors file contains the ancestral haplotype data inferred from the sample data in the Generate ancestors step.

Todo

Document the structure of the ancestors file.

Tree sequences#

The goal of tsinfer is to infer correlated genealogies from variation data, and it uses the very efficient succinct tree sequence data structure to encode this output. Please see the tskit documentation for details on how to process and manipulate such tree sequences.

The intermediate .ancestors.trees file produced by the Match ancestors step is also a tree sequence and can be loaded and analysed using the tskit API.