Python API#

This page provides detailed documentation for the tszip Python API.

Usage example#

Tszip can be used directly in Python to provide seamless compression and decompression of tree sequences files. Here, we run an msprime simulation and write the output to a .trees.tsz file:

import msprime
import tszip

ts = msprime.simulate(10, random_seed=1)
tszip.compress(ts, "simulation.trees.tsz")

# Later, we load the same tree sequence from the compressed file.
ts = tszip.decompress("simulation.trees.tsz")

# Or use open, which works for both compressed and uncompressed files.
ts = tszip.load("simulation.trees.tsz")

Note

For very small simulations like this example, the tszip file may be larger than the original uncompressed file.

API#

tszip.load(path)[source]#

Open a tszip or normal tskit file. This is a convenience function that determines if the file needs to be decompressed or not, returning the tree sequence instance in either case.

Parameters:

path (str) – The location of the tszip compressed file or standard tskit file to load.

Return type:

tskit.TreeSequence

Returns:

A tskit.TreeSequence instance corresponding to the specified file.

tszip.compress(ts, destination, variants_only=False)[source]#

Compresses the specified tree sequence and writes it to the specified path or file-like object. By default, fully lossless compression is used so that tree sequences are identical before and after compression. By specifying the variants_only option, a lossy compression can be used, which discards any information that is not needed to represent the variants (which are stored losslessly).

Parameters:
  • ts (tskit.TreeSequence) – The input tree sequence.

  • destination (str) – The string, pathlib.Path or file-like object we should write the compressed file to.

  • variants_only (bool) – If True, discard all information not necessary to represent the variants in the input file.

tszip.decompress(path)[source]#

Decompresses the tszip compressed file and returns a tskit tree sequence instance.

Parameters:

path (str) – The location of the tszip compressed file to load.

Return type:

tskit.TreeSequence

Returns:

A tskit.TreeSequence instance corresponding to the the specified file.