Welcome to tszip

Welcome to tszip#

tszip is a command line interface and Python API for compressing tskit tree sequence files produced and read by software projects in the tskit ecosystem such as msprime, SLiM, fwdpy11 and tsinfer. Tszip achieves much better compression than is possible using generic compression utilities by building on the zarr and numcodecs packages.

The command line interface follows the design of gzip closely, so should be immediately familiar. Here we compress a large tree sequence representing 1000 Genomes chromosome 20 using tszip and decompress it using tsunzip:

$ ls -lh
total 297M
-rw-r--r-- 1 jk jk 297M May 10 14:49 1kg_chr20.trees
$ tszip 1kg_chr20.trees
$ ls -lh
total 46M
-rw-r--r-- 1 jk jk 46M May 10 14:51 1kg_chr20.trees.tsz
$ tsunzip 1kg_chr20.trees.tsz
$ ls -lh
total 297M
-rw-r--r-- 1 jk jk 297M May 10 14:52 1kg_chr20.trees