Welcome to tszip#
tszip
is a command line interface and Python API for compressing
tskit tree sequence files produced
and read by software projects in the tskit ecosystem
such as msprime,
SLiM, fwdpy11
and tsinfer. Tszip achieves much better
compression than is possible using generic compression utilities by building on
the zarr and
numcodecs packages.
The command line interface follows the design of
gzip closely, so should be immediately familiar.
Here we compress a large tree sequence representing 1000 Genomes chromosome 20 using
tszip
and decompress it using tsunzip
:
$ ls -lh
total 297M
-rw-r--r-- 1 jk jk 297M May 10 14:49 1kg_chr20.trees
$ tszip 1kg_chr20.trees
$ ls -lh
total 46M
-rw-r--r-- 1 jk jk 46M May 10 14:51 1kg_chr20.trees.tsz
$ tsunzip 1kg_chr20.trees.tsz
$ ls -lh
total 297M
-rw-r--r-- 1 jk jk 297M May 10 14:52 1kg_chr20.trees