Configuration reference#

Tsinfer is configured via a TOML file passed to the CLI. Paths in the config are resolved relative to the config file’s directory.

A complete annotated example is in example_config.toml.

[[source]]#

Each [[source]] block defines a named view over a VCZ store. The same store can appear multiple times with different filters.

Field

Type

Default

Description

name

string

(required)

Unique name for this source

path

string

(required)

Path to VCZ store

include

string

bcftools include expression (e.g. "TYPE='snp'")

exclude

string

bcftools exclude expression

samples

string

Sample filter (comma-separated; prefix ^ to exclude)

regions

string

Genomic region, half-open (e.g. "chr20:1000-50000")

targets

string

Exact target positions

sample_time

various

Per-sample times: constant, field name, or {path, field} dict

[ancestral_state]#

Specifies where to read the ancestral allele for each variant position.

Field

Type

Default

Description

path

string

(required)

Path to VCZ containing ancestral alleles

field

string

Array name in the store (e.g. "variant_AA"). Required unless is_reference is set.

is_reference

bool

false

Use the REF allele (variant_allele[:, 0]) as the ancestral state. Useful for simulations. field must not be set when this is true.

[[ancestors]]#

Controls the ancestor-generation step (infer-ancestors). At least one [[ancestors]] block is required unless [match] specifies a reference_ts.

Field

Type

Default

Description

name

string

(required)

Unique ancestor set name

path

string

(required)

Output VCZ path

sources

list[str]

(required)

Source names to build ancestors from

max_gap_length

int

500,000

Split intervals at gaps wider than this (bp)

samples_chunk_size

int

100

Zarr chunk size (ancestor dimension)

variants_chunk_size

int

50,000

Zarr chunk size (site dimension)

compressor

string

"zstd"

Blosc compressor name

compression_level

int

7

Compression level (0–9)

genotype_encoding

string

"eight_bit"

"one_bit" uses ~8x less memory (biallelic only)

[match]#

Controls the HMM matching step.

Field

Type

Default

Description

output

string

(required)

Output .trees file path

path_compression

bool

true

Enable Viterbi path compression

reference_ts

string

Reference tree sequence (skip ancestor generation)

workdir

string

Checkpoint directory (enables resume)

keep_intermediates

bool

false

Keep per-group checkpoint files

[match.sources.<name>]#

Per-source parameters. Every source that should appear in the output tree sequence needs an entry here.

Field

Type

Default

Description

node_flags

int

1

tskit node flags (1 = NODE_IS_SAMPLE, 0 for ancestors)

create_individuals

bool

true

Group sample nodes into tskit individuals

[post_process]#

Optional cleanup applied after matching.

Field

Type

Default

Description

split_ultimate

bool

true

Split virtual root into per-tree roots

erase_flanks

bool

true

Erase ancestry outside informative sites

[augment_sites]#

Place non-inference sites via parsimony.

Field

Type

Default

Description

sources

list[str]

(required)

Source names for parsimony placement

[individual_metadata]#

Map VCZ sample-dimensioned arrays into tskit individual metadata.

Field

Type

Default

Description

population

string

VCZ array whose unique values become tskit populations

[individual_metadata.fields]#

Each key becomes a tskit metadata field; the value names the VCZ array.

[individual_metadata.fields]
name = "sample_id"
sex = "sample_sex"