Configuration reference#
Tsinfer is configured via a TOML file passed to the CLI. Paths in the config are resolved relative to the config file’s directory.
A complete annotated example is in example_config.toml.
[[source]]#
Each [[source]] block defines a named view over a VCZ store. The same store
can appear multiple times with different filters.
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
(required) |
Unique name for this source |
|
string |
(required) |
Path to VCZ store |
|
string |
— |
bcftools include expression (e.g. |
|
string |
— |
bcftools exclude expression |
|
string |
— |
Sample filter (comma-separated; prefix |
|
string |
— |
Genomic region, half-open (e.g. |
|
string |
— |
Exact target positions |
|
various |
— |
Per-sample times: constant, field name, or |
[ancestral_state]#
Specifies where to read the ancestral allele for each variant position.
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
(required) |
Path to VCZ containing ancestral alleles |
|
string |
— |
Array name in the store (e.g. |
|
bool |
|
Use the REF allele ( |
[[ancestors]]#
Controls the ancestor-generation step (infer-ancestors). At least one
[[ancestors]] block is required unless [match] specifies a reference_ts.
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
(required) |
Unique ancestor set name |
|
string |
(required) |
Output VCZ path |
|
list[str] |
(required) |
Source names to build ancestors from |
|
int |
500,000 |
Split intervals at gaps wider than this (bp) |
|
int |
100 |
Zarr chunk size (ancestor dimension) |
|
int |
50,000 |
Zarr chunk size (site dimension) |
|
string |
|
Blosc compressor name |
|
int |
7 |
Compression level (0–9) |
|
string |
|
|
[match]#
Controls the HMM matching step.
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
(required) |
Output |
|
bool |
|
Enable Viterbi path compression |
|
string |
— |
Reference tree sequence (skip ancestor generation) |
|
string |
— |
Checkpoint directory (enables resume) |
|
bool |
|
Keep per-group checkpoint files |
[match.sources.<name>]#
Per-source parameters. Every source that should appear in the output tree sequence needs an entry here.
Field |
Type |
Default |
Description |
|---|---|---|---|
|
int |
1 |
tskit node flags ( |
|
bool |
|
Group sample nodes into tskit individuals |
[post_process]#
Optional cleanup applied after matching.
Field |
Type |
Default |
Description |
|---|---|---|---|
|
bool |
|
Split virtual root into per-tree roots |
|
bool |
|
Erase ancestry outside informative sites |
[augment_sites]#
Place non-inference sites via parsimony.
Field |
Type |
Default |
Description |
|---|---|---|---|
|
list[str] |
(required) |
Source names for parsimony placement |
[individual_metadata]#
Map VCZ sample-dimensioned arrays into tskit individual metadata.
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
— |
VCZ array whose unique values become tskit populations |
[individual_metadata.fields]#
Each key becomes a tskit metadata field; the value names the VCZ array.
[individual_metadata.fields]
name = "sample_id"
sex = "sample_sex"