Command line interface#

The command line interface in tsinfer is intended to provide a convenient interface to the high-level API functionality. There are two equivalent ways to invoke this program:

$ tsinfer

or

$ python3 -m tsinfer

The first form is more intuitive and works well most of the time. The second form is useful when multiple versions of Python are installed or if the tsinfer executable is not installed on your path.

The tsinfer program has five subcommands: list prints a summary of the data held in one of tsinfer’s file formats; infer runs the complete inference process for a given input samples file; and generate-ancestors, match-ancestors and match-samples run the three parts of this inference process as separate steps. Running the inference as separate steps like this is recommended for large inferences as it allows for greater control over the inference process.

Argument details#

Command line interface for tsinfer.

usage: tsinfer [-h] [-V]
               {generate-ancestors,ga,match-ancestors,ma,augment-ancestors,aa,match-samples,ms,infer,list,ls,verify}
               ...

Positional Arguments#

subcommand

Possible choices: generate-ancestors, ga, match-ancestors, ma, augment-ancestors, aa, match-samples, ms, infer, list, ls, verify

Named Arguments#

-V, --version

show program’s version number and exit

Sub-commands#

generate-ancestors (ga)#

Generates a set of ancestors from the input sample data and stores the results in a tsinfer ancestors file.

tsinfer generate-ancestors [-h] [-a ANCESTORS] [--num-threads NUM_THREADS]
                           [--num-flush-threads NUM_FLUSH_THREADS]
                           [--progress] [-v]
                           [--log-section {tsinfer.inference,tsinfer.formats,tsinfer.threads}]
                           samples
Positional Arguments#
samples

The input sample data in tsinfer ‘samples’ format. Please see the documentation at https://tskit.dev/tsinfer/docs/ for information on how to import data into this format.

Named Arguments#
-a, --ancestors

The path to the ancestor data file in tsinfer ‘ancestors’ format. If not specified, this defaults to the input samples file stem with the extension ‘.ancestors’. For example, if ‘1kg-chr1.samples’ is the input file then the default ancestors file would be ‘1kg-chr1.ancestors’

--num-threads, -t

The number of worker threads to use. If < 1, use a simpler unthreaded algorithm (default).

--num-flush-threads, -F

The number of data flush threads to use. If < 1, all data is flushed synchronously in the main thread (default=2)

--progress, -p

Show a progress monitor.

-v, --verbosity

Increase the verbosity

--log-section, -L

Possible choices: tsinfer.inference, tsinfer.formats, tsinfer.threads

Log messages only for the specified module

match-ancestors (ma)#

Matches the ancestors built by the ‘generate-ancestors’ command against each other using the model information specified in the input file and writes the output to a tskit .trees file.

tsinfer match-ancestors [-h] [-v]
                        [--log-section {tsinfer.inference,tsinfer.formats,tsinfer.threads}]
                        [-a ANCESTORS] [-A ANCESTORS_TREES]
                        [--num-threads NUM_THREADS] [--progress]
                        [--no-path-compression]
                        [--recombination-rate RECOMBINATION_RATE]
                        [--mismatch-ratio MISMATCH_RATIO]
                        samples
Positional Arguments#
samples

The input sample data in tsinfer ‘samples’ format. Please see the documentation at https://tskit.dev/tsinfer/docs/ for information on how to import data into this format.

Named Arguments#
-v, --verbosity

Increase the verbosity

--log-section, -L

Possible choices: tsinfer.inference, tsinfer.formats, tsinfer.threads

Log messages only for the specified module

-a, --ancestors

The path to the ancestor data file in tsinfer ‘ancestors’ format. If not specified, this defaults to the input samples file stem with the extension ‘.ancestors’. For example, if ‘1kg-chr1.samples’ is the input file then the default ancestors file would be ‘1kg-chr1.ancestors’

-A, --ancestors-trees

The path to the ancestor trees file in tskit ‘.trees’ format. If not specified, this defaults to the input samples file stem with the extension ‘.ancestors.trees’. For example, if ‘1kg-chr1.samples’ is the input file then the default ancestors file would be ‘1kg-chr1.ancestors.trees’

--num-threads, -t

The number of worker threads to use. If < 1, use a simpler unthreaded algorithm (default).

--progress, -p

Show a progress monitor.

--no-path-compression

Disable path compression

--recombination-rate

The recombination rate per unit genome

--mismatch-ratio

The mismatch ratio: measures the relative importance of multiple mutation/error versus recombination during inference. This defaults to unity if a recombination rate or map are specified.

augment-ancestors (aa)#

Augments the ancestors tree sequence by adding a subset of the samples

tsinfer augment-ancestors [-h] [-n NUM_SAMPLES] [-A ANCESTORS_TREES] [-v]
                          [--log-section {tsinfer.inference,tsinfer.formats,tsinfer.threads}]
                          [--no-path-compression] [--num-threads NUM_THREADS]
                          [--progress]
                          [--recombination-rate RECOMBINATION_RATE]
                          [--mismatch-ratio MISMATCH_RATIO]
                          samples augmented_ancestors
Positional Arguments#
samples

The input sample data in tsinfer ‘samples’ format. Please see the documentation at https://tskit.dev/tsinfer/docs/ for information on how to import data into this format.

augmented_ancestors

The path to write the augmented ancestors tree sequence to

Named Arguments#
-n, --num-samples

The number of samples to use. Defaults to 10% of the total.

-A, --ancestors-trees

The path to the ancestor trees file in tskit ‘.trees’ format. If not specified, this defaults to the input samples file stem with the extension ‘.ancestors.trees’. For example, if ‘1kg-chr1.samples’ is the input file then the default ancestors file would be ‘1kg-chr1.ancestors.trees’

-v, --verbosity

Increase the verbosity

--log-section, -L

Possible choices: tsinfer.inference, tsinfer.formats, tsinfer.threads

Log messages only for the specified module

--no-path-compression

Disable path compression

--num-threads, -t

The number of worker threads to use. If < 1, use a simpler unthreaded algorithm (default).

--progress, -p

Show a progress monitor.

--recombination-rate

The recombination rate per unit genome

--mismatch-ratio

The mismatch ratio: measures the relative importance of multiple mutation/error versus recombination during inference. This defaults to unity if a recombination rate or map are specified.

match-samples (ms)#

Matches the samples against the tree sequence structure built by the match-ancestors command

tsinfer match-samples [-h] [-v]
                      [--log-section {tsinfer.inference,tsinfer.formats,tsinfer.threads}]
                      [-A ANCESTORS_TREES] [--no-path-compression]
                      [--no-post-process] [-O OUTPUT_TREES]
                      [--num-threads NUM_THREADS] [--progress]
                      [--recombination-rate RECOMBINATION_RATE]
                      [--mismatch-ratio MISMATCH_RATIO]
                      samples
Positional Arguments#
samples

The input sample data in tsinfer ‘samples’ format. Please see the documentation at https://tskit.dev/tsinfer/docs/ for information on how to import data into this format.

Named Arguments#
-v, --verbosity

Increase the verbosity

--log-section, -L

Possible choices: tsinfer.inference, tsinfer.formats, tsinfer.threads

Log messages only for the specified module

-A, --ancestors-trees

The path to the ancestor trees file in tskit ‘.trees’ format. If not specified, this defaults to the input samples file stem with the extension ‘.ancestors.trees’. For example, if ‘1kg-chr1.samples’ is the input file then the default ancestors file would be ‘1kg-chr1.ancestors.trees’

--no-path-compression

Disable path compression

--no-post-process, --no-simplify

Do not post process the output tree sequence

-O, --output-trees

The path to the output trees file in tskit ‘.trees’ format. If not specified, this defaults to the input samples file stem with the extension ‘.trees’. For example, if ‘1kg-chr1.samples’ is the input file then the default output file would be ‘1kg-chr1.trees’

--num-threads, -t

The number of worker threads to use. If < 1, use a simpler unthreaded algorithm (default).

--progress, -p

Show a progress monitor.

--recombination-rate

The recombination rate per unit genome

--mismatch-ratio

The mismatch ratio: measures the relative importance of multiple mutation/error versus recombination during inference. This defaults to unity if a recombination rate or map are specified.

infer#

Runs the generate-ancestors, match-ancestors and match-samples steps in one go. Not recommended for large inferences.

tsinfer infer [-h] [-v]
              [--log-section {tsinfer.inference,tsinfer.formats,tsinfer.threads}]
              [-O OUTPUT_TREES] [--no-path-compression]
              [--num-threads NUM_THREADS] [--progress] [--no-post-process]
              [--recombination-rate RECOMBINATION_RATE]
              [--mismatch-ratio MISMATCH_RATIO] [--keep-intermediates]
              [-a ANCESTORS] [-A ANCESTORS_TREES]
              samples
Positional Arguments#
samples

The input sample data in tsinfer ‘samples’ format. Please see the documentation at https://tskit.dev/tsinfer/docs/ for information on how to import data into this format.

Named Arguments#
-v, --verbosity

Increase the verbosity

--log-section, -L

Possible choices: tsinfer.inference, tsinfer.formats, tsinfer.threads

Log messages only for the specified module

-O, --output-trees

The path to the output trees file in tskit ‘.trees’ format. If not specified, this defaults to the input samples file stem with the extension ‘.trees’. For example, if ‘1kg-chr1.samples’ is the input file then the default output file would be ‘1kg-chr1.trees’

--no-path-compression

Disable path compression

--num-threads, -t

The number of worker threads to use. If < 1, use a simpler unthreaded algorithm (default).

--progress, -p

Show a progress monitor.

--no-post-process, --no-simplify

Do not post process the output tree sequence

--recombination-rate

The recombination rate per unit genome

--mismatch-ratio

The mismatch ratio: measures the relative importance of multiple mutation/error versus recombination during inference. This defaults to unity if a recombination rate or map are specified.

--keep-intermediates, -k

Keep the intermediate ancestors and ancestors-tree-sequence files. To override the default locations where these files are saved, use the –ancestors and –ancestors-trees options

-a, --ancestors

The path to the ancestor data file in tsinfer ‘ancestors’ format. If not specified, this defaults to the input samples file stem with the extension ‘.ancestors’. For example, if ‘1kg-chr1.samples’ is the input file then the default ancestors file would be ‘1kg-chr1.ancestors’

-A, --ancestors-trees

The path to the ancestor trees file in tskit ‘.trees’ format. If not specified, this defaults to the input samples file stem with the extension ‘.ancestors.trees’. For example, if ‘1kg-chr1.samples’ is the input file then the default ancestors file would be ‘1kg-chr1.ancestors.trees’

list (ls)#

Show a summary of the specified tsinfer related file.

tsinfer list [-h] [-v]
             [--log-section {tsinfer.inference,tsinfer.formats,tsinfer.threads}]
             [--storage]
             path
Positional Arguments#
path

The tsinfer file to show information about.

Named Arguments#
-v, --verbosity

Increase the verbosity

--log-section, -L

Possible choices: tsinfer.inference, tsinfer.formats, tsinfer.threads

Log messages only for the specified module

--storage, -s

Show detailed information about data storage.

verify#

Verify that the specified tree sequence and samples files represent the same data

tsinfer verify [-h] [-v]
               [--log-section {tsinfer.inference,tsinfer.formats,tsinfer.threads}]
               [--progress]
               samples tree_sequence
Positional Arguments#
samples

The input sample data in tsinfer ‘samples’ format. Please see the documentation at https://tskit.dev/tsinfer/docs/ for information on how to import data into this format.

tree_sequence

The tree sequence to compare with in .trees format.

Named Arguments#
-v, --verbosity

Increase the verbosity

--log-section, -L

Possible choices: tsinfer.inference, tsinfer.formats, tsinfer.threads

Log messages only for the specified module

--progress, -p

Show a progress monitor.