# Command line interface

## Contents

# Command line interface#

Two command line applications are provided with `msprime`

: msp and
mspms. The **msp** program is a POSIX compliant command line
interface to the library. The **mspms** program is a fully-**ms**
compatible interface. This is useful for those who wish to get started quickly
with using
the library, and also as a means of plugging `msprime`

into existing work
flows. However, there is a substantial overhead involved in translating data
from `msprime`

’s native history file into legacy formats like `ms`

, and so new code
should use the Python API where possible.

## msp#

The **msp** program provides a convenient interface to the `msprime`

API.
It is based on subcommands that either generate or consume a
tree sequence file. The `ancestry`

sub-command simulates tree sequences from the
coalescent with recombination. The `mutations`

sub-command places
mutations onto an existing tree sequence. Several
mutation models are available.
The deprecated `simulate`

sub-command simulates tree sequences from the
coalescent with recombination and mutations.

Important

The **msp ancestry** and **msp mutations** commands write
their output to stdout by default so make sure you redirect this to a
file or use the `--output`

option.

### Examples#

Simulate 10 diploid samples from a population of size 1000
with a genome of 100 base pairs, and write the output to `ancestry.trees`

:

```
$ msp ancestry 10 -N 1000 -L 100 > ancestry.trees
```

Simulate mutations from the `Jukes-Cantor`

mutation model
at rate 0.01 per-base per-generation, and write this to `mutations.trees`

.

```
$ msp mutations 0.01 ancestry.trees > mutations.trees
```

Do the same simulations, but pipe the output of the ancestry simulation directly to the mutations simulation:

```
$ msp ancestry 10 -N 1000 -L 100 | msp mutations 0.01 > combined.trees
```

Show a summary of the properties of the trees file `combined.trees`

:

```
$ tskit info combined.trees
sequence_length: 100.0
trees: 1
samples: 20
individuals: 10
nodes: 39
edges: 38
sites: 100
mutations: 16997
migrations: 0
populations: 1
provenances: 2
```

See the tskit documentation for more details on the
**tskit** command line interface.

### msp ancestry#

**msp ancestry** generates coalescent simulations with recombination
from a constant population size and stores the result as a tree sequence in
an output file. This sub-command is an interface to the
`msprime.sim_ancestry()`

API function.

```
usage: msp ancestry [-h] [-v] [-o OUTPUT] [--random-seed RANDOM_SEED]
[--length LENGTH]
[--recombination-rate RECOMBINATION_RATE]
[--ploidy PLOIDY]
[--population-size POPULATION_SIZE | --demography DEMOGRAPHY]
samples [samples ...]
```

#### Positional Arguments#

`samples`The sample specification. If a demography is not specified using the -d option, this must be a single integer denoting the number of k-ploid individuals (see the –ploidy argument) to sample. If a demography is specified, the samples must be provided as <population identifier>:<num samples> pairs. Samples from multiple populations can be specified; for example, if we have a demography with two populations named A and B the sample specification ‘A:5 B:6’ will sample 5 individuals from A and 6 from B. Samples are always taken at the corresponding population’s default sampling time.

#### Named Arguments#

`-v, --verbosity`Increase the verbosity. Use -v for INFO output and -vv for DEBUG

`-o, --output`Where to write the output tree sequence file. If omitted or ‘-’, write to standard output

`--random-seed, -s`The random seed; If not specified, one is chosen randomly.

`--length, -L`The length of the genome sequence to simulate

`--recombination-rate, -r`The recombination rate per base per generation

`--ploidy, -k`The number of monoploid genomes per sample individual

`--population-size, -N`The number of individuals in the population

`--demography, -d`The path to a Demes YAML file describing the demographic model.

### msp mutations#

**msp mutate** can be used to add mutations to a tree sequence and store
a copy of the resulting tree sequence in a second file. This
sub-command is an interface to the
`msprime.sim_mutations()`

API function.

```
usage: msp mutations [-h] [-o OUTPUT] [--random-seed RANDOM_SEED]
[--start-time START_TIME] [--end-time END_TIME]
[--model {binary,blosum62,infinite_alleles,jc69,pam}]
mutation_rate [input]
```

#### Positional Arguments#

`mutation_rate`The mutation rate per base

`input`The input tree sequence. If omitted or ‘-’ read from standard input

#### Named Arguments#

`-o, --output`Where to write the output tree sequence file. If omitted or ‘-’, write to standard output

`--random-seed, -s`The random seed; If not specified, one is chosen randomly.

`--start-time`The minimum time ago at which a mutation can occur

`--end-time`The maximum time ago at which a mutation can occur

`--model, -m`Possible choices: binary, blosum62, infinite_alleles, jc69, pam

The mutation model to use (default=JC69)

### msp simulate#

Warning

The **msp simulate** command is deprecated.

**msp simulate** generates coalescent simulations with recombination
from a constant population size and stores the result as a tree sequence in
an output file.
**msp simulate** is deprecated, but will be supported indefinitely.
msp mutations provides further mutation models.
**msp ancestry** will provide further demographic scenarios from
which to simulate tree sequences. **msp simulate** is an
interface to the deprecated `msprime.simulate()`

API function.

```
usage: msp simulate [-h] [--length LENGTH]
[--recombination-rate RECOMBINATION_RATE]
[--mutation-rate MUTATION_RATE]
[--effective-population-size EFFECTIVE_POPULATION_SIZE]
[--random-seed RANDOM_SEED] [--compress]
sample_size tree_sequence
```

#### Positional Arguments#

`sample_size`The number of genomes in the sample

`tree_sequence`The output tree sequence file

#### Named Arguments#

`--length, -L`The length of the simulated region in base pairs

`--recombination-rate, -r`The recombination rate per base per generation

`--mutation-rate, -u`The mutation rate per base per generation

`--effective-population-size, -N`The diploid effective population size Ne

`--random-seed, -s`The random seed; If not specified, one is chosen randomly.

`--compress, -z`Deprecated option with no effect; please use the tszip utility instead.

## mspms#

The **mspms** program is an **ms**-compatible
command line interface to the `msprime`

library. This interface should
be useful for legacy applications, where it can be used as a drop-in
replacement for **ms**. This interface is not recommended for new applications,
particularly if the simulated trees are required as part of the output
as Newick is very inefficient. The Python API is the recommended interface,
providing direct access to the structures used within `msprime`

.

### Supported Features#

**mspms** supports a subset of **ms**’s functionality. Please
open an issue on
GitHub if there is a feature of **ms** that you would like to see
added. We currently support:

Basic functionality (sample size, replicates, tree and haplotype output);

Recombination (via the

`-r`

option);Gene-conversion (via the

`-c`

option);Spatial structure with arbitrary migration matrices;

Support for

**ms**demographic events. (The implementation of the`-es`

option is limited, and has restrictions on how it may be combined with other options.)

### Argument details#

This section provides the detailed listing of the arguments to
**mspms** (also available via `mspms --help`

). See
the documentation for ms
for details on how these values should be interpreted.

Warning

Due to quirks in Python’s argparse module, negative growth rates
written in exponential form (e.g. `-eG 1.0 -1e-5`

) are not recognised as an
option argument. To work around this, specify the argument using quotes
and a leading space, e.g. `-eG 1.0 ' -1e-5'`

, or avoid scientific notation,
e.g. `-eG 1.0 -0.00001`

.

mspms is an ms-compatible interface to the msprime library. It simulates the coalescent with recombination for a variety of demographic models and outputs the results in a text-based format. It supports a subset of the functionality available in ms and aims for full compatibility. WARNING: due to quirks in Python’s argparse module, negative growth rates written in exponential form (e.g. -eG 1.0 -1e-5) are not recognised as an option argument. To work around this, specify the argument using quotes and a leading space, e.g. -eG 1.0 ‘ -1e-5’, or avoid scientific notation, e.g. -eG 1.0 -0.00001.

```
usage: mspms [-h] [--mutation-rate theta] [--trees]
[--recombination rho num_loci]
[--gene-conversion gc_recomb_ratio tract_length]
[--hotspots HOTSPOTS [HOTSPOTS ...]]
[--structure value [value ...]]
[--migration-matrix-entry i j rate]
[--migration-matrix entry [entry ...]]
[--migration-rate-change t x]
[--migration-matrix-entry-change time i j rate]
[--migration-matrix-change entry [entry ...]]
[--growth-rate alpha]
[--population-growth-rate population_id alpha]
[--population-size population_id size]
[--growth-rate-change t alpha]
[--population-growth-rate-change t population_id alpha]
[--size-change t x] [--population-size-change t population_id x]
[--population-split t i j]
[--admixture t population_id proportion]
[--random-seeds x1 x2 x3] [--precision PRECISION] [-V]
[-f FILENAME]
sample_size num_replicates
```

#### Positional Arguments#

`sample_size`The number of genomes in the sample

`num_replicates`Number of independent replicates

#### Named Arguments#

`-V, --version`show program’s version number and exit

`-f, --filename`Insert commands from a file at this point in the command line.

#### Behaviour#

`--mutation-rate, -t`Mutation rate theta=4*N0*mu

`--trees, -T`Print out trees in Newick format

`--recombination, -r`Recombination at rate rho=4*N0*r where r is the rate of recombination between the ends of the region being simulated; num_loci is the number of sites between which recombination can occur

`--gene-conversion, -c`Gene conversion at rate gamma where gamma depends on the defined recombination rate rho=4*N0*r. If rho > 0, gc_recomb_ratio defines the ratio g/r, where r is the probability per generation of crossing-over and g the corresponding gene conversion probability. Gene conversions are initiated at rate gamma=rho*gc_recomb_ratio = 4*N0*r*gc_recomb_ratio. If rho = 0 the gene conversion rate is given by gamma=gc_recomb_ratio=4*N0*c where c is the rate of gene conversion initiation between the ends of the simulated region of length num_loci. If the recombination rate is not specified, standard parameters are used, i.e. rho = 0 and num_loci = 1. The length of the gene conversion tracts is geometrically distributed with mean tract_length. The mean tract_length needs to be larger than or equal to 1 for discrete genomes and larger than 0 for continuous genomes.

`--hotspots, -v`Recombination hotspots defined according to the msHOT format. This is defined as a sequence: n (start stop scale)+ where n is the number of hotspots and each hotspot spans [start, stop) where the recombination rate is the background recombination rate times scale. Adjacent hotspots may stop and start at the same position but must otherwise be non-overlapping and specified in ascending order.

#### Structure and migration#

`--structure, -I`Sample from populations with the specified deme structure. The arguments are of the form ‘num_populations n1 n2 … [4N0m]’, specifying the number of populations, the sample configuration, and optionally, the migration rate for a symmetric island model

`--migration-matrix-entry, -m`Sets an entry M[i, j] in the migration matrix to the specified rate. i and j are (1-indexed) population IDs. Multiple options can be specified.

`--migration-matrix, -ma`Sets the migration matrix to the specified value. The entries are in the order M[1,1], M[1, 2], …, M[2, 1],M[2, 2], …, M[N, N], where N is the number of populations.

`--migration-rate-change, -eM`Set the symmetric island model migration rate to x / (npop - 1) at time t

`--migration-matrix-entry-change, -em`Sets an entry M[i, j] in the migration matrix to the specified rate at the specified time. i and j are (1-indexed) population IDs.

`--migration-matrix-change, -ema`Sets the migration matrix to the specified value at time t.The entries are in the order M[1,1], M[1, 2], …, M[2, 1],M[2, 2], …, M[N, N], where N is the number of populations.

#### Demography#

`--growth-rate, -G`Set the growth rate to alpha for all populations. See warning above about negative growth rates.

`--population-growth-rate, -g`Set the growth rate to alpha for a specific population. See warning above about negative growth rates.

`--population-size, -n`Set the size of a specific population to size*N0.

`--growth-rate-change, -eG`Set the growth rate for all populations to alpha at time t. See warning above about negative growth rates.

`--population-growth-rate-change, -eg`Set the growth rate for a specific population to alpha at time t. See warning above about negative growth rates.

`--size-change, -eN`Set the population size for all populations to x * N0 at time t

`--population-size-change, -en`Set the population size for a specific population to x * N0 at time t

`--population-split, -ej`Move all lineages in population i to j at time t. Forwards in time, this corresponds to a population split in which lineages in j split into i. All migration rates for population i are set to zero.

`--admixture, -es`Split the specified population into a new population, such that the specified proportion of lineages remains in the population population_id. Forwards in time this corresponds to an admixture event. The new population has ID num_populations + 1. Migration rates to and from the new population are set to 0, and growth rate is 0 and the population size for the new population is N0.

#### Miscellaneous#

`--random-seeds, -seeds`Random seeds (must be three integers)

`--precision, -p`Number of values after decimal place to print

If you use msprime in your work, please cite the following paper: Jerome Kelleher, Alison M Etheridge and Gilean McVean (2016), “Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes”, PLoS Comput Biol 12(5): e1004842. doi: 10.1371/journal.pcbi.1004842