Python API

Python API#

This page provides detailed documentation for the methods and classes available in pyslim. Here is a quick reference to some of the methods:

`recapitate`(ts[, ancestral_Ne, keep_vacant])	Returns a "recapitated" tree sequence, by using msprime to run a coalescent simulation from the "top" of this tree sequence, i.e., allowing any uncoalesced lineages to coalesce.
`annotate`(ts, **kwargs)	Takes a tree sequence (as produced by msprime, for instance), and adds in the information necessary for SLiM to use it as an initial state, filling in mostly default values.
`individuals_alive_at`(ts, time[, stage, ...])	Returns an array giving the IDs of all individuals that are known to be alive at the given time ago.
`individual_ages`(ts)	Returns the ages of all individuals in the tree sequence, extracted from metadata.
`individual_ages_at`(ts, time[, stage, ...])	Returns the ages of each individual at the corresponding time ago, which will be `nan` if the individual is either not born yet or dead.
`individual_parents`(ts)	DEPRECATED: now SLiM records parents directly in the individual table (see for instance ind.parents).
`has_individual_parents`(ts)	DEPRECATED: now SLiM records parents directly in the individual table (see for instance ind.parents).
`slim_time`(ts, time[, stage])	Converts the given "tskit times" (i.e., in units of time before the end of the simulation) to SLiM times (those recorded by SLiM, usually in units of ticks since the start of the simulation).
`convert_alleles`(ts)	Returns a modified tree sequence in which alleles have been replaced by their corresponding nucleotides.
`generate_nucleotides`(ts[, ...])	Returns a modified tree sequence in which mutations have been randomly assigned nucleotides and (optionally) a reference sequence has been randomly generated.
`population_size`(ts, x_bins, y_bins, time_bins)	Calculates the population size in each of the spatial bins defined by grid lines at `x_bins` and `y_bins`, averaged over each of the time intervals separated by `time_bins`.
`default_slim_metadata`(name[, num_chromosomes])	Returns default metadata of type `name`, where `name` is one of "tree_sequence", "edge", "site", "mutation", "mutation_list_entry", "node", "individual", or "population".
`update`(ts)	Update a tree sequence produced by a previous verion of SLiM to the current file version.

Editing or adding to tree sequences#

pyslim provides tools for transforming tree sequences:

pyslim.recapitate(ts, ancestral_Ne=None, *, keep_vacant=False, **kwargs)[source]#

Returns a “recapitated” tree sequence, by using msprime to run a coalescent simulation from the “top” of this tree sequence, i.e., allowing any uncoalesced lineages to coalesce.

To allow recapitation to be done correctly, the nodes of the first generation of the SLiM simulation from whom all samples inherit are still present in the tree sequence, but are not marked as samples. If you simplify the tree sequence before recapitating you must ensure these are not removed, which you do by passing the argument keep_input_roots=True to simplify.

If you specify an ancestral_Ne, then the recapitated portion of the tree sequence will be simulated in a single population with this (diploid) size. In other words, all lineages are moved to a single population of this size (named “ancestral” if this name is not already taken), and coalescence is allowed to happen.

You may control the ancestral demography by passing in a demography argument: see msprime.sim_ancestry().

In general, all defaults are whatever the defaults of msprime.sim_ancestry() are; this includes recombination rate, so that if neither recombination_rate or a recombination_map are provided, there will be no recombination.

Sample flags from vacant nodes will be removed before recapitating: see {meth}`.remove_vacant`. To restore these, use keep_vacant=True. You only need to do this if some individuals are not diploid and you will be loading the tree sequence back into SLiM.

Parameters:

ts (tskit.TreeSequence) – The tree sequence to transform.
ancestral_Ne (float) – If specified, then will simulate from a single ancestral population of this size. It is an error to specify this as well as demography.
keep_vacant (bool) – Whether to restore the sample flags on any vacant sample nodes. Default: False.
kwargs (dict) – Any other arguments to msprime.sim_ancestry().

pyslim.convert_alleles(ts)[source]#

Returns a modified tree sequence in which alleles have been replaced by their corresponding nucleotides. For sites, SLiM-produced tree sequences have “” (the empty string) for the ancestral state at each site; this method will replace this with the corresponding nucleotide from the reference sequence. For mutations, SLiM records the ‘derived state’ as a SLiM mutation ID; this method will this with the nucleotide from the mutation’s metadata.

This operation is not reversible: since SLiM mutation IDs are lost, the tree sequence will not be able to be read back into SLiM.

The main purpose of this method is for output: for instance, this code will produce a VCF file with nucleotide alleles:

nts = pyslim.convert_alleles(ts)
with open('nucs.vcf', 'w') as f:
    nts.write_vcf(f)

This method will produce an error if the tree sequence does not have a valid reference sequence or if any mutations do not have nucleotides: to first generate these, see generate_nucleotides().

Parameters:: ts (tskit.TreeSequence) – The tree sequence to transform.

pyslim.generate_nucleotides(ts, reference_sequence=None, keep=True, seed=None)[source]#

Returns a modified tree sequence in which mutations have been randomly assigned nucleotides and (optionally) a reference sequence has been randomly generated.

If reference_sequence is a string of nucleotides (A, C, G, and T) of length equal to the sequence length, this is used for the reference sequence. If no reference sequence is given, the reference_sequence property of ts is used if present; if not then a sequence of independent and uniformly random nucleotides is generated.

SLiM stores the nucleotide as an integer in the mutation metadata, with -1 meaning “not a nucleotide mutation”. This method assigns nucleotides by stepping through each mutation and picking a random nucleotide uniformly out of the three possible nucleotides that differ from the parental state (i.e., the derived state of the parental mutation, or the ancestral state if the mutation has no parent). If keep=True (the default), the mutations that already have a nucleotide (i.e., an integer 0-3 in metadata) will not be modified.

Technical note: in the case of stacked mutations, the SLiM mutation that determines the nucleotide state of the (tskit) mutation is the one with the largest slim_time attribute. This method tries to assign nucleotides so that each mutation differs from the previous state, but this is not always possible in certain unlikely cases.

Parameters:

ts (tskit.TreeSequence) – The tree sequence to transform.
reference_sequence (bool) – A reference sequence, or None to use an existing reference, or to randomly generate one.
keep (bool) – Whether to leave existing nucleotides in mutations that already have one.
seed (int) – The random seed for generating new alleles.

pyslim.update(ts)[source]#

Update a tree sequence produced by a previous verion of SLiM to the current file version.

Return TreeSequence:: The updated tree sequence.

Summarizing tree sequences#

Additionally, pyslim contains the following methods:

pyslim.individuals_alive_at(ts, time, stage='late', remembered_stage=None, population=None, samples_only=False)[source]#

Returns an array giving the IDs of all individuals that are known to be alive at the given time ago. This is determined using their birth time ago (given by their time attribute) and, for nonWF models, their age attribute (which is equal to their age at the last time they were Remembered). See also {func}`.individual_ages_at`.

In WF models, birth occurs after “early()”, so that individuals are only alive during “late()” for the time step when they have age zero, while in nonWF models, birth occurs before “early()”, so they are alive for both stages.

In both WF and nonWF models, mortality occurs between “early()” and “late()”, so that individuals are last alive during the “early()” stage of the time step of their final age, and if individuals are alive during “late()” they will also be alive during “first()” and “early()” of the next time step. This means it is important to know during which stage individuals were Remembered - for instance, if the call to sim.treeSeqRememberIndividuals() was made during “early()” of a given time step, then those individuals might not have survived until “late()” of that time step. Since SLiM does not record the stage at which individuals were Remembered, you can specify this by setting remembered_stages: it should be the stage during which all calls to sim.treeSeqRememberIndividuals(), as well as to sim.treeSeqOutput(), were made.

Note also that in nonWF models, birth occurs between “first()” and “early()”, so the possible parents in a given time step are those that are alive in “early()” and have age greater than zero, or, equivalently, are alive in “late()” during the previous time step. In WF models, birth occurs after “early()”, so possible parents in a given time step are those that are alive during “early()” of that time step or are alive during “late()” of the previous time step.

Since individuals may be created not during the usual ‘birth’ stage by addSubPop( ), and the stage at which they are created is not stored, results may be unreliable for the first-generation individuals.

Parameters:

ts (tskit.TreeSequence) – A tree sequence.
time (float) – The number of ticks (i.e., time steps) ago.
stage (str) – The stage in the SLiM life cycle that we are inquiring about (either “first”, “early” or “late”; defaults to “late”).
remembered_stage (str) – The stage in the SLiM life cycle during which individuals were Remembered (defaults to the stage the tree sequence was recorded at, stored in metadata).
population (int) – If given, return only individuals in the population(s) with these population ID(s).
samples_only (bool) – Whether to return only individuals who have at least one node marked as samples.

pyslim.individual_ages(ts)[source]#

Returns the ages of all individuals in the tree sequence, extracted from metadata. The result is a array of length equal to the number of individuals, with k-th entry equal to ts.individual(k).metadata["age"].

Returns:: An array of ages of individuals.

pyslim.individual_ages_at(ts, time, stage='late', remembered_stage='late')[source]#

Returns the ages of each individual at the corresponding time ago, which will be nan if the individual is either not born yet or dead. This is computed as the time ago the individual was born (found by the time associated with the the individual’s nodes) minus the time argument; while “death” is inferred from the individual’s age, recorded in metadata. These values are the same as what would be shown in SLiM during the corresponding time step and stage.

Since age increments at the end of each time step, the age is the number of time steps ends the individual has lived through, so if they were born in time step time, then their age will be zero.

In a WF model, this method does not provide any more information than does {func}`.individuals_alive_at`, but for consistency, non-nan ages will be 0 in “late” and 1 in “first” and “early”. See {func}`.individuals_alive_at` for further discussion.

Parameters:

ts (tskit.TreeSequence) – A tree sequence.
time (float) – The reference time ago.
stage (str) – The stage in the SLiM life cycle used to determine who is alive (either “early” or “late”; defaults to “late”).
remembered_stage (str) – The stage in the SLiM life cycle during which individuals were Remembered.

pyslim.individual_parents(ts)[source]#

DEPRECATED: now SLiM records parents directly in the individual table (see for instance ind.parents).

Finds all parent-child relationships in the tree sequence (as far as we can tell). The output will be a two-column array with row [i,j] indicating that individual i is a parent of individual j. See {func}`.has_individual_parents` for exactly which parents are returned.

See {func}`.individuals_alive_at` for further discussion about how this is determined based on when the individuals were Remembered.

Parameters:: ts (tskit.TreeSequence) – A tskit.TreeSequence.
Returns:: An array of individual IDs, with row [i, j] if individual i is a parent of individual j.

pyslim.has_individual_parents(ts)[source]#

DEPRECATED: now SLiM records parents directly in the individual table (see for instance ind.parents).

Finds which individuals have both their parent individuals also present in the tree sequence, as far as we can tell. To do this, we return a boolean array with True for those individuals for which:

all edges terminating in that individual’s nodes are in individuals,
each of the individual’s nodes inherit from a single individual only,
those parental individuals were alive when the individual was born,
the parental individuals account for two whole genomes.

This returns a boolean array indicating for each individual whether all these are true. Note in particular that individuals with only one recorded parent are not counted as “having parents”.

See {func}`.individuals_alive_at` for further discussion about how this is determined based on when the individuals were Remembered.

Parameters:: ts (tskit.TreeSequence) – A tskit.TreeSequence.
Returns:: A boolean array of length equal to targets.

pyslim.population_size(ts, x_bins, y_bins, time_bins, stage='late', remembered_stage=None)[source]#

Calculates the population size in each of the spatial bins defined by grid lines at x_bins and y_bins, averaged over each of the time intervals separated by time_bins. To obtain actual (census) sizes, the tree sequence must contain all individuals alive, e.g., from a SLiM simulation with all individuals permanently remembered.

With nx, ny and nt the number of bins in the x, y and time directions (so, nx is one less than the length of x_bins), returns a 3-d array with dimensions (nx, ny, nt). The (i,j,k)``th element of the array is the average number of individuals with ``x coordinate in the half-open interval [x_bins[i], x_bins[i + 1]) and y coordinate in the half-open interval [y_bins[i], y_bins[i + 1]), averaged across all times in the half-open interval [time_bins[k], time_bins[k + 1]).

For integer endpoints of the time bins, this average is equivalent to recording the number of indivduals that are alive at each time in the time interval and have location in the relevant location bin, then taking the mean of these recorded population sizes.

Parameters:

ts (tskit.TreeSequence) – The tree sequence to calculate population size from.
x_bins (numpy.ndarray) – The x-coordinates of the boundaries of the location bins.
y_bins (numpy.ndarray) – The y-coordinates of the boundaries of the location bins.
time_bins (numpy.ndarray) – The endpoints of the time bins.
stage (str) – The stage in the SLiM life cycle that the endpoints of the time bins refer to (either “early” or “late”; defaults to “late”).
remembered_stage (str) – The stage in the SLiM life cycle during which individuals were Remembered (defaults to the stage the tree sequence was recorded at, stored in metadata).

Utilities#

pyslim.slim_time(ts, time, stage='late')[source]#

Converts the given “tskit times” (i.e., in units of time before the end of the simulation) to SLiM times (those recorded by SLiM, usually in units of ticks since the start of the simulation). Although the latter are always integers, these will not be if the provided times are not integers.

When the tree sequence is written out, SLiM records the current current tick in the metadata: ts.metadata['SLiM']['tick']. In most cases, the “SLiM time” referred to by a time ago in the tree sequence (i.e., the value that would be reported by community.tick within SLiM at the point in time thus referenced) can be obtained by subtracting that time ago from ts.metadata['SLiM']['tick']. However, in WF models, birth happens between the “early()” and “late()” stages, so if the tree sequence was written out using sim.treeSeqOutput() during “early()” in a WF model, the tree sequence’s times measure time before the last set of individuals are born, i.e., before SLiM time step ts.metadata['SLiM']['tick'] - 1. The same thing applies to the “first” stage for both WF and nonWF models.

In some situations (e.g., mutations added during early() in WF models) this may not return what you expect. See Converting from SLiM time to tskit time and back for more discussion.

Parameters:

ts (tskit.TreeSequence) – A SLiM-compatible TreeSequence.
time (numpy.ndarray) – An array of times to be converted.
stage (str) – The stage of the SLiM life cycle that the SLiM time should be computed for.

pyslim.next_slim_mutation_id(ts)[source]#: Returns the next SLiM mutation ID for this tree sequence. This is useful because if you want to add more mutations to your SLiM tree sequence using msprime.sim_mutations(), you may need to specify the parameter next_id in your msprime.SLiMMutationModel to be larger than any existing mutation IDs. Setting next_id equal to the output of this function will allow the mutated tree sequence to be read in by SLiM. To do this, recall that the “derived state” of SLiM’s mutations are comma-separated strings of mutation IDs; this function just parses all derived states and returns one larger than the largest integer found. It will return an error if it encounters derived states that are not comma-separated strings of integers.

Metadata#

SLiM-specific metadata is made visible to the user by .metadata properties. For instance:

ts.node(4).metadata

{'slim_id': 30, 'is_vacant': [0]}

shows that the fifth node in the tree sequence was given pedigree ID 982740 by SLiM, is not a null genome, and has genome_type zero, which corresponds to an autosome (see below).

Annotation#

These two functions will add default SLiM metadata to a tree sequence (or the underlying tables), which can then be modified and loaded into SLiM.

pyslim.annotate(ts, **kwargs)[source]#

Takes a tree sequence (as produced by msprime, for instance), and adds in the information necessary for SLiM to use it as an initial state, filling in mostly default values. Returns a tskit.TreeSequence.

Parameters:

ts (tskit.TreeSequence) – A tskit.TreeSequence.
model_type (str) – SLiM model type: either “WF” or “nonWF”.
tick (int) – What tick number in SLiM correponds to time=0 in the tree sequence.
cycle (int) – What cycle number in SLiM correponds to time=0 in the tree sequence (default: equal to tick).
stage (int) – What stage in SLiM’s cycle has the tree sequence been written out in (defaults to “early”; must be “early” or “late”).
reference_sequence (str) – A reference sequence of length equal to ts.sequence_length.
annotate_mutations (bool) – Whether to replace mutation metadata with defaults. (If False, the mutation table is unchanged.)

pyslim.annotate_tables(tables, model_type, tick, cycle=None, stage='early', reference_sequence=None, annotate_mutations=True)[source]#: Does the work of annotate(), but modifies the tables in place: so, takes tables as produced by msprime, and makes them look like the tables as output by SLiM. See annotate() for details.

Constants and flags#

pyslim.NUCLEOTIDES = ['A', 'C', 'G', 'T']#

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

These flags are the possible values for node.metadata["genome_type"]:

pyslim.GENOME_TYPE_AUTOSOME = 0#