Python API

Python API#

This page provides detailed documentation for the methods and classes available in pyslim. Here is a quick reference to some of the methods:

`recapitate`(ts[, ancestral_Ne])	Returns a "recapitated" tree sequence, by using msprime to run a coalescent simulation from the "top" of this tree sequence, i.e., allowing any uncoalesced lineages to coalesce.
`annotate`(ts, **kwargs)	Takes a tree sequence (as produced by msprime, for instance), and adds in the information necessary for SLiM to use it as an initial state, filling in mostly default values.
`individuals_alive_at`(ts, time[, stage, ...])	Returns an array giving the IDs of all individuals that are known to be alive at the given time ago.
`individual_ages_at`(ts, time[, stage, ...])	Returns the ages of each individual at the corresponding time ago, which will be `nan` if the individual is either not born yet or dead.
`slim_time`(ts, time[, stage])	Converts the given "tskit times" (i.e., in units of time before the end of the simulation) to SLiM times (those recorded by SLiM, usually in units of ticks since the start of the simulation).
`convert_alleles`(ts)	Returns a modified tree sequence in which alleles have been replaced by their corresponding nucleotides.
`generate_nucleotides`(ts[, ...])	Returns a modified tree sequence in which mutations have been randomly assigned nucleotides and (optionally) a reference sequence has been randomly generated.
`population_size`(ts, x_bins, y_bins, time_bins)	Calculates the population size in each of the spatial bins defined by grid lines at `x_bins` and `y_bins`, averaged over each of the time intervals separated by `time_bins`.
`default_slim_metadata`(name)	Returns default metadata of type `name`, where `name` is one of "tree_sequence", "edge", "site", "mutation", "mutation_list_entry", "node", "individual", or "population".
`update`(ts)	Update a tree sequence produced by a previous verion of SLiM to the current file version.

Editing or adding to tree sequences#

pyslim provides tools for transforming tree sequences:

pyslim.recapitate(ts, ancestral_Ne=None, **kwargs)[source]#

Returns a “recapitated” tree sequence, by using msprime to run a coalescent simulation from the “top” of this tree sequence, i.e., allowing any uncoalesced lineages to coalesce.

To allow recapitation to be done correctly, the nodes of the first generation of the SLiM simulation from whom all samples inherit are still present in the tree sequence, but are not marked as samples. If you simplify the tree sequence before recapitating you must ensure these are not removed, which you do by passing the argument keep_input_roots=True to simplify.

If you specify an ancestral_Ne, then the recapitated portion of the tree sequence will be simulated in a single population with this (diploid) size. In other words, all lineages are moved to a single population of this size (named “ancestral” if this name is not already taken), and coalescence is allowed to happen.

You may control the ancestral demography by passing in a demography argument: see msprime.sim_ancestry().

In general, all defaults are whatever the defaults of {meth}`msprime.sim_ancestry` are; this includes recombination rate, so that if neither recombination_rate or a recombination_map are provided, there will be no recombination.

Parameters:

ts (tskit.TreeSequence) – The tree sequence to transform.
ancestral_Ne (float) – If specified, then will simulate from a single ancestral population of this size. It is an error to specify this as well as demography.
kwargs (dict) – Any other arguments to msprime.sim_ancestry().

pyslim.convert_alleles(ts)[source]#

Returns a modified tree sequence in which alleles have been replaced by their corresponding nucleotides. For sites, SLiM-produced tree sequences have “” (the empty string) for the ancestral state at each site; this method will replace this with the corresponding nucleotide from the reference sequence. For mutations, SLiM records the ‘derived state’ as a SLiM mutation ID; this method will this with the nucleotide from the mutation’s metadata.

This operation is not reversible: since SLiM mutation IDs are lost, the tree sequence will not be able to be read back into SLiM.

The main purpose of this method is for output: for instance, this code will produce a VCF file with nucleotide alleles:

nts = pyslim.convert_alleles(ts)
with open('nucs.vcf', 'w') as f:
    nts.write_vcf(f)

This method will produce an error if the tree sequence does not have a valid reference sequence or if any mutations do not have nucleotides: to first generate these, see generate_nucleotides().

Parameters:: ts (tskit.TreeSequence) – The tree sequence to transform.

pyslim.generate_nucleotides(ts, reference_sequence=None, keep=True, seed=None)[source]#

Returns a modified tree sequence in which mutations have been randomly assigned nucleotides and (optionally) a reference sequence has been randomly generated.

If reference_sequence is a string of nucleotides (A, C, G, and T) of length equal to the sequence length, this is used for the reference sequence. If no reference sequence is given, the reference_sequence property of ts is used if present; if not then a sequence of independent and uniformly random nucleotides is generated.

SLiM stores the nucleotide as an integer in the mutation metadata, with -1 meaning “not a nucleotide mutation”. This method assigns nucleotides by stepping through each mutation and picking a random nucleotide uniformly out of the three possible nucleotides that differ from the parental state (i.e., the derived state of the parental mutation, or the ancestral state if the mutation has no parent). If keep=True (the default), the mutations that already have a nucleotide (i.e., an integer 0-3 in metadata) will not be modified.

Technical note: in the case of stacked mutations, the SLiM mutation that determines the nucleotide state of the (tskit) mutation is the one with the largest slim_time attribute. This method tries to assign nucleotides so that each mutation differs from the previous state, but this is not always possible in certain unlikely cases.

Parameters:

ts (tskit.TreeSequence) – The tree sequence to transform.
reference_sequence (bool) – A reference sequence, or None to use an existing reference, or to randomly generate one.
keep (bool) – Whether to leave existing nucleotides in mutations that already have one.
seed (int) – The random seed for generating new alleles.

pyslim.update(ts)[source]#

Update a tree sequence produced by a previous verion of SLiM to the current file version.

Return TreeSequence:: The updated tree sequence.

Summarizing tree sequences#

Additionally, pyslim contains the following methods:

pyslim.individuals_alive_at(ts, time, stage='late', remembered_stage=None, population=None, samples_only=False)[source]#

Returns an array giving the IDs of all individuals that are known to be alive at the given time ago. This is determined using their birth time ago (given by their time attribute) and, for nonWF models, their age attribute (which is equal to their age at the last time they were Remembered). See also {func}`.individual_ages_at`.

In WF models, birth occurs after “early()”, so that individuals are only alive during “late()” for the time step when they have age zero, while in nonWF models, birth occurs before “early()”, so they are alive for both stages.

In both WF and nonWF models, mortality occurs between “early()” and “late()”, so that individuals are last alive during the “early()” stage of the time step of their final age, and if individuals are alive during “late()” they will also be alive during “first()” and “early()” of the next time step. This means it is important to know during which stage individuals were Remembered - for instance, if the call to sim.treeSeqRememberIndividuals() was made during “early()” of a given time step, then those individuals might not have survived until “late()” of that time step. Since SLiM does not record the stage at which individuals were Remembered, you can specify this by setting remembered_stages: it should be the stage during which all calls to sim.treeSeqRememberIndividuals(), as well as to sim.treeSeqOutput(), were made.

Note also that in nonWF models, birth occurs between “first()” and “early()”, so the possible parents in a given time step are those that are alive in “early()” and have age greater than zero, or, equivalently, are alive in “late()” during the previous time step. In WF models, birth occurs after “early()”, so possible parents in a given time step are those that are alive during “early()” of that time step or are alive during “late()” of the previous time step.

Since individuals may be created not during the usual ‘birth’ stage by addSubPop( ), and the stage at which they are created is not stored, results may be unreliable for the first-generation individuals.

Parameters:

ts (tskit.TreeSequence) – A tree sequence.
time (float) – The number of ticks (i.e., time steps) ago.
stage (str) – The stage in the SLiM life cycle that we are inquiring about (either “first”, “early” or “late”; defaults to “late”).
remembered_stage (str) – The stage in the SLiM life cycle during which individuals were Remembered (defaults to the stage the tree sequence was recorded at, stored in metadata).
population (int) – If given, return only individuals in the population(s) with these population ID(s).
samples_only (bool) – Whether to return only individuals who have at least one node marked as samples.

pyslim.individual_ages_at(ts, time, stage='late', remembered_stage='late')[source]#

Returns the ages of each individual at the corresponding time ago, which will be nan if the individual is either not born yet or dead. This is computed as the time ago the individual was born (found by the time associated with the the individual’s nodes) minus the time argument; while “death” is inferred from the individual’s age, recorded in metadata. These values are the same as what would be shown in SLiM during the corresponding time step and stage.

Since age increments at the end of each time step, the age is the number of time steps ends the individual has lived through, so if they were born in time step time, then their age will be zero.

In a WF model, this method does not provide any more information than does {func}`.individuals_alive_at`, but for consistency, non-nan ages will be 0 in “late” and 1 in “first” and “early”. See {func}`.individuals_alive_at` for further discussion.

Parameters:

ts (tskit.TreeSequence) – A tree sequence.
time (float) – The reference time ago.
stage (str) – The stage in the SLiM life cycle used to determine who is alive (either “early” or “late”; defaults to “late”).
remembered_stage (str) – The stage in the SLiM life cycle during which individuals were Remembered.

pyslim.slim_time(ts, time, stage='late')[source]#

Converts the given “tskit times” (i.e., in units of time before the end of the simulation) to SLiM times (those recorded by SLiM, usually in units of ticks since the start of the simulation). Although the latter are always integers, these will not be if the provided times are not integers.

When the tree sequence is written out, SLiM records the current current tick in the metadata: ts.metadata['SLiM']['tick']. In most cases, the “SLiM time” referred to by a time ago in the tree sequence (i.e., the value that would be reported by community.tick within SLiM at the point in time thus referenced) can be obtained by subtracting that time ago from ts.metadata['SLiM']['tick']. However, in WF models, birth happens between the “early()” and “late()” stages, so if the tree sequence was written out using sim.treeSeqOutput() during “early()” in a WF model, the tree sequence’s times measure time before the last set of individuals are born, i.e., before SLiM time step ts.metadata['SLiM']['tick'] - 1. The same thing applies to the “first” stage for both WF and nonWF models.

In some situations (e.g., mutations added during early() in WF models) this may not return what you expect. See Converting from SLiM time to tskit time and back for more discussion.

Parameters:

ts (tskit.TreeSequence) – A SLiM-compatible TreeSequence.
time (numpy.ndarray) – An array of times to be converted.
stage (str) – The stage of the SLiM life cycle that the SLiM time should be computed for.

pyslim.population_size(ts, x_bins, y_bins, time_bins, stage='late', remembered_stage=None)[source]#

Calculates the population size in each of the spatial bins defined by grid lines at x_bins and y_bins, averaged over each of the time intervals separated by time_bins. To obtain actual (census) sizes, the tree sequence must contain all individuals alive, e.g., from a SLiM simulation with all individuals permanently remembered.

With nx, ny and nt the number of bins in the x, y and time directions (so, nx is one less than the length of x_bins), returns a 3-d array with dimensions (nx, ny, nt). The (i,j,k)``th element of the array is the average number of individuals with ``x coordinate in the half-open interval [x_bins[i], x_bins[i + 1]) and y coordinate in the half-open interval [y_bins[i], y_bins[i + 1]), averaged across all times in the half-open interval [time_bins[k], time_bins[k + 1]).

For integer endpoints of the time bins, this average is equivalent to recording the number of indivduals that are alive at each time in the time interval and have location in the relevant location bin, then taking the mean of these recorded population sizes.

Parameters:

ts (tskit.TreeSequence) – The tree sequence to calculate population size from.
x_bins (numpy.ndarray) – The x-coordinates of the boundaries of the location bins.
y_bins (numpy.ndarray) – The y-coordinates of the boundaries of the location bins.
time_bins (numpy.ndarray) – The endpoints of the time bins.
stage (str) – The stage in the SLiM life cycle that the endpoints of the time bins refer to (either “early” or “late”; defaults to “late”).
remembered_stage (str) – The stage in the SLiM life cycle during which individuals were Remembered (defaults to the stage the tree sequence was recorded at, stored in metadata).

Metadata#

SLiM-specific metadata is made visible to the user by .metadata properties. For instance:

ts.node(4).metadata

{'slim_id': 30, 'is_vacant': [0]}

shows that the fifth node in the tree sequence was given pedigree ID 982740 by SLiM, is not a null genome, and has genome_type zero, which corresponds to an autosome (see below).

Annotation#

These two functions will add default SLiM metadata to a tree sequence (or the underlying tables), which can then be modified and loaded into SLiM.

Constants and flags#

These flags are the possible values for node.metadata["genome_type"]:

These flags are the possible values for individual.metadata["sex"]:

This is a flag used in individual.metadata["flags"]:

Finally, these are used in individual.flags:

Python API

Contents

Python API#

Editing or adding to tree sequences#

Summarizing tree sequences#

Metadata#

Annotation#

Constants and flags#