Python API#

This page provides detailed documentation for the methods and classes available in pyslim. Here is a quick reference to some of the methods:

recapitate(ts[, ancestral_Ne, keep_vacant])

Returns a "recapitated" tree sequence, by using msprime to run a coalescent simulation from the "top" of this tree sequence, i.e., allowing any uncoalesced lineages to coalesce.

annotate(ts, **kwargs)

Takes a tree sequence (as produced by msprime, for instance), and adds in the information necessary for SLiM to use it as an initial state, filling in mostly default values.

individuals_alive_at(ts, time[, stage, ...])

Returns an array giving the IDs of all individuals that are known to be alive at the given time ago.

individual_ages(ts)

Returns the ages of all individuals in the tree sequence, extracted from metadata.

individual_ages_at(ts, time[, stage, ...])

Returns the ages of each individual at the corresponding time ago, which will be nan if the individual is either not born yet or dead.

remove_vacant(ts)

Remove sample flags from all vacant nodes.

restore_vacant(ts)

The inverse of remove_vacant().

has_vacant_samples(ts)

Returns whether the tree sequence has vacant sample nodes.

node_is_vacant(ts, node)

Returns True if the node is labelled as vacant in the node's metadata recorded by SLiM.

slim_time(ts, time[, stage])

Converts the given "tskit times" (i.e., in units of time before the end of the simulation) to SLiM times (those recorded by SLiM, usually in units of ticks since the start of the simulation).

convert_alleles(ts)

Returns a modified tree sequence in which alleles have been replaced by their corresponding nucleotides.

generate_nucleotides(ts[, ...])

Returns a modified tree sequence in which mutations have been randomly assigned nucleotides and (optionally) a reference sequence has been randomly generated.

population_size(ts, x_bins, y_bins, time_bins)

Calculates the population size in each of the spatial bins defined by grid lines at x_bins and y_bins, averaged over each of the time intervals separated by time_bins.

set_slim_state(ts[, time, individuals])

Changes the information stored in metadata so that when loaded into SLiM, the current time will be time units ago (i.e., at tskit time time) and the alive individuals will be individuals.

default_slim_metadata(name[, num_chromosomes])

Returns default metadata of type name, where name is one of "tree_sequence", "edge", "site", "mutation", "mutation_list_entry", "node", "individual", or "population".

update(ts)

Update a tree sequence produced by a previous version of SLiM to the current file version.

Editing or adding to tree sequences#

pyslim provides tools for transforming tree sequences:

pyslim.recapitate(ts, ancestral_Ne=None, *, keep_vacant=False, **kwargs)[source]#

Returns a “recapitated” tree sequence, by using msprime to run a coalescent simulation from the “top” of this tree sequence, i.e., allowing any uncoalesced lineages to coalesce.

To allow recapitation to be done correctly, the nodes of the first generation of the SLiM simulation from whom all samples inherit are still present in the tree sequence, but are not marked as samples. If you simplify the tree sequence before recapitating you must ensure these are not removed, which you do by passing the argument keep_input_roots=True to simplify.

If you specify an ancestral_Ne, then the recapitated portion of the tree sequence will be simulated in a single population with this (diploid) size. In other words, all lineages are moved to a single population of this size (named “ancestral” if this name is not already taken), and coalescence is allowed to happen.

You may control the ancestral demography by passing in a demography argument: see msprime.sim_ancestry().

In general, all defaults are whatever the defaults of msprime.sim_ancestry() are; this includes recombination rate, so that if neither recombination_rate or a recombination_map are provided, there will be no recombination.

Sample flags from vacant nodes will be removed before recapitating: see remove_vacant(). To restore these, use keep_vacant=True. You only need to do this if some individuals are not diploid and you will be loading the tree sequence back into SLiM.

Parameters:
  • ts (tskit.TreeSequence) – The tree sequence to transform.

  • ancestral_Ne (float) – If specified, then will simulate from a single ancestral population of this size. It is an error to specify this as well as demography.

  • keep_vacant (bool) – Whether to restore the sample flags on any vacant sample nodes. Default: False.

  • kwargs (dict) – Any other arguments to msprime.sim_ancestry().

pyslim.convert_alleles(ts)[source]#

Returns a modified tree sequence in which alleles have been replaced by their corresponding nucleotides. For sites, SLiM-produced tree sequences have “” (the empty string) for the ancestral state at each site; this method will replace this with the corresponding nucleotide from the reference sequence. For mutations, SLiM records the ‘derived state’ as a SLiM mutation ID; this method will this with the nucleotide from the mutation’s metadata.

This operation is not reversible: since SLiM mutation IDs are lost, the tree sequence will not be able to be read back into SLiM.

The main purpose of this method is for output: for instance, this code will produce a VCF file with nucleotide alleles:

nts = pyslim.convert_alleles(ts)
with open('nucs.vcf', 'w') as f:
    nts.write_vcf(f)

This method will produce an error if the tree sequence does not have a valid reference sequence or if any mutations do not have nucleotides: to first generate these, see generate_nucleotides().

Parameters:

ts (tskit.TreeSequence) – The tree sequence to transform.

pyslim.generate_nucleotides(ts, reference_sequence=None, keep=True, seed=None)[source]#

Returns a modified tree sequence in which mutations have been randomly assigned nucleotides and (optionally) a reference sequence has been randomly generated.

If reference_sequence is a string of nucleotides (A, C, G, and T) of length equal to the sequence length, this is used for the reference sequence. If no reference sequence is given, the reference_sequence property of ts is used if present; if not then a sequence of independent and uniformly random nucleotides is generated.

SLiM stores the nucleotide as an integer in the mutation metadata, with -1 meaning “not a nucleotide mutation”. This method assigns nucleotides by stepping through each mutation and picking a random nucleotide uniformly out of the three possible nucleotides that differ from the parental state (i.e., the derived state of the parental mutation, or the ancestral state if the mutation has no parent). If keep=True (the default), the mutations that already have a nucleotide (i.e., an integer 0-3 in metadata) will not be modified.

Technical note: in the case of stacked mutations, the SLiM mutation that determines the nucleotide state of the (tskit) mutation is the one with the largest slim_time attribute. This method tries to assign nucleotides so that each mutation differs from the previous state, but this is not always possible in certain unlikely cases.

Parameters:
  • ts (tskit.TreeSequence) – The tree sequence to transform.

  • reference_sequence (bool) – A reference sequence, or None to use an existing reference, or to randomly generate one.

  • keep (bool) – Whether to leave existing nucleotides in mutations that already have one.

  • seed (int) – The random seed for generating new alleles.

pyslim.update(ts)[source]#

Update a tree sequence produced by a previous version of SLiM to the current file version.

Return TreeSequence:

The updated tree sequence.

pyslim.remove_vacant(ts)[source]#

Remove sample flags from all vacant nodes.

In SLiM’s internal state, there are two nodes per individual, even on chromosomes for which the individual is not diploid. Thus, some sample nodes may be placeholders, not actually representing physical haplosomes, and are called “vacant” nodes. In the tree sequence these nodes have no ancestry, and thus have ‘missing’ data; however, their presence can cause problems with methods not designed for missing data.

This method returns a copy of the tree sequence for which all vacant nodes have the sample flag removed; these nodes will thus not affect recapitate(), tree sequence statistics, etc. This also sets the NODE_IS_VACANT_SAMPLE flag on these nodes, so they can be restored with restore_vacant(). You probably don’t want to run this method again on the output, since these flags will be overwritten and restore_vacant() will no longer work as expected.

Parameters:

ts (tskit.TreeSequence) – The tree sequence.

pyslim.restore_vacant(ts)[source]#

The inverse of remove_vacant().

This method returns a copy of the tree sequence for which all nodes with the NODE_IS_VACANT_SAMPLE flag set have their sample flags set also. If these nodes are not vacant, an error will be raised. This is intended to be an inverse of remove_vacant().

Parameters:

ts (tskit.TreeSequence) – The tree sequence.

pyslim.set_slim_state(ts, time=0, individuals=None)[source]#

Changes the information stored in metadata so that when loaded into SLiM, the current time will be time units ago (i.e., at tskit time time) and the alive individuals will be individuals. The time in SLiM (i.e., the value of the tick counter) will also be time units earlier.

Appropriate individuals might be found, for instance, with pyslim.individuals_alive_at(ts, time).

To do this, the “tick” in top-level metadata is changed; individual flags are reset so that only individuals have the INDIVIDUAL_ALIVE flag set; and tskit times have time subtracted from them (so they measure time ago relative to the new tick).

As a result, some individuals in the tree sequence may have negative times, i.e., have lived “in the future”. Since these individuals will not be “alive”, they will be ignored by SLiM, even when their birth times arrive, so that any future history will also be present, unchanged, in the tree sequence that results from additional simulation. If additional simulation is performed then contradictions may arise: for instance, if one of the individuals had offspring for an additional ten ticks past time in the original simulation, but in the new simulation they die immediately, then the resulting tree sequence will still be valid but the history as recorded by SLiM in metadata will not make sense. To avoid such issues, one can Remember and then kill the individuals to be later used in this way.

This also subtracts time from the value of “cycle” in top-level metadata; if this is not desired (or the value in “tick” needs to be adjusted), change the metadata directly.

Parameters:
  • ts (tskit.TreeSequence) – A SLiM-compatible TreeSequence.

  • time (int) – The number of time units (ticks) into the past to shift times. (Default: zero; can be positive or negative.)

  • individuals (np.ndarray) – An array of the tskit IDs of the individuals that should be marked as alive (all others will be not alive). (Default: leave unchanged.)

Summarizing tree sequences#

Additionally, pyslim contains the following methods:

pyslim.individuals_alive_at(ts, time, stage='late', remembered_stage=None, population=None, samples_only=False)[source]#

Returns an array giving the IDs of all individuals that are known to be alive at the given time ago. This is determined using their birth time ago (given by their time attribute) and, for nonWF models, their age attribute (which is equal to their age at the last time they were Remembered). See also individual_ages_at().

In WF models, birth occurs after “early()”, so that individuals are only alive during “late()” for the time step when they have age zero, while in nonWF models, birth occurs before “early()”, so they are alive for both stages.

In both WF and nonWF models, mortality occurs between “early()” and “late()”, so that individuals are last alive during the “early()” stage of the time step of their final age, and if individuals are alive during “late()” they will also be alive during “first()” and “early()” of the next time step. This means it is important to know during which stage individuals were Remembered - for instance, if the call to sim.treeSeqRememberIndividuals() was made during “early()” of a given time step, then those individuals might not have survived until “late()” of that time step. Since SLiM does not record the stage at which individuals were Remembered, you can specify this by setting remembered_stages: it should be the stage during which all calls to sim.treeSeqRememberIndividuals(), as well as to sim.treeSeqOutput(), were made.

Note also that in nonWF models, birth occurs between “first()” and “early()”, so the possible parents in a given time step are those that are alive in “early()” and have age greater than zero, or, equivalently, are alive in “late()” during the previous time step. In WF models, birth occurs after “early()”, so possible parents in a given time step are those that are alive during “early()” of that time step or are alive during “late()” of the previous time step.

Since individuals may be created not during the usual ‘birth’ stage by addSubPop( ), and the stage at which they are created is not stored, results may be unreliable for the first-generation individuals.

Parameters:
  • ts (tskit.TreeSequence) – A tree sequence.

  • time (float) – The number of ticks (i.e., time steps) ago.

  • stage (str) – The stage in the SLiM life cycle that we are inquiring about (either “first”, “early” or “late”; defaults to “late”).

  • remembered_stage (str) – The stage in the SLiM life cycle during which individuals were Remembered (defaults to the stage the tree sequence was recorded at, stored in metadata).

  • population (int) – If given, return only individuals in the population(s) with these population ID(s).

  • samples_only (bool) – Whether to return only individuals who have at least one node marked as samples.

pyslim.individual_ages(ts)[source]#

Returns the ages of all individuals in the tree sequence, extracted from metadata. The result is a array of length equal to the number of individuals, with k-th entry equal to ts.individual(k).metadata["age"].

Returns:

An array of ages of individuals.

pyslim.individual_ages_at(ts, time, stage='late', remembered_stage='late')[source]#

Returns the ages of each individual at the corresponding time ago, which will be nan if the individual is either not born yet or dead. This is computed as the time ago the individual was born (found by the time associated with the the individual’s nodes) minus the time argument; while “death” is inferred from the individual’s age, recorded in metadata. These values are the same as what would be shown in SLiM during the corresponding time step and stage.

Since age increments at the end of each time step, the age is the number of time steps ends the individual has lived through, so if they were born in time step time, then their age will be zero.

In a WF model, this method does not provide any more information than does individuals_alive_at(), but for consistency, non-nan ages will be 0 in “late” and 1 in “first” and “early”. See individuals_alive_at() for further discussion.

Parameters:
  • ts (tskit.TreeSequence) – A tree sequence.

  • time (float) – The reference time ago.

  • stage (str) – The stage in the SLiM life cycle used to determine who is alive (either “early” or “late”; defaults to “late”).

  • remembered_stage (str) – The stage in the SLiM life cycle during which individuals were Remembered.

pyslim.individual_parents(ts)[source]#

DEPRECATED: now SLiM records parents directly in the individual table (see for instance ind.parents).

Finds all parent-child relationships in the tree sequence (as far as we can tell). The output will be a two-column array with row [i,j] indicating that individual i is a parent of individual j. See has_individual_parents() for exactly which parents are returned.

See individuals_alive_at() for further discussion about how this is determined based on when the individuals were Remembered.

Parameters:

ts (tskit.TreeSequence) – A tskit.TreeSequence.

Returns:

An array of individual IDs, with row [i, j] if individual i is a parent of individual j.

pyslim.has_individual_parents(ts)[source]#

DEPRECATED: now SLiM records parents directly in the individual table (see for instance ind.parents).

Finds which individuals have both their parent individuals also present in the tree sequence, as far as we can tell. To do this, we return a boolean array with True for those individuals for which:

  • all edges terminating in that individual’s nodes are in individuals,

  • each of the individual’s nodes inherit from a single individual only,

  • those parental individuals were alive when the individual was born,

  • the parental individuals account for two whole genomes.

This returns a boolean array indicating for each individual whether all these are true. Note in particular that individuals with only one recorded parent are not counted as “having parents”.

See individuals_alive_at() for further discussion about how this is determined based on when the individuals were Remembered.

Parameters:

ts (tskit.TreeSequence) – A tskit.TreeSequence.

Returns:

A boolean array of length equal to targets.

pyslim.population_size(ts, x_bins, y_bins, time_bins, stage='late', remembered_stage=None)[source]#

Calculates the population size in each of the spatial bins defined by grid lines at x_bins and y_bins, averaged over each of the time intervals separated by time_bins. To obtain actual (census) sizes, the tree sequence must contain all individuals alive, e.g., from a SLiM simulation with all individuals permanently remembered.

With nx, ny and nt the number of bins in the x, y and time directions (so, nx is one less than the length of x_bins), returns a 3-d array with dimensions (nx, ny, nt). The (i,j,k)``th element of the array is the average number of individuals with ``x coordinate in the half-open interval [x_bins[i], x_bins[i + 1]) and y coordinate in the half-open interval [y_bins[i], y_bins[i + 1]), averaged across all times in the half-open interval [time_bins[k], time_bins[k + 1]).

For integer endpoints of the time bins, this average is equivalent to recording the number of indivduals that are alive at each time in the time interval and have location in the relevant location bin, then taking the mean of these recorded population sizes.

Parameters:
  • ts (tskit.TreeSequence) – The tree sequence to calculate population size from.

  • x_bins (numpy.ndarray) – The x-coordinates of the boundaries of the location bins.

  • y_bins (numpy.ndarray) – The y-coordinates of the boundaries of the location bins.

  • time_bins (numpy.ndarray) – The endpoints of the time bins.

  • stage (str) – The stage in the SLiM life cycle that the endpoints of the time bins refer to (either “early” or “late”; defaults to “late”).

  • remembered_stage (str) – The stage in the SLiM life cycle during which individuals were Remembered (defaults to the stage the tree sequence was recorded at, stored in metadata).

Utilities#

pyslim.slim_time(ts, time, stage='late')[source]#

Converts the given “tskit times” (i.e., in units of time before the end of the simulation) to SLiM times (those recorded by SLiM, usually in units of ticks since the start of the simulation). Although the latter are always integers, these will not be if the provided times are not integers.

When the tree sequence is written out, SLiM records the current current tick in the metadata: ts.metadata['SLiM']['tick']. In most cases, the “SLiM time” referred to by a time ago in the tree sequence (i.e., the value that would be reported by community.tick within SLiM at the point in time thus referenced) can be obtained by subtracting that time ago from ts.metadata['SLiM']['tick']. However, in WF models, birth happens between the “early()” and “late()” stages, so if the tree sequence was written out using sim.treeSeqOutput() during “early()” in a WF model, the tree sequence’s times measure time before the last set of individuals are born, i.e., before SLiM time step ts.metadata['SLiM']['tick'] - 1. The same thing applies to the “first” stage for both WF and nonWF models.

In some situations (e.g., mutations added during early() in WF models) this may not return what you expect. See Converting from SLiM time to tskit time and back for more discussion.

Parameters:
  • ts (tskit.TreeSequence) – A SLiM-compatible TreeSequence.

  • time (numpy.ndarray) – An array of times to be converted.

  • stage (str) – The stage of the SLiM life cycle that the SLiM time should be computed for.

pyslim.next_slim_mutation_id(ts)[source]#

Returns the next SLiM mutation ID for this tree sequence. This is useful because if you want to add more mutations to your SLiM tree sequence using msprime.sim_mutations(), you may need to specify the parameter next_id in your msprime.SLiMMutationModel to be larger than any existing mutation IDs. Setting next_id equal to the output of this function will allow the mutated tree sequence to be read in by SLiM. To do this, recall that the “derived state” of SLiM’s mutations are comma-separated strings of mutation IDs; this function just parses all derived states and returns one larger than the largest integer found. It will return an error if it encounters derived states that are not comma-separated strings of integers.

pyslim.has_vacant_samples(ts)[source]#

Returns whether the tree sequence has vacant sample nodes. See remove_vacant().

Parameters:

ts (tskit.TreeSequence) – The tree sequence.

pyslim.node_is_vacant(ts, node)[source]#

Returns True if the node is labelled as vacant in the node’s metadata recorded by SLiM. A vacant node represents a blank placeholder in SLiM: either a “null haplosome” (used as placeholders for sex chromosomes and other chromosome types not of consistent ploidy in all individuals) or simply an unused node for haploid chromosome types. See remove_vacant().

Parameters:

Metadata#

SLiM-specific metadata is made visible to the user by .metadata properties. For instance:

ts.individual(4).metadata
{'pedigree_id': 495999,
 'pedigree_p1': 493739,
 'pedigree_p2': 494784,
 'age': 10,
 'subpopulation': 1,
 'sex': 0,
 'flags': 0}

shows that the fifth individual in the tree sequence was given pedigree ID 495999 by SLiM, had parents with pedigree IDs 493739 and 494784, was age 10 at the time that they died (or the simulation ended), lived in subpopulation 1, was female (because sex matches pyslim.INDIVIDUAL_TYPE_FEMALE, below), and has no additional metadata flags.

Annotation#

These two functions will add default SLiM metadata to a tree sequence (or the underlying tables), which can then be modified and loaded into SLiM.

pyslim.annotate(ts, **kwargs)[source]#

Takes a tree sequence (as produced by msprime, for instance), and adds in the information necessary for SLiM to use it as an initial state, filling in mostly default values. Returns a tskit.TreeSequence.

Parameters:
  • ts (tskit.TreeSequence) – A tskit.TreeSequence.

  • model_type (str) – SLiM model type: either “WF” or “nonWF”.

  • tick (int) – What tick number in SLiM correponds to time=0 in the tree sequence.

  • cycle (int) – What cycle number in SLiM correponds to time=0 in the tree sequence (default: equal to tick).

  • stage (int) – What stage in SLiM’s cycle has the tree sequence been written out in (defaults to “early”; must be “early” or “late”).

  • reference_sequence (str) – A reference sequence of length equal to ts.sequence_length.

  • annotate_mutations (bool) – Whether to replace mutation metadata with defaults. (If False, the mutation table is unchanged.)

pyslim.annotate_tables(tables, model_type, tick, cycle=None, stage='early', reference_sequence=None, annotate_mutations=True)[source]#

Does the work of annotate(), but modifies the tables in place: so, takes tables as produced by msprime, and makes them look like the tables as output by SLiM. See annotate() for details.

Constants and flags#

pyslim.NUCLEOTIDES = ['A', 'C', 'G', 'T']#

Mutation metadata records the nucleotide as an integer, translated to ACGT by indexing this array, so a nucleotide value of k actually means NUCLEOTIDES[k].

This is a flag used in node.flags (see remove_vacant()):

pyslim.NODE_IS_VACANT_SAMPLE = np.uint32(65536)#

This flag exists because SLiM expects certain vacant nodes (=haplosomes) to be marked as samples (those vacant nodes corresponding to alive individuals), but including these as samples causes problems for certain operations in tskit. So, if {meth}`.remove_vacant` is used to remove the ‘sample’ flags from those vacant nodes, this flag is applied so that the sample flag can be easily put back (by {meth}`.restore_vacant`). The flag does not mean simply that this is a vacant sample node (indeed, if this flag is set then the tskit.NODE_IS_SAMPLE flag is not expected to be set).

These are the possible values for individual.metadata["sex"]:

pyslim.INDIVIDUAL_TYPE_HERMAPHRODITE = -1#

A value used in individual metadata (“sex”) to indicate the individual is a hermaphrodite.

pyslim.INDIVIDUAL_TYPE_FEMALE = 0#

A value used in individual metadata (“sex”) to indicate the individual is a male.

pyslim.INDIVIDUAL_TYPE_MALE = 1#

A value used in individual metadata (“sex”) to indicate the individual is a female.

This is a flag used in individual.metadata["flags"] (note: this is not used in individual.flags, which is different!):

pyslim.INDIVIDUAL_FLAG_MIGRATED = np.uint32(2)#

An individual flag indicating the individual is a migrant.

Finally, these are used in individual.flags:

pyslim.INDIVIDUAL_ALIVE = np.uint32(65536)#

Used in individual.flags to denote the individual is alive when the tree sequence was written out.

pyslim.INDIVIDUAL_REMEMBERED = np.uint32(131072)#

Used in individual.flags to denote the individual was marked as “remembered”.

pyslim.INDIVIDUAL_RETAINED = np.uint32(262144)#

Used in individual.flags to denote the individual was marked as “retained”.