Python API

This page provides detailed documentation for the methods and classes available in pyslim.

Methods

pyslim provides tools for transforming tree sequences:

recapitate(ts[, ancestral_Ne])

Returns a “recapitated” tree sequence, by using msprime to run a coalescent simulation from the “top” of this tree sequence, i.e., allowing any uncoalesced lineages to coalesce.

convert_alleles(ts)

Returns a modified tree sequence in which alleles have been replaced by their corresponding nucleotides.

generate_nucleotides(ts[, …])

Returns a modified tree sequence in which mutations have been randomly assigned nucleotides and (optionally) a reference sequence has been randomly generated.

Additions to the tree sequence

The tskit.TreeSequence class represents a sequence of trees that describes the genealogical history of a population. Here, we describe pyslim’s additions to this class.

The SLiM Tree Sequence

class pyslim.SlimTreeSequence[source]

This is just like a tskit.TreeSequence, with a few more properties and methods, including:

  • slim_generation - the SLiM “generation” at the end of the simulation

  • reference_sequence - if using a nucleotide model, the reference sequence

  • individual_locations - numpy array of individual locations

  • individual_ages - numpy array of individiual ages

  • individual_times - numpy array of how long ago each individual was born

  • individual_populations - numpy array of individual’s populations

All mutable properties of individuals (e.g., age) are as they were last recorded in the tree sequence: either the last time they were Remembered, or at the end of the simulation, if they are still alive then.

You can create a SlimTreeSequence using one of

Variables
  • reference_sequence (string) – None, or an string of length equal to the sequence length that gives the entire reference sequence for nucleotide models.

  • legacy_metadata – Whether this tree sequence returns metadata in objects (as in older versions of pyslim) rather than dicts: see the documentation. This option is deprecated and will disappear at some point.

dump(path, **kwargs)[source]

Dumps the tree sequence to the path specified. This is mostly just a wrapper for tskit.TreeSequence.dump(), but also writes out the reference sequence.

Parameters
  • path (str) – The file path to write the TreeSequence to.

  • kwargs – Additional keyword args to pass to tskit.TreeSequence.dump

first_generation_individuals()[source]

Returns the IDs of the individuals remembered as part of the first SLiM generation, as determined by their flags.

Warning

This method is deprecated, because from SLiM version 3.5 the first generation individuals are no longer marked as such: only tree sequences from older versions of SLiM will have these individuals.

has_individual_parents()[source]

Finds which individuals have both their parent individuals also present in the tree sequence, as far as we can tell. To do this, we return a boolean array with True for those individuals for which:

  • all edges terminating in that individual’s nodes are in individuals,

  • each of the individual’s nodes inherit from a single individual only,

  • those parental individuals were alive when the individual was born,

  • the parental individuals account for two whole genomes.

This returns a boolean array indicating for each individual whether all these are true. Note in particular that individuals with only one recorded parent are not counted as “having parents”.

See individuals_alive_at() for further discussion about how this is determined based on when the individuals were Remembered.

Returns

A boolean array of length equal to targets.

individual(id_)[source]

Returns the individual whose ID is given by id_, as documented in tskit.TreeSequence.individual(), but with additional attributes: time, pedigree_id, age, slim_population, sex, and slim_flags. The time and population properties are extracted from the nodes, and an error will be thrown if the individual’s nodes derive from more than one population or more than one time.

Parameters

id (int) – The ID of the individual (i.e., its index).

individual_ages_at(time, stage='late', remembered_stage='late')[source]

Returns the ages of each individual at the corresponding time ago, which will be nan if the individual is either not born yet or dead. This is computed as the time ago the individual was born (found by the time associated with the the individual’s nodes) minus the time argument; while “death” is inferred from the individual’s age, recorded in metadata. These values are the same as what would be shown in SLiM during the corresponding time step and stage.

Since age increments at the end of each time step, the age is the number of time steps ends the individual has lived through, so if they were born in time step time, then their age will be zero.

In a WF model, this method does not provide any more information than does individuals_alive_at(), but for consistency, non-nan ages will be 0 in “late” and 1 in “early”. See individuals_alive_at() for further discussion.

Parameters
  • time (float) – The reference time ago.

  • stage (str) – The stage in the SLiM life cycle used to determine who is alive (either “early” or “late”; defaults to “late”).

  • remembered_stage (str) – The stage in the SLiM life cycle during which individuals were Remembered.

individual_parents()[source]

Finds all parent-child relationships in the tree sequence (as far as we can tell). The output will be a two-column array with row [i,j] indicating that individual i is a parent of individual j. See has_individual_parents() for exactly which parents are returned.

See individuals_alive_at() for further discussion about how this is determined based on when the individuals were Remembered.

Parameters

individuals (array) – The IDs of individuals that parents should be found for (defaults to all individuals).

Returns

An array of individual IDs, with row [i, j] if individual i is a parent of individual j.

individuals_alive_at(time, stage='late', remembered_stage=None, population=None, samples_only=False)[source]

Returns an array giving the IDs of all individuals that are known to be alive at the given time ago. This is determined using their birth time ago (given by their time attribute) and, for nonWF models, their age attribute (which is equal to their age at the last time they were Remembered). See also individual_ages_at().

In WF models, birth occurs after “early()”, so that individuals are only alive during “late()” for the time step when they have age zero, while in nonWF models, birth occurs before “early()”, so they are alive for both stages.

In both WF and nonWF models, mortality occurs between “early()” and “late()”, so that individuals are last alive during the “early()” stage of the time step of their final age, and if individuals are alive during “late()” they will also be alive during “early()” of the next time step. This means it is important to know during which stage individuals were Remembered - for instance, if the call to sim.treeSeqRememberIndividuals() was made during “early()” of a given time step, then those individuals might not have survived until “late()” of that time step. Since SLiM does not record the stage at which individuals were Remembered, you can specify this by setting remembered_stages: it should be the stage during which all calls to sim.treeSeqRememberIndividuals(), as well as to sim.treeSeqOutput(), were made.

Note also that in nonWF models, birth occurs before “early()”, so the possible parents in a given time step are those that are alive in “early()” and have age greater than zero, or, equivalently, are alive in “late()” during the previous time step. In WF models, birth occurs after “early()”, so possible parents in a given time step are those that are alive during “early()” of that time step or are alive during “late()” of the previous time step.

Parameters
  • time (float) – The number of time steps ago.

  • stage (str) – The stage in the SLiM life cycle that we are inquiring about (either “early” or “late”; defaults to “late”).

  • remembered_stage (str) – The stage in the SLiM life cycle during which individuals were Remembered (defaults to the stage the tree sequence was recorded at, stored in metadata).

  • population (int) – If given, return only individuals in the population(s) with these population ID(s).

  • samples_only (bool) – Whether to return only individuals who have at least one node marked as samples.

classmethod load(path, legacy_metadata=False)[source]

Load a SlimTreeSequence from a .trees file on disk.

Parameters

path (string) – The path to a .trees file.

Rtype SlimTreeSequence

classmethod load_tables(tables, **kwargs)[source]

Creates the SlimTreeSequence defined by the tables.

Parameters
  • tables (TableCollection) – A set of tables, as produced by SLiM or by annotate_defaults().

  • reference_sequence (TableCollection) – An optional string of ACGT giving the reference sequence.

Rtype SlimTreeSequence

mutation(id_)[source]

Returns the mutation whose ID is given by id_, as documented in tskit.TreeSequence.mutation(), but with additional attributes: mutation_type, selection_coeff, population, slim_time, and nucleotide. These are all recorded by SLiM in the metadata.

Parameters

id (int) – The ID of the mutation (i.e., its index).

mutation_at(node, position, time=None)[source]

Finds the mutation present in the genome of node at position, returning -1 if there is no such mutation recorded in the tree sequence. Warning: if node is not actually in the tree sequence (e.g., not ancestral to any samples) at position, then this function will return -1, possibly erroneously. If time is provided, returns the last mutation at position inherited by node that occurred at or before time ago.

Parameters
  • node (int) – The index of a node in the tree sequence.

  • position (float) – A position along the genome.

  • time (int) – The time ago that we want the nucleotide, or None, in which case the time of node is used.

Returns

Index of the mutation in question, or -1 if none.

node(id_)[source]

Returns the node whose ID is given by id_, as documented in tskit.TreeSequence.node(), but with additional attributes: slim_id, is_null, and genome_type. These are all recorded by SLiM in the metadata.

Parameters

id (int) – The ID of the node (i.e., its index).

nucleotide_at(node, position, time=None)[source]

Finds the nucleotide present in the genome of node at position. Warning: if node is not actually in the tree sequence (e.g., not ancestral to any samples) at position, then this function will return the reference sequence nucleotide, possibly erroneously. If time is provided, returns the last nucletide produced by a mutation at position inherited by node that occurred at or before time ago.

Parameters
  • node (int) – The index of a node in the tree sequence.

  • position (float) – A position along the genome.

  • time (int) – The time ago that we want the nucleotide, or None, in which case the time of node is used.

Returns

Index of the nucleotide in NUCLEOTIDES (0=A, 1=C, 2=G, 3=T).

population(id_)[source]

Returns the population whose ID is given by id_, as documented in tskit.TreeSequence.population(), but with additional attributes: slim_id, selfing_fraction, female_cloning_fraction, male_cloning_fraction, sex_ratio, bounds_x0, bounds_x1, bounds_y0, bounds_y1, bounds_z0, bounds_z1, and migration_records. These are all recorded by SLiM in the metadata.

Note that SLiM populations are usually indexed starting from 1, but in tskit from zero, so there may be populations (e.g., with id_=0) that have no metadata and were not created by SLiM.

Parameters

id (int) – The ID of the population (i.e., its index).

recapitate(recombination_rate=None, population_configurations=None, recombination_map=None, **kwargs)[source]

Returns a “recapitated” tree sequence, by using msprime to run a coalescent simulation from the “top” of this tree sequence, i.e., allowing any uncoalesced lineages to coalesce.

To allow recapitation to be done correctly, the nodes of the first generation of the SLiM simulation from whom all samples inherit are still present in the tree sequence, but are not marked as samples. If you simplify the tree sequence before recapitating you must ensure these are not removed, which you do by passing the argument keep_input_roots=True to simplify().

TODO: UPDATE THIS:

Note that Ne is not set automatically, so defaults to 1.0; you probably want to set it explicitly. Similarly, migration is not set up automatically, so that if there are uncoalesced lineages in more than one population, you will need to pass in a migration matrix to allow coalescence. In both cases, remember that population IDs in tskit begin with 0, so that if your SLiM simulation has populations p1 and p2, then the tree sequence will have three populations (but with no nodes assigned to population 0), so that a migration rate of 1.0 between p1 and p2 needs a migration matrix of:

[[0.0, 0.0, 0.0], [0.0, 0.0, 1.0], [0.0, 1.0, 0.0]]

In general, all defaults are whatever the defaults of {ref}`msprime.simulate` are; this includes recombination rate, so that if neither recombination_rate or a recombination_map are provided, there will be no recombination.

Parameters
  • recombination_rate (float) – A (constant) recombination rate, in units of crossovers per nucleotide per unit of time.

  • population_configurations (list) – See msprime.simulate() for this argument; if not provided, each population will have zero growth rate and the same effective population size.

  • recombination_map (:class`msprime.RecombinationMap`) – The recombination map, or None, if recombination_rate is specified.

  • kwargs (dict) – Any other arguments to msprime.simulate().

simplify(*args, **kwargs)[source]

This is a wrapper for tskit.TreeSequence.simplify(). The only difference is that this method returns the derived class SlimTreeSequence.

If you have not yet recapitated your SlimTreeSequence, you probably want to pass keep_input_roots=True, so that recapitation is possible in the future.

Rtype SlimTreeSequence

property slim_provenance

Returns model type, slim generation, and remembered node count from the last entry in the provenance table that is tagged with “program”=”SLiM”.

NOTE: you probably want to use the .metadata property instead.

Rtype ProvenanceMetadata

property slim_provenances

Returns model type, slim generation, and remembered node count from all entries in the provenance table that are tagged with “program”=”SLiM”.

Rtype ProvenanceMetadata

slim_time(time, stage='late')[source]

Converts the given “tskit times” (i.e., in units of time before the end of the simulation) to SLiM times (those recorded by SLiM, usually in units of generations since the start of the simulation). Although the latter are always integers, these will not be if the provided times are not integers.

When the tree sequence is written out, SLiM records the value of its current generation, which can be found in the metadata: ts.metadata['SLiM']['generation']. In most cases, the “SLiM time” referred to by a time ago in the tree sequence (i.e., the value that would be reported by sim.generation within SLiM at the point in time thus referenced) can be obtained by subtracting that time ago from ts.metadata['SLiM']['generation']. However, in WF models, birth happens between the “early()” and “late()” stages, so if the tree sequence was written out using sim.treeSeqOutput() during “early()” in a WF model, the tree sequence’s times measure time before the last set of individuals are born, i.e., before SLiM time step ts.slim_generation - 1.

In some situations (e.g., mutations added during early() in WF models) this may not return what you expect. See Converting from SLiM time to tskit time for more discussion.

Parameters
  • time (array) – An array of times to be converted.

  • stage (string) – The stage of the SLiM life cycle that the SLiM time should be computed for.

Metadata

SLiM-specific metadata is made visible to the user by .metadata properties. For instance:

ts.node(4).metadata
{'slim_id': 982740, 'is_null': False, 'genome_type': 0}

shows that the fifth node in the tree sequence was given pedigree ID 982740 by SLiM, is not a null genome, and has genome_type zero, which corresponds to an autosome (see below).

Annotation

These two functions will add default SLiM metadata to a tree sequence (or the underlying tables), which can then be modified and loaded into SLiM.

pyslim.annotate_defaults(ts, model_type, slim_generation, reference_sequence=None, annotate_mutations=True)[source]

Takes a tree sequence (as produced by msprime, for instance), and adds in the information necessary for SLiM to use it as an initial state, filling in mostly default values. Returns a SlimTreeSequence.

Parameters
  • ts (TreeSequence) – A TreeSequence.

  • model_type (string) – SLiM model type: either “WF” or “nonWF”.

  • slim_generation (int) – What generation number in SLiM correponds to time=0 in the tree sequence.

  • annotate_mutations (bool) – Whether to replace mutation metadata with defaults. (If False, the mutation table is unchanged.)

pyslim.annotate_defaults_tables(tables, model_type, slim_generation, annotate_mutations=True)[source]

Does the work of annotate_defaults(), but modifies the tables in place: so, takes tables as produced by msprime, and makes them look like the tables as output by SLiM. See annotate_defaults() for details.

Provenances

class pyslim.ProvenanceMetadata(model_type, slim_generation, file_version)[source]

DEPRECATED: use the tree sequence metadata (under the ‘SLiM’ key) instead.

pyslim.get_provenance(ts, only_last=True)[source]

Extracts model type and slim generation from either the last entry in the provenance table that is tagged with “program”=”SLiM” (if only_last=True) or a list of all of them (otherwise).

Parameters
  • ts (SlimTreeSequence) – The tree sequence or table collection.

  • only_last (bool) – Whether to return only the last SLiM provenance entry, (otherwise, returns a list of all SLiM entries).

Rtype ProvenanceMetadata

Constants and flags

NUCLEOTIDES == ['A', 'C', 'G', 'T']

Nucleotide states in nucleotide models are encoded as integers (0, 1, 2, 3), so a nucleotide encoded as k refers to nucleotide pyslim.NUCLEOTIDES[k].

These flags are the possible values for node.metadata["genome_type"]:

GENOME_TYPE_AUTOSOME == 0
GENOME_TYPE_X == 1
GENOME_TYPE_Y == 2

These flags are the possible values for individual.metadata["sex"]:

INDIVIDUAL_TYPE_HERMAPHRODITE == -1
INDIVIDUAL_TYPE_FEMALE == 0
INDIVIDUAL_TYPE_MALE == 1

This is a flag used in individual.metadata["flags"]:

INDIVIDUAL_FLAG_MIGRATED == 0x01

Finally, these are used in individual.flags:

INDIVIDUAL_ALIVE == 2**16
INDIVIDUAL_REMEMBERED == 2**17
INDIVIDUAL_RETAINED == 2**18