Time units

Time units#

There are a number of subtle ways that time units can trip up the user; below are explanations of how to deal with most of these topics. At issue is the fact that tree sequences produced by SLiM have times in units of “ticks”, rather than generations. If we query such a tree sequence for its time units we see its return value

ts.time_units

'ticks'

In general this mean it can be conceptually tricky to make sure that time units are consistent across different stages of simulation.

That warning from recapitate about time units#

If you recapitate a tree sequence then you’ve probably seen the warning

TimeUnitsMismatchWarning: The initial_state has time_units=ticks but time is measured
in generations in msprime. This may lead to significant discrepancies between the timescales. 

This is a handy reminder to think about the issue, because it is important. However, if you’re running a single-species WF simulation, then the time units are in generations (unless you’re doing something very creative). If so, you can avoid this warning by setting the time units to “generations”, using the timeUnit parameter to initializeTreeSeq( ). However, this only affects the label in metadata (i.e., the output of ts.time_units, and does not actually change how times are recorded in the tree sequence, so please only do this if you are running a WF simulation (and, keep reading).

Mutation rates with msprime#

Only you really know how much time in the real world corresponds to one spin around SLiM’s life cycle. So, SLiM does not specify what its time units actually are, and refers to them as “ticks”. Previous versions of SLiM called these “generations”, but this is only correct for simulations in which all organisms die every tick, e.g., a single-species WF model. Converting these times into real-world times is usually not too hard: maybe in your simulation one tick = one year, or maybe you need to multiply by a (mean) generation time known from empirical data.

However, it becomes important to know what the time units are when interfacing with msprime. For instance, suppose you’ve done a WF simulation for an organism whose mean generation time in the real world is 30 years, and you want to overlay neutral mutations using msprime. The mutation rate passed to msprime is in units of mutations per unit-of-time-in-the-tree-sequence, so since you did a WF simulation, the units of time in the tree sequence are in generations, and you should give it mutation rate in mutations per generation. As long as your mutation rate is in units of mutations per bp per generation, as is typical, no adjustment is needed, but if it were in different units, such as per year, it would need to be changed. (The rate in units of mutations per year would be 30 times smaller, and so would give you 30 times fewer neutral mutations than you want.)

Coalescence, recombination, and migration rates in recapitation#

Note: this section is only relevant for nonWF SLiM simulations: in a single-species WF simulation, one tick = one generation, so no adjustment is needed.

If you recapitate with msprime, then the same issue occurs for coalescence rates. This is less obvious, because “coalescence rate” is not an argument to msprime; rather, the coalescence rate in a given population is \(1/2N\), where \(N\) is the (effective) size of that population. (The 2 in the denominator is actually the ‘ploidy’ argument to msprime, which defaults to 2.) This is the correct rate, in units of coalescences per generation. Since it’s “per generation”, this means that if you want to recapitate a tree sequence generated by SLiM in which one tick is not one generation, then you need to adjust the effective population sizes accordingly. For instance, suppose as above that we did a nonWF simulation in which one tick = one year, for a species with generation time 30 years, and we want to recapitate with an effective size of 10,000 diploids. An effective size of 10,000 diploids means one coalescence per pair of lineages every 20,000 generations; since one unit of time in the tree sequence equals 1/30th of a generation, this is one coalescence per pair of lineages every 600,000 time units, which we can get msprime to do by asking for an effective population size of 300,000. In other words, if a generation in our tree sequence is \(T\) units of time, then we need to divide the coalescence rate by \(T,\) which we do by multiplying effective population sizes by \(T\).

(A better way of thinking about this could be to say that since the mean time to coalescence is twice the effective population size, actually effective population size is in units of time, rather than individuals.)

To recapitate we need to also pick recombination and (possibly) migration rates. Since these are rates, they are also in units of something “per time”, and so just as for mutation rates, we need to ensure they are in the correct units. Recombination rates in particular come in units of “per generation” (since a Morgan is in units of recombinations per generation), and so a rate of \(r\) Morgans would need to be divided by \(T\) before it’s passed to msprime for recapitation. Since msprime (by default) runs a continuous-time simulation, there is no harm in adjusting time units in msprime this way, but it is confusing.

`TimeUnitsMismatchWarning`#

Above we saw that if we want msprime to simulate coalescence in a population of size 10,000 to recapitate our tree sequence whose times are given in units of 1/30th of a generation, then we need to ask it for a population of (effective) size 300,000. In other words, the “population sizes” of msprime are only in units of individuals (as one would expect a population size to be) if times are being measured in units of generations. For this reason, although it’s not strictly true, sometimes we say that “msprime times are in units of generations”. Also, msprime will produce a warning if it is handed a tree sequence whose time units are not “generations”:

rts = pyslim.recapitate(ts,
            recombination_rate=1e-8,
            ancestral_Ne=200, random_seed=5)

TimeUnitsMismatchWarning: The initial_state has time_units=ticks but time is measured in
generations in msprime. This may lead to significant discrepancies between the timescales.
If you wish to suppress this warning, you can use, e.g., warnings.simplefilter('ignore',
msprime.TimeUnitsMismatchWarning)

Think of this warning as a friendly reminder to pay attention to your time units. To make this warning disappear, the easiest option is to do as the warning suggests and just ask python to ignore it:

import warnings
warnings.simplefilter('ignore', msprime.TimeUnitsMismatchWarning)
rts = pyslim.recapitate(ts,
            recombination_rate=1e-8,
            ancestral_Ne=200, random_seed=5)

To re-label the time units on your tree sequence as being in “generations” (but! only do this if it’s true!), you must convert to a TableCollection and back (since the tables are mutable but the tree sequence is not):

print(f"Time units before: {ts.time_units}")
t = ts.dump_tables()
t.time_units = 'generations'
ts = t.tree_sequence()
print(f"Time units after: {ts.time_units}")

Time units before: ticks
Time units after: generations

Example: calculating generation times in a SLiM simulation#

In a nonWF simulation, the generation time is not something that you get to explicitly set: rather, like population size it is an emergent property of the rules of the simulation. So, it’s important to be able to calculate what the (average) generation time actually is in a given simulation (keeping in mind that the answer may vary across time and space). This is not really a pyslim operation at all, but since it’s important to the considerations above, here’s an easy way to do this. The definition of generation time appropriate for comparison to a coalescent simulation is “average amount of time between birth of parent and birth of child”. (Note that this is the birth of the average child in some sense, not the average time to each parent’s first child!) Therefore, an easy and convenient way to measure this is to record for each individual their parents’ ages at the times of their births, and then take the average across all individuals alive at a given time. (This is only one possible choice, and other choices are usually but not always equivalent, but a discussion of the options and distinctions would make this tutorial lengthy and confusing.

Here is a script that computes this. In the simulation, females’ fecundity increases with their age, and so we expect “generation time” to be larger than the mean age of individuals alive at a given time. In the script, we store the average of the mean parent ages across all extant individuals every tick using SLiM Individuals’ meanParentAge property, and store the vector in top-level metadata when we write out the tree sequence.

initialize() {
	initializeSLiMModelType("nonWF");
	initializeSex("A");
	initializeTreeSeq();
	initializeMutationRate(2e-8);
	initializeMutationType("m1", 0.5, "f", 0.0);
	initializeGenomicElementType("g1", m1, 1.0);
	initializeGenomicElement(g1, 0, 1e8-1);
	initializeRecombinationRate(1e-8);
	defineConstant("K", 1000);
	defineGlobal("GENTIME", rep(NAN, 100));
}

reproduction(p1, "F") {
	for (k in seqLen(rpois(1, individual.age))) {
		mate = subpop.sampleIndividuals(1, sex="M");
		offspring = subpop.addCrossed(individual, mate);
	}
}

1 early() {
	sim.addSubpop("p1", K);
}

early() {
	p1.fitnessScaling = K / p1.individualCount;
}

1: late() {
	// this should happen *after* mortality (so, in late())
	GENTIME[community.tick - 1] = mean(p1.individuals.meanParentAge);
}

100 late() {
	sim.treeSeqOutput(
		"generation_time.trees",
		metadata=Dictionary("generation_times", GENTIME)
		);
	catn("Done! Mean generation time " + mean(GENTIME[50:99]));
}

%%bash
slim -s 23 generation_time.slim | tail -n1

#WARNING (Species::RunInitializeCallbacks): with tree-sequence recording enabled and a non-zero muta

tion rate, a neutral mutation type was defined and used; this is legal, but usually undesirable, sin

ce neutral mutations can be overlaid later using the tree-sequence information.

Done! Mean generation time 2.49543

(About the warning: we turned on mutations for a good reason; see below.) Now we can look at our empirical calculation of generation time. Note that the first generation individuals have a mean parent age of 0.0, so we expect generation time to go up at first.

gts = tskit.load("generation_time.trees")
gentimes = gts.metadata["SLiM"]["user_metadata"]["generation_times"]

fig, ax = plt.subplots(figsize=(12, 6), dpi=300)
ax.set_xlabel("tick")
ax.set_ylabel("generation time")
ax.plot(np.arange(1, len(gentimes)+1), gentimes);

_images/46c9fa722932f6aad475c529d6d374c99a9e4be4af028e6e2e5efa974c1d7029.png

In the script above we added mutations during the SLiM simulation to do a simple check to verify the meanParentAge property correctly estimates mean generation time. If it does, then the mean sequence divergence calculated from the mutations introduced by SLiM should (approximately) match the mean length of branches in the tree sequence separating two samples multiplied by the expected number of mutations per unit time. (We get the mean branch length using the ts.diversity method with mode="branch"; see the tskit documentation for details.) This involves the generation time because “mutations per unit time” is equal to “mutations per generation” (here, the mutation rate per bp, \(2 \times 10^{-8}\)) divided by “time per generation” (i.e., mean generation time).

slim_diversity = gts.diversity(mode = 'site')
ts_diversity = gts.diversity(mode='branch') * 2e-8 / np.mean(gentimes)

print(f"slim diversity: {slim_diversity}\n"
      f"scaled tree sequence diversity: {ts_diversity}\n"
      f"error: {100 * np.abs(slim_diversity - ts_diversity)/slim_diversity:0.3f}%"
)

slim diversity: 1.5383378037734143e-06
scaled tree sequence diversity: 1.593246393029409e-06
error: 3.569%

Now, to recapitate this simulation in a population with effective size 1000, and using the recombination rate from the SLiM script of \(10^{-8}\) crossovers per generation and \(2 \times 10^{-8}\) mutations per generation, we need to scale \(N_e\) and both rates by the generation time. Furthermore, since we already have mutations up until 100 time units ago, we need to put mutations on only previous to that time.

gentimes = gts.metadata["SLiM"]["user_metadata"]["generation_times"]
gt = np.mean(gentimes[-50:])
recomb_rate = 1e-8 # per generation
Ne = 1000 # generations
mut_rate = 2e-8 # per generation
with warnings.catch_warnings(record=True) as w:
    rts = pyslim.recapitate(gts,
                recombination_rate=recomb_rate / gt,
                ancestral_Ne=Ne * gt,
                random_seed=5)
    mts = msprime.sim_mutations(
               rts,
               rate=mut_rate / gt,
               model=msprime.SLiMMutationModel(type=0),
               keep=True,
               start_time=100,
    )

print(f"Mean genetic diversity is now {mts.diversity():.2e},\n"
      f"  and the expected value under a WF model is "
      f"4 * N * mu = {4 * Ne * mut_rate:.2e}")

Mean genetic diversity is now 7.87e-05,
  and the expected value under a WF model is 4 * N * mu = 8.00e-05

We get a lower value of genetic diversity than expected under a WF model, but that’s expected because both fluctuating population sizes and age-associated variability in fecundity will decrease the effective population size.

Converting from SLiM time to tskit time and back#

Note

This is a nitpicky, document-the-details section. Hopefully, you don’t have to deal with the specifics of converting between tskit and SLiM time, but this page is here for you if you do.

SLiM is a forwards simulator, while the tree sequence format thinks about things retrospectively, and so works with times in units of time ago. Mostly, you don’t have to convert between the two, unless you want to match up information in a tree sequence with information written out by SLiM itself. In other words, SLiM’s time counter measures the number of time steps (“ticks”) since the start of the simulation, and times in the tree sequence record how long before the end of the simulation. However, there are quite a few details, and off-by-one errors are easy to make, so we’ll spell it out here. Hopefully, you can just use pyslim’s methods slim_time(), individuals_alive_at(), and individual_ages_at() without needing to actually do any conversions yourself.

SLiM’s time counter is called the “tick”. The SLiM tick starts at 1, and records which round of the life cycle the simulation is in. However, the order of the life cycle differs between WF and nonWF models: in a WF model, it is first \(\to\) early \(\to\) birth \(\to\) late”, while in a nonWF model, it is first \(\to\) birth \(\to\) early \(\to\) late. Often, the first set of individuals are created in the early() phase of tick 1, and so in a WF model reproduce immediately, in the same tick they were “born”. Parents and offspring cannot have the same birth time in the tree sequence, and so some clever bookkeeping was required. You’ll want to refer to the tables below to see what’s going on. “Time” in a tree sequence is actually time ago, or time before the tree sequence was recorded. To obtain this number, and ensure that offspring cannot have the same birth time-ago in the tree sequence as their parents, SLiM also keeps track of “how many birth phases of the life cycle have happened so far” (the column “# births” in the tables). As the simulation goes along, tskit time ago is recorded as minus one times the number of birth phases so far. (We multiply by -1 because as we move forward in time, we accumulate a negative number of “time-ago” units.) When the tree sequence is output, the current cumulative number of birth phases is added to this, so “tskit time ago” is, equivalently, “how many birth phases happened since this time”. In a nonWF model, the two counters (“tick” and “number of birth phases”) are always in sync; but in a WF model they are not (during first and early). The extra wrinkle this introduces is that the correspondence between “tskit time ago” and “SLiM time” depends on which phase the tree sequence was recorded in.

When the tree sequence is written out, SLiM records the value of its current tick, cycle, and stage, which can be found in the metadata: ts.metadata['SLiM']['tick'], ts.metadata['SLiM']['cycle'], and ts.metadata['SLiM']['stage']. The “SLiM time” referred to by a time in the tree sequence (i.e., the value that would be reported by community.tick within SLiM at the point in time thus referenced) can be obtained by subtracting time from ts.metadata['SLiM']['tick'] and possibly adding or subtracting one depending on the current stage. The fiddly details are worked out by the slim_time() method, which takes a stage argument.

Some of the other methods in pyslim – those that depend on individuals_alive_at() – need you to tell them during which stage the tree sequence was saved with sim.treeSeqOutput, and need this to be the same as the stage that any individuals were saved with sim.treeSeqRememberIndividuals. (This can add another factor of plus or minus one to the calculations.) This argument, remembered_stage, defaults to late(); we recommend that you also default to always Remembering individuals, and saving out the tree sequence, during late() as well, unless you have good reason not to.

Detailed description of conversion#

To help keep all this straight, here are schematics for WF and nonWF models. (To see the nonWF model schematic, click on the “nonWF” tab.)

WF model

For a WF model, the SLiM tick (first column) can be obtained by subtracting the tskit time ago from the SLiM tick at time of output only during the same stage that output occured in. Suppose that we want to know what tick counter in SLiM corresponds to a given tskit time ago. The answer to this depends on what stage, unfortunately. Let \(k\) be the value of the tick counter, \(n\) be the total number of ticks, i.e., the value of ts.metadata["SLiM"]["tick"], \(t\) be the tskit time, \(x=1\) if the stage is first() or early() and 0 otherwise, and \(y=1\) if the tree sequence was output in late(). Then, as can be verified from the table below, \(k + t = n + x + y - 1\). The right-hand columns show \(t\), i.e., what the tskit time ago is at the corresponding point in the SLiM simulation.

tick	stage	# births		`first` output	`early` output	`late` output
1	first	0	\(\leftarrow\) add subpops	n-1	n-1	n
1	early	0		n-1	n-1	n
1	birth	1		n-2	n-2	n-1
1	late	1		n-2	n-2	n-1
2	first	1		n-2	n-2	n-1
2	early	1		n-2	n-2	n-1
2	birth	2		n-3	n-3	n-2
2	late	2		n-3	n-3	n-2
3	first	2		n-3	n-3	n-2
3	early	2		n-3	n-3	n-2
3	birth	3		n-4	n-4	n-3
3	late	3		n-4	n-4	n-3
\(\downarrow\)	\(\cdots\)	\(\downarrow\)		\(\uparrow\)	\(\uparrow\)	\(\uparrow\)
n-2	first	n-3		2	2	2
n-2	early	n-3		2	2	2
n-2	birth	n-2		1	1	2
n-2	late	n-2		1	1	2
n-1	first	n-2		1	1	2
n-1	early	n-2		1	1	2
n-1	birth	n-1		0	0	1
n-1	late	n-1		0	0	1
n	first	n-1	treeSeqOutput \(\to\)	0	0	1
n	early	n-1		treeSeqOutput \(\to\)	0	1
n	birth	n				0
n	late	n			treeSeqOutput \(\to\)	0

nonWF model

Note that for nonWF models the SLiM time (first column) can always be obtained by subtracting the tskit time ago from the SLiM time at time of output. Suppose that we want to know what tick counter in SLiM corresponds to a given tskit time ago. The answer to this depends on what stage, unfortunately. Let \(k\) be the value of the tick counter, \(n\) be the total number of ticks, i.e., the value of ts.metadata["SLiM"]["tick"], let \(t\) be the tskit time, \(x=1\) if the stage is “first” and 0 otherwise, and \(y=1\) if the tree sequence was output in early() or late(). Then, as can be verified from the table below, \(k + t = n + x + y - 1\). The right-hand columns show \(t\), i.e., what the tskit time ago is at the corresponding point in the SLiM simulation.

tick	stage	# births		first output	early output	late output
1	first	0	\(\leftarrow\) add subpops	n-1	n	n
1	birth	1		n-2	n-1	n-1
1	early	1	\(\leftarrow\) add subpops	n-2	n-1	n-1
1	late	1		n-2	n-1	n-1
2	first	1		n-2	n-1	n-1
2	birth	2		n-3	n-2	n-2
2	early	2		n-3	n-2	n-2
2	late	2		n-3	n-2	n-2
3	first	2		n-3	n-2	n-2
3	birth	3		n-4	n-3	n-3
3	early	3		n-4	n-3	n-3
3	late	3		n-4	n-3	n-3
\(\downarrow\)	\(\cdots\)	\(\downarrow\)		\(\uparrow\)	\(\uparrow\)	\(\uparrow\)
n-2	first	n-3		2	3	3
n-2	birth	n-2		1	2	2
n-2	early	n-2		1	2	2
n-2	late	n-2		1	2	2
n-1	first	n-2		1	2	2
n-1	birth	n-1		0	1	1
n-1	early	n-1		0	1	1
n-1	late	n-1		0	1	1
n	first	n-1	treeSeqOutput \(\to\)	0	1	1
n	birth	n			0	0
n	early	n		treeSeqOutput \(\to\)	0	0
n	late	n			treeSeqOutput \(\to\)	0