Usage#
Quickstart#
To set up an example, let’s (1) simulate a tree sequence with msprime; and (2) infer a tree sequence from the resulting genetic variation data with tsinfer.
orig_ts = msprime.sim_ancestry(
100, recombination_rate=1e-8, population_size=1e3,
sequence_length=1e6, record_full_arg=True,
random_seed=123)
orig_ts = msprime.sim_mutations(orig_ts, rate=1e-8, random_seed=456)
vdata = tsinfer.SampleData.from_tree_sequence(orig_ts)
inferred_ts = tsinfer.infer(vdata)
Now, we can ask: how much of the inferred tree sequence is not “correct”:
in other words, how much of it is not represented in the true tree sequence?
(Here, part of an ancestral node’s span is “correct” if it is ancestral
to the same set of samples.)
We do this with haplotype_arf()
:
dis = tscompare.haplotype_arf(orig_ts, inferred_ts)
print(dis)
Tree sequence comparison:
ARF: 45.31%
TPR: 78.40%
matched_span: (np.float64(411624981.0), np.float64(254642498.0))
total span (ts, other): 752682616.0, 324801639.0
time RMSE: 4.216214207458347