API Reference#
This lists the detailed reference documentation for the tscompare Python API. The reference documentation aims to be concise, precise and exhaustive;
Summary#
|
Returns the array of "node spans", i.e., the j`th entry gives the total span over which node `j is in the tree sequence. |
|
Calculate the spans over which pairs of nodes in two tree sequences are ancestral to identical sets of samples. |
|
For each node in ts, return the age of a matched node from other. Node |
|
For two tree sequences ts and other, this method returns an object of type |
- tscompare.node_spans(ts, include_missing=False)[source]#
Returns the array of “node spans”, i.e., the j`th entry gives the total span over which node `j is in the tree sequence. Sample nodes that are isolated are “missing data”; inclusion of these spans are controlled by include_missing. (If include_missing is True then the span of each sample is always equal to the sequence length.)
- Parameters:
include_missing (bool) – Whether to include spans of nodes on which they have missing data.
Calculate the spans over which pairs of nodes in two tree sequences are ancestral to identical sets of samples.
Returns a sparse matrix where rows correspond to nodes in ts and columns correspond to nodes in other, and whose value is the total amount of span over which the set of samples inheriting from the two nodes is identical.
The shared span of a sample node with itself includes spans over which it has missing data.
- Returns:
A sparse matrix of class scipy.sparse.csr_matrix.
- tscompare.match_node_ages(ts, other)[source]#
For each node in ts, return the age of a matched node from other. Node matching is accomplished as described in
haplotype_arf()
.Returns a tuple of three vectors of length ts.num_nodes, in this order: the age of the best matching node in other; the proportion of the node span in ts that is covered by the best match; and the node id of the best match in other.
- Returns:
A tuple of arrays of length ts.num_nodes containing (time of matching node, proportion overlap, and node ID of match).
- tscompare.haplotype_arf(ts, other, transform=None)[source]#
For two tree sequences ts and other, this method returns an object of type
ARFResult
. The values reported summarize the degree to which nodes in ts “match” corresponding nodes in other.To match nodes, for each node in ts, the best matching node(s) from other has the longest matching span using
shared_node_spans()
. If there are multiple matches with the same longest shared span for a single node, the best match is the match that is closest in time. This requires that the samples are the same in both tree sequences: in other words, if node i is a sample node in ts, then node i is also a sample node in other (and vice-versa).For each node in other we compute the inverse matched span as the maximum shared span amongst all nodes in ts for which that node is their best match.
Then,
ARFResult
contains:- (arf)
The fraction of the total span of ts over which each nodes’ descendant sample set does not match its’ best match’s descendant sample set (i.e., the total un-matched span divided by the total span of ts).
- (tpr)
The proportion of the span in other that is correctly represented in ts (i.e., the total inverse matching span divided by the total span of other).
- (matched_span)
The total “matching” and “inverse matching” spans between ts and other. The “matching span” is the total span of all nodes in ts over which each node is ancestral to the same set of samples as its best match in other. The “inverse matching span” is the total span of all nodes in other over which each node is ancestral to the same set of sample as its best match in ts.
- (total_span)
The total node spans of ts and other.
- (rmse)
The root mean squared difference between the transformed times of the nodes in ts and transformed times of their best matching nodes in other, with the average weighted by the nodes’ spans in ts.
The callable transform is used to transform times before computing root-mean-squared error (see
ARFResult
); the default is log(1 + t).- Parameters:
ts – The focal tree sequence.
other – The tree sequence we compare to.
transform – A callable that can take an array of times and return another array of numbers.
- Returns:
The three quantities above.
- Return type:
- class tscompare.ARFResult(arf, tpr, matched_span, total_span, rmse, transform)[source]#
The result of a call to tscompare.haplotype_arf(ts, other), returning metrics associated with the ARG Robinson-Foulds measures of similarity and dissimilarity. This contains:
- arf:
The ARG Robinson-Foulds relative dissimilarity: the proportion of the total span of ts that is not represented in other. This is: 1 - matched_span[0] / total_span[0]
- tpr:
The “true proportion represented”: the proportion of the total span of other that is represented in ts. This is: matched_span[1] / total_span[1]
- matched_span:
The total matched node spans between ts and other, in order (match, inverse_match), where match is the total span of ts that is represented in other, and inverse_match is the total span of other that is represented in ts.
- total_span:
The total of all node spans of the two tree sequences, in order (ts, other).
- rmse:
The root-mean-squared error between the transformed times of the nodes in ts and the transformed times of their best-matching nodes in other, with the average taken weighting by span in ts.
- transform:
The transformation function used to transform times for computing rmse.