Identity by descent
Identity by descent#
TreeSequence.ibd_segments() method allows us to compute
segments of identity by descent.
This documentation page is preliminary
Relate the concept of identity by descent to the MRCA spans in the tree sequence.
import tskit import io from IPython.display import SVG nodes = io.StringIO( """\ id is_sample time 0 1 0 1 1 0 2 1 0 3 0 1 4 0 2 5 0 3 """ ) edges = io.StringIO( """\ left right parent child 2 10 3 0 2 10 3 2 0 10 4 1 0 2 4 2 2 10 4 3 0 2 5 0 0 2 5 4 """ ) ts = tskit.load_text(nodes=nodes, edges=edges, strict=False) SVG(ts.draw_svg())
A pair of nodes
(u, v) has an IBD segment with a left and right
[left, right) and ancestral node
a iff the most
recent common ancestor of the segment
[left, right) in nodes
a, and the segment has been inherited along the same
genealogical path (ie. it has not been broken by recombination). The
segments returned are the longest possible ones.
Consider the IBD segments that we get from our example tree sequence:
segments = ts.ibd_segments(store_segments=True) for pair, segment_list in segments.items(): print(pair, list(segment_list))
(0, 1) [IdentitySegment(left=2.0, right=10.0, node=4), IdentitySegment(left=0.0, right=2.0, node=5)] (0, 2) [IdentitySegment(left=2.0, right=10.0, node=3), IdentitySegment(left=0.0, right=2.0, node=5)] (1, 2) [IdentitySegment(left=0.0, right=2.0, node=4), IdentitySegment(left=2.0, right=10.0, node=4)]
Each of the sample pairs (0, 1), (0, 2) and (1, 2) is associated with two IBD segments, representing the different paths from these sample pairs to their common ancestor. Note in particular that (1, 2) has two IBD segments rather than one: even though the MRCA is 4 in both cases, the paths from the samples to the MRCA are different in the left and right trees.
segments = ts.ibd_segments() print(segments)
╔════════════════════╗ ║IdentitySegments ║ ╠══════════════╤═════╣ ║Parameters: │ ║ ║max_time │ inf║ ║min_span │ 0║ ║store_pairs │False║ ║store_segments│False║ ║Results: │ ║ ║num_segments │ 6║ ║total_span │ 30.0║ ╚══════════════╧═════╝
By default this class only stores the high-level summaries of the IBD segments discovered. As we can see in this example, we have a total of six segments and the total span (i.e., the sum lengths of the genomic intervals spanned by IBD segments) is 30.
If required, we can get more detailed information about particular
segment pairs and the actual segments using the
Only use the
store_segments arguments if you
really need this information! The number of IBD segments can be
very large and storing them all requires a lot of memory. It is
also much faster to just compute the overall summaries, without
needing to store the actual lists.
segments = ts.ibd_segments(store_pairs=True) for pair, value in segments.items(): print(pair, "::", value)
(0, 1) :: IdentitySegmentList(num_segments=2, total_span=10.0) (0, 2) :: IdentitySegmentList(num_segments=2, total_span=10.0) (1, 2) :: IdentitySegmentList(num_segments=2, total_span=10.0)
Now we can see the more detailed breakdown of how the identity segments
are distributed among the sample pairs. The
class behaves like a dictionary, such that
segments[(a, b)] will return
IdentitySegmentList instance for that pair of samples:
seglist = segments[(0, 1)] print(seglist)
If we want to access the detailed information about the actual
identity segments, we must use the
segments = ts.ibd_segments(store_pairs=True, store_segments=True) segments[(0, 1)]
IdentitySegmentList([IdentitySegment(left=2.0, right=10.0, node=4), IdentitySegment(left=0.0, right=2.0, node=5)])
The order of segments in an
is arbitrary, and may change in future versions.
More examples using the other bits of the IdentitySegments API here
Controlling the sample sets#
By default we get the IBD segments between all pairs of sample nodes.
IBD within a sample set#
We can reduce this to pairs within a specific set using the
More detail and better examples here.
segments = ts.ibd_segments(within=[0, 2], store_pairs=True) print(list(segments.keys()))
IBD between sample sets#
We can also compute IBD between sample sets:
segments = ts.ibd_segments(between=[[0,1], ], store_pairs=True) print(list(segments.keys()))
[(0, 2), (1, 2)]
TreeSequence.ibd_segments() documentation for
Constraints on the segments#
min_span arguments allow us to constrain the
segments that we consider.
Add examples for these arguments.