Identity by descent#
The TreeSequence.ibd_segments()
method allows us to compute
segments of identity by descent.
Note
This documentation page is preliminary
Todo
Relate the concept of identity by descent to the MRCA spans in the tree sequence.
Examples#
Let’s take a simple tree sequence to illustrate the TreeSequence.ibd_segments()
method and associated Identity classes:
Show code cell source
import tskit
import io
from IPython.display import SVG
nodes = io.StringIO(
"""\
id is_sample time
0 1 0
1 1 0
2 1 0
3 0 1
4 0 2
5 0 3
"""
)
edges = io.StringIO(
"""\
left right parent child
2 10 3 0
2 10 3 2
0 10 4 1
0 2 4 2
2 10 4 3
0 2 5 0
0 2 5 4
"""
)
ts = tskit.load_text(nodes=nodes, edges=edges, strict=False)
SVG(ts.draw_svg())
Definition#
A pair of nodes (u, v)
has an IBD segment with a left and right
coordinate [left, right)
and ancestral node a
iff the most
recent common ancestor of the segment [left, right)
in nodes u
and v
is a
, and the segment has been inherited along the same
genealogical path (ie. it has not been broken by recombination). The
segments returned are the longest possible ones.
Consider the IBD segments that we get from our example tree sequence:
segments = ts.ibd_segments(store_segments=True)
for pair, segment_list in segments.items():
print(pair, list(segment_list))
(np.int32(0), np.int32(1)) [IdentitySegment(left=2.0, right=10.0, node=4), IdentitySegment(left=0.0, right=2.0, node=5)]
(np.int32(0), np.int32(2)) [IdentitySegment(left=2.0, right=10.0, node=3), IdentitySegment(left=0.0, right=2.0, node=5)]
(np.int32(1), np.int32(2)) [IdentitySegment(left=0.0, right=2.0, node=4), IdentitySegment(left=2.0, right=10.0, node=4)]
Each of the sample pairs (0, 1), (0, 2) and (1, 2) is associated with two IBD segments, representing the different paths from these sample pairs to their common ancestor. Note in particular that (1, 2) has two IBD segments rather than one: even though the MRCA is 4 in both cases, the paths from the samples to the MRCA are different in the left and right trees.
Data structures#
The result of calling TreeSequence.ibd_segments()
is an
IdentitySegments
class:
segments = ts.ibd_segments()
print(segments)
╔════════════════════╗
║IdentitySegments ║
╠══════════════╤═════╣
║Parameters: │ ║
║max_time │ inf║
║min_span │ 0║
║store_pairs │False║
║store_segments│False║
║Results: │ ║
║num_segments │ 6║
║total_span │ 30.0║
╚══════════════╧═════╝
By default this class only stores the high-level summaries of the IBD segments discovered. As we can see in this example, we have a total of six segments and the total span (i.e., the sum lengths of the genomic intervals spanned by IBD segments) is 30.
If required, we can get more detailed information about particular
segment pairs and the actual segments using the store_pairs
and store_segments
arguments.
Warning
Only use the store_pairs
and store_segments
arguments if you
really need this information! The number of IBD segments can be
very large and storing them all requires a lot of memory. It is
also much faster to just compute the overall summaries, without
needing to store the actual lists.
segments = ts.ibd_segments(store_pairs=True)
for pair, value in segments.items():
print(pair, "::", value)
(np.int32(0), np.int32(1)) :: IdentitySegmentList(num_segments=2, total_span=10.0)
(np.int32(0), np.int32(2)) :: IdentitySegmentList(num_segments=2, total_span=10.0)
(np.int32(1), np.int32(2)) :: IdentitySegmentList(num_segments=2, total_span=10.0)
Now we can see the more detailed breakdown of how the identity segments
are distributed among the sample pairs. The IdentitySegments
class behaves like a dictionary, such that segments[(a, b)]
will return
the IdentitySegmentList
instance for that pair of samples:
seglist = segments[(0, 1)]
print(seglist)
IdentitySegmentList(num_segments=2, total_span=10.0)
If we want to access the detailed information about the actual
identity segments, we must use the store_segments
argument:
segments = ts.ibd_segments(store_pairs=True, store_segments=True)
segments[(0, 1)]
IdentitySegmentList([IdentitySegment(left=2.0, right=10.0, node=4), IdentitySegment(left=0.0, right=2.0, node=5)])
The IdentitySegmentList
behaves like a Python list,
where each element is an instance of IdentitySegment
.
Warning
The order of segments in an IdentitySegmentList
is arbitrary, and may change in future versions.
Todo
More examples using the other bits of the IdentitySegments API here
Controlling the sample sets#
By default we get the IBD segments between all pairs of sample nodes.
IBD within a sample set#
We can reduce this to pairs within a specific set using the
within
argument:
Todo
More detail and better examples here.
segments = ts.ibd_segments(within=[0, 2], store_pairs=True)
print(list(segments.keys()))
[(np.int32(0), np.int32(2))]
IBD between sample sets#
We can also compute IBD between sample sets:
segments = ts.ibd_segments(between=[[0,1], [2]], store_pairs=True)
print(list(segments.keys()))
[(np.int32(0), np.int32(2)), (np.int32(1), np.int32(2))]
See also
See the TreeSequence.ibd_segments()
documentation for
more details.
Constraints on the segments#
The max_time
and min_span
arguments allow us to constrain the
segments that we consider.
Todo
Add examples for these arguments.