# Identity by descent#

The `TreeSequence.ibd_segments()`

method allows us to compute
segments of identity by descent.

Note

This documentation page is preliminary

Todo

Relate the concept of identity by descent to the MRCA spans in the tree sequence.

## Examples#

Let’s take a simple tree sequence to illustrate the `TreeSequence.ibd_segments()`

method and associated Identity classes:

## Show code cell source

```
import tskit
import io
from IPython.display import SVG
nodes = io.StringIO(
"""\
id is_sample time
0 1 0
1 1 0
2 1 0
3 0 1
4 0 2
5 0 3
"""
)
edges = io.StringIO(
"""\
left right parent child
2 10 3 0
2 10 3 2
0 10 4 1
0 2 4 2
2 10 4 3
0 2 5 0
0 2 5 4
"""
)
ts = tskit.load_text(nodes=nodes, edges=edges, strict=False)
SVG(ts.draw_svg())
```

### Definition#

A pair of nodes `(u, v)`

has an IBD segment with a left and right
coordinate `[left, right)`

and ancestral node `a`

iff the most
recent common ancestor of the segment `[left, right)`

in nodes `u`

and `v`

is `a`

, and the segment has been inherited along the same
genealogical path (ie. it has not been broken by recombination). The
segments returned are the longest possible ones.

Consider the IBD segments that we get from our example tree sequence:

```
segments = ts.ibd_segments(store_segments=True)
for pair, segment_list in segments.items():
print(pair, list(segment_list))
```

```
(np.int32(0), np.int32(1)) [IdentitySegment(left=2.0, right=10.0, node=4), IdentitySegment(left=0.0, right=2.0, node=5)]
(np.int32(0), np.int32(2)) [IdentitySegment(left=2.0, right=10.0, node=3), IdentitySegment(left=0.0, right=2.0, node=5)]
(np.int32(1), np.int32(2)) [IdentitySegment(left=0.0, right=2.0, node=4), IdentitySegment(left=2.0, right=10.0, node=4)]
```

Each of the sample pairs (0, 1), (0, 2) and (1, 2) is associated with
two IBD segments, representing the different paths from these sample
pairs to their common ancestor. Note in particular that (1, 2) has
**two** IBD segments rather than one: even though the MRCA is
4 in both cases, the paths from the samples to the MRCA are different
in the left and right trees.

### Data structures#

The result of calling `TreeSequence.ibd_segments()`

is an
`IdentitySegments`

class:

```
segments = ts.ibd_segments()
print(segments)
```

```
╔════════════════════╗
║IdentitySegments ║
╠══════════════╤═════╣
║Parameters: │ ║
║max_time │ inf║
║min_span │ 0║
║store_pairs │False║
║store_segments│False║
║Results: │ ║
║num_segments │ 6║
║total_span │ 30.0║
╚══════════════╧═════╝
```

By default this class only stores the high-level summaries of the IBD segments discovered. As we can see in this example, we have a total of six segments and the total span (i.e., the sum lengths of the genomic intervals spanned by IBD segments) is 30.

If required, we can get more detailed information about particular
segment pairs and the actual segments using the `store_pairs`

and `store_segments`

arguments.

Warning

Only use the `store_pairs`

and `store_segments`

arguments if you
really need this information! The number of IBD segments can be
very large and storing them all requires a lot of memory. It is
also much faster to just compute the overall summaries, without
needing to store the actual lists.

```
segments = ts.ibd_segments(store_pairs=True)
for pair, value in segments.items():
print(pair, "::", value)
```

```
(np.int32(0), np.int32(1)) :: IdentitySegmentList(num_segments=2, total_span=10.0)
(np.int32(0), np.int32(2)) :: IdentitySegmentList(num_segments=2, total_span=10.0)
(np.int32(1), np.int32(2)) :: IdentitySegmentList(num_segments=2, total_span=10.0)
```

Now we can see the more detailed breakdown of how the identity segments
are distributed among the sample pairs. The `IdentitySegments`

class behaves like a dictionary, such that `segments[(a, b)]`

will return
the `IdentitySegmentList`

instance for that pair of samples:

```
seglist = segments[(0, 1)]
print(seglist)
```

```
IdentitySegmentList(num_segments=2, total_span=10.0)
```

If we want to access the detailed information about the actual
identity segments, we must use the `store_segments`

argument:

```
segments = ts.ibd_segments(store_pairs=True, store_segments=True)
segments[(0, 1)]
```

```
IdentitySegmentList([IdentitySegment(left=2.0, right=10.0, node=4), IdentitySegment(left=0.0, right=2.0, node=5)])
```

The `IdentitySegmentList`

behaves like a Python list,
where each element is an instance of `IdentitySegment`

.

Warning

The order of segments in an `IdentitySegmentList`

is arbitrary, and may change in future versions.

Todo

More examples using the other bits of the IdentitySegments API here

### Controlling the sample sets#

By default we get the IBD segments between all pairs of sample nodes.

#### IBD within a sample set#

We can reduce this to pairs within a specific set using the
`within`

argument:

Todo

More detail and better examples here.

```
segments = ts.ibd_segments(within=[0, 2], store_pairs=True)
print(list(segments.keys()))
```

```
[(np.int32(0), np.int32(2))]
```

#### IBD between sample sets#

We can also compute IBD **between** sample sets:

```
segments = ts.ibd_segments(between=[[0,1], [2]], store_pairs=True)
print(list(segments.keys()))
```

```
[(np.int32(0), np.int32(2)), (np.int32(1), np.int32(2))]
```

See also

See the `TreeSequence.ibd_segments()`

documentation for
more details.

### Constraints on the segments#

The `max_time`

and `min_span`

arguments allow us to constrain the
segments that we consider.

Todo

Add examples for these arguments.