Changelogs

Changelogs#

Python#

[0.6.5] - 2025-0X-XX#

Features

TreeSequence.map_to_vcf_model now also returns the transformed positions and contig length. (@benjeffery, #3174, #3173)
draw_svg() methods now associate tree branches with edge IDs (@hyanwong, #3193, #557)
draw_svg() methods now allow the y-axis to be placed on the right-hand side using y_axis="right" (@hyanwong, #3201)
Add contig_id and isolated_as_missing to VcfModelMapping (@benjeffery, #3219, #3177)
Add TreeSequence.mutations_edge which returns the edge ID for each mutation’s edge. (@benjeffery, #3226, #3189)

Bugfixes

Fix bug in TreeSequence.pair_coalescence_counts when span_normalise=True and a window breakpoint falls within an internal missing interval. (@nspope, #3176, #3175)
Fix metadata schemas that are equal but have different byte representations not being equal when using TableCollection.assert_equals and Table.assert_equals. (@benjeffery, #3246, #3244)

Breaking changes

ltrim, rtrim, trim and shift raise an error if used on a tree sequence containing a reference sequence (@hyanwong, #3210, #2091)
Add TreeSequence.sites_ancestral_state and TreeSequence.mutations_derived_state properties to return the ancestral state of sites and derived state of mutations as NumPy arrays of the new numpy 2.0 StringDType. This requires numpy version 2 or greater, as such this is now the minimum version stated in tskit’s dependencies. If you try to use another python module that was compiled against numpy 1.X you may see the error “A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash.”. If no newer version of the module is avaliable you can still use it with tskit and numpy 1.X by building tskit from source with numpy 1.X using pip install tskit --no-binary tskit. However any use of the new properties will result in a RuntimeError. (@benjeffery, #3228, #2632)

[0.6.4] - 2025-05-21#

Features

Add TreeSequence.sample_nodes_by_ploidy method to return the sample nodes in a tree sequence, grouped by a ploidy value. (@benjeffery, #3157)
Add TreeSequence.individuals_nodes attribute to return the nodes associated with each individual as a numpy array. (@benjeffery, #3153)
Add shift method to both TableCollection and TreeSequence classes allowing the coordinate system to be shifted, and TreeSequence.concatenate so a set of tree sequence can be added to the right of an existing one. (@hyanwong, #3165, #3164)
Add TreeSequence.map_to_vcf_model method to return a mapping of the tree sequence to the VCF model. (@benjeffery, #3163)
Use a thin space as the thousands separator in HTML output, and a comma in CLI output. (@hossam26644, #3167, #2951)

Fixes

Correct assertion message when tables are compared with metadata ignored. (@benjeffery, #3162, #3161)

Breaking changes

TreeSequence.write_vcf now filters non-sample nodes from individuals by default, instead of raising an error. These nodes can be included using the new include_non_sample_nodes argument. By default individual names (sample IDs) in VCF output are now of the form tsk_{individual.id} Previously these were always "tsk_{j}" for j in range(num_individuals). This may break some downstream code if individuals are specified. To fix, manually specify individual_names to the required pattern. (@benjeffery, #3163)

[0.6.3] - 2025-04-28#

Bugfixes

TreeSequence.draw_svg(path=...) was failing due to a missing import xml.dom.minidom (@petrelharp, #3144, #3145)

[0.6.2] - 2025-04-01#

Bugfixes

Metadata.schema was returning a modified schema, this is fixed to return a copy of the original schema instead (@benjeffery, #3129, #3130)

Breaking Changes

Legacy formats from msprime<0.6 (HDF5 formats) support is dropped. This includes the support for tskit upgrade (@hossam26644, #2812, #3138)

[0.6.1] - 2025-03-31#

Bugfixes

Fix to TreeSequence.pair_coalescence_counts output dimension when provided with time windows containing no nodes (@nspope, #3046, #3058)
Fix to TreeSequence.pair_coalescence_counts to normalise by non-missing span if span_normalise=True. This resolves a bug where TreeSequence.pair_coalescence_rates would return incorrect values for intervals with missing trees. (@natep, #3053, #3059)
Fix to TreeSequence.pair_coalescence_rates causing an assertion to be triggered by floating point error, when all coalescence events are inside a single time window (@natep, #3035, #3038)

Features

Add support for fixed-length arrays in metadata struct codec using the length property. (@benjeffery, #3088,:pr:3090)
Add a new TreeSequence.pca method that uses randomized linear algebra to find the top eigenvectors/values of the genetic relatedness matrix (@hanbin973, @petrelharp, #3008)
Add methods on TreeSequence to efficiently get table metadata as a numpy structured array. (@benjeffery, #3098)
Add Python 3.13 support (@benjeffery, #3107)
Add a preamble argument to draw_svg() methods to allow adding arbitrary extra graphics (e.g. legends) to SVG plots (@hyanwong, issue:`3086, #3121)

[0.6.0] - 2024-10-16#

Breaking Changes

The definition of TreeSequence.genetic_relatedness and TreeSequence.genetic_relatedness_weighted are changed to average over sample sets, rather than summing over them. For computation with diploid sample sets, this will change the result by a factor of four; for larger sample sets it will now produce sensible values that are comparable between sample sets of different sizes. The default for these methods is also changed to polarised=True, but the output is unchanged for centre=True (the default). See the documentation for these methods for more discussion. (@petrelharp, @mmosmond, #1623)

Bugfixes

Fix to TreeSequence.genetic_relatedness with indexes=None and proportion=True. (@petrelharp, #2984, #1623)
Fix to TreeSequence.general_stat when using non-strict summary functions in the presence of non-ancestral material (very rare). (@petrelharp, #2983, #1623)
Printing tskit.MetadataSchema(schema=None) now shows "Null_schema" rather than None, to avoid confusion (@hyanwong, #2720)
Limit output HTML when a tree sequence is displayed that has a large amount of metadata. (@benjeffery, #2999)
Fix warning in draw_svg to use correct warnings module. (@duncanMR, #2870, #2871)

Features

Add the centre option to TreeSequence.genetic_relatedness and TreeSequence.genetic_relatedness_weighted. (@petrelharp, @mmosmond, #1623)
Edges now have an .interval attribute returning a tskit.Interval object. (@hyanwong, #2531)
Variants now have a states() method that returns the genotypes as an (inefficient) array of strings, rather than integer indexes, to aid comparison of genetic variation (@hyanwong, #2617)
Added distance_between that calculates the total distance between two nodes in a tree. (@Billyzhang1229, #2771)
Added genetic_relatedness_matrix method to compute pairwise genetic relatedness between sample sets. (@jeromekelleher, @petrelharp, #2823)
Add TreeSequence.extend_haplotypes method that extends ancestral haplotypes using recombination information, leading to unary nodes in many trees and fewer edges. (@petrelharp, @hfr1tz3, :user: nspope, @avabamf, #2651, #2938)
Add Table.drop_metadata to make clearing metadata from tables easy. (@jeromekelleher, #2944)
Add Interval.mid and Tree.mid properties to return the midpoint of the interval. (@currocam, #2960)
Added genetic_relatedness_vector method to compute product of genetic relatedness matrix and weight vector. (@petrelharp, #2980)
Added pair_coalescence_counts method to calculate coalescence events per node or time interval, pair_coalescence_quantiles method to estimate quantiles of pair coalescence times using empirical CDF inversion, and pair_coalescence_rates method to estimate instantaneous rates of pair coalescence within time intervals from the empirical CDF. (@nspope, #2915, #2976, #2985)
Add provenance information to the HTML notebook representation of a tree sequence. (@benjeffery, #3001)
The .draw_svg() methods can add annotated genomic regions (e.g. genes) to the x-axis. (@hyanwong, #3002)
Added a node_titles and a mutation_titles parameter to .draw_svg() methods which assigns a string to node and mutation symbols, commonly shown on mouseover. This can reduce label clutter while retaining useful info (@hyanwong, #3007)
Added (currently undocumented) use of the order parameter in Tree.draw_svg() to pass a subset of nodes, so subtrees can be visually collapsed. Additionally, an option pack_untracked_polytomies allows large polytomies involving untracked samples to be summarised as a dotted line (@hyanwong, #3011 #3010, #3012)
Added a title parameter to .draw_svg() methods (@hyanwong, #3015)
Add comma separation to all display numbers. (@benjeffery, #3017, #3018)
Added Tree.ancestors(u) method. (@hyanwong, #2706, #3021)
Add resources section to provenance schema. (@benjeffery, #3016)
Add Tree.rf_distance method to calculate the unweighted Robinson-Foulds distance between two trees. (@Billyzhang1229, #995, #2643, #3032)

[0.5.8] - 2024-06-27#

Add support for numpy 2 (@jeromekelleher, @benjeffery, #2964)

[0.5.7] - 2024-06-17#

Breaking Changes

The VCF writing methods (ts.write_vcf, ts.as_vcf) now error if a site with position zero is encountered. The VCF spec does not allow zero position sites. Suppress this error with the allow_position_zero argument. (@benjeffery, #2901, #2838)

Bugfixes

Fix to the folded, expected allele frequency spectrum (i.e., TreeSequence.allele_frequency_spectrum(mode=”branch”, polarised=False), which was half as big as it should have been. (@petrelharp, @nspope, #2933)

[0.5.6] - 2023-10-10#

Breaking Changes

tskit now requires Python 3.8, as Python 3.7 became end-of-life on 2023-06-27

Features

Tree.trmca now accepts >2 nodes and returns nicer errors (@hyanwong, :pr:2808, #2801, #2070, #2611)
Add TreeSequence.genetic_relatedness_weighted stats method. (@petrelharp, @brieuclehmann, @jeromekelleher, #2785, #1246)
Add TreeSequence.impute_unknown_mutations_time method to return an array of mutation times based on the times of associated nodes (@duncanMR, #2760, #2758)
Add asdict to all dataclasses. These are returned when you access a row or other tree sequence object. (@benjeffery, #2759, #2719)

Bugfixes

Fix incompatibility with jsonschema>4.18.6 which caused AttributeError: module jsonschema has no attribute _validators (@benjeffery, #2844, #2840)

[0.5.5] - 2023-05-17#

Performance improvements

Methods like ts.at() which seek to a specified position on the sequence from a new Tree instance are now much faster (@molpopgen, #2661).

Features

Add __repr__ for variants to return a string representation of the raw data without spewing megabytes of text (@chriscrsmith, #2695, #2694)

Breaking Changes

Bugfixes

Fix UnicodeDecodeError when calling Variant.alleles on the emscripten platform. (@benjeffery, #2754, #2737)

[0.5.4] - 2023-01-13#

Features

A new Tree.is_root method avoids the need to to search the potentially large list of Tree.roots (@hyanwong, #2669, #2620)
The TreeSequence object now has the attributes min_time and max_time, which are the minimum and maximum among the node times and mutation times, respectively. (@szhan, #2612, #2271)
The draw_svg methods now have a max_num_trees parameter to truncate the total number of trees shown, giving a readable display for tree sequences with many trees (@hyanwong, #2652)
The draw_svg methods now accept a canvas_size parameter to allow extra room on the canvas e.g. for long labels or repositioned graphical elements (@hyanwong, #2646, #2645)
The Tree object now has the method siblings to get
the siblings of a node. It returns an empty tuple if the node has no siblings, is not a node in the tree, is the virtual root, or is an isolated non-sample node. (@szhan, #2618, #2616)
The msprime.RateMap class has been ported into tskit: functionality should be identical to the version in msprime, apart from minor changes in the formatting of tabular text output (@hyanwong, @jeromekelleher, #2678)
Tskit now supports and has wheels for Python 3.11. This Python version has a significant performance boost (@benjeffery, #2624, #2248)
Add the update_sample_flags option to simplify which ensures no node sample flags are changed to allow calling code to manage sample status. (@jeromekelleher, #2662, #2663).

Breaking Changes

the filter_populations, filter_individuals, and filter_sites parameters to simplify previously defaulted to True but now default to None, which is treated as True. Previously, passing None would result in an error. (@hyanwong, #2609, #2608)

[0.5.3] - 2022-10-03#

Fixes

The Variant object can now be initialized with 64 bit numpy ints as returned e.g. from np.where (@hyanwong, #2518, #2514)

Fix tree.mrca for the case of a tree with multiple roots. (@benjeffery, #2533, #2521)

Features

The ts.nodes method now takes an order parameter so that nodes can be visited in time order (@hyanwong, #2471, #2370)

Add samples argument to TreeSequence.genotype_matrix. Default is None, where all the sample nodes are selected. (@szhan, #2493, #678)

ts.draw and the draw_svg methods now have an optional omit_sites parameter, aiding drawing large trees with many sites and mutations (@hyanwong, #2519, #2516)

Breaking Changes

Single statistics computed with TreeSequence.general_stat are now returned as numpy scalars if windows=None, AND; samples is a single list or None (for a 1-way stat), OR indexes is None or a single list of length k (instead of a list of length-k lists). (@gtsambos, #2417, #2308)

Accessor methods such as ts.edge(n) and ts.node(n) now allow negative indexes (@hyanwong, #2478, #1008)

ts.subset() produces valid tree sequences even if nodes are shuffled out of time order (@hyanwong, #2479, #2473), and the same for tables.subset() (@hyanwong, #2489). This involves sorting the returned tables, potentially changing the returned edge order.

Performance improvements

TreeSequence.link_ancestors no longer continues to process edges once all of the sample and ancestral nodes have been accounted for, improving memory overhead and overall performance (@gtsambos, #2456, #2442)

[0.5.2] - 2022-07-29#

Fixes

Iterating over ts.variants() could cause a segfault in tree sequences with large numbers of alleles or very long alleles (@jeromekelleher, #2437, #2429).
Various circular references fixed, lowering peak memory usage (@jeromekelleher, #2424, #2423, #2427).
Fix bugs in VCF output when there isn’t a 1-1 mapping between individuals and sample nodes (@jeromekelleher, #2442, #2257, #2446, #2448).

Performance improvements

TreeSequence.site position search performance greatly improved, with much lower memory overhead (@jeromekelleher, #2424).
TreeSequence.samples time/population search performance greatly improved, with much lower memory overhead (@jeromekelleher, #2424, #1916).
The timeasc and timedesc orders for Tree.nodes have much improved performance and lower memory overhead (@jeromekelleher, #2424, #2423).

Features

Variant objects now have a .num_missing attribute and .counts() and .frequencies methods (@hyanwong, #2390 #2393).
Add the Tree.num_lineages(t) method to return the number of lineages present at time t in the tree (@jeromekelleher, #386, #2422)
Efficient array access to table data now provided via attributes like TreeSequence.nodes_time, etc (@jeromekelleher, #2424).

Breaking Changes

Previously, accessing (e.g.) tables.edges returned a different instance of EdgeTable each time. This has been changed to return the same instance for the lifetime of a given TableCollection instance. This is technically a breaking change, although it’s difficult to see how code would depend on the property that (e.g.) tables.edges is not tables.edges. (@jeromekelleher, #2441, #2080).

[0.5.1] - 2022-07-14#

Fixes

Copies of a Variant object would cause a segfault when .samples was accessed. (@benjeffery, #2400, #2401)

Changes

Tables in a table collection can be replaced using the replace_with method (@hyanwong, #1489 #2389)
SVG drawing routines now return a special string object that is automatically rendered in a Jupyter notebook (@hyanwong, #2377)

Features

New Site.alleles() method (@hyanwong, #2380, #2385)
The variants(), haplotypes() and alignments() methods can now take a list of sample ids and a left and right position, to restrict the size of the output (@hyanwong, #2092, #2397)

[0.5.0] - 2022-06-22#

Changes

A min_time parameter in draw_svg enables the youngest node as the y axis min value, allowing negative times. (@hyanwong, #2197, #2215)
VcfWriter.write now prints the site ID of variants in the ID field of the output VCF files. (@roohy, #2103, #2107)
Make dumping of tables and tree sequences to disk a zero-copy operation. (@benjeffery, #2111, #2124)
Add copy argument to TreeSequence.variants which if False reuses the returned Variant object for improved performance. Defaults to True. (@benjeffery, #605, #2172)
tree.mrca now takes 2 or more arguments and gives the common ancestor of them all. (@savitakartik, #1340, #2121)
Add a edge attribute to the Mutation class that gives the ID of the edge that the mutation falls on. (@jeromekelleher, #685, #2279).
Add the TreeSequence.split_edges operation which inserts nodes into edges at a specific time. (@jeromekelleher, #2276, #2296).
Add the TreeSequence.decapitate (and closely related TableCollection.delete_older) operation to remove topology and mutations older than a give time. (@jeromekelleher, #2236, #2302, #2331).
Add the TreeSequence.individuals_time and TreeSequence.individuals_population methods to return arrays of per-individual times and populations, respectively. (@petrelharp, #1481, #2298).
Add the sample_mask and site_mask to write_vcf to allow parts of an output VCF to be omitted or marked as missing data. Also add the as_vcf convenience function, to return VCF as a string. (@jeromekelleher, #2300).
Add support for missing data to write_vcf, and add the isolated_as_missing argument. (@jeromekelleher, #2329, #447).
Add Tree.num_children_array and Tree.num_children. Returns the counts of the number of child nodes for each or a single node in the tree respectively. (@GertjanBisschop, #2318, #2319, #2332)
Add Tree.path_length. (@jeremyguez, #2249, #2259).
Add B1 tree balance index. (@jeremyguez, @jeromekelleher, #2251, #2281, #2346).
Add B2 tree balance index. (@jeremyguez, @jeromekelleher, #2252, #2353, #2354).
Add Sackin tree imbalance index. (@jeremyguez, @jeromekelleher, #2246, #2258).
Add Colless tree imbalance index. (@jeremyguez, @jeromekelleher, #2250, #2266, #2344).
Add direction argument to TreeSequence.edge_diffs, allowing iteration over diffs in the reverse direction. NOTE: this comes with a ~10% performance regression as the implementation was moved from C to Python for simplicity and maintainability. Please open an issue if this affects your application. (@jeromekelleher, @benjeffery, #2120).
Add Tree.edge_array and Tree.edge. Returns the edge id of the edge encoding the relationship of each node with its parent. (@GertjanBisschop, #2361, #2357)
Add position argument to TreeSequence.site. Returns a Site object if there is one at the specified position. If not, it raises ValueError. (@szhan, #2234, #2235)

Breaking Changes

The JSON metadata codec now interprets the empty string as an empty object. This means that applying a schema to an existing table will no longer necessitate modifying the existing rows. (@benjeffery, #2064, #2104)
Remove the previously deprecated as_bytes argument to TreeSequence.variants. If you need genotypes in byte form this can be done following the code in the to_macs method on line 5573 of trees.py. This argument was initially deprecated more than 3 years ago when the code was part of msprime. (@benjeffery, #605, #2172)
Arguments after ploidy in write_vcf marked as keyword only (@jeromekelleher, #2329, #2315).
When metadata equal to b'' is printed to text or HTML tables it will render as an empty string rather than "b''". (@hyanwong, #2349, #2351)

[0.4.1] - 2022-01-11#

Changes

TableCollection.name_map has been deprecated in favour of table_name_map. (@benjeffery, #1981, #2086)

Fixes

TreeSequence.dump_text now prints decoded metadata if there is a schema. (@benjeffery, #1860, #1527)
Add missing ReferenceSequence.__eq__ method. (@benjeffery, #2063, #2085)

[0.4.0] - 2021-12-10#

Breaking changes

The Tree.num_nodes method is now deprecated with a warning, because it confusingly returns the number of nodes in the entire tree sequence, rather than in the tree. Text summaries of trees (e.g. str(tree)) now return the number of nodes in the tree, not in the entire tree sequence (@hyanwong, #1966 #1968)
The CLI info command now gives more detailed information on the tree sequence (@benjeffery, #1611)
64 bits are now used to store the sizes of ragged table columns such as metadata, allowing them to hold more data. This change is fully backwards and forwards compatible for all tree-sequences whose ragged column sizes fit into 32 bits. New tree-sequences with large offset arrays that require 64 bits will fail to load in previous versions with error _tskit.FileFormatError: An incompatible type for a column was found in the file. (@jeromekelleher, #343, #1527, #1528, #1530, #1554, #1573, #1589,:issue:1598,:issue:1628, #1571, #1579, #1585, #1590, #1602, #1618, #1620, #1652).
The Tree class now conceptually has an extra node, the “virtual root” whose children are the roots of the tree. The quintuply linked tree arrays (parent_array, left_child_array, right_child_array, left_sib_array and right_sib_array) all have one extra element. (@jeromekelleher, #1691, #1704).
Tree traversal orders returned by the nodes method have changed when there are multiple roots. Previously orders were defined locally for each root, but are now globally across all roots. (@jeromekelleher, #1704).
Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence. TableCollection.sort no longer sorts individuals. (@benjeffery, #1774, #1789)
Metadata encoding errors now raise MetadataEncodingError (@benjeffery, #1505, #1827).
For TreeSequence.samples all arguments after population are now keyword only (@benjeffery, #1715, #1831).
Remove the method TreeSequence.to_nexus and replace with TreeSequence.as_nexus. As the old method was not generating standards-compliant output, it seems unlikely that it was used by anyone. Calls to to_nexus will result in a NotImplementedError, informing users of the change. See below for details on as_nexus.
Change default value for missing_data_char in the TreeSequence.haplotypes method from “-” to “N”. This is a more idiomatic usage to indicate missing data rather than a gap in an alignment. (@jeromekelleher, #1893, #1894)

Features

Add the ibd_segments method and associated classes to compute, summarise and store segments of identity by descent from a tree sequence (@gtsambos, @jeromekelleher).
Allow skipping of site and mutation tables in TableCollection.sort (@benjeffery, #1475, #1826).
Add TableCollection.sort_individuals to sort the individuals as this is no longer done by the default sort (@benjeffery, #1774, #1789).
Add __setitem__ to all tables allowing single rows to be updated. For example tables.nodes[0] = tables.nodes[0].replace(flags=tskit.NODE_IS_SAMPLE) (@jeromekelleher, @benjeffery, #1545, #1600).
Added a new parameter time to TreeSequence.samples() allowing to select samples at a specific time point or time interval. (@mufernando, @petrelharp, #1692, #1700)
Add table.metadata_vector to all table classes to allow easy extraction of a single metadata key into an array (@petrelharp, #1676, #1690).
Add time_units to TreeSequence to describe the units of the time dimension of the tree sequence. This is then used to generate an error if time_units is uncalibrated when using the branch lengths in statistics. (@benjeffery, #1644, #1760, #1832)
Add the virtual_root property to the Tree class (@jeromekelleher, #1704).
Add the num_edges property to the Tree class (@jeromekelleher, #1704).
Improved performance for tree traversal methods in the nodes iterator. Roughly a 10X performance increase for “preorder”, “postorder”, “timeasc” and “timedesc” (@jeromekelleher, #1704).
Substantial performance improvement for Tree.total_branch_length (@jeromekelleher, #1794 #1799)
Add the discrete_genome property to the TreeSequence class which is true if all coordinates are discrete (@jeromekelleher, #1144, #1819)
Add a random_nucleotides function. (user:jeromekelleher, #1825)
Add the TreeSequence.alignments method. (user:jeromekelleher, #1825)
Add alignment export in the FASTA and nexus formats using the TreeSequence.write_nexus and TreeSequence.write_fasta methods. (@jeromekelleher, @hyanwong, #1894)
Add the discrete_time property to the TreeSequence class which is true if all time coordinates are discrete or unknown (@benjeffery, #1839, #1890)
Add the skip_tables option to load to support only loading top-level information from a file. Also add the ignore_tables option to TableCollection.equals and TableCollection.assert_equals to compare only top-level information. (@clwgg, #1882, #1854).
Add the skip_reference_sequence option to load. Also add the ignore_reference_sequence option equals to compare two table collections without comparing their reference sequence. (@clwgg, #2019, #1971).
tskit now supports python 3.10 (@benjeffery, #1895, #1949)

Fixes

dump_tables omitted individual parents. (@benjeffery, #1828, #1884)
Add the Tree.as_newick method and deprecate Tree.newick. The as_newick method by default labels samples with the pattern "n{node_id}" which is much more useful that the behaviour of Tree.newick (which mimics ms output). (@jeromekelleher, #1671, #1838.)
Add the as_nexus and write_nexus methods to the TreeSequence class, replacing the broken to_nexus method (see above). This uses the same sample labelling pattern as as_newick. (@jeetsukumaran, @jeromekelleher, #1785, #1835, #1836, #1838)
load_text created additional populations even if the population table was specified, and didn’t strip newlines from input text (@hyanwong, #1909, #1910)

[0.3.7] - 2021-07-08#

Features

map_mutations now allows the ancestral state to be specified (@hyanwong, @jeromekelleher, #1542, #1550)

[0.3.6] - 2021-05-14#

Breaking changes

Mutation.position and Mutation.index which were deprecated in 0.2.2 (Sep ‘19) have been removed.

Features

Add direct, copy-free access to the arrays representing the quintuply-linked structure of Tree (e.g. left_child_array). Allows performant algorithms over the tree structure using, for example, numba (@jeromekelleher, #1299, #1320).
Add fancy indexing to tables. E.g. table[6:86] returns a new table with the specified rows. Supports slices, index arrays and boolean masks (@benjeffery, #1221, #1348, #1342).
Add Table.append method for adding rows from classes such as SiteTableRow and Site (@benjeffery, #1111, #1254).
SVG visualization of a tree sequence can be restricted to displaying between left and right genomic coordinates using the x_lim parameter. The default settings now mean that if the left or right flanks of a tree sequence are entirely empty, these regions will not be plotted in the SVG (@hyanwong, #1288).
SVG visualization of a single tree allows all mutations on an edge to be plotted via the all_edge_mutations param (@hyanwong,:issue:1253, #1258).
Entity classes such as Mutation, Node are now python dataclasses (@benjeffery, #1261).
Metadata decoding for table row access is now lazy (@benjeffery, #1261).
Add html notebook representation for Tree and change Tree.__str__ from dict representation to info table. (@benjeffery, #1269, #1304).
Improve display of tables when print``ed, limiting lines set via ``tskit.set_print_options (@benjeffery,:issue:1270, #1300).
Add Table.assert_equals and TableCollection.assert_equals which give an exact report of any differences. (@benjeffery,:issue:1076, #1328)

Changes

In drawing methods max_tree_height and tree_height_scale have been deprecated in favour of max_time and time_scale (@benjeffery,:issue:1262, #1331).

Fixes

Tree sequences were not properly init’d after unpickling (@benjeffery, #1297, #1298)

[0.3.5] - 2021-03-16#

Features

SVG visualization plots mutations at the correct time, if it exists, and a y-axis, with label can be drawn. Both x- and y-axes can be plotted on trees as well as tree sequences (@hyanwong,:issue:840, #580, #1236)
SVG visualization now uses squares for sample nodes and red crosses for mutations, with the site/mutation positions marked on the x-axis. Additionally, an x-axis label can be set (@hyanwong,:issue:1155, #1194, #1182, #1213)
Add parents column to the individual table to allow recording of pedigrees (@ivan-krukov, @benjeffery, #852, #1125, #866, #1153, #1177, #1192 #1199).
Added Tree.generate_random_binary static method to create random binary trees (@hyanwong, @jeromekelleher, #1037).
Change the default behaviour of Tree.split_polytomies to generate the shortest possible branch lengths instead of a fixed epsilon of 1e-10. (@jeromekelleher, #1089, #1090)
Default value metadata in add_row functions is now schema-dependant, so that metadata={} is no longer needed as an argument when a schema is present (@benjeffery, #1084).
default in metadata schemas is used to fill in missing values when encoding for the struct codec. (@benjeffery, #1073, #1116).
Added canonical option to table collection sorting (@mufernando, @petrelharp, #705)
Added various arguments to TreeSequence.subset, to allow for stable population indexing and lossless node reordering with subset. (@petrelharp, #1097)

Changes

Allow mutations that have the same derived state as their parent mutation. (@benjeffery, #1180, #1233)
File minor version change to support individual parents

Breaking changes

tskit now requires Python 3.7 (@benjeffery, #1235)

[0.3.4] - 2020-12-02#

Minor bugfix release.

Bugfixes

Reinstate the unused zlib_compression option to tskit.dump, as msprime < 1.0 still uses it (@jeromekelleher, #1067).

[0.3.3] - 2020-11-27#

Features

Add TreeSequence.genetic_relatedness for calculating genetic relatedness between pairs of sets of nodes (@brieuclehmann, #1021, #1023, #974, #973, #898).
Expose TreeSequence.coiterate() method to allow iteration over 2 sequences simultaneously, aiding comparison of trees from two sequences (@jeromekelleher, @hyanwong, #1021, #1022).
tskit is now supported on, and has wheels for, python3.9 (@benjeffery, #982, #907).
Tree.newick() now has extra option include_branch_lengths to allow branch lengths to be omitted (@hyanwong, #931).
Added Tree.generate_star static method to create star-topologies (@hyanwong, #934).
Added Tree.generate_comb and Tree.generate_balanced methods to create example trees. (@jeromekelleher, #1026).
Added equals method to TreeSequence, TableCollection and each of the tables which provides more flexible equality comparisons, for example, allowing users to ignore metadata or provenance in the comparison (@mufernando, @jeromekelleher, #896, #897, #913, #917).
Added __eq__ to TreeSequence (@benjeffery, #1011, #1020).
ts.dump and tskit.load now support reading and writing file objects such as FIFOs and sockets (@benjeffery, #657, #909).
Added tskit.write_ms for writing to MS format (@saurabhbelsare, #727, #854).
Added TableCollection.indexes for access to the edge insertion/removal order indexes (@benjeffery, #4, #916).
The dictionary representation of a TableCollection now contains its index (@benjeffery, #870, #921).
Added TreeSequence._repr_html_ for use in jupyter notebooks (@benjeffery, #872, #923).
Added TreeSequence.__str__ to display a summary for terminal usage (@benjeffery, #938, #985).
Added TableCollection.dump and TableCollection.load. This allows table collections that are not valid tree sequences to be manipulated (@benjeffery, #14, #986).
Added nbytes method to tables, TableCollection and TreeSequence which reports the size in bytes of those objects (@jeromekelleher, @benjeffery, #54, #871).
Added TableCollection.clear to clear data table rows and optionally provenances, table schemas and tree-sequence level metadata and schema (@benjeffery, #929, #1001).

Bugfixes

LightWeightTableCollection.asdict and TableCollection.asdict now return copies of arrays (@benjeffery, #1025, #1029).
The map_mutations method previously used the Fitch parsimony method, but this does not produce parsimonious results on non-binary trees. We now now use the Hartigan parsimony algorithm, which does (@jeromekelleher, #987, #1030).
The flag argument to tables’ add_row was treating the value as signed (@benjeffery, #1027, #1031).

Breaking changes

The argument to ts.dump and tskit.load has been renamed file from path.
All arguments to Tree.newick() except precision are now keyword-only.
Renamed ts.trait_regression to ts.trait_linear_model.

[0.3.2] - 2020-09-29#

Breaking changes

The argument order of Tree.unrank and combinatorics.num_labellings now positions the number of leaves before the tree rank (@daniel-goldstein, #950, #978)
Change several methods (simplify(), trees(), Tree()) so most parameters are keyword only, not positional. This allows reordering of parameters, so that deprecated parameters can be moved, and the parameter order in similar functions, e.g. TableCollection.simplify and TreeSequence.simplify() can be made consistent (@hyanwong, #374, #846, #851)

Features

Add split_polytomies method to the Tree class (@hyanwong, @jeromekelleher, #809, #815)
Tree accessor functions (e.g. ts.first(), ts.at() pass extra parameters such as sample_indexes to the underlying Tree constructor; also root_threshold can be specified when calling ts.trees() (@hyanwong, #847, #848)
Genomic intervals returned by python functions are now namedtuples, allowing .left .right and .span usage (@hyanwong, #784, #786, #811)
Added include_terminal parameter to edge diffs iterator, to output the last edges at the end of a tree sequence (@hyanwong, #783, #787)
#832 - Add metadata_bytes method to allow access to raw TableCollection metadata (@benjeffery, #842)
New tree.is_isolated(u) method (@hyanwong, #443).
tskit.is_unknown_time can now check arrays. (@benjeffery, #857).

[0.3.1] - 2020-09-04#

Bugfixes

#823 - Fix mutation time error when using simplify(keep_input_roots=True) (@petrelharp, #823).
#821 - Fix mutation rows with unknown time never being equal (@petrelharp, #822).

[0.3.0] - 2020-08-27#

Major feature release for metadata schemas, set-like operations, mutation times, SVG drawing improvements and many others.

Breaking changes

The default display order for tree visualisations has been changed to minlex (see below) to stabilise the node ordering and to make trees more readily comparable. The old behaviour is still available with order="tree".
File system operations such as dump/load now raise an appropriate OSError instead of tskit.FileFormatError. Loading from an empty file now raises and EOFError.
Bad tree topologies are detected earlier, so that it is no longer possible to create a TreeSequence object which contains a parent with contradictory children on an interval. Previously an error was thrown when some operation building the trees was attempted (@jeromekelleher, #709).
The TableCollection object no longer implements the iterator protocol. Previously list(tables) returned a sequence of (table_name, table_instance) tuples. This has been replaced with the more intuitive and future-proof TableCollection.name_map and TreeSequence.tables_dict attributes, which perform the same function (@jeromekelleher, #500, #694).
The arguments to TreeSequence.genotype_matrix, TreeSequence.haplotypes and TreeSequence.variants must now be keyword arguments, not positional. This is to support the change from impute_missing_data to isolated_as_missing in the arguments to these methods. (@benjeffery, #716, #794)

New features

New methods to perform set operations on TableCollections and TreeSequences. TableCollection.subset subsets and reorders table collections by nodes (@mufernando, @petrelharp, #663, #690). TableCollection.union forms the node-wise union of two table collections (@mufernando, @petrelharp, #381 #623).
Mutations now have an optional double-precision floating-point time column. If not specified, this defaults to a particular NaN value (tskit.UNKNOWN_TIME) indicating that the time is unknown. For a tree sequence to be considered valid it must meet new criteria for mutation times, see Mutation requirements. Also added function TableCollection.compute_mutation_times. Table sorting orders mutations by non-increasing time per-site, which is also a requirement for a valid tree sequence (@benjeffery, #672).
Add support for trees with internal samples for the Kendall-Colijn tree distance metric. (@daniel-goldstein, #610)
Add background shading to SVG tree sequences to reflect tree position along the sequence (@hyanwong, #563).
Tables with a metadata column now have a metadata_schema that is used to validate and encode metadata that is passed to add_row and decode metadata on calls to table[j] and e.g. tree_sequence.node(j) See Metadata (@benjeffery, #491, #542, #543, #601).
The tree-sequence now has top-level metadata with a schema (@benjeffery, #666, #644, #642).
Add classes to SVG drawings to allow easy adjustment and styling, and document the new tskit.Tree.draw_svg() and tskit.TreeSequence.draw_svg() methods. This also fixes #467 for duplicate SVG entity id s in Jupyter notebooks (@hyanwong, #555).
Add a to_nexus function that outputs a tree sequence in Nexus format (@saunack, #550).
Add extension of Kendall-Colijn tree distance metric for tree sequences computed by TreeSequence.kc_distance (@daniel-goldstein, #548).
Add an optional node traversal order in tskit.Tree that uses the minimum lexicographic order of leaf nodes visited. This ordering ("minlex_postorder") adds more determinism because it constraints the order in which children of a node are visited (@brianzhang01, #411).
Add an order argument to the tree visualisation functions which supports two node orderings: "tree" (the previous default) and "minlex" which stabilises the node ordering (making it easier to compare trees). The default node ordering is changed to "minlex" (@brianzhang01, @jeromekelleher, #389, #566).
Add _repr_html_ to tables, so that jupyter notebooks render them as html tables (@benjeffery, #514).
Remove support for kc_distance on trees with unary nodes (@daniel-goldstein, #508).
Improve Kendall-Colijn tree distance algorithm to operate in O(n^2) time instead of O(n^2 * log(n)) where n is the number of samples (@daniel-goldstein, #490).
Add a metadata column to the migrations table. Works similarly to existing metadata columns on other tables (@benjeffery, #505).
Add a metadata column to the edges table. Works similarly to existing metadata columns on other tables (@benjeffery, #496).
Allow sites with missing data to be output by the haplotypes method, by default replacing with -. Errors are no longer raised for missing data with isolated_as_missing=True; the error types returned for bad alleles (e.g. multiletter or non-ascii) have also changed from _tskit.LibraryError to TypeError, or ValueError if the missing data character clashes (@hyanwong, #426).
Access the number of children of a node in a tree directly using tree.num_children(u) (@hyanwong, #436).
User specified allele mapping for genotypes in variants and genotype_matrix (@jeromekelleher, #430).
New root_threshold option for the Tree class, which allows us to efficiently iterate over ‘real’ roots when we have missing data (@jeromekelleher, #462).
Add pickle support for TreeSequence (@terhorst, #473).
Add tree.as_dict_of_dicts() function to enable use with networkx. See Networkx (@winni2k, #457).
Add tree_sequence.to_macs() function to convert tree sequence to MACS format (@winni2k, #727)
Add a keep_input_roots option to simplify which, if enabled, adds edges from the MRCAs of samples in the simplified tree sequence back to the roots in the input tree sequence (@jeromekelleher, #775, #782).

Bugfixes

#453 - Fix LibraryError when tree.newick() is called with large node time values (@jeromekelleher, #637).
#777 - Mutations over isolated samples were incorrectly decoded as missing data. (@jeromekelleher, #778)
#776 - Fix a segfault when a partial list of samples was provided to the variants iterator. (@jeromekelleher, #778)

Deprecated

The sample_counts feature has been deprecated and is now ignored. Sample counts are now always computed.
For TreeSequence.genotype_matrix, TreeSequence.haplotypes and TreeSequence.variants the impute_missing_data argument is deprecated and replaced with isolated_as_missing. Note that to get the same behaviour impute_missing_data=True should be replaced with isolated_as_missing=False. (@benjeffery, #716, #794)

[0.2.3] - 2019-11-22#

Minor feature release, providing a tree distance metric and various method to manipulate tree sequence data.

New features

Kendall-Colijn tree distance metric computed by Tree.kc_distance (@awohns, #172).
New “timeasc” and “timedesc” orders for tree traversals (@benjeffery, #246, #399).
Up to 2X performance improvements to tree traversals (@benjeffery, #400).
Add trim, delete_sites, keep_intervals and delete_intervals methods to edit tree sequence data. (@hyanwong, #364, #372, #377, #390).
Initial online documentation for CLI (@hyanwong, #414).
Various documentation improvements (@hyanwong, @jeromekelleher, @petrelharp).
Rename the map_ancestors function to link_ancestors (@hyanwong, @gtsambos; #406, #262). The original function is retained as an deprecated alias.

Bugfixes

Fix height scaling issues with SVG tree drawing (@jeromekelleher, #407, #383, #378).
Do not reuse buffers in LdCalculator (@jeromekelleher). See #397 and #396.

[0.2.2] - 2019-09-01#

Minor bugfix release.

Relaxes overly-strict input requirements on individual location data that caused some SLiM tree sequences to fail loading in version 0.2.1 (see #351).

New features

Add log_time height scaling option for drawing SVG trees (@marianne-aspbury). See #324 and #303.

Bugfixes

Allow 4G metadata columns (@jeromekelleher). See #342 and #341.

[0.2.1] - 2019-08-23#

Major feature release, adding support for population genetic statistics, improved VCF output and many other features.

Note: Version 0.2.0 was skipped because of an error uploading to PyPI which could not be undone.

Breaking changes

Genotype arrays returned by TreeSequence.variants and TreeSequence.genotype_matrix have changed from unsigned 8 bit values to signed 8 bit values to accomodate missing data (see #144 for discussion). Specifically, the dtype of the genotypes arrays have changed from numpy “u8” to “i8”. This should not affect client code in any way unless it specifically depends on the type of the returned numpy array.
The VCF written by the write_vcf is no longer compatible with previous versions, which had significant shortcomings. Position values are now rounded to the nearest integer by default, REF and ALT values are derived from the actual allelic states (rather than always being A and T). Sample names are now of the form tsk_j for sample ID j. Most of the legacy behaviour can be recovered with new options, however.
The positional parameter reference_sets in genealogical_nearest_neighbours and mean_descendants TreeSequence methods has been renamed to sample_sets.

New features

Support for general windowed statistics. Implementations of diversity, divergence, segregating sites, Tajima’s D, Fst, Patterson’s F statistics, Y statistics, trait correlations and covariance, and k-dimensional allele frequency specra (@petrelharp, @jeromekelleher, @molpopgen).
Add the keep_unary option to simplify (@gtsambos). See #1 and #143.
Add the map_ancestors method to TableCollection (user:gtsambos). See #175.
Add the squash method to EdgeTable (@gtsambos). See #59 and #285.
Add support for individuals to VCF output, and fix major issues with output format (@jeromekelleher). Position values are transformed in a much more straightforward manner and output has been generalised substantially. Adds individual_names and position_transform arguments. See #286, and issues #2, #30 and #73.
Control height scale in SVG trees using ‘tree_height_scale’ and ‘max_tree_height’ (@hyanwong, @jeromekelleher). See #167, #168. Various other improvements to tree drawing (#235, #241, #242, #252, #259).
Add Tree.max_root_time property (@hyanwong, @jeromekelleher). See #170.
Improved input checking on various methods taking numpy arrays as parameters (@hyanwong). See #8 and #185.
Define the branch length over roots in trees to be zero (previously raise an error; @jeromekelleher). See #188 and #191.
Implementation of the genealogical nearest neighbours statistic (@hyanwong, @jeromekelleher).
New delete_intervals and keep_intervals method for the TableCollection to allow slicing out of topology from specific intervals (@hyanwong, @andrewkern, @petrelharp, @jeromekelleher). See #225 and #261.
Support for missing data via a topological definition (@jeromekelleher). See #270 and #272.
Add ability to set columns directly in the Tables API (@jeromekelleher). See #12 and #307.
Various documentation improvements from @brianzhang01, @hyanwong, @petrelharp and @jeromekelleher.

Deprecated

Deprecate Tree.length in favour of Tree.span (@hyanwong). See #169.
Deprecate TreeSequence.pairwise_diversity in favour of the new diversity method. See #215, #312.

Bugfixes

Catch NaN and infinity values within tables (@hyanwong). See #293 and #294.

[0.1.5] - 2019-03-27#

This release removes support for Python 2, adds more flexible tree access and a new tskit command line interface.

New features

Remove support for Python 2 (@hugovk). See #137 and #140.
More flexible tree API (#121). Adds TreeSequence.at and TreeSequence.at_index methods to find specific trees, and efficient support for backwards traversal using reversed(ts.trees()).
Add initial tskit CLI (#80)
Add tskit info CLI command (#66)
Enable drawing SVG trees with coloured edges (@hyanwong; #149).
Add Tree.is_descendant method (#120)
Add Tree.copy method (#122)

Bugfixes

Fixes to the low-level C API (#132 and #157)

[0.1.4] - 2019-02-01#

Minor feature update. Using the C API 0.99.1.

New features

Add interface for setting TableCollection.sequence_length: tskit-dev/tskit#107
Add support for building and dropping TableCollection indexes: tskit-dev/tskit#108

[0.1.3] - 2019-01-14#

Bugfix release.

Bugfixes

Fix missing provenance schema: tskit-dev/tskit#81

[0.1.2] - 2019-01-14#

Bugfix release.

Bugfixes

Fix memory leak in table collection. tskit-dev/tskit#76

[0.1.1] - 2019-01-11#

Fixes broken distribution tarball for 0.1.0.

[0.1.0] - 2019-01-11#

Initial release after separation from msprime 0.6.2. Code that reads tree sequence files and processes them should be able to work without changes.

Breaking changes

Removal of the previously deprecated sort_tables, simplify_tables and load_tables functions. All code should change to using corresponding TableCollection methods.
Rename SparseTree class to Tree.

[1.1.0a1] - 2019-01-10#

Initial alpha version posted to PyPI for bootstrapping.

[0.0.0] - 2019-01-10#

Initial extraction of tskit code from msprime. Relicense to MIT.

Code copied at hash 29921408661d5fe0b1a82b1ca302a8b87510fd23

C API#

[1.2.0] - 2025-XX-XX#

Breaking changes

Remove tsk_diff_iter_t and associated functions. (@benjeffery, #3221, #2797).

[1.1.4] - 2025-03-31#

Changes

Added the TSK_TRACE_ERRORS macro to enable tracing of errors in the C library. This is useful for debugging as errors will print to stderr when set. (@jeromekelleher, #3095).

[1.1.3] - 2024-10-16#

Features

Add the tsk_treeseq_extend_haplotypes method that can compress a tree sequence by extending edges into adjacent trees and thus creating unary nodes in those trees (@petrelharp, @hfr1tze, @avabamf, #2651, #2938).

[1.1.2] - 2023-05-17#

Performance improvements

tsk_tree_seek is now much faster at seeking to arbitrary points along the sequence from the null tree (@molpopgen, #2661).

Features

The struct tsk_treeseq_t now has the variables min_time and max_time, which are the minimum and maximum among the node times and mutation times, respectively. min_time and max_time can be accessed using the functions tsk_treeseq_get_min_time and tsk_treeseq_get_max_time, respectively. (@szhan, #2612, #2271)
Add the TSK_SIMPLIFY_NO_FILTER_NODES option to simplify to allow unreferenced nodes be kept in the output (@jeromekelleher, @hyanwong, #2606, #2619).
Add the TSK_SIMPLIFY_NO_UPDATE_SAMPLE_FLAGS option to simplify which ensures no node sample flags are changed to allow calling code to manage sample status. (@jeromekelleher, #2662, #2663).
Guarantee that unfiltered tables are not written to unnecessarily during simplify (@jeromekelleher, #2619).
Add x_table_keep_rows methods to provide efficient in-place table subsetting (@jeromekelleher, #2700).
Add tsk_tree_seek_index function

[1.1.1] - 2022-07-29#

Bug fixes

Fix segfault in tsk_variant_restricted_copy in tree sequences with large numbers of alleles or very long alleles (@jeromekelleher, #2437, #2429).

[1.1.0] - 2022-07-14#

Features

Add num_children to tsk_tree_t an array which contains counts of the number of child nodes of each node in the tree. (@GertjanBisschop, #2274, #2316)
Add edge to tsk_tree_t an array which contains the edge_id of the edge encoding the relationship between the child node and its parent for each (child) node in the tree. (@GertjanBisschop, #2304, #2340)

Changes

Reduce the maximum number of rows in a table by 1. This removes edge cases so that a tsk_id_t can be used to count the number of rows. (@benjeffery, #2336, #2337)
Samples are now copied by tsk_variant_restricted_copy. (@benjeffery, #2400, #2401)

[1.0.0] - 2022-05-24#

This major release marks the point at which the documented API becomes stable and supported.

Breaking changes

Change the type of genotypes to int32_t, removing the TSK_16_BIT_GENOTYPES flag option. (@benjeffery, #463, #2108)
tsk_variant_t now includes its tsk_site_t rather than pointing to it. (@benjeffery, #2161, #2162)
Rename TSK_TAKE_TABLES to TSK_TAKE_OWNERSHIP. (@benjeffery, #2221, #2222)
TSK_DEBUG, TSK_NO_INIT, TSK_NO_CHECK_INTEGRITY and TSK_TAKE_OWNERSHIP have moved to core.h (@benjeffery, #2218, #2230))
Rename several flags:
- All flags to simplify for example TSK_KEEP_INPUT_ROOTS becomes TSK_SIMPLIFY_KEEP_INPUT_ROOTS.
- All flags to subset for example TSK_KEEP_UNREFERENCED becomes TSK_SUBSET_KEEP_UNREFERENCED.
- TSK_BUILD_INDEXES -> TSK_TS_INIT_BUILD_INDEXES
- TSK_NO_METADATA -> TSK_TABLE_NO_METADATA
- TSK_NO_EDGE_METADATA -> TSK_TC_NO_EDGE_METADATA
(@benjeffery, #1720, #2226, #2229, #2224)
Remove the generic TSK_ERR_OUT_OF_BOUNDS - replacing with specific errors. Remove TSK_ERR_NON_SINGLE_CHAR_MUTATION which was unused. (@benjeffery, #2260)
Reorder stats API methods to place result as the last argument. (@benjeffery, #2292, #2285)

Features

Make dumping of tables and tree sequences to disk a zero-copy operation. (@benjeffery, #2111, #2124)
Add edge attribute to mutation_t struct and make available in tree sequence. (@jeromekelleher, #685, #2279)
Reduce peak memory usage in tsk_treeseq_simplify. (@jeromekelleher, #2287, #2288)

[0.99.15] - 2021-12-07#

Breaking changes

The tables argument to tsk_treeseq_init is no longer const, to allow for future no-copy tree sequence creation. (@benjeffery, #1718, #1719)
Additional consistency checks for mutation tables are now run by tsk_table_collection_check_integrity even when TSK_CHECK_MUTATION_ORDERING is not passed in. (@petrelharp, #1713, #1722)
num_tracked_samples and num_samples in tsk_tree_t are now typed as tsk_size_t (@benjeffery, #1723, #1727)
The previously deprecated option TSK_SAMPLE_COUNTS has been removed. (@benjeffery, #1744, #1761).
Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence. tsk_table_collection_sort no longer sorts individuals. (@benjeffery, #1774, #1789)
The tsk_tree_t.left_root member has been removed. Client code can be updated most easily by using the equivalent tsk_tree_get_left_root function. However, it may be worth considering updating code to use either the standard traversal functions (which automatically iterate over roots) or to use the virtual_root member (which may lead to more concise code). (@jeromekelleher, #1796, #1862)
Rename tsk_tree_t.left and tsk_tree_t.right members to tsk_tree_t.interval.left and tsk_tree_t.interval.right respectively. (@jeromekelleher, #1686, #1913)
kastore is now vendored into this repo instead of being a git submodule. Developers need to run git submodule update. (@jeromekelleher, #1687, #1973)
Tree arrays such as left_sib, right_child etc. now have an additional “virtual root” node at the end. (@jeromekelleher, #1691, #1704)
marked and mark have been removed from tsk_tree_t. (@jeromekelleher, #1936)

Features

Add tsk_table_collection_individual_topological_sort to sort the individuals as this is no longer done by the default sort. (@benjeffery, #1774, #1789)
The default behaviour for table size growth is now to double the current size of the table, up to a threshold. To keep the previous behaviour, use (e.g.) tsk_edge_table_set_max_rows_increment(tables->edges, 1024), which results in adding space for 1024 additional rows each time we run out of space in the edge table. (@benjeffery, #5, #1683)
tsk_table_collection_check_integrity now has a TSK_CHECK_MIGRATION_ORDERING flag. (@petrelharp, #1722)
The default behaviour for ragged column growth is now to double the current size of the column, up to a threshold. To keep the previous behaviour, use (e.g.) tsk_node_table_set_max_metadata_length_increment(tables->nodes, 1024), which results in adding space for 1024 additional entries each time we run out of space in the ragged column. (@benjeffery, #1703, #1709)
Support for compiling the C library on Windows using msys2 (@jeromekelleher, #1742).
Add time_units to tsk_table_collection_t to describe the units of the time dimension of the tree sequence. This is then used to geerate an error if time_units is uncalibrated when using the branch lengths in statistics. (@benjeffery, #1644, #1760)
Add the TSK_LOAD_SKIP_TABLES option to load just the top-level information from a file. Also add the TSK_CMP_IGNORE_TABLES option to compare only the top-level information in two table collections. (@clwgg, #1882, #1854).
Add reference sequence. (@jeromekelleher, @benjeffery, #146, #1911, #1944, #1911)
Add the TSK_LOAD_SKIP_REFERENCE_SEQUENCE option to load a table collection without the reference sequence. Also add the TSK_CMP_IGNORE_REFERENCE_SEQUENCE option to compare two table collections without comparing their reference sequence. (@clwgg, #2019, #1971).
Add a “virtual root” to Tree arrays such as left_sib, right_child etc. The virtual root is appended to each array, has all real roots as its children, but is not the parent of any node. Simplifies traversal algorithms. (@jeromekelleher, #1691, #1704)
Add num_edges to tsk_tree_t to count the edges that define the topology of the tree. (@jeromekelleher, #1704)
Add the tsk_tree_get_size_bound function which returns an upper bound on the number of nodes reachable from the roots of a tree. Useful for tree stack allocations (@jeromekelleher, #1704).
Add MetadataSchema.permissive_json for an easy way to get the simplest schema.

[0.99.14] - 2021-09-03#

Breaking changes

64 bits are now used to store the sizes of ragged table columns such as metadata, allowing them to hold more data. As such tsk_size_t is now 64 bits wide. This change is fully backwards and forwards compatible for all tree-sequences whose ragged column sizes fit into 32 bits. New tree-sequences with large offset arrays that require 64 bits will fail to load in previous versions with error TSK_ERR_BAD_COLUMN_TYPE. (@jeromekelleher, #343, #1527, #1528, #1530, #1554, #1573, #1589,:issue:1598,:issue:1628, #1571, #1579, #1585, #1590, #1602, #1618, #1620, #1652).

Features

Add tsk_X_table_update_row methods which allow modifying single rows of tables (@jeromekelleher, #1545, #1552).

[0.99.13] - 2021-07-08#

Fixes

Fix segfault when very large columns overflow (@bhaller, @benjeffery, #1509, #1511).

[0.99.12] - 2021-05-14#

Breaking changes

Removed TSK_NO_BUILD_INDEXES. Not building indexes is now the default behaviour of tsk_table_collection_dump and related functions. (@molpopgen, #1327, #1337).

Features

Add tsk_*_table_extend methods to append to a table from another (@benjeffery, #1271, #1287).

Fixes

[0.99.11] - 2021-03-16#

Features

Add parents to the individual table to enable recording of pedigrees (@ivan-krukov, @benjeffery, #852, #1125, #866, #1153, #1177, #1199).
Added a tsk_table_collection_canonicalise method, that allows checking for equality between tables that are equivalent up to reordering (@petrelharp, @mufernando, #1108).
Removed a previous requirement on tsk_table_collection_union, allowing for unioning of new information both above and below shared history (@petrelharp, @mufernando, #1108).
Support migrations in tsk_table_collection_sort. (@jeromekelleher, #22, #117, #1131).

Breaking changes

Method tsk_individual_table_add_row has an extra arguments parents and parents_length.
Add an options argument to tsk_table_collection_subset (@petrelharp, #1108), to allow for retaining the order of populations.
Mutation error codes have changed

Changes

Allow mutations that have the same derived state as their parent mutation. (@benjeffery, #1180, #1233)
File minor version change to support individual parents

[0.99.10] - 2021-01-25#

Minor bugfix on internal APIs

[0.99.9] - 2021-01-22#

Features

Add TSK_SIMPLIFY_KEEP_UNARY_IN_INDIVIDUALS flag to simplify, which allows the user to keep unary nodes only if they belong to a tabled individual. This is useful for simplification in forwards simulations (@hyanwong, #1113, #1119).

[0.99.8] - 2020-11-27#

Features

Add tsk_treeseq_genetic_relatedness for calculating genetic relatedness between pairs of sets of nodes (@brieuclehmann, #1021, #1023, #974, #973, #898).
Exposed tsk_table_collection_set_indexes to the API (@benjeffery, #870, #921).

Breaking changes

Added an options argument to tsk_table_collection_equals and table equality methods to allow for more flexible equality criteria (e.g., ignore top-level metadata and schema or provenance tables). Existing code should add an extra final parameter 0 to retain the current behaviour (@mufernando, @jeromekelleher, #896, #897, #913, #917).
Changed default behaviour of tsk_table_collection_clear to not clear provenances and added options argument to optionally clear provenances and schemas (@benjeffery, #929, #1001).
Renamed ts.trait_regression to ts.trait_linear_model.

[0.99.7] - 2020-09-29#

Added TSK_INCLUDE_TERMINAL option to tsk_diff_iter_init to output the last edges at the end of a tree sequence (@hyanwong, #783, #787).
Added tsk_bug_assert for assertions that should be compiled into release binaries (@benjeffery, #860).

[0.99.6] - 2020-09-04#

Bugfixes

#823 - Fix mutation time error when using tsk_table_collection_simplify with TSK_SIMPLIFY_KEEP_INPUT_ROOTS (@petrelharp, #823).

[0.99.5] - 2020-08-27#

Breaking changes

The macro TSK_IMPUTE_MISSING_DATA is renamed to TSK_ISOLATED_NOT_MISSING (@benjeffery, #716, #794)

New features

Add a TSK_SIMPLIFY_KEEP_INPUT_ROOTS option to simplify which, if enabled, adds edges from the MRCAs of samples in the simplified tree sequence back to the roots in the input tree sequence (@jeromekelleher, #775, #782).

Bugfixes

#777 - Mutations over isolated samples were incorrectly decoded as missing data. (@jeromekelleher, #778)
#776 - Fix a segfault when a partial list of samples was provided to the variants iterator. (@jeromekelleher, #778)

[0.99.4] - 2020-08-12#

Note

The TSK_VERSION_PATCH macro was incorrectly set to 4 for 0.99.3, so both 0.99.4 and 0.99.3 have the same value.

Changes

Mutation times can be a mixture of known and unknown as long as for each individual site they are either all known or all unknown (@benjeffery, #761).

Bugfixes

Fix for including core.h under C++ (@petrelharp, #755).

[0.99.3] - 2020-07-27#

Breaking changes

tsk_mutation_table_add_row has an extra time argument. If the time is unknown TSK_UNKNOWN_TIME should be passed. (@benjeffery, #672)
Change genotypes from unsigned to signed to accommodate missing data (see #144 for discussion). This only affects users of the tsk_vargen_t class. Genotypes are now stored as int8_t and int16_t types rather than the former unsigned types. The field names in the genotypes union of the tsk_variant_t struct returned by tsk_vargen_next have been renamed to i8 and i16 accordingly; care should be taken when updating client code to ensure that types are correct. The number of distinct alleles supported by 8 bit genotypes has therefore dropped from 255 to 127, with a similar reduction for 16 bit genotypes.
Change the tsk_vargen_init method to take an extra parameter alleles. To keep the current behaviour, set this parameter to NULL.
Edges can now have metadata. Hence edge methods now take two extra arguments: metadata and metadata length. The file format has also changed to accommodate this, but is backwards compatible. Edge metadata can be disabled for a table collection with the TSK_NO_EDGE_METADATA flag. (@benjeffery, #496, #712)
Migrations can now have metadata. Hence migration methods now take two extra arguments: metadata and metadata length. The file format has also changed to accommodate this, but is backwards compatible. (@benjeffery, #505)
The text dump of tables with metadata now includes the metadata schema as a header. (@benjeffery, #493)
Bad tree topologies are detected earlier, so that it is no longer possible to create a tsk_treeseq_t object which contains a parent with contradictory children on an interval. Previously an error occured when some operation building the trees was attempted (@jeromekelleher, #709).

New features

New methods to perform set operations on table collections. tsk_table_collection_subset subsets and reorders table collections by nodes (@mufernando, @petrelharp, #663, #690). tsk_table_collection_union forms the node-wise union of two table collections (@mufernando, @petrelharp, #381, #623).
Mutations now have an optional double-precision floating-point time column. If not specified, this defaults to a particular NaN value (TSK_UNKNOWN_TIME) indicating that the time is unknown. For a tree sequence to be considered valid it must meet new criteria for mutation times, see Mutation requirements. Add tsk_table_collection_compute_mutation_times and new flag to tsk_table_collection_check_integrity:TSK_CHECK_MUTATION_TIME. Table sorting orders mutations by non-increasing time per-site, which is also a requirement for a valid tree sequence. (@benjeffery, #672)
Add metadata and metadata_schema fields to table collection, with accessors on tree sequence. These store arbitrary bytes and are optional in the file format. (:user: benjeffery, #641)
Add the TSK_SIMPLIFY_KEEP_UNARY option to simplify (@gtsambos). See #1 and #143.
Add a set_root_threshold option to tsk_tree_t which allows us to set the number of samples a node must be an ancestor of to be considered a root (#462).
Change the semantics of tsk_tree_t so that sample counts are always computed, and add a new TSK_NO_SAMPLE_COUNTS option to turn this off (#462).
Tables with metadata now have an optional metadata_schema field that can contain arbitrary bytes. (@benjeffery, #493)
Tables loaded from a file can now be edited in the same way as any other table collection (@jeromekelleher, #536, #530.
Support for reading/writing to arbitrary file streams with the loadf/dumpf variants for tree sequence and table collection load/dump (@jeromekelleher, @grahamgower, #565, #599).
Add low-level sorting API and TSK_NO_CHECK_INTEGRITY flag (@jeromekelleher, #627, #626).
Add extension of Kendall-Colijn tree distance metric for tree sequences computed by tsk_treeseq_kc_distance (@daniel-goldstein, #548)

Deprecated

The TSK_SAMPLE_COUNTS options is now ignored and will print out a warning if used (#462).

[0.99.2] - 2019-03-27#

Bugfix release. Changes:

Fix incorrect errors on tbl_collection_dump (#132)
Catch table overflows (#157)

[0.99.1] - 2019-01-24#

Refinements to the C API as we move towards 1.0.0. Changes:

Change the _tbl_ abbreviation to _table_ to improve readability. Hence, we now have, e.g., tsk_node_table_t etc.
Change tsk_tbl_size_t to tsk_size_t.
Standardise public API to use tsk_size_t and tsk_id_t as appropriate.
Add tsk_flags_t typedef and consistently use this as the type used to encode bitwise flags. To avoid confusion, functions now have an options parameter.
Rename tsk_table_collection_position_t to tsk_bookmark_t.
Rename tsk_table_collection_reset_position to tsk_table_collection_truncate and tsk_table_collection_record_position to tsk_table_collection_record_num_rows.
Generalise tsk_table_collection_sort to take a bookmark as start argument.
Relax restriction that nodes in the samples argument to simplify must currently be marked as samples. (tskit-dev/tskit#72)
Allow tsk_table_collection_simplify to take a NULL samples argument to specify “all samples in the current tables”.
Add support for building as a meson subproject.

[0.99.0] - 2019-01-14#

Initial alpha version of the tskit C API tagged. Version 0.99.x represents the series of releases leading to version 1.0.0 which will be the first stable release. After 1.0.0, semver rules regarding API/ABI breakage will apply; however, in the 0.99.x series arbitrary changes may happen.

[0.0.0] - 2019-01-10#

Initial extraction of tskit code from msprime. Relicense to MIT. Code copied at hash 29921408661d5fe0b1a82b1ca302a8b87510fd23

Changelogs

Contents

Changelogs#

Python#

[0.6.5] - 2025-0X-XX#

[0.6.4] - 2025-05-21#

[0.6.3] - 2025-04-28#

[0.6.2] - 2025-04-01#

[0.6.1] - 2025-03-31#

[0.6.0] - 2024-10-16#

[0.5.8] - 2024-06-27#

[0.5.7] - 2024-06-17#

[0.5.6] - 2023-10-10#

[0.5.5] - 2023-05-17#

[0.5.4] - 2023-01-13#

[0.5.3] - 2022-10-03#

[0.5.2] - 2022-07-29#

[0.5.1] - 2022-07-14#

[0.5.0] - 2022-06-22#

[0.4.1] - 2022-01-11#

[0.4.0] - 2021-12-10#

[0.3.7] - 2021-07-08#

[0.3.6] - 2021-05-14#

[0.3.5] - 2021-03-16#

[0.3.4] - 2020-12-02#

[0.3.3] - 2020-11-27#

[0.3.2] - 2020-09-29#

[0.3.1] - 2020-09-04#

[0.3.0] - 2020-08-27#

[0.2.3] - 2019-11-22#

[0.2.2] - 2019-09-01#

[0.2.1] - 2019-08-23#

[0.1.5] - 2019-03-27#

[0.1.4] - 2019-02-01#

[0.1.3] - 2019-01-14#

[0.1.2] - 2019-01-14#

[0.1.1] - 2019-01-11#

[0.1.0] - 2019-01-11#

[1.1.0a1] - 2019-01-10#

[0.0.0] - 2019-01-10#

C API#

[1.2.0] - 2025-XX-XX#

[1.1.4] - 2025-03-31#

[1.1.3] - 2024-10-16#

[1.1.2] - 2023-05-17#

[1.1.1] - 2022-07-29#

[1.1.0] - 2022-07-14#

[1.0.0] - 2022-05-24#

[0.99.15] - 2021-12-07#

[0.99.14] - 2021-09-03#

[0.99.13] - 2021-07-08#

[0.99.12] - 2021-05-14#

[0.99.11] - 2021-03-16#

[0.99.10] - 2021-01-25#

[0.99.9] - 2021-01-22#

[0.99.8] - 2020-11-27#

[0.99.7] - 2020-09-29#

[0.99.6] - 2020-09-04#

[0.99.5] - 2020-08-27#

[0.99.4] - 2020-08-12#

[0.99.3] - 2020-07-27#

[0.99.2] - 2019-03-27#

[0.99.1] - 2019-01-24#

[0.99.0] - 2019-01-14#

[0.0.0] - 2019-01-10#