Changelogs#
Python#
[0.6.1] - 2024-XX-XX#
Bugfixes
Fix to
TreeSequence.pair_coalescence_counts
output dimension when provided with time windows containing no nodes (@nspope, #3046, #3058)Fix to
TreeSequence.pair_coalescence_counts
to normalise by non-missing span ifspan_normalise=True
. This resolves a bug whereTreeSequence.pair_coalescence_rates
would return incorrect values for intervals with missing trees. (@natep, #3053, #3059)Fix to
TreeSequence.pair_coalescence_rates
causing an assertion to be triggered by floating point error, when all coalescence events are inside a single time window (@natep, #3035, #3038)
[0.6.0] - 2024-10-16#
Breaking Changes
The definition of
TreeSequence.genetic_relatedness
andTreeSequence.genetic_relatedness_weighted
are changed to average over sample sets, rather than summing over them. For computation with diploid sample sets, this will change the result by a factor of four; for larger sample sets it will now produce sensible values that are comparable between sample sets of different sizes. The default for these methods is also changed topolarised=True
, but the output is unchanged forcentre=True
(the default). See the documentation for these methods for more discussion. (@petrelharp, @mmosmond, #1623)
Bugfixes
Fix to
TreeSequence.genetic_relatedness
withindexes=None
andproportion=True
. (@petrelharp, #2984, #1623)Fix to
TreeSequence.general_stat
when using non-strict summary functions in the presence of non-ancestral material (very rare). (@petrelharp, #2983, #1623)Printing
tskit.MetadataSchema(schema=None)
now shows"Null_schema"
rather thanNone
, to avoid confusion (@hyanwong, #2720)Limit output HTML when a tree sequence is displayed that has a large amount of metadata. (@benjeffery, #2999)
Fix warning in draw_svg to use correct warnings module. (@duncanMR, #2870, #2871)
Features
Add the
centre
option toTreeSequence.genetic_relatedness
andTreeSequence.genetic_relatedness_weighted
. (@petrelharp, @mmosmond, #1623)Edges now have an
.interval
attribute returning atskit.Interval
object. (@hyanwong, #2531)Variants now have a states() method that returns the genotypes as an (inefficient) array of strings, rather than integer indexes, to aid comparison of genetic variation (@hyanwong, #2617)
Added
distance_between
that calculates the total distance between two nodes in a tree. (@Billyzhang1229, #2771)Added
genetic_relatedness_matrix
method to compute pairwise genetic relatedness between sample sets. (@jeromekelleher, @petrelharp, #2823)Add
TreeSequence.extend_haplotypes
method that extends ancestral haplotypes using recombination information, leading to unary nodes in many trees and fewer edges. (@petrelharp, @hfr1tz3, :user: nspope, @avabamf, #2651, #2938)Add
Table.drop_metadata
to make clearing metadata from tables easy. (@jeromekelleher, #2944)Add
Interval.mid
andTree.mid
properties to return the midpoint of the interval. (@currocam, #2960)Added
genetic_relatedness_vector
method to compute product of genetic relatedness matrix and weight vector. (@petrelharp, #2980)Added
pair_coalescence_counts
method to calculate coalescence events per node or time interval,pair_coalescence_quantiles
method to estimate quantiles of pair coalescence times using empirical CDF inversion, andpair_coalescence_rates
method to estimate instantaneous rates of pair coalescence within time intervals from the empirical CDF. (@nspope, #2915, #2976, #2985)Add provenance information to the HTML notebook representation of a tree sequence. (@benjeffery, #3001)
The
.draw_svg()
methods can add annotated genomic regions (e.g. genes) to the x-axis. (@hyanwong, #3002)Added a
node_titles
and amutation_titles
parameter to.draw_svg()
methods which assigns a string to node and mutation symbols, commonly shown on mouseover. This can reduce label clutter while retaining useful info (@hyanwong, #3007)Added (currently undocumented) use of the order parameter in
Tree.draw_svg()
to pass a subset of nodes, so subtrees can be visually collapsed. Additionally, an optionpack_untracked_polytomies
allows large polytomies involving untracked samples to be summarised as a dotted line (@hyanwong, #3011 #3010, #3012)Added a
title
parameter to.draw_svg()
methods (@hyanwong, #3015)Add comma separation to all display numbers. (@benjeffery, #3017, #3018)
Add
resources
section to provenance schema. (@benjeffery, #3016)Add
Tree.rf_distance
method to calculate the unweighted Robinson-Foulds distance between two trees. (@Billyzhang1229, #995, #2643, #3032)
[0.5.8] - 2024-06-27#
Add support for numpy 2 (@jeromekelleher, @benjeffery, #2964)
[0.5.7] - 2024-06-17#
Breaking Changes
The VCF writing methods (ts.write_vcf, ts.as_vcf) now error if a site with position zero is encountered. The VCF spec does not allow zero position sites. Suppress this error with the allow_position_zero argument. (@benjeffery, #2901, #2838)
Bugfixes
Fix to the folded, expected allele frequency spectrum (i.e., TreeSequence.allele_frequency_spectrum(mode=”branch”, polarised=False), which was half as big as it should have been. (@petrelharp, @nspope, #2933)
[0.5.6] - 2023-10-10#
Breaking Changes
tskit now requires Python 3.8, as Python 3.7 became end-of-life on 2023-06-27
Features
Tree.trmca now accepts >2 nodes and returns nicer errors (@hyanwong, :pr:2808, #2801, #2070, #2611)
Add
TreeSequence.genetic_relatedness_weighted
stats method. (@petrelharp, @brieuclehmann, @jeromekelleher, #2785, #1246)Add
TreeSequence.impute_unknown_mutations_time
method to return an array of mutation times based on the times of associated nodes (@duncanMR, #2760, #2758)Add
asdict
to all dataclasses. These are returned when you access a row or other tree sequence object. (@benjeffery, #2759, #2719)
Bugfixes
Fix incompatibility with
jsonschema>4.18.6
which causedAttributeError: module jsonschema has no attribute _validators
(@benjeffery, #2844, #2840)
[0.5.5] - 2023-05-17#
Performance improvements
Methods like ts.at() which seek to a specified position on the sequence from a new Tree instance are now much faster (@molpopgen, #2661).
Features
Add
__repr__
for variants to return a string representation of the raw data without spewing megabytes of text (@chriscrsmith, #2695, #2694)
Breaking Changes
Bugfixes
Fix UnicodeDecodeError when calling Variant.alleles on the emscripten platform. (@benjeffery, #2754, #2737)
[0.5.4] - 2023-01-13#
Features
A new
Tree.is_root
method avoids the need to to search the potentially large list ofTree.roots
(@hyanwong, #2669, #2620)The
TreeSequence
object now has the attributesmin_time
andmax_time
, which are the minimum and maximum among the node times and mutation times, respectively. (@szhan, #2612, #2271)The
draw_svg
methods now have amax_num_trees
parameter to truncate the total number of trees shown, giving a readable display for tree sequences with many trees (@hyanwong, #2652)The
draw_svg
methods now accept acanvas_size
parameter to allow extra room on the canvas e.g. for long labels or repositioned graphical elements (@hyanwong, #2646, #2645)The
msprime.RateMap
class has been ported into tskit: functionality should be identical to the version in msprime, apart from minor changes in the formatting of tabular text output (@hyanwong, @jeromekelleher, #2678)Tskit now supports and has wheels for Python 3.11. This Python version has a significant performance boost (@benjeffery, #2624, #2248)
Add the update_sample_flags option to simplify which ensures no node sample flags are changed to allow calling code to manage sample status. (@jeromekelleher, #2662, #2663).
Breaking Changes
[0.5.3] - 2022-10-03#
Fixes
Features
The
ts.nodes
method now takes anorder
parameter so that nodes can be visited in time order (@hyanwong, #2471, #2370)Add
samples
argument toTreeSequence.genotype_matrix
. Default isNone
, where all the sample nodes are selected. (@szhan, #2493, #678)
ts.draw
and thedraw_svg
methods now have an optionalomit_sites
parameter, aiding drawing large trees with many sites and mutations (@hyanwong, #2519, #2516)
Breaking Changes
Single statistics computed with
TreeSequence.general_stat
are now returned as numpy scalars if windows=None, AND; samples is a single list or None (for a 1-way stat), OR indexes is None or a single list of length k (instead of a list of length-k lists). (@gtsambos, #2417, #2308)Accessor methods such as ts.edge(n) and ts.node(n) now allow negative indexes (@hyanwong, #2478, #1008)
ts.subset()
produces valid tree sequences even if nodes are shuffled out of time order (@hyanwong, #2479, #2473), and the same fortables.subset()
(@hyanwong, #2489). This involves sorting the returned tables, potentially changing the returned edge order.
Performance improvements
[0.5.2] - 2022-07-29#
Fixes
Iterating over
ts.variants()
could cause a segfault in tree sequences with large numbers of alleles or very long alleles (@jeromekelleher, #2437, #2429).Various circular references fixed, lowering peak memory usage (@jeromekelleher, #2424, #2423, #2427).
Fix bugs in VCF output when there isn’t a 1-1 mapping between individuals and sample nodes (@jeromekelleher, #2442, #2257, #2446, #2448).
Performance improvements
TreeSequence.site position search performance greatly improved, with much lower memory overhead (@jeromekelleher, #2424).
TreeSequence.samples time/population search performance greatly improved, with much lower memory overhead (@jeromekelleher, #2424, #1916).
The
timeasc
andtimedesc
orders forTree.nodes
have much improved performance and lower memory overhead (@jeromekelleher, #2424, #2423).
Features
Variant objects now have a
.num_missing
attribute and.counts()
and.frequencies
methods (@hyanwong, #2390 #2393).Add the Tree.num_lineages(t) method to return the number of lineages present at time t in the tree (@jeromekelleher, #386, #2422)
Efficient array access to table data now provided via attributes like TreeSequence.nodes_time, etc (@jeromekelleher, #2424).
Breaking Changes
Previously, accessing (e.g.)
tables.edges
returned a different instance of EdgeTable each time. This has been changed to return the same instance for the lifetime of a given TableCollection instance. This is technically a breaking change, although it’s difficult to see how code would depend on the property that (e.g.)tables.edges is not tables.edges
. (@jeromekelleher, #2441, #2080).
[0.5.1] - 2022-07-14#
Fixes
Copies of a Variant object would cause a segfault when
.samples
was accessed. (@benjeffery, #2400, #2401)
Changes
Tables in a table collection can be replaced using the replace_with method (@hyanwong, #1489 #2389)
SVG drawing routines now return a special string object that is automatically rendered in a Jupyter notebook (@hyanwong, #2377)
Features
[0.5.0] - 2022-06-22#
Changes
A
min_time
parameter indraw_svg
enables the youngest node as the y axis min value, allowing negative times. (@hyanwong, #2197, #2215)VcfWriter.write
now prints the site ID of variants in the ID field of the output VCF files. (@roohy, #2103, #2107)Make dumping of tables and tree sequences to disk a zero-copy operation. (@benjeffery, #2111, #2124)
Add
copy
argument toTreeSequence.variants
which if False reuses the returnedVariant
object for improved performance. Defaults to True. (@benjeffery, #605, #2172)tree.mrca
now takes 2 or more arguments and gives the common ancestor of them all. (@savitakartik, #1340, #2121)Add a
edge
attribute to theMutation
class that gives the ID of the edge that the mutation falls on. (@jeromekelleher, #685, #2279).Add the
TreeSequence.split_edges
operation which inserts nodes into edges at a specific time. (@jeromekelleher, #2276, #2296).Add the
TreeSequence.decapitate
(and closely relatedTableCollection.delete_older
) operation to remove topology and mutations older than a give time. (@jeromekelleher, #2236, #2302, #2331).Add the
TreeSequence.individuals_time
andTreeSequence.individuals_population
methods to return arrays of per-individual times and populations, respectively. (@petrelharp, #1481, #2298).Add the
sample_mask
andsite_mask
towrite_vcf
to allow parts of an output VCF to be omitted or marked as missing data. Also add theas_vcf
convenience function, to return VCF as a string. (@jeromekelleher, #2300).Add support for missing data to
write_vcf
, and add theisolated_as_missing
argument. (@jeromekelleher, #2329, #447).Add
Tree.num_children_array
andTree.num_children
. Returns the counts of the number of child nodes for each or a single node in the tree respectively. (@GertjanBisschop, #2318, #2319, #2332)Add
Tree.path_length
. (@jeremyguez, #2249, #2259).Add B1 tree balance index. (@jeremyguez, @jeromekelleher, #2251, #2281, #2346).
Add B2 tree balance index. (@jeremyguez, @jeromekelleher, #2252, #2353, #2354).
Add Sackin tree imbalance index. (@jeremyguez, @jeromekelleher, #2246, #2258).
Add Colless tree imbalance index. (@jeremyguez, @jeromekelleher, #2250, #2266, #2344).
Add
direction
argument toTreeSequence.edge_diffs
, allowing iteration over diffs in the reverse direction. NOTE: this comes with a ~10% performance regression as the implementation was moved from C to Python for simplicity and maintainability. Please open an issue if this affects your application. (@jeromekelleher, @benjeffery, #2120).Add
Tree.edge_array
andTree.edge
. Returns the edge id of the edge encoding the relationship of each node with its parent. (@GertjanBisschop, #2361, #2357)Add
position
argument toTreeSequence.site
. Returns aSite
object if there is one at the specified position. If not, it raisesValueError
. (@szhan, #2234, #2235)
Breaking Changes
The JSON metadata codec now interprets the empty string as an empty object. This means that applying a schema to an existing table will no longer necessitate modifying the existing rows. (@benjeffery, #2064, #2104)
Remove the previously deprecated
as_bytes
argument toTreeSequence.variants
. If you need genotypes in byte form this can be done following the code in theto_macs
method on line5573
oftrees.py
. This argument was initially deprecated more than 3 years ago when the code was part ofmsprime
. (@benjeffery, #605, #2172)Arguments after
ploidy
inwrite_vcf
marked as keyword only (@jeromekelleher, #2329, #2315).When metadata equal to
b''
is printed to text or HTML tables it will render as an empty string rather than"b''"
. (@hyanwong, #2349, #2351)
[0.4.1] - 2022-01-11#
Changes
TableCollection.name_map
has been deprecated in favour oftable_name_map
. (@benjeffery, #1981, #2086)
Fixes
TreeSequence.dump_text
now prints decoded metadata if there is a schema. (@benjeffery, #1860, #1527)Add missing
ReferenceSequence.__eq__
method. (@benjeffery, #2063, #2085)
[0.4.0] - 2021-12-10#
Breaking changes
The
Tree.num_nodes
method is now deprecated with a warning, because it confusingly returns the number of nodes in the entire tree sequence, rather than in the tree. Text summaries of trees (e.g.str(tree)
) now return the number of nodes in the tree, not in the entire tree sequence (@hyanwong, #1966 #1968)The CLI
info
command now gives more detailed information on the tree sequence (@benjeffery, #1611)64 bits are now used to store the sizes of ragged table columns such as metadata, allowing them to hold more data. This change is fully backwards and forwards compatible for all tree-sequences whose ragged column sizes fit into 32 bits. New tree-sequences with large offset arrays that require 64 bits will fail to load in previous versions with error
_tskit.FileFormatError: An incompatible type for a column was found in the file
. (@jeromekelleher, #343, #1527, #1528, #1530, #1554, #1573, #1589,:issue:1598,:issue:1628, #1571, #1579, #1585, #1590, #1602, #1618, #1620, #1652).The Tree class now conceptually has an extra node, the “virtual root” whose children are the roots of the tree. The quintuply linked tree arrays (parent_array, left_child_array, right_child_array, left_sib_array and right_sib_array) all have one extra element. (@jeromekelleher, #1691, #1704).
Tree traversal orders returned by the
nodes
method have changed when there are multiple roots. Previously orders were defined locally for each root, but are now globally across all roots. (@jeromekelleher, #1704).Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence.
TableCollection.sort
no longer sorts individuals. (@benjeffery, #1774, #1789)Metadata encoding errors now raise
MetadataEncodingError
(@benjeffery, #1505, #1827).For
TreeSequence.samples
all arguments afterpopulation
are now keyword only (@benjeffery, #1715, #1831).Remove the method
TreeSequence.to_nexus
and replace withTreeSequence.as_nexus
. As the old method was not generating standards-compliant output, it seems unlikely that it was used by anyone. Calls toto_nexus
will result in a NotImplementedError, informing users of the change. See below for details onas_nexus
.Change default value for
missing_data_char
in theTreeSequence.haplotypes
method from “-” to “N”. This is a more idiomatic usage to indicate missing data rather than a gap in an alignment. (@jeromekelleher, #1893, #1894)
Features
Add the
ibd_segments
method and associated classes to compute, summarise and store segments of identity by descent from a tree sequence (@gtsambos, @jeromekelleher).Allow skipping of site and mutation tables in
TableCollection.sort
(@benjeffery, #1475, #1826).Add
TableCollection.sort_individuals
to sort the individuals as this is no longer done by the default sort (@benjeffery, #1774, #1789).Add
__setitem__
to all tables allowing single rows to be updated. For exampletables.nodes[0] = tables.nodes[0].replace(flags=tskit.NODE_IS_SAMPLE)
(@jeromekelleher, @benjeffery, #1545, #1600).Added a new parameter
time
toTreeSequence.samples()
allowing to select samples at a specific time point or time interval. (@mufernando, @petrelharp, #1692, #1700)Add
table.metadata_vector
to all table classes to allow easy extraction of a single metadata key into an array (@petrelharp, #1676, #1690).Add
time_units
toTreeSequence
to describe the units of the time dimension of the tree sequence. This is then used to generate an error iftime_units
isuncalibrated
when using the branch lengths in statistics. (@benjeffery, #1644, #1760, #1832)Add the
virtual_root
property to the Tree class (@jeromekelleher, #1704).Add the
num_edges
property to the Tree class (@jeromekelleher, #1704).Improved performance for tree traversal methods in the
nodes
iterator. Roughly a 10X performance increase for “preorder”, “postorder”, “timeasc” and “timedesc” (@jeromekelleher, #1704).Substantial performance improvement for
Tree.total_branch_length
(@jeromekelleher, #1794 #1799)Add the
discrete_genome
property to the TreeSequence class which is true if all coordinates are discrete (@jeromekelleher, #1144, #1819)Add a
random_nucleotides
function. (user:jeromekelleher, #1825)Add the
TreeSequence.alignments
method. (user:jeromekelleher, #1825)Add alignment export in the FASTA and nexus formats using the
TreeSequence.write_nexus
andTreeSequence.write_fasta
methods. (@jeromekelleher, @hyanwong, #1894)Add the
discrete_time
property to the TreeSequence class which is true if all time coordinates are discrete or unknown (@benjeffery, #1839, #1890)Add the
skip_tables
option toload
to support only loading top-level information from a file. Also add theignore_tables
option toTableCollection.equals
andTableCollection.assert_equals
to compare only top-level information. (@clwgg, #1882, #1854).Add the
skip_reference_sequence
option toload
. Also add theignore_reference_sequence
optionequals
to compare two table collections without comparing their reference sequence. (@clwgg, #2019, #1971).tskit now supports python 3.10 (@benjeffery, #1895, #1949)
Fixes
dump_tables omitted individual parents. (@benjeffery, #1828, #1884)
Add the
Tree.as_newick
method and deprecateTree.newick
. Theas_newick
method by default labels samples with the pattern"n{node_id}"
which is much more useful that the behaviour ofTree.newick
(which mimicsms
output). (@jeromekelleher, #1671, #1838.)Add the
as_nexus
andwrite_nexus
methods to the TreeSequence class, replacing the brokento_nexus
method (see above). This uses the same sample labelling pattern asas_newick
. (@jeetsukumaran, @jeromekelleher, #1785, #1835, #1836, #1838)load_text created additional populations even if the population table was specified, and didn’t strip newlines from input text (@hyanwong, #1909, #1910)
[0.3.7] - 2021-07-08#
Features
map_mutations
now allows the ancestral state to be specified (@hyanwong, @jeromekelleher, #1542, #1550)
[0.3.6] - 2021-05-14#
Breaking changes
Mutation.position
andMutation.index
which were deprecated in 0.2.2 (Sep ‘19) have been removed.
Features
Add direct, copy-free access to the arrays representing the quintuply-linked structure of
Tree
(e.g.left_child_array
). Allows performant algorithms over the tree structure using, for example, numba (@jeromekelleher, #1299, #1320).Add fancy indexing to tables. E.g.
table[6:86]
returns a new table with the specified rows. Supports slices, index arrays and boolean masks (@benjeffery, #1221, #1348, #1342).Add
Table.append
method for adding rows from classes such asSiteTableRow
andSite
(@benjeffery, #1111, #1254).SVG visualization of a tree sequence can be restricted to displaying between left and right genomic coordinates using the
x_lim
parameter. The default settings now mean that if the left or right flanks of a tree sequence are entirely empty, these regions will not be plotted in the SVG (@hyanwong, #1288).SVG visualization of a single tree allows all mutations on an edge to be plotted via the
all_edge_mutations
param (@hyanwong,:issue:1253, #1258).Entity classes such as
Mutation
,Node
are now python dataclasses (@benjeffery, #1261).Metadata decoding for table row access is now lazy (@benjeffery, #1261).
Add html notebook representation for
Tree
and changeTree.__str__
from dict representation to info table. (@benjeffery, #1269, #1304).Improve display of tables when
print``ed, limiting lines set via ``tskit.set_print_options
(@benjeffery,:issue:1270, #1300).Add
Table.assert_equals
andTableCollection.assert_equals
which give an exact report of any differences. (@benjeffery,:issue:1076, #1328)
Changes
In drawing methods
max_tree_height
andtree_height_scale
have been deprecated in favour ofmax_time
andtime_scale
(@benjeffery,:issue:1262, #1331).
Fixes
Tree sequences were not properly init’d after unpickling (@benjeffery, #1297, #1298)
[0.3.5] - 2021-03-16#
Features
SVG visualization plots mutations at the correct time, if it exists, and a y-axis, with label can be drawn. Both x- and y-axes can be plotted on trees as well as tree sequences (@hyanwong,:issue:840, #580, #1236)
SVG visualization now uses squares for sample nodes and red crosses for mutations, with the site/mutation positions marked on the x-axis. Additionally, an x-axis label can be set (@hyanwong,:issue:1155, #1194, #1182, #1213)
Add
parents
column to the individual table to allow recording of pedigrees (@ivan-krukov, @benjeffery, #852, #1125, #866, #1153, #1177, #1192 #1199).Added
Tree.generate_random_binary
static method to create random binary trees (@hyanwong, @jeromekelleher, #1037).Change the default behaviour of Tree.split_polytomies to generate the shortest possible branch lengths instead of a fixed epsilon of 1e-10. (@jeromekelleher, #1089, #1090)
Default value metadata in
add_row
functions is now schema-dependant, so thatmetadata={}
is no longer needed as an argument when a schema is present (@benjeffery, #1084).default
in metadata schemas is used to fill in missing values when encoding for the struct codec. (@benjeffery, #1073, #1116).Added
canonical
option to table collection sorting (@mufernando, @petrelharp, #705)Added various arguments to
TreeSequence.subset
, to allow for stable population indexing and lossless node reordering with subset. (@petrelharp, #1097)
Changes
Allow mutations that have the same derived state as their parent mutation. (@benjeffery, #1180, #1233)
File minor version change to support individual parents
Breaking changes
tskit now requires Python 3.7 (@benjeffery, #1235)
[0.3.4] - 2020-12-02#
Minor bugfix release.
Bugfixes
Reinstate the unused zlib_compression option to tskit.dump, as msprime < 1.0 still uses it (@jeromekelleher, #1067).
[0.3.3] - 2020-11-27#
Features
Add
TreeSequence.genetic_relatedness
for calculating genetic relatedness between pairs of sets of nodes (@brieuclehmann, #1021, #1023, #974, #973, #898).Expose
TreeSequence.coiterate()
method to allow iteration over 2 sequences simultaneously, aiding comparison of trees from two sequences (@jeromekelleher, @hyanwong, #1021, #1022).tskit is now supported on, and has wheels for, python3.9 (@benjeffery, #982, #907).
Tree.newick()
now has extra optioninclude_branch_lengths
to allow branch lengths to be omitted (@hyanwong, #931).Added
Tree.generate_star
static method to create star-topologies (@hyanwong, #934).Added
Tree.generate_comb
andTree.generate_balanced
methods to create example trees. (@jeromekelleher, #1026).Added
equals
method to TreeSequence, TableCollection and each of the tables which provides more flexible equality comparisons, for example, allowing users to ignore metadata or provenance in the comparison (@mufernando, @jeromekelleher, #896, #897, #913, #917).Added
__eq__
to TreeSequence (@benjeffery, #1011, #1020).ts.dump
andtskit.load
now support reading and writing file objects such as FIFOs and sockets (@benjeffery, #657, #909).Added
tskit.write_ms
for writing to MS format (@saurabhbelsare, #727, #854).Added
TableCollection.indexes
for access to the edge insertion/removal order indexes (@benjeffery, #4, #916).The dictionary representation of a TableCollection now contains its index (@benjeffery, #870, #921).
Added
TreeSequence._repr_html_
for use in jupyter notebooks (@benjeffery, #872, #923).Added
TreeSequence.__str__
to display a summary for terminal usage (@benjeffery, #938, #985).Added
TableCollection.dump
andTableCollection.load
. This allows table collections that are not valid tree sequences to be manipulated (@benjeffery, #14, #986).Added
nbytes
method to tables,TableCollection
andTreeSequence
which reports the size in bytes of those objects (@jeromekelleher, @benjeffery, #54, #871).Added
TableCollection.clear
to clear data table rows and optionally provenances, table schemas and tree-sequence level metadata and schema (@benjeffery, #929, #1001).
Bugfixes
LightWeightTableCollection.asdict
andTableCollection.asdict
now return copies of arrays (@benjeffery, #1025, #1029).The
map_mutations
method previously used the Fitch parsimony method, but this does not produce parsimonious results on non-binary trees. We now now use the Hartigan parsimony algorithm, which does (@jeromekelleher, #987, #1030).The
flag
argument to tables’add_row
was treating the value as signed (@benjeffery, #1027, #1031).
Breaking changes
The argument to
ts.dump
andtskit.load
has been renamed file from path.All arguments to
Tree.newick()
except precision are now keyword-only.Renamed
ts.trait_regression
tots.trait_linear_model
.
[0.3.2] - 2020-09-29#
Breaking changes
The argument order of
Tree.unrank
andcombinatorics.num_labellings
now positions the number of leaves before the tree rank (@daniel-goldstein, #950, #978)Change several methods (
simplify()
,trees()
,Tree()
) so most parameters are keyword only, not positional. This allows reordering of parameters, so that deprecated parameters can be moved, and the parameter order in similar functions, e.g.TableCollection.simplify
andTreeSequence.simplify()
can be made consistent (@hyanwong, #374, #846, #851)
Features
Add
split_polytomies
method to the Tree class (@hyanwong, @jeromekelleher, #809, #815)Tree accessor functions (e.g.
ts.first()
,ts.at()
pass extra parameters such assample_indexes
to the underlyingTree
constructor; alsoroot_threshold
can be specified when callingts.trees()
(@hyanwong, #847, #848)Genomic intervals returned by python functions are now namedtuples, allowing
.left
.right
and.span
usage (@hyanwong, #784, #786, #811)Added
include_terminal
parameter to edge diffs iterator, to output the last edges at the end of a tree sequence (@hyanwong, #783, #787)#832 - Add
metadata_bytes
method to allow access to raw TableCollection metadata (@benjeffery, #842)tskit.is_unknown_time
can now check arrays. (@benjeffery, #857).
[0.3.1] - 2020-09-04#
Bugfixes
#823 - Fix mutation time error when using
simplify(keep_input_roots=True)
(@petrelharp, #823).#821 - Fix mutation rows with unknown time never being equal (@petrelharp, #822).
[0.3.0] - 2020-08-27#
Major feature release for metadata schemas, set-like operations, mutation times, SVG drawing improvements and many others.
Breaking changes
The default display order for tree visualisations has been changed to
minlex
(see below) to stabilise the node ordering and to make trees more readily comparable. The old behaviour is still available withorder="tree"
.File system operations such as dump/load now raise an appropriate OSError instead of
tskit.FileFormatError
. Loading from an empty file now raises andEOFError
.Bad tree topologies are detected earlier, so that it is no longer possible to create a
TreeSequence
object which contains a parent with contradictory children on an interval. Previously an error was thrown when some operation building the trees was attempted (@jeromekelleher, #709).The
TableCollection object
no longer implements the iterator protocol. Previouslylist(tables)
returned a sequence of (table_name, table_instance) tuples. This has been replaced with the more intuitive and future-proofTableCollection.name_map
andTreeSequence.tables_dict
attributes, which perform the same function (@jeromekelleher, #500, #694).The arguments to
TreeSequence.genotype_matrix
,TreeSequence.haplotypes
andTreeSequence.variants
must now be keyword arguments, not positional. This is to support the change fromimpute_missing_data
toisolated_as_missing
in the arguments to these methods. (@benjeffery, #716, #794)
New features
New methods to perform set operations on TableCollections and TreeSequences.
TableCollection.subset
subsets and reorders table collections by nodes (@mufernando, @petrelharp, #663, #690).TableCollection.union
forms the node-wise union of two table collections (@mufernando, @petrelharp, #381 #623).Mutations now have an optional double-precision floating-point
time
column. If not specified, this defaults to a particularNaN
value (tskit.UNKNOWN_TIME
) indicating that the time is unknown. For a tree sequence to be considered valid it must meet new criteria for mutation times, see Mutation requirements. Also added functionTableCollection.compute_mutation_times
. Table sorting orders mutations by non-increasing time per-site, which is also a requirement for a valid tree sequence (@benjeffery, #672).Add support for trees with internal samples for the Kendall-Colijn tree distance metric. (@daniel-goldstein, #610)
Add background shading to SVG tree sequences to reflect tree position along the sequence (@hyanwong, #563).
Tables with a metadata column now have a
metadata_schema
that is used to validate and encode metadata that is passed toadd_row
and decode metadata on calls totable[j]
and e.g.tree_sequence.node(j)
See Metadata (@benjeffery, #491, #542, #543, #601).The tree-sequence now has top-level metadata with a schema (@benjeffery, #666, #644, #642).
Add classes to SVG drawings to allow easy adjustment and styling, and document the new
tskit.Tree.draw_svg()
andtskit.TreeSequence.draw_svg()
methods. This also fixes #467 for duplicate SVG entityid
s in Jupyter notebooks (@hyanwong, #555).Add a
to_nexus
function that outputs a tree sequence in Nexus format (@saunack, #550).Add extension of Kendall-Colijn tree distance metric for tree sequences computed by
TreeSequence.kc_distance
(@daniel-goldstein, #548).Add an optional node traversal order in
tskit.Tree
that uses the minimum lexicographic order of leaf nodes visited. This ordering ("minlex_postorder"
) adds more determinism because it constraints the order in which children of a node are visited (@brianzhang01, #411).Add an
order
argument to the tree visualisation functions which supports two node orderings:"tree"
(the previous default) and"minlex"
which stabilises the node ordering (making it easier to compare trees). The default node ordering is changed to"minlex"
(@brianzhang01, @jeromekelleher, #389, #566).Add
_repr_html_
to tables, so that jupyter notebooks render them as html tables (@benjeffery, #514).Remove support for
kc_distance
on trees with unary nodes (@daniel-goldstein, #508).Improve Kendall-Colijn tree distance algorithm to operate in O(n^2) time instead of O(n^2 * log(n)) where n is the number of samples (@daniel-goldstein, #490).
Add a metadata column to the migrations table. Works similarly to existing metadata columns on other tables (@benjeffery, #505).
Add a metadata column to the edges table. Works similarly to existing metadata columns on other tables (@benjeffery, #496).
Allow sites with missing data to be output by the
haplotypes
method, by default replacing with-
. Errors are no longer raised for missing data withisolated_as_missing=True
; the error types returned for bad alleles (e.g. multiletter or non-ascii) have also changed from_tskit.LibraryError
to TypeError, or ValueError if the missing data character clashes (@hyanwong, #426).Access the number of children of a node in a tree directly using
tree.num_children(u)
(@hyanwong, #436).User specified allele mapping for genotypes in
variants
andgenotype_matrix
(@jeromekelleher, #430).New
root_threshold
option for the Tree class, which allows us to efficiently iterate over ‘real’ roots when we have missing data (@jeromekelleher, #462).Add
tree.as_dict_of_dicts()
function to enable use with networkx. See Networkx (@winni2k, #457).Add
tree_sequence.to_macs()
function to convert tree sequence to MACS format (@winni2k, #727)Add a
keep_input_roots
option to simplify which, if enabled, adds edges from the MRCAs of samples in the simplified tree sequence back to the roots in the input tree sequence (@jeromekelleher, #775, #782).
Bugfixes
#453 - Fix LibraryError when
tree.newick()
is called with large node time values (@jeromekelleher, #637).#777 - Mutations over isolated samples were incorrectly decoded as missing data. (@jeromekelleher, #778)
#776 - Fix a segfault when a partial list of samples was provided to the
variants
iterator. (@jeromekelleher, #778)
Deprecated
The
sample_counts
feature has been deprecated and is now ignored. Sample counts are now always computed.For
TreeSequence.genotype_matrix
,TreeSequence.haplotypes
andTreeSequence.variants
theimpute_missing_data
argument is deprecated and replaced withisolated_as_missing
. Note that to get the same behaviourimpute_missing_data=True
should be replaced withisolated_as_missing=False
. (@benjeffery, #716, #794)
[0.2.3] - 2019-11-22#
Minor feature release, providing a tree distance metric and various method to manipulate tree sequence data.
New features
Kendall-Colijn tree distance metric computed by
Tree.kc_distance
(@awohns, #172).New “timeasc” and “timedesc” orders for tree traversals (@benjeffery, #246, #399).
Up to 2X performance improvements to tree traversals (@benjeffery, #400).
Add
trim
,delete_sites
,keep_intervals
anddelete_intervals
methods to edit tree sequence data. (@hyanwong, #364, #372, #377, #390).Various documentation improvements (@hyanwong, @jeromekelleher, @petrelharp).
Rename the
map_ancestors
function tolink_ancestors
(@hyanwong, @gtsambos; #406, #262). The original function is retained as an deprecated alias.
Bugfixes
Fix height scaling issues with SVG tree drawing (@jeromekelleher, #407, #383, #378).
Do not reuse buffers in
LdCalculator
(@jeromekelleher). See #397 and #396.
[0.2.2] - 2019-09-01#
Minor bugfix release.
Relaxes overly-strict input requirements on individual location data that caused some SLiM tree sequences to fail loading in version 0.2.1 (see #351).
New features
Add log_time height scaling option for drawing SVG trees (@marianne-aspbury). See #324 and #303.
Bugfixes
Allow 4G metadata columns (@jeromekelleher). See #342 and #341.
[0.2.1] - 2019-08-23#
Major feature release, adding support for population genetic statistics, improved VCF output and many other features.
Note: Version 0.2.0 was skipped because of an error uploading to PyPI which could not be undone.
Breaking changes
Genotype arrays returned by
TreeSequence.variants
andTreeSequence.genotype_matrix
have changed from unsigned 8 bit values to signed 8 bit values to accomodate missing data (see #144 for discussion). Specifically, the dtype of the genotypes arrays have changed from numpy “u8” to “i8”. This should not affect client code in any way unless it specifically depends on the type of the returned numpy array.The VCF written by the
write_vcf
is no longer compatible with previous versions, which had significant shortcomings. Position values are now rounded to the nearest integer by default, REF and ALT values are derived from the actual allelic states (rather than always being A and T). Sample names are now of the formtsk_j
for sample ID j. Most of the legacy behaviour can be recovered with new options, however.The positional parameter
reference_sets
ingenealogical_nearest_neighbours
andmean_descendants
TreeSequence methods has been renamed tosample_sets
.
New features
Support for general windowed statistics. Implementations of diversity, divergence, segregating sites, Tajima’s D, Fst, Patterson’s F statistics, Y statistics, trait correlations and covariance, and k-dimensional allele frequency specra (@petrelharp, @jeromekelleher, @molpopgen).
Add the
keep_unary
option to simplify (@gtsambos). See #1 and #143.Add the
map_ancestors
method to TableCollection (user:gtsambos). See #175.Add the
squash
method to EdgeTable (@gtsambos). See #59 and #285.Add support for individuals to VCF output, and fix major issues with output format (@jeromekelleher). Position values are transformed in a much more straightforward manner and output has been generalised substantially. Adds
individual_names
andposition_transform
arguments. See #286, and issues #2, #30 and #73.Control height scale in SVG trees using ‘tree_height_scale’ and ‘max_tree_height’ (@hyanwong, @jeromekelleher). See #167, #168. Various other improvements to tree drawing (#235, #241, #242, #252, #259).
Add
Tree.max_root_time
property (@hyanwong, @jeromekelleher). See #170.Improved input checking on various methods taking numpy arrays as parameters (@hyanwong). See #8 and #185.
Define the branch length over roots in trees to be zero (previously raise an error; @jeromekelleher). See #188 and #191.
Implementation of the genealogical nearest neighbours statistic (@hyanwong, @jeromekelleher).
New
delete_intervals
andkeep_intervals
method for the TableCollection to allow slicing out of topology from specific intervals (@hyanwong, @andrewkern, @petrelharp, @jeromekelleher). See #225 and #261.Support for missing data via a topological definition (@jeromekelleher). See #270 and #272.
Add ability to set columns directly in the Tables API (@jeromekelleher). See #12 and #307.
Various documentation improvements from @brianzhang01, @hyanwong, @petrelharp and @jeromekelleher.
Deprecated
Deprecate
Tree.length
in favour ofTree.span
(@hyanwong). See #169.Deprecate
TreeSequence.pairwise_diversity
in favour of the newdiversity
method. See #215, #312.
Bugfixes
[0.1.5] - 2019-03-27#
This release removes support for Python 2, adds more flexible tree access and a
new tskit
command line interface.
New features
More flexible tree API (#121). Adds
TreeSequence.at
andTreeSequence.at_index
methods to find specific trees, and efficient support for backwards traversal usingreversed(ts.trees())
.Add initial
tskit
CLI (#80)Add
tskit info
CLI command (#66)Enable drawing SVG trees with coloured edges (@hyanwong; #149).
Add
Tree.is_descendant
method (#120)Add
Tree.copy
method (#122)
Bugfixes
[0.1.4] - 2019-02-01#
Minor feature update. Using the C API 0.99.1.
New features
Add interface for setting TableCollection.sequence_length: tskit-dev/tskit#107
Add support for building and dropping TableCollection indexes: tskit-dev/tskit#108
[0.1.3] - 2019-01-14#
Bugfix release.
Bugfixes
Fix missing provenance schema: tskit-dev/tskit#81
[0.1.2] - 2019-01-14#
Bugfix release.
Bugfixes
Fix memory leak in table collection. tskit-dev/tskit#76
[0.1.1] - 2019-01-11#
Fixes broken distribution tarball for 0.1.0.
[0.1.0] - 2019-01-11#
Initial release after separation from msprime 0.6.2. Code that reads tree sequence files and processes them should be able to work without changes.
Breaking changes
Removal of the previously deprecated
sort_tables
,simplify_tables
andload_tables
functions. All code should change to using corresponding TableCollection methods.Rename
SparseTree
class toTree
.
[1.1.0a1] - 2019-01-10#
Initial alpha version posted to PyPI for bootstrapping.
[0.0.0] - 2019-01-10#
Initial extraction of tskit code from msprime. Relicense to MIT.
Code copied at hash 29921408661d5fe0b1a82b1ca302a8b87510fd23
C API#
[1.1.4] - 2024-XX-XX#
[1.1.3] - 2024-10-16#
Features
Add the tsk_treeseq_extend_haplotypes method that can compress a tree sequence by extending edges into adjacent trees and thus creating unary nodes in those trees (@petrelharp, @hfr1tze, @avabamf, #2651, #2938).
[1.1.2] - 2023-05-17#
Performance improvements
tsk_tree_seek is now much faster at seeking to arbitrary points along the sequence from the null tree (@molpopgen, #2661).
Features
The struct
tsk_treeseq_t
now has the variablesmin_time
andmax_time
, which are the minimum and maximum among the node times and mutation times, respectively.min_time
andmax_time
can be accessed using the functionstsk_treeseq_get_min_time
andtsk_treeseq_get_max_time
, respectively. (@szhan, #2612, #2271)Add the TSK_SIMPLIFY_NO_FILTER_NODES option to simplify to allow unreferenced nodes be kept in the output (@jeromekelleher, @hyanwong, #2606, #2619).
Add the TSK_SIMPLIFY_NO_UPDATE_SAMPLE_FLAGS option to simplify which ensures no node sample flags are changed to allow calling code to manage sample status. (@jeromekelleher, #2662, #2663).
Guarantee that unfiltered tables are not written to unnecessarily during simplify (@jeromekelleher, #2619).
Add x_table_keep_rows methods to provide efficient in-place table subsetting (@jeromekelleher, #2700).
Add tsk_tree_seek_index function
[1.1.1] - 2022-07-29#
Bug fixes
Fix segfault in tsk_variant_restricted_copy in tree sequences with large numbers of alleles or very long alleles (@jeromekelleher, #2437, #2429).
[1.1.0] - 2022-07-14#
Features
Add
num_children
totsk_tree_t
an array which contains counts of the number of child nodes of each node in the tree. (@GertjanBisschop, #2274, #2316)Add
edge
totsk_tree_t
an array which contains theedge_id
of the edge encoding the relationship between the child node and its parent for each (child) node in the tree. (@GertjanBisschop, #2304, #2340)
Changes
Reduce the maximum number of rows in a table by 1. This removes edge cases so that a
tsk_id_t
can be used to count the number of rows. (@benjeffery, #2336, #2337)Samples are now copied by
tsk_variant_restricted_copy
. (@benjeffery, #2400, #2401)
[1.0.0] - 2022-05-24#
This major release marks the point at which the documented API becomes stable and supported.
Breaking changes
Change the type of genotypes to
int32_t
, removing the TSK_16_BIT_GENOTYPES flag option. (@benjeffery, #463, #2108)tsk_variant_t
now includes itstsk_site_t
rather than pointing to it. (@benjeffery, #2161, #2162)Rename
TSK_TAKE_TABLES
toTSK_TAKE_OWNERSHIP
. (@benjeffery, #2221, #2222)TSK_DEBUG
,TSK_NO_INIT
,TSK_NO_CHECK_INTEGRITY
andTSK_TAKE_OWNERSHIP
have moved tocore.h
(@benjeffery, #2218, #2230))- Rename several flags:
All flags to
simplify
for exampleTSK_KEEP_INPUT_ROOTS
becomesTSK_SIMPLIFY_KEEP_INPUT_ROOTS
.All flags to
subset
for exampleTSK_KEEP_UNREFERENCED
becomesTSK_SUBSET_KEEP_UNREFERENCED
.TSK_BUILD_INDEXES
->TSK_TS_INIT_BUILD_INDEXES
TSK_NO_METADATA
->TSK_TABLE_NO_METADATA
TSK_NO_EDGE_METADATA
->TSK_TC_NO_EDGE_METADATA
(@benjeffery, #1720, #2226, #2229, #2224)
Remove the generic
TSK_ERR_OUT_OF_BOUNDS
- replacing with specific errors. RemoveTSK_ERR_NON_SINGLE_CHAR_MUTATION
which was unused. (@benjeffery, #2260)Reorder stats API methods to place
result
as the last argument. (@benjeffery, #2292, #2285)
Features
Make dumping of tables and tree sequences to disk a zero-copy operation. (@benjeffery, #2111, #2124)
Add
edge
attribute tomutation_t
struct and make available in tree sequence. (@jeromekelleher, #685, #2279)Reduce peak memory usage in
tsk_treeseq_simplify
. (@jeromekelleher, #2287, #2288)
[0.99.15] - 2021-12-07#
Breaking changes
The
tables
argument totsk_treeseq_init
is no longerconst
, to allow for future no-copy tree sequence creation. (@benjeffery, #1718, #1719)Additional consistency checks for mutation tables are now run by
tsk_table_collection_check_integrity
even whenTSK_CHECK_MUTATION_ORDERING
is not passed in. (@petrelharp, #1713, #1722)num_tracked_samples
andnum_samples
intsk_tree_t
are now typed astsk_size_t
(@benjeffery, #1723, #1727)The previously deprecated option
TSK_SAMPLE_COUNTS
has been removed. (@benjeffery, #1744, #1761).Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence.
tsk_table_collection_sort
no longer sorts individuals. (@benjeffery, #1774, #1789)The
tsk_tree_t.left_root
member has been removed. Client code can be updated most easily by using the equivalenttsk_tree_get_left_root
function. However, it may be worth considering updating code to use either the standard traversal functions (which automatically iterate over roots) or to use thevirtual_root
member (which may lead to more concise code). (@jeromekelleher, #1796, #1862)Rename
tsk_tree_t.left
andtsk_tree_t.right
members totsk_tree_t.interval.left
andtsk_tree_t.interval.right
respectively. (@jeromekelleher, #1686, #1913)kastore
is now vendored into this repo instead of being a git submodule. Developers need to rungit submodule update
. (@jeromekelleher, #1687, #1973)Tree
arrays such asleft_sib
,right_child
etc. now have an additional “virtual root” node at the end. (@jeromekelleher, #1691, #1704)marked
andmark
have been removed fromtsk_tree_t
. (@jeromekelleher, #1936)
Features
Add
tsk_table_collection_individual_topological_sort
to sort the individuals as this is no longer done by the default sort. (@benjeffery, #1774, #1789)The default behaviour for table size growth is now to double the current size of the table, up to a threshold. To keep the previous behaviour, use (e.g.)
tsk_edge_table_set_max_rows_increment(tables->edges, 1024)
, which results in adding space for 1024 additional rows each time we run out of space in the edge table. (@benjeffery, #5, #1683)tsk_table_collection_check_integrity
now has aTSK_CHECK_MIGRATION_ORDERING
flag. (@petrelharp, #1722)The default behaviour for ragged column growth is now to double the current size of the column, up to a threshold. To keep the previous behaviour, use (e.g.)
tsk_node_table_set_max_metadata_length_increment(tables->nodes, 1024)
, which results in adding space for 1024 additional entries each time we run out of space in the ragged column. (@benjeffery, #1703, #1709)Support for compiling the C library on Windows using msys2 (@jeromekelleher, #1742).
Add
time_units
totsk_table_collection_t
to describe the units of the time dimension of the tree sequence. This is then used to geerate an error iftime_units
isuncalibrated
when using the branch lengths in statistics. (@benjeffery, #1644, #1760)Add the
TSK_LOAD_SKIP_TABLES
option to load just the top-level information from a file. Also add theTSK_CMP_IGNORE_TABLES
option to compare only the top-level information in two table collections. (@clwgg, #1882, #1854).Add reference sequence. (@jeromekelleher, @benjeffery, #146, #1911, #1944, #1911)
Add the
TSK_LOAD_SKIP_REFERENCE_SEQUENCE
option to load a table collection without the reference sequence. Also add the TSK_CMP_IGNORE_REFERENCE_SEQUENCE option to compare two table collections without comparing their reference sequence. (@clwgg, #2019, #1971).Add a “virtual root” to
Tree
arrays such asleft_sib
,right_child
etc. The virtual root is appended to each array, has all real roots as its children, but is not the parent of any node. Simplifies traversal algorithms. (@jeromekelleher, #1691, #1704)Add
num_edges
totsk_tree_t
to count the edges that define the topology of the tree. (@jeromekelleher, #1704)Add the
tsk_tree_get_size_bound
function which returns an upper bound on the number of nodes reachable from the roots of a tree. Useful for tree stack allocations (@jeromekelleher, #1704).Add
MetadataSchema.permissive_json
for an easy way to get the simplest schema.
[0.99.14] - 2021-09-03#
Breaking changes
64 bits are now used to store the sizes of ragged table columns such as metadata, allowing them to hold more data. As such
tsk_size_t
is now 64 bits wide. This change is fully backwards and forwards compatible for all tree-sequences whose ragged column sizes fit into 32 bits. New tree-sequences with large offset arrays that require 64 bits will fail to load in previous versions with errorTSK_ERR_BAD_COLUMN_TYPE
. (@jeromekelleher, #343, #1527, #1528, #1530, #1554, #1573, #1589,:issue:1598,:issue:1628, #1571, #1579, #1585, #1590, #1602, #1618, #1620, #1652).
Features
Add tsk_X_table_update_row methods which allow modifying single rows of tables (@jeromekelleher, #1545, #1552).
[0.99.13] - 2021-07-08#
Fixes
Fix segfault when very large columns overflow (@bhaller, @benjeffery, #1509, #1511).
[0.99.12] - 2021-05-14#
Breaking changes
Removed
TSK_NO_BUILD_INDEXES
. Not building indexes is now the default behaviour of tsk_table_collection_dump and related functions. (@molpopgen, #1327, #1337).
Features
Add
tsk_*_table_extend
methods to append to a table from another (@benjeffery, #1271, #1287).
Fixes
[0.99.11] - 2021-03-16#
Features
Add
parents
to the individual table to enable recording of pedigrees (@ivan-krukov, @benjeffery, #852, #1125, #866, #1153, #1177, #1199).Added a
tsk_table_collection_canonicalise
method, that allows checking for equality between tables that are equivalent up to reordering (@petrelharp, @mufernando, #1108).Removed a previous requirement on
tsk_table_collection_union
, allowing for unioning of new information both above and below shared history (@petrelharp, @mufernando, #1108).Support migrations in tsk_table_collection_sort. (@jeromekelleher, #22, #117, #1131).
Breaking changes
Method
tsk_individual_table_add_row
has an extra argumentsparents
andparents_length
.Add an
options
argument totsk_table_collection_subset
(@petrelharp, #1108), to allow for retaining the order of populations.Mutation error codes have changed
Changes
Allow mutations that have the same derived state as their parent mutation. (@benjeffery, #1180, #1233)
File minor version change to support individual parents
[0.99.10] - 2021-01-25#
Minor bugfix on internal APIs
[0.99.9] - 2021-01-22#
Features
[0.99.8] - 2020-11-27#
Features
Add
tsk_treeseq_genetic_relatedness
for calculating genetic relatedness between pairs of sets of nodes (@brieuclehmann, #1021, #1023, #974, #973, #898).Exposed
tsk_table_collection_set_indexes
to the API (@benjeffery, #870, #921).
Breaking changes
Added an
options
argument totsk_table_collection_equals
and table equality methods to allow for more flexible equality criteria (e.g., ignore top-level metadata and schema or provenance tables). Existing code should add an extra final parameter0
to retain the current behaviour (@mufernando, @jeromekelleher, #896, #897, #913, #917).Changed default behaviour of
tsk_table_collection_clear
to not clear provenances and addedoptions
argument to optionally clear provenances and schemas (@benjeffery, #929, #1001).Renamed
ts.trait_regression
tots.trait_linear_model
.
[0.99.7] - 2020-09-29#
Added
TSK_INCLUDE_TERMINAL
option totsk_diff_iter_init
to output the last edges at the end of a tree sequence (@hyanwong, #783, #787).Added
tsk_bug_assert
for assertions that should be compiled into release binaries (@benjeffery, #860).
[0.99.6] - 2020-09-04#
Bugfixes
#823 - Fix mutation time error when using
tsk_table_collection_simplify
withTSK_SIMPLIFY_KEEP_INPUT_ROOTS
(@petrelharp, #823).
[0.99.5] - 2020-08-27#
Breaking changes
The macro
TSK_IMPUTE_MISSING_DATA
is renamed toTSK_ISOLATED_NOT_MISSING
(@benjeffery, #716, #794)
New features
Add a
TSK_SIMPLIFY_KEEP_INPUT_ROOTS
option to simplify which, if enabled, adds edges from the MRCAs of samples in the simplified tree sequence back to the roots in the input tree sequence (@jeromekelleher, #775, #782).
Bugfixes
#777 - Mutations over isolated samples were incorrectly decoded as missing data. (@jeromekelleher, #778)
#776 - Fix a segfault when a partial list of samples was provided to the
variants
iterator. (@jeromekelleher, #778)
[0.99.4] - 2020-08-12#
Note
The
TSK_VERSION_PATCH
macro was incorrectly set to4
for 0.99.3, so both 0.99.4 and 0.99.3 have the same value.
Changes
Mutation times can be a mixture of known and unknown as long as for each individual site they are either all known or all unknown (@benjeffery, #761).
Bugfixes
Fix for including core.h under C++ (@petrelharp, #755).
[0.99.3] - 2020-07-27#
Breaking changes
tsk_mutation_table_add_row
has an extratime
argument. If the time is unknownTSK_UNKNOWN_TIME
should be passed. (@benjeffery, #672)Change genotypes from unsigned to signed to accommodate missing data (see #144 for discussion). This only affects users of the
tsk_vargen_t
class. Genotypes are now stored as int8_t and int16_t types rather than the former unsigned types. The field names in the genotypes union of thetsk_variant_t
struct returned bytsk_vargen_next
have been renamed toi8
andi16
accordingly; care should be taken when updating client code to ensure that types are correct. The number of distinct alleles supported by 8 bit genotypes has therefore dropped from 255 to 127, with a similar reduction for 16 bit genotypes.Change the
tsk_vargen_init
method to take an extra parameteralleles
. To keep the current behaviour, set this parameter to NULL.Edges can now have metadata. Hence edge methods now take two extra arguments: metadata and metadata length. The file format has also changed to accommodate this, but is backwards compatible. Edge metadata can be disabled for a table collection with the TSK_NO_EDGE_METADATA flag. (@benjeffery, #496, #712)
Migrations can now have metadata. Hence migration methods now take two extra arguments: metadata and metadata length. The file format has also changed to accommodate this, but is backwards compatible. (@benjeffery, #505)
The text dump of tables with metadata now includes the metadata schema as a header. (@benjeffery, #493)
Bad tree topologies are detected earlier, so that it is no longer possible to create a tsk_treeseq_t object which contains a parent with contradictory children on an interval. Previously an error occured when some operation building the trees was attempted (@jeromekelleher, #709).
New features
New methods to perform set operations on table collections.
tsk_table_collection_subset
subsets and reorders table collections by nodes (@mufernando, @petrelharp, #663, #690).tsk_table_collection_union
forms the node-wise union of two table collections (@mufernando, @petrelharp, #381, #623).Mutations now have an optional double-precision floating-point
time
column. If not specified, this defaults to a particular NaN value (TSK_UNKNOWN_TIME
) indicating that the time is unknown. For a tree sequence to be considered valid it must meet new criteria for mutation times, see Mutation requirements. Addtsk_table_collection_compute_mutation_times
and new flag totsk_table_collection_check_integrity
:TSK_CHECK_MUTATION_TIME
. Table sorting orders mutations by non-increasing time per-site, which is also a requirement for a valid tree sequence. (@benjeffery, #672)Add
metadata
andmetadata_schema
fields to table collection, with accessors on tree sequence. These store arbitrary bytes and are optional in the file format. (:user: benjeffery, #641)Add the
TSK_SIMPLIFY_KEEP_UNARY
option to simplify (@gtsambos). See #1 and #143.Add a
set_root_threshold
option to tsk_tree_t which allows us to set the number of samples a node must be an ancestor of to be considered a root (#462).Change the semantics of tsk_tree_t so that sample counts are always computed, and add a new
TSK_NO_SAMPLE_COUNTS
option to turn this off (#462).Tables with metadata now have an optional metadata_schema field that can contain arbitrary bytes. (@benjeffery, #493)
Tables loaded from a file can now be edited in the same way as any other table collection (@jeromekelleher, #536, #530.
Support for reading/writing to arbitrary file streams with the loadf/dumpf variants for tree sequence and table collection load/dump (@jeromekelleher, @grahamgower, #565, #599).
Add low-level sorting API and
TSK_NO_CHECK_INTEGRITY
flag (@jeromekelleher, #627, #626).Add extension of Kendall-Colijn tree distance metric for tree sequences computed by
tsk_treeseq_kc_distance
(@daniel-goldstein, #548)
Deprecated
The
TSK_SAMPLE_COUNTS
options is now ignored and will print out a warning if used (#462).
[0.99.2] - 2019-03-27#
Bugfix release. Changes:
Fix incorrect errors on tbl_collection_dump (#132)
Catch table overflows (#157)
[0.99.1] - 2019-01-24#
Refinements to the C API as we move towards 1.0.0. Changes:
Change the
_tbl_
abbreviation to_table_
to improve readability. Hence, we now have, e.g.,tsk_node_table_t
etc.Change
tsk_tbl_size_t
totsk_size_t
.Standardise public API to use
tsk_size_t
andtsk_id_t
as appropriate.Add
tsk_flags_t
typedef and consistently use this as the type used to encode bitwise flags. To avoid confusion, functions now have anoptions
parameter.Rename
tsk_table_collection_position_t
totsk_bookmark_t
.Rename
tsk_table_collection_reset_position
totsk_table_collection_truncate
andtsk_table_collection_record_position
totsk_table_collection_record_num_rows
.Generalise
tsk_table_collection_sort
to take a bookmark as start argument.Relax restriction that nodes in the
samples
argument to simplify must currently be marked as samples. (tskit-dev/tskit#72)Allow
tsk_table_collection_simplify
to take a NULL samples argument to specify “all samples in the current tables”.Add support for building as a meson subproject.
[0.99.0] - 2019-01-14#
Initial alpha version of the tskit C API tagged. Version 0.99.x represents the series of releases leading to version 1.0.0 which will be the first stable release. After 1.0.0, semver rules regarding API/ABI breakage will apply; however, in the 0.99.x series arbitrary changes may happen.
[0.0.0] - 2019-01-10#
Initial extraction of tskit code from msprime. Relicense to MIT. Code copied at hash 29921408661d5fe0b1a82b1ca302a8b87510fd23