C API

This is the documentation for the tskit C API, a low-level library for manipulating and processing tree sequence data. The library is written using the C99 standard and is fully thread safe. Tskit uses kastore to define a simple storage format for the tree sequence data.

To see the API in action, please see Examples section.

Overview

Do I need the C API?

The tskit C API is generally useful in the following situations:

  • You want to use the tskit API in a larger C/C++ application (e.g., in order to output data in the .trees format);

  • You need to perform lots of tree traversals/loops etc. to analyse some data that is in tree sequence form.

For high level operations that are not performance sensitive, the Python API is generally more useful. Python is much more convenient that C, and since the tskit Python module is essentially a wrapper for this C library, there’s often no real performance penalty for using it.

Differences with the Python API

Much of the explanatory material (for example tutorials) about the Python API applies to the C-equivalent methods as the Python API wraps this API.

The main area of difference is, unlike the Python API, the C API doesn’t do any decoding, encoding or schema validation of Metadata fields, instead only handling the byte sting representation of the metadata. Metadata is therefore never used directly by any tskit C API method, just stored.

API stability contract

Since the C API 1.0 release we pledge to make no breaking changes to the documented API in subsequent releases in the 1.0 series. What this means is that any code that compiles under the 1.0 release should also compile without changes in subsequent 1.x releases. We will not change the semantics of documented functions, unless it is to fix clearly buggy behaviour. We will not change the values of macro constants.

Undocumented functions do not have this guarantee, and may be changed arbitrarily between releases.

Note

We do not currently make any guarantees about ABI stability, since the primary use-case is for tskit to be embedded within another application rather than used as a shared library. If you do intend to use tskit as a shared library and ABI stability is therefore imporant to you, please let us know and we can plan accordingly.

API structure

Tskit uses a set of conventions to provide a pseudo object-oriented API. Each ‘object’ is represented by a C struct and has a set of ‘methods’. This is most easily explained by an example:

#include <stdio.h>
#include <stdlib.h>
#include <tskit/tables.h>

#define check_tsk_error(val)                                                            \
    if (val < 0) {                                                                      \
        fprintf(stderr, "line %d: %s", __LINE__, tsk_strerror(val));                    \
        exit(EXIT_FAILURE);                                                             \
    }

int
main(int argc, char **argv)
{
    int j, ret;
    tsk_edge_table_t edges;

    ret = tsk_edge_table_init(&edges, 0);
    check_tsk_error(ret);
    for (j = 0; j < 5; j++) {
        ret = tsk_edge_table_add_row(&edges, 0, 1, j + 1, j, NULL, 0);
        check_tsk_error(ret);
    }
    tsk_edge_table_print_state(&edges, stdout);
    tsk_edge_table_free(&edges);

    return EXIT_SUCCESS;
}

In this program we create a tsk_edge_table_t instance, add five rows using tsk_edge_table_add_row(), print out its contents using the tsk_edge_table_print_state() debugging method, and finally free the memory used by the edge table object. We define this edge table ‘class’ by using some simple naming conventions which are adhered to throughout tskit. This is simply a naming convention that helps to keep code written in plain C logically structured; there are no extra C++ style features. We use object oriented terminology freely throughout this documentation with this understanding.

In this convention, a class is defined by a struct tsk_class_name_t (e.g. tsk_edge_table_t) and its methods all have the form tsk_class_name_method_name whose first argument is always a pointer to an instance of the class (e.g., tsk_edge_table_add_row above). Each class has an initialise and free method, called tsk_class_name_init and tsk_class_name_free, respectively. The init method must be called to ensure that the object is correctly initialised (except for functions such as for tsk_table_collection_load() and tsk_table_collection_copy() which automatically initialise the object by default for convenience). The free method must always be called to avoid leaking memory, even in the case of an error occurring during initialisation. If tsk_class_name_init has been called successfully, we say the object has been “initialised”; if not, it is “uninitialised”. After tsk_class_name_free has been called, the object is again uninitialised.

It is important to note that the init methods only allocate internal memory; the memory for the instance itself must be allocated either on the heap or the stack:

// Instance allocated on the stack
tsk_node_table_t nodes;
tsk_node_table_init(&nodes, 0);
tsk_node_table_free(&nodes);

// Instance allocated on the heap
tsk_edge_table_t *edges = malloc(sizeof(tsk_edge_table_t));
tsk_edge_table_init(edges, 0);
tsk_edge_table_free(edges);
free(edges);

Error handling

C does not have a mechanism for propagating exceptions, and great care must be taken to ensure that errors are correctly and safely handled. The convention adopted in tskit is that every function (except for trivial accessor methods) returns an integer. If this return value is negative an error has occured which must be handled. A description of the error that occured can be obtained using the tsk_strerror() function. The following example illustrates the key conventions around error handling in tskit:

#include <stdio.h>
#include <stdlib.h>
#include <tskit.h>

int
main(int argc, char **argv)
{
    int ret;
    tsk_treeseq_t ts;

    if (argc != 2) {
        fprintf(stderr, "usage: <tree sequence file>");
        exit(EXIT_FAILURE);
    }
    ret = tsk_treeseq_load(&ts, argv[1], 0);
    if (ret < 0) {
        /* Error condition. Free and exit */
        tsk_treeseq_free(&ts);
        fprintf(stderr, "%s", tsk_strerror(ret));
        exit(EXIT_FAILURE);
    }
    printf("Loaded tree sequence with %lld nodes and %lld edges from %s\n",
        (long long) tsk_treeseq_get_num_nodes(&ts),
        (long long) tsk_treeseq_get_num_edges(&ts),
        argv[1]);
    tsk_treeseq_free(&ts);

    return EXIT_SUCCESS;
}

In this example we load a tree sequence from file and print out a summary of the number of nodes and edges it contains. After calling tsk_treeseq_load() we check the return value ret to see if an error occured. If an error has occured we exit with an error message produced by tsk_strerror(). Note that in this example we call tsk_treeseq_free() whether or not an error occurs: in general, once a function that initialises an object (e.g., X_init, X_copy or X_load) is called, then X_free must be called to ensure that memory is not leaked.

Most functions in tskit return an error status; we recommend that every return value is checked.

Memory allocation strategy

To reduce the frequency of memory allocations tskit pre-allocates space for additional table rows in each table, along with space for the contents of ragged columns. The default behaviour is to start with space for 1,024 rows in each table and 65,536 bytes in each ragged column. The table then grows as needed by doubling, until a maximum pre-allocation of 2,097,152 rows for a table or 104,857,600 bytes for a ragged column. This behaviour can be disabled and a fixed increment used, on a per-table and per-ragged-column basis using the tsk_X_table_set_max_rows_increment and tsk_provenance_table_set_max_X_length_increment methods where X is the name of the table or column.

Using tskit in your project

Tskit is built as a standard C library and so there are many different ways in which it can be included in downstream projects. It is possible to install tskit onto a system (i.e., installing a shared library and header files to a standard locations on Unix) and linking against it, but there are many different ways in which this can go wrong. In the interest of simplicity and improving the end-user experience we recommend embedding tskit directly into your applications.

There are many different build systems and approaches to compiling code, and so it’s not possible to give definitive documentation on how tskit should be included in downstream projects. Please see the build examples repo for some examples of how to incorporate tskit into different project structures and build systems.

Tskit uses the meson build system internally, and supports being used a meson subproject. We show an example in which this is combined with the tskit distribution tarball to neatly abstract many details of cross-platform C development.

Some users may choose to check the source for tskit directly into their source control repositories. If you wish to do this, the code is in the c subdirectory of the tskit repo. The following header files should be placed in the search path: subprojects/kastore/kastore.h, tskit.h, and tskit/*.h. The C files subprojects/kastore/kastore.c and tskit/*.c should be compiled. For those who wish to minimise the size of their compiled binaries, tskit is quite modular, and C files can be omitted if not needed. For example, if you are just using the Generic Errors then only the files tskit/core.[c,h] and tskit/tables.[c,h] are needed.

However you include tskit in your project, however, please ensure that it is a released version. Released versions are tagged on GitHub using the convention C_{VERSION}. The code can either be downloaded from GitHub on the releases page where each release has a distribution tarball for example https://github.com/tskit-dev/tskit/releases/download/C_1.0.0/tskit-1.0.0.tar.xz Alternatively the code can be checked out using git. For example, to check out the C_1.0.0 release:

$ git clone https://github.com/tskit-dev/tskit.git
$ cd tskit
$ git checkout C_1.0.0

Basic Types

typedef int32_t tsk_id_t

Tskit Object IDs.

All objects in tskit are referred to by integer IDs corresponding to the row they occupy in the relevant table. The tsk_id_t type should be used when manipulating these ID values. The reserved value TSK_NULL (-1) defines missing data.

typedef uint64_t tsk_size_t

Tskit sizes.

The tsk_size_t type is an unsigned integer used for any size or count value.

typedef uint32_t tsk_flags_t

Container for bitwise flags.

Bitwise flags are used in tskit as a column type and also as a way to specify options to API functions.

Common options

TSK_DEBUG (1u << 31)

Turn on debugging output. Not supported by all functions.

TSK_NO_INIT (1u << 30)

Do not initialise the parameter object.

TSK_NO_CHECK_INTEGRITY (1u << 29)

Do not run integrity checks before performing an operation. This performance optimisation should not be used unless the calling code can guarantee reference integrity within the table collection. References to rows not in the table or bad offsets will result in undefined behaviour.

TSK_TAKE_OWNERSHIP (1u << 28)

Instead of taking a copy of input objects, the function should take ownership of them and manage their lifecycle. The caller specifying this flag should no longer modify or free the object or objects passed. See individual functions using this flag for what object it applies to.

Tables API

The tables API section of tskit is defined in the tskit/tables.h header.

Table collections

struct tsk_table_collection_t

A collection of tables defining the data for a tree sequence.

Public Members

double sequence_length

The sequence length defining the tree sequence’s coordinate space.

char *time_units

The units of the time dimension.

char *metadata

The tree-sequence metadata.

char *metadata_schema

The metadata schema.

tsk_individual_table_t individuals

The individual table.

tsk_node_table_t nodes

The node table.

tsk_edge_table_t edges

The edge table.

tsk_migration_table_t migrations

The migration table.

tsk_site_table_t sites

The site table.

tsk_mutation_table_t mutations

The mutation table.

tsk_population_table_t populations

The population table.

tsk_provenance_table_t provenances

The provenance table.

struct tsk_bookmark_t

A bookmark recording the position of all the tables in a table collection.

Public Members

tsk_size_t individuals

The position in the individual table.

tsk_size_t nodes

The position in the node table.

tsk_size_t edges

The position in the edge table.

tsk_size_t migrations

The position in the migration table.

tsk_size_t sites

The position in the site table.

tsk_size_t mutations

The position in the mutation table.

tsk_size_t populations

The position in the population table.

tsk_size_t provenances

The position in the provenance table.

int tsk_table_collection_init(tsk_table_collection_t *self, tsk_flags_t options)

Initialises the table collection by allocating the internal memory and initialising all the constituent tables.

This must be called before any operations are performed on the table collection. See the API structure for details on how objects are initialised and freed.

Options

Options can be specified by providing bitwise flags:

Parameters
  • self – A pointer to an uninitialised tsk_table_collection_t object.

  • options – Allocation time options as above.

Returns

Return 0 on success or a negative value on failure.

int tsk_table_collection_free(tsk_table_collection_t *self)

Free the internal memory for the specified table collection.

Parameters
Returns

Always returns 0.

int tsk_table_collection_clear(tsk_table_collection_t *self, tsk_flags_t options)

Clears data tables (and optionally provenances and metadata) in this table collection.

By default this operation clears all tables except the provenance table, retaining table metadata schemas and the tree-sequence level metadata and schema.

No memory is freed as a result of this operation; please use tsk_table_collection_free() to free internal resources.

Options

Options can be specified by providing one or more of the following bitwise flags:

Parameters
Returns

Return 0 on success or a negative value on failure.

bool tsk_table_collection_equals(const tsk_table_collection_t *self, const tsk_table_collection_t *other, tsk_flags_t options)

Returns true if the data in the specified table collection is equal to the data in this table collection.

Returns true if the two table collections are equal. The indexes are not considered as these are derived from the tables. We also do not consider the file_uuid, since it is a property of the file that set of tables is stored in.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) two table collections are considered equal if all of the tables are byte-wise identical, and the sequence lengths, metadata and metadata schemas of the two table collections are identical.

Parameters
Returns

Return true if the specified table collection is equal to this table.

int tsk_table_collection_copy(const tsk_table_collection_t *self, tsk_table_collection_t *dest, tsk_flags_t options)

Copies the state of this table collection into the specified destination.

By default the method initialises the specified destination table collection. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Options

Options can be specified by providing bitwise flags:

TSK_COPY_FILE_UUID

Parameters
  • self – A pointer to a tsk_table_collection_t object.

  • dest – A pointer to a tsk_table_collection_t object. If the TSK_NO_INIT option is specified, this must be an initialised table collection. If not, it must be an uninitialised table collection.

  • options – Bitwise option flags.

Returns

Return 0 on success or a negative value on failure.

void tsk_table_collection_print_state(const tsk_table_collection_t *self, FILE *out)

Print out the state of this table collection to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters
int tsk_table_collection_load(tsk_table_collection_t *self, const char *filename, tsk_flags_t options)

Load a table collection from a file path.

Loads the data from the specified file into this table collection. By default, the table collection is also initialised. The resources allocated must be freed using tsk_table_collection_free() even in error conditions.

If the TSK_NO_INIT option is set, the table collection is not initialised, allowing an already initialised table collection to be overwritten with the data from a file.

If the file contains multiple table collections, this function will load the first. Please see the tsk_table_collection_loadf() for details on how to sequentially load table collections from a stream.

If the TSK_LOAD_SKIP_TABLES option is set, only the non-table information from the table collection will be read, leaving all tables with zero rows and no metadata or schema. If the TSK_LOAD_SKIP_REFERENCE_SEQUENCE option is set, the table collection is read without loading the reference sequence.

Options

Options can be specified by providing one or more of the following bitwise flags:

Examples

int ret;
tsk_table_collection_t tables;
ret = tsk_table_collection_load(&tables, "data.trees", 0);
if (ret != 0) {
    fprintf(stderr, "Load error:%s\n", tsk_strerror(ret));
    exit(EXIT_FAILURE);
}

Parameters
  • self – A pointer to an uninitialised tsk_table_collection_t object if the TSK_NO_INIT option is not set (default), or an initialised tsk_table_collection_t otherwise.

  • filename – A NULL terminated string containing the filename.

  • options – Bitwise options. See above for details.

Returns

Return 0 on success or a negative value on failure.

int tsk_table_collection_loadf(tsk_table_collection_t *self, FILE *file, tsk_flags_t options)

Load a table collection from a stream.

Loads a tables definition from the specified file stream to this table collection. By default, the table collection is also initialised. The resources allocated must be freed using tsk_table_collection_free() even in error conditions.

If the TSK_NO_INIT option is set, the table collection is not initialised, allowing an already initialised table collection to be overwritten with the data from a file.

The stream can be an arbitrary file descriptor, for example a network socket. No seek operations are performed.

If the stream contains multiple table collection definitions, this function will load the next table collection from the stream. If the stream contains no more table collection definitions the error value TSK_ERR_EOF will be returned. Note that EOF is only returned in the case where zero bytes are read from the stream — malformed files or other errors will result in different error conditions. Please see the File streaming section for an example of how to sequentially load tree sequences from a stream.

Please note that this streaming behaviour is not supported if the TSK_LOAD_SKIP_TABLES or TSK_LOAD_SKIP_REFERENCE_SEQUENCE option is set. If the TSK_LOAD_SKIP_TABLES option is set, only the non-table information from the table collection will be read, leaving all tables with zero rows and no metadata or schema. If the TSK_LOAD_SKIP_REFERENCE_SEQUENCE option is set, the table collection is read without loading the reference sequence. When attempting to read from a stream with multiple table collection definitions and either of these two options set, the requested information from the first table collection will be read on the first call to tsk_table_collection_loadf(), with subsequent calls leading to errors.

Options

Options can be specified by providing one or more of the following bitwise flags:

Parameters
  • self – A pointer to an uninitialised tsk_table_collection_t object if the TSK_NO_INIT option is not set (default), or an initialised tsk_table_collection_t otherwise.

  • file – A FILE stream opened in an appropriate mode for reading (e.g. “r”, “r+” or “w+”) positioned at the beginning of a table collection definition.

  • options – Bitwise options. See above for details.

Returns

Return 0 on success or a negative value on failure.

int tsk_table_collection_dump(const tsk_table_collection_t *self, const char *filename, tsk_flags_t options)

Write a table collection to file.

Writes the data from this table collection to the specified file.

If an error occurs the file path is deleted, ensuring that only complete and well formed files will be written.

Examples

int ret;
tsk_table_collection_t tables;

ret = tsk_table_collection_init(&tables, 0);
error_check(ret);
tables.sequence_length = 1.0;
// Write out the empty tree sequence
ret = tsk_table_collection_dump(&tables, "empty.trees", 0);
error_check(ret);

Parameters
  • self – A pointer to an initialised tsk_table_collection_t object.

  • filename – A NULL terminated string containing the filename.

  • options – Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

int tsk_table_collection_dumpf(const tsk_table_collection_t *self, FILE *file, tsk_flags_t options)

Write a table collection to a stream.

Writes the data from this table collection to the specified FILE stream. Semantics are identical to tsk_table_collection_dump().

Please see the File streaming section for an example of how to sequentially dump and load tree sequences from a stream.

Parameters
  • self – A pointer to an initialised tsk_table_collection_t object.

  • file – A FILE stream opened in an appropriate mode for writing (e.g. “w”, “a”, “r+” or “w+”).

  • options – Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

int tsk_table_collection_record_num_rows(const tsk_table_collection_t *self, tsk_bookmark_t *bookmark)

Record the number of rows in each table in the specified tsk_bookmark_t object.

Parameters
Returns

Return 0 on success or a negative value on failure.

int tsk_table_collection_truncate(tsk_table_collection_t *self, tsk_bookmark_t *bookmark)

Truncates the tables in this table collection according to the specified bookmark.

Truncate the tables in this collection so that each one has the number of rows specified in the parameter tsk_bookmark_t. Use the tsk_table_collection_record_num_rows() function to record the number rows for each table in a table collection at a particular time.

Parameters
  • self – A pointer to a tsk_individual_table_t object.

  • bookmark – The number of rows to retain in each table.

Returns

Return 0 on success or a negative value on failure.

int tsk_table_collection_sort(tsk_table_collection_t *self, const tsk_bookmark_t *start, tsk_flags_t options)

Sorts the tables in this collection.

Some of the tables in a table collection must satisfy specific sortedness requirements in order to define a valid tree sequence. This method sorts the edge, site, mutation and individual tables such that these requirements are guaranteed to be fulfilled. The node, population and provenance tables do not have any sortedness requirements, and are therefore ignored by this method.

The specified tsk_bookmark_t allows us to specify a start position for sorting in each of the tables; rows before this value are assumed to already be in sorted order and this information is used to make sorting more efficient. Positions in tables that are not sorted (node, population and provenance) are ignored and can be set to arbitrary values.

The table collection will always be unindexed after sort successfully completes.

For more control over the sorting process, see the Low-level sorting section.

Options

Options can be specified by providing one or more of the following bitwise flags:

TSK_NO_CHECK_INTEGRITY

Do not run integrity checks using tsk_table_collection_check_integrity() before sorting, potentially leading to a small reduction in execution time. This performance optimisation should not be used unless the calling code can guarantee reference integrity within the table collection. References to rows not in the table or bad offsets will result in undefined behaviour.

Note

The current implementation may sort in such a way that exceeds these requirements, but this behaviour should not be relied upon and later versions may weaken the level of sortedness. However, the method does guarantee that the resulting tables describes a valid tree sequence.

Warning

Sorting migrations is currently not supported and an error will be raised if a table collection containing a non-empty migration table is specified.

Warning

The current implementation only supports specifying a start position for the edge table and in a limited form for the site, mutation and individual tables. Specifying a non-zero migration, start position results in an error. The start positions for the site, mutation and individual tables can either be 0 or the length of the respective tables, allowing these tables to either be fully sorted, or not sorted at all.

Parameters
  • self – A pointer to a tsk_table_collection_t object.

  • start – The position to begin sorting in each table; all rows less than this position must fulfill the tree sequence sortedness requirements. If this is NULL, sort all rows.

  • options – Sort options.

Returns

Return 0 on success or a negative value on failure.

int tsk_table_collection_individual_topological_sort(tsk_table_collection_t *self, tsk_flags_t options)

Sorts the individual table in this collection.

Sorts the individual table in place, so that parents come before children, and the parent column is remapped as required. Node references to individuals are also updated.

Parameters
  • self – A pointer to a tsk_table_collection_t object.

  • options – Sort options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

int tsk_table_collection_canonicalise(tsk_table_collection_t *self, tsk_flags_t options)

Puts the tables into canonical form.

Put tables into canonical form such that randomly reshuffled tables are guaranteed to always be sorted in the same order, and redundant information is removed. The canonical sorting exceeds the usual tree sequence sortedness requirements.

Options:

Options can be specified by providing one or more of the following bitwise flags:

Returns

Return 0 on success or a negative value on failure.

int tsk_table_collection_simplify(tsk_table_collection_t *self, const tsk_id_t *samples, tsk_size_t num_samples, tsk_flags_t options, tsk_id_t *node_map)

Simplify the tables to remove redundant information.

Simplification transforms the tables to remove redundancy and canonicalise tree sequence data. See the simplification tutorial for more details.

A mapping from the node IDs in the table before simplification to their equivalent values after simplification can be obtained via the node_map argument. If this is non NULL, node_map[u] will contain the new ID for node u after simplification, or TSK_NULL if the node has been removed. Thus, node_map must be an array of at least self->nodes.num_rows tsk_id_t values. The table collection will always be unindexed after simplify successfully completes.

Options:

Options can be specified by providing one or more of the following bitwise flags:

Note

Migrations are currently not supported by simplify, and an error will be raised if we attempt call simplify on a table collection with greater than zero migrations. See https://github.com/tskit-dev/tskit/issues/20

Parameters
  • self – A pointer to a tsk_table_collection_t object.

  • samples – Either NULL or an array of num_samples distinct and valid node IDs. If non-null the nodes in this array will be marked as samples in the output. If NULL, the num_samples parameter is ignored and the samples in the output will be the same as the samples in the input. This is equivalent to populating the samples array with all of the sample nodes in the input in increasing order of ID.

  • num_samples – The number of node IDs in the input samples array. Ignored if the samples array is NULL.

  • options – Simplify options; see above for the available bitwise flags. For the default behaviour, a value of 0 should be provided.

  • node_map – If not NULL, this array will be filled to define the mapping between nodes IDs in the table collection before and after simplification.

Returns

Return 0 on success or a negative value on failure.

int tsk_table_collection_subset(tsk_table_collection_t *self, const tsk_id_t *nodes, tsk_size_t num_nodes, tsk_flags_t options)

Subsets and reorders a table collection according to an array of nodes.

Reduces the table collection to contain only the entries referring to the provided list of nodes, with nodes reordered according to the order they appear in the nodes argument. Specifically, this subsets and reorders each of the tables as follows (but see options, below):

  1. Nodes: if in the list of nodes, and in the order provided.

  2. Individuals: if referred to by a retained node.

  3. Populations: if referred to by a retained node, and in the order first seen when traversing the list of retained nodes.

  4. Edges: if both parent and child are retained nodes.

  5. Mutations: if the mutation’s node is a retained node.

  6. Sites: if any mutations remain at the site after removing mutations.

Retained individuals, edges, mutations, and sites appear in the same order as in the original tables. Note that only the information directly associated with the provided nodes is retained - for instance, subsetting to nodes=[A, B] does not retain nodes ancestral to A and B, and only retains the individuals A and B are in, and not their parents.

This function does not require the tables to be sorted.

Options:

Options can be specified by providing one or more of the following bitwise flags:

Note

Migrations are currently not supported by subset, and an error will be raised if we attempt call subset on a table collection with greater than zero migrations.

Parameters
  • self – A pointer to a tsk_table_collection_t object.

  • nodes – An array of num_nodes valid node IDs.

  • num_nodes – The number of node IDs in the input nodes array.

  • options – Bitwise option flags.

Returns

Return 0 on success or a negative value on failure.

int tsk_table_collection_union(tsk_table_collection_t *self, const tsk_table_collection_t *other, const tsk_id_t *other_node_mapping, tsk_flags_t options)

Forms the node-wise union of two table collections.

Expands this table collection by adding the non-shared portions of another table collection to itself. The other_node_mapping encodes which nodes in other are equivalent to a node in self. The positions in the other_node_mapping array correspond to node ids in other, and the elements encode the equivalent node id in self or TSK_NULL if the node is exclusive to other. Nodes that are exclusive other are added to self, along with:

  1. Individuals which are new to self.

  2. Edges whose parent or child are new to self.

  3. Sites which were not present in self.

  4. Mutations whose nodes are new to self.

By default, populations of newly added nodes are assumed to be new populations, and added to the population table as well.

This operation will also sort the resulting tables, so the tables may change even if nothing new is added, if the original tables were not sorted.

Options:

Options can be specified by providing one or more of the following bitwise flags:

Note

Migrations are currently not supported by union, and an error will be raised if we attempt call union on a table collection with migrations.

Parameters
  • self – A pointer to a tsk_table_collection_t object.

  • other – A pointer to a tsk_table_collection_t object.

  • other_node_mapping – An array of node IDs that relate nodes in other to nodes in self: the k-th element of other_node_mapping should be the index of the equivalent node in self, or TSK_NULL if the node is not present in self (in which case it will be added to self).

  • options – Union options; see above for the available bitwise flags. For the default behaviour, a value of 0 should be provided.

Returns

Return 0 on success or a negative value on failure.

int tsk_table_collection_set_time_units(tsk_table_collection_t *self, const char *time_units, tsk_size_t time_units_length)

Set the time_units.

Copies the time_units string to this table collection, replacing any existing.

Parameters
  • self – A pointer to a tsk_table_collection_t object.

  • time_units – A pointer to a char array.

  • time_units_length – The size of the time units string in bytes.

Returns

Return 0 on success or a negative value on failure.

int tsk_table_collection_set_metadata(tsk_table_collection_t *self, const char *metadata, tsk_size_t metadata_length)

Set the metadata.

Copies the metadata string to this table collection, replacing any existing.

Parameters
  • self – A pointer to a tsk_table_collection_t object.

  • metadata – A pointer to a char array.

  • metadata_length – The size of the metadata in bytes.

Returns

Return 0 on success or a negative value on failure.

int tsk_table_collection_set_metadata_schema(tsk_table_collection_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)

Set the metadata schema.

Copies the metadata schema string to this table collection, replacing any existing.

Parameters
  • self – A pointer to a tsk_table_collection_t object.

  • metadata_schema – A pointer to a char array.

  • metadata_schema_length – The size of the metadata schema in bytes.

Returns

Return 0 on success or a negative value on failure.

bool tsk_table_collection_has_index(const tsk_table_collection_t *self, tsk_flags_t options)

Returns true if this table collection is indexed.

This method returns true if the table collection has an index for the edge table. It guarantees that the index exists, and that it is for the same number of edges that are in the edge table. It does not guarantee that the index is valid (i.e., if the rows in the edge have been permuted in some way since the index was built).

See the Table indexes section for details on the index life-cycle.

Parameters
  • self – A pointer to a tsk_table_collection_t object.

  • options – Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return true if there is an index present for this table collection.

int tsk_table_collection_drop_index(tsk_table_collection_t *self, tsk_flags_t options)

Deletes the indexes for this table collection.

Unconditionally drop the indexes that may be present for this table collection. It is not an error to call this method on an unindexed table collection. See the Table indexes section for details on the index life-cycle.

Parameters
  • self – A pointer to a tsk_table_collection_t object.

  • options – Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Always returns 0.

int tsk_table_collection_build_index(tsk_table_collection_t *self, tsk_flags_t options)

Builds indexes for this table collection.

Builds the tree traversal indexes for this table collection. Any existing index is first dropped using tsk_table_collection_drop_index(). See the Table indexes section for details on the index life-cycle.

Parameters
  • self – A pointer to a tsk_table_collection_t object.

  • options – Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

tsk_id_t tsk_table_collection_check_integrity(const tsk_table_collection_t *self, tsk_flags_t options)

Runs integrity checks on this table collection.

Checks the integrity of this table collection. The default checks (i.e., with options = 0) guarantee the integrity of memory and entity references within the table collection. All positions along the genome are checked to see if they are finite values and within the required bounds. Time values are checked to see if they are finite or marked as unknown. Consistency of the direction of inheritance is also checked: whether parents are more recent than children, mutations are not more recent than their nodes or their mutation parents, etcetera.

To check if a set of tables fulfills the requirements needed for a valid tree sequence, use the TSK_CHECK_TREES option. When this method is called with TSK_CHECK_TREES, the number of trees in the tree sequence is returned. Thus, to check for errors client code should verify that the return value is less than zero. All other options will return zero on success and a negative value on failure.

More fine-grained checks can be achieved using bitwise combinations of the other options.

Options:

Options can be specified by providing one or more of the following bitwise flags:

Parameters
Returns

Return a negative error value on if any problems are detected in the tree sequence. If the TSK_CHECK_TREES option is provided, the number of trees in the tree sequence will be returned, on success.

Individuals

struct tsk_individual_t

A single individual defined by a row in the individual table.

See the data model section for the definition of an individual and its properties.

Public Members

tsk_id_t id

Non-negative ID value corresponding to table row.

tsk_flags_t flags

Bitwise flags.

const double *location

Spatial location. The number of dimensions is defined by location_length.

tsk_size_t location_length

Number of spatial dimensions.

tsk_id_t *parents

IDs of the parents. The number of parents given by parents_length

tsk_size_t parents_length

Number of parents.

const char *metadata

Metadata.

tsk_size_t metadata_length

Size of the metadata in bytes.

const tsk_id_t *nodes

An array of the nodes associated with this individual.

tsk_size_t nodes_length

The number of nodes associated with this individual.

struct tsk_individual_table_t

The individual table.

See the individual table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows

The number of rows in this table.

tsk_size_t location_length

The total length of the location column.

tsk_size_t parents_length

The total length of the parent column.

tsk_size_t metadata_length

The total length of the metadata column.

tsk_flags_t *flags

The flags column.

double *location

The location column.

tsk_size_t *location_offset

The location_offset column.

tsk_id_t *parents

The parents column.

tsk_size_t *parents_offset

The parents_offset column.

char *metadata

The metadata column.

tsk_size_t *metadata_offset

The metadata_offset column.

char *metadata_schema

The metadata schema.

int tsk_individual_table_init(tsk_individual_table_t *self, tsk_flags_t options)

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Parameters
  • self – A pointer to an uninitialised tsk_individual_table_t object.

  • options – Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

int tsk_individual_table_free(tsk_individual_table_t *self)

Free the internal memory for the specified table.

Parameters
Returns

Always returns 0.

tsk_id_t tsk_individual_table_add_row(tsk_individual_table_t *self, tsk_flags_t flags, const double *location, tsk_size_t location_length, const tsk_id_t *parents, tsk_size_t parents_length, const char *metadata, tsk_size_t metadata_length)

Adds a row to this individual table.

Add a new individual with the specified flags, location, parents and metadata to the table. Copies of the location, parents and metadata parameters are taken immediately. See the table definition for details of the columns in this table.

Parameters
  • self – A pointer to a tsk_individual_table_t object.

  • flags – The bitwise flags for the new individual.

  • location – A pointer to a double array representing the spatial location of the new individual. Can be NULL if location_length is 0.

  • location_length – The number of dimensions in the locations position. Note this the number of elements in the corresponding double array not the number of bytes.

  • parents – A pointer to a tsk_id array representing the parents of the new individual. Can be NULL if parents_length is 0.

  • parents_length – The number of parents. Note this the number of elements in the corresponding tsk_id array not the number of bytes.

  • metadata – The metadata to be associated with the new individual. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns

Return the ID of the newly added individual on success, or a negative value on failure.

int tsk_individual_table_update_row(tsk_individual_table_t *self, tsk_id_t index, tsk_flags_t flags, const double *location, tsk_size_t location_length, const tsk_id_t *parents, tsk_size_t parents_length, const char *metadata, tsk_size_t metadata_length)

Updates the row at the specified index.

Rewrite the row at the specified index in this table to use the specified values. Copies of the location, parents and metadata parameters are taken immediately. See the table definition for details of the columns in this table.

Warning

Because of the way that ragged columns are encoded, this method requires a full rewrite of the internal column memory in worst case, and would therefore be inefficient for bulk updates for such columns. However, if the sizes of all ragged column values are unchanged in the updated row, this method is guaranteed to only update the memory for the row in question.

Parameters
  • self – A pointer to a tsk_individual_table_t object.

  • index – The row to update.

  • flags – The bitwise flags for the individual.

  • location – A pointer to a double array representing the spatial location of the new individual. Can be NULL if location_length is 0.

  • location_length – The number of dimensions in the locations position. Note this the number of elements in the corresponding double array not the number of bytes.

  • parents – A pointer to a tsk_id array representing the parents of the new individual. Can be NULL if parents_length is 0.

  • parents_length – The number of parents. Note this the number of elements in the corresponding tsk_id array not the number of bytes.

  • metadata – The metadata to be associated with the new individual. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns

Return 0 on success or a negative value on failure.

int tsk_individual_table_clear(tsk_individual_table_t *self)

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_individual_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Parameters
Returns

Return 0 on success or a negative value on failure.

int tsk_individual_table_truncate(tsk_individual_table_t *self, tsk_size_t num_rows)

Truncates this table so that only the first num_rows are retained.

Parameters
  • self – A pointer to a tsk_individual_table_t object.

  • num_rows – The number of rows to retain in the table.

Returns

Return 0 on success or a negative value on failure.

int tsk_individual_table_extend(tsk_individual_table_t *self, const tsk_individual_table_t *other, tsk_size_t num_rows, const tsk_id_t *row_indexes, tsk_flags_t options)

Extends this table by appending rows copied from another table.

Appends the rows at the specified indexes from the table other to the end of this table. Row indexes can be repeated and in any order. If row_indexes is NULL, append the first num_rows from other to this table. Note that metadata is copied as-is and is not checked for compatibility with any existing schema on this table.

Parameters
  • self – A pointer to a tsk_individual_table_t object where rows are to be added.

  • other – A pointer to a tsk_individual_table_t object where rows are copied from.

  • num_rows – The number of rows from other to append to this table.

  • row_indexes – Array of row indexes in other. If NULL is passed then the first num_rows of other are used.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

bool tsk_individual_table_equals(const tsk_individual_table_t *self, const tsk_individual_table_t *other, tsk_flags_t options)

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

Parameters
Returns

Return true if the specified table is equal to this table.

int tsk_individual_table_copy(const tsk_individual_table_t *self, tsk_individual_table_t *dest, tsk_flags_t options)

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Indexes that are present are also copied to the destination table.

Parameters
  • self – A pointer to a tsk_individual_table_t object.

  • dest – A pointer to a tsk_individual_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised individual table. If not, it must be an uninitialised individual table.

  • options – Bitwise option flags.

Returns

Return 0 on success or a negative value on failure.

int tsk_individual_table_get_row(const tsk_individual_table_t *self, tsk_id_t index, tsk_individual_t *row)

Get the row at the specified index.

Updates the specified individual struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Parameters
  • self – A pointer to a tsk_individual_table_t object.

  • index – The requested table row.

  • row – A pointer to a tsk_individual_t struct that is updated to reflect the values in the specified row.

Returns

Return 0 on success or a negative value on failure.

int tsk_individual_table_set_metadata_schema(tsk_individual_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Parameters
  • self – A pointer to a tsk_individual_table_t object.

  • metadata_schema – A pointer to a char array.

  • metadata_schema_length – The size of the metadata schema in bytes.

Returns

Return 0 on success or a negative value on failure.

void tsk_individual_table_print_state(const tsk_individual_table_t *self, FILE *out)

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters
int tsk_individual_table_set_columns(tsk_individual_table_t *self, tsk_size_t num_rows, const tsk_flags_t *flags, const double *location, const tsk_size_t *location_offset, const tsk_id_t *parents, const tsk_size_t *parents_offset, const char *metadata, const tsk_size_t *metadata_offset)

Replace this table’s data by copying from a set of column arrays.

Clears the data columns of this table and then copies column data from the specified set of arrays. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters
  • self – A pointer to a tsk_individual_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • flags – The array of tsk_flag_t flag values to be copied.

  • location – The array of double location values to be copied.

  • location_offset – The array of tsk_size_t location offset values to be copied.

  • parents – The array of tsk_id_t parent values to be copied.

  • parents_offset – The array of tsk_size_t parent offset values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns

Return 0 on success or a negative value on failure.

int tsk_individual_table_append_columns(tsk_individual_table_t *self, tsk_size_t num_rows, const tsk_flags_t *flags, const double *location, const tsk_size_t *location_offset, const tsk_id_t *parents, const tsk_size_t *parents_offset, const char *metadata, const tsk_size_t *metadata_offset)

Extends this table by copying from a set of column arrays.

Copies column data from the specified set of arrays to create new rows at the end of the table. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters
  • self – A pointer to a tsk_individual_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays

  • flags – The array of tsk_flag_t flag values to be copied.

  • location – The array of double location values to be copied.

  • location_offset – The array of tsk_size_t location offset values to be copied.

  • parents – The array of tsk_id_t parent values to be copied.

  • parents_offset – The array of tsk_size_t parent offset values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns

Return 0 on success or a negative value on failure.

int tsk_individual_table_set_max_rows_increment(tsk_individual_table_t *self, tsk_size_t max_rows_increment)

Controls the pre-allocation strategy for this table.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_individual_table_t object.

  • max_rows_increment – The number of rows to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

int tsk_individual_table_set_max_metadata_length_increment(tsk_individual_table_t *self, tsk_size_t max_metadata_length_increment)

Controls the pre-allocation strategy for the metadata column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_individual_table_t object.

  • max_metadata_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

int tsk_individual_table_set_max_location_length_increment(tsk_individual_table_t *self, tsk_size_t max_location_length_increment)

Controls the pre-allocation strategy for the location column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_individual_table_t object.

  • max_location_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

int tsk_individual_table_set_max_parents_length_increment(tsk_individual_table_t *self, tsk_size_t max_parents_length_increment)

Controls the pre-allocation strategy for the parents column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_individual_table_t object.

  • max_parents_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

Nodes

struct tsk_node_t

A single node defined by a row in the node table.

See the data model section for the definition of a node and its properties.

Public Members

tsk_id_t id

Non-negative ID value corresponding to table row.

tsk_flags_t flags

Bitwise flags.

double time

Time.

tsk_id_t population

Population ID.

tsk_id_t individual

Individual ID.

const char *metadata

Metadata.

tsk_size_t metadata_length

Size of the metadata in bytes.

struct tsk_node_table_t

The node table.

See the node table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows

The number of rows in this table.

tsk_size_t metadata_length

The total length of the metadata column.

tsk_flags_t *flags

The flags column.

double *time

The time column.

tsk_id_t *population

The population column.

tsk_id_t *individual

The individual column.

char *metadata

The metadata column.

tsk_size_t *metadata_offset

The metadata_offset column.

char *metadata_schema

The metadata schema.

int tsk_node_table_init(tsk_node_table_t *self, tsk_flags_t options)

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Parameters
  • self – A pointer to an uninitialised tsk_node_table_t object.

  • options – Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

int tsk_node_table_free(tsk_node_table_t *self)

Free the internal memory for the specified table.

Parameters
Returns

Always returns 0.

tsk_id_t tsk_node_table_add_row(tsk_node_table_t *self, tsk_flags_t flags, double time, tsk_id_t population, tsk_id_t individual, const char *metadata, tsk_size_t metadata_length)

Adds a row to this node table.

Add a new node with the specified flags, time, population, individual and metadata to the table. A copy of the metadata parameter is taken immediately. See the table definition for details of the columns in this table.

Parameters
  • self – A pointer to a tsk_node_table_t object.

  • flags – The bitwise flags for the new node.

  • time – The time for the new node.

  • population – The population for the new node. Set to TSK_NULL if not known.

  • individual – The individual for the new node. Set to TSK_NULL if not known.

  • metadata – The metadata to be associated with the new node. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns

Return the ID of the newly added node on success, or a negative value on failure.

int tsk_node_table_update_row(tsk_node_table_t *self, tsk_id_t index, tsk_flags_t flags, double time, tsk_id_t population, tsk_id_t individual, const char *metadata, tsk_size_t metadata_length)

Updates the row at the specified index.

Rewrite the row at the specified index in this table to use the specified values. A copy of the metadata parameter is taken immediately. See the table definition for details of the columns in this table.

Warning

Because of the way that ragged columns are encoded, this method requires a full rewrite of the internal column memory in worst case, and would therefore be inefficient for bulk updates for such columns. However, if the sizes of all ragged column values are unchanged in the updated row, this method is guaranteed to only update the memory for the row in question.

Parameters
  • self – A pointer to a tsk_node_table_t object.

  • index – The row to update.

  • flags – The bitwise flags for the node.

  • time – The time for the node.

  • population – The population for the node. Set to TSK_NULL if not known.

  • individual – The individual for the node. Set to TSK_NULL if not known.

  • metadata – The metadata to be associated with the node. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns

Return 0 on success or a negative value on failure.

int tsk_node_table_clear(tsk_node_table_t *self)

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_node_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Parameters
Returns

Return 0 on success or a negative value on failure.

int tsk_node_table_truncate(tsk_node_table_t *self, tsk_size_t num_rows)

Truncates this table so that only the first num_rows are retained.

Parameters
  • self – A pointer to a tsk_node_table_t object.

  • num_rows – The number of rows to retain in the table.

Returns

Return 0 on success or a negative value on failure.

int tsk_node_table_extend(tsk_node_table_t *self, const tsk_node_table_t *other, tsk_size_t num_rows, const tsk_id_t *row_indexes, tsk_flags_t options)

Extends this table by appending rows copied from another table.

Appends the rows at the specified indexes from the table other to the end of this table. Row indexes can be repeated and in any order. If row_indexes is NULL, append the first num_rows from other to this table. Note that metadata is copied as-is and is not checked for compatibility with any existing schema on this table.

Parameters
  • self – A pointer to a tsk_node_table_t object where rows are to be added.

  • other – A pointer to a tsk_node_table_t object where rows are copied from.

  • num_rows – The number of rows from other to append to this table.

  • row_indexes – Array of row indexes in other. If NULL is passed then the first num_rows of other are used.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

bool tsk_node_table_equals(const tsk_node_table_t *self, const tsk_node_table_t *other, tsk_flags_t options)

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

Parameters
Returns

Return true if the specified table is equal to this table.

int tsk_node_table_copy(const tsk_node_table_t *self, tsk_node_table_t *dest, tsk_flags_t options)

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Parameters
  • self – A pointer to a tsk_node_table_t object.

  • dest – A pointer to a tsk_node_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised node table. If not, it must be an uninitialised node table.

  • options – Bitwise option flags.

Returns

Return 0 on success or a negative value on failure.

int tsk_node_table_get_row(const tsk_node_table_t *self, tsk_id_t index, tsk_node_t *row)

Get the row at the specified index.

Updates the specified node struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Parameters
  • self – A pointer to a tsk_node_table_t object.

  • index – The requested table row.

  • row – A pointer to a tsk_node_t struct that is updated to reflect the values in the specified row.

Returns

Return 0 on success or a negative value on failure.

int tsk_node_table_set_metadata_schema(tsk_node_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Parameters
  • self – A pointer to a tsk_node_table_t object.

  • metadata_schema – A pointer to a char array.

  • metadata_schema_length – The size of the metadata schema in bytes.

Returns

Return 0 on success or a negative value on failure.

void tsk_node_table_print_state(const tsk_node_table_t *self, FILE *out)

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters
  • self – A pointer to a tsk_node_table_t object.

  • out – The stream to write the summary to.

int tsk_node_table_set_columns(tsk_node_table_t *self, tsk_size_t num_rows, const tsk_flags_t *flags, const double *time, const tsk_id_t *population, const tsk_id_t *individual, const char *metadata, const tsk_size_t *metadata_offset)

Replace this table’s data by copying from a set of column arrays.

Clears the data columns of this table and then copies column data from the specified set of arrays. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters
  • self – A pointer to a tsk_node_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • flags – The array of tsk_flag_t values to be copied.

  • time – The array of double time values to be copied.

  • population – The array of tsk_id_t population values to be copied.

  • individual – The array of tsk_id_t individual values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns

Return 0 on success or a negative value on failure.

int tsk_node_table_append_columns(tsk_node_table_t *self, tsk_size_t num_rows, const tsk_flags_t *flags, const double *time, const tsk_id_t *population, const tsk_id_t *individual, const char *metadata, const tsk_size_t *metadata_offset)

Extends this table by copying from a set of column arrays.

Copies column data from the specified set of arrays to create new rows at the end of the table. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters
  • self – A pointer to a tsk_node_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays

  • flags – The array of tsk_flag_t values to be copied.

  • time – The array of double time values to be copied.

  • population – The array of tsk_id_t population values to be copied.

  • individual – The array of tsk_id_t individual values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns

Return 0 on success or a negative value on failure.

int tsk_node_table_set_max_rows_increment(tsk_node_table_t *self, tsk_size_t max_rows_increment)

Controls the pre-allocation strategy for this table.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_node_table_t object.

  • max_rows_increment – The number of rows to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

int tsk_node_table_set_max_metadata_length_increment(tsk_node_table_t *self, tsk_size_t max_metadata_length_increment)

Controls the pre-allocation strategy for the metadata column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_node_table_t object.

  • max_metadata_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

Edges

struct tsk_edge_t

A single edge defined by a row in the edge table.

See the data model section for the definition of an edge and its properties.

Public Members

tsk_id_t id

Non-negative ID value corresponding to table row.

tsk_id_t parent

Parent node ID.

tsk_id_t child

Child node ID.

double left

Left coordinate.

double right

Right coordinate.

const char *metadata

Metadata.

tsk_size_t metadata_length

Size of the metadata in bytes.

struct tsk_edge_table_t

The edge table.

See the edge table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows

The number of rows in this table.

tsk_size_t metadata_length

The total length of the metadata column.

double *left

The left column.

double *right

The right column.

tsk_id_t *parent

The parent column.

tsk_id_t *child

The child column.

char *metadata

The metadata column.

tsk_size_t *metadata_offset

The metadata_offset column.

char *metadata_schema

The metadata schema.

tsk_flags_t options

Flags for this table.

int tsk_edge_table_init(tsk_edge_table_t *self, tsk_flags_t options)

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Options

Options can be specified by providing one or more of the following bitwise flags:

Parameters
  • self – A pointer to an uninitialised tsk_edge_table_t object.

  • options – Allocation time options.

Returns

Return 0 on success or a negative value on failure.

int tsk_edge_table_free(tsk_edge_table_t *self)

Free the internal memory for the specified table.

Parameters
Returns

Always returns 0.

tsk_id_t tsk_edge_table_add_row(tsk_edge_table_t *self, double left, double right, tsk_id_t parent, tsk_id_t child, const char *metadata, tsk_size_t metadata_length)

Adds a row to this edge table.

Add a new edge with the specified left, right, parent, child and metadata to the table. See the table definition for details of the columns in this table.

Parameters
  • self – A pointer to a tsk_edge_table_t object.

  • left – The left coordinate for the new edge.

  • right – The right coordinate for the new edge.

  • parent – The parent node for the new edge.

  • child – The child node for the new edge.

  • metadata – The metadata to be associated with the new edge. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns

Return the ID of the newly added edge on success, or a negative value on failure.

int tsk_edge_table_update_row(tsk_edge_table_t *self, tsk_id_t index, double left, double right, tsk_id_t parent, tsk_id_t child, const char *metadata, tsk_size_t metadata_length)

Updates the row at the specified index.

Rewrite the row at the specified index in this table to use the specified values. A copy of the metadata parameter is taken immediately. See the table definition for details of the columns in this table.

Warning

Because of the way that ragged columns are encoded, this method requires a full rewrite of the internal column memory in worst case, and would therefore be inefficient for bulk updates for such columns. However, if the sizes of all ragged column values are unchanged in the updated row, this method is guaranteed to only update the memory for the row in question.

Parameters
  • self – A pointer to a tsk_edge_table_t object.

  • index – The row to update.

  • left – The left coordinate for the edge.

  • right – The right coordinate for the edge.

  • parent – The parent node for the edge.

  • child – The child node for the edge.

  • metadata – The metadata to be associated with the edge. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns

Return 0 on success or a negative value on failure.

int tsk_edge_table_clear(tsk_edge_table_t *self)

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_edge_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Parameters
Returns

Return 0 on success or a negative value on failure.

int tsk_edge_table_truncate(tsk_edge_table_t *self, tsk_size_t num_rows)

Truncates this table so that only the first num_rows are retained.

Parameters
  • self – A pointer to a tsk_edge_table_t object.

  • num_rows – The number of rows to retain in the table.

Returns

Return 0 on success or a negative value on failure.

int tsk_edge_table_extend(tsk_edge_table_t *self, const tsk_edge_table_t *other, tsk_size_t num_rows, const tsk_id_t *row_indexes, tsk_flags_t options)

Extends this table by appending rows copied from another table.

Appends the rows at the specified indexes from the table other to the end of this table. Row indexes can be repeated and in any order. If row_indexes is NULL, append the first num_rows from other to this table. Note that metadata is copied as-is and is not checked for compatibility with any existing schema on this table.

Parameters
  • self – A pointer to a tsk_edge_table_t object where rows are to be added.

  • other – A pointer to a tsk_edge_table_t object where rows are copied from.

  • num_rows – The number of rows from other to append to this table.

  • row_indexes – Array of row indexes in other. If NULL is passed then the first num_rows of other are used.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

bool tsk_edge_table_equals(const tsk_edge_table_t *self, const tsk_edge_table_t *other, tsk_flags_t options)

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

Parameters
Returns

Return true if the specified table is equal to this table.

int tsk_edge_table_copy(const tsk_edge_table_t *self, tsk_edge_table_t *dest, tsk_flags_t options)

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Parameters
  • self – A pointer to a tsk_edge_table_t object.

  • dest – A pointer to a tsk_edge_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised edge table. If not, it must be an uninitialised edge table.

  • options – Bitwise option flags.

Returns

Return 0 on success or a negative value on failure.

int tsk_edge_table_get_row(const tsk_edge_table_t *self, tsk_id_t index, tsk_edge_t *row)

Get the row at the specified index.

Updates the specified edge struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Parameters
  • self – A pointer to a tsk_edge_table_t object.

  • index – The requested table row.

  • row – A pointer to a tsk_edge_t struct that is updated to reflect the values in the specified row.

Returns

Return 0 on success or a negative value on failure.

int tsk_edge_table_set_metadata_schema(tsk_edge_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Parameters
  • self – A pointer to a tsk_edge_table_t object.

  • metadata_schema – A pointer to a char array

  • metadata_schema_length – The size of the metadata schema in bytes.

Returns

Return 0 on success or a negative value on failure.

void tsk_edge_table_print_state(const tsk_edge_table_t *self, FILE *out)

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters
  • self – A pointer to a tsk_edge_table_t object.

  • out – The stream to write the summary to.

int tsk_edge_table_set_columns(tsk_edge_table_t *self, tsk_size_t num_rows, const double *left, const double *right, const tsk_id_t *parent, const tsk_id_t *child, const char *metadata, const tsk_size_t *metadata_offset)

Replace this table’s data by copying from a set of column arrays.

Clears the data columns of this table and then copies column data from the specified set of arrays. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters
  • self – A pointer to a tsk_edge_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • left – The array of double left values to be copied.

  • right – The array of double right values to be copied.

  • parent – The array of tsk_id_t parent values to be copied.

  • child – The array of tsk_id_t child values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns

Return 0 on success or a negative value on failure.

int tsk_edge_table_append_columns(tsk_edge_table_t *self, tsk_size_t num_rows, const double *left, const double *right, const tsk_id_t *parent, const tsk_id_t *child, const char *metadata, const tsk_size_t *metadata_offset)

Extends this table by copying from a set of column arrays.

Copies column data from the specified set of arrays to create new rows at the end of the table. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters
  • self – A pointer to a tsk_edge_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • left – The array of double left values to be copied.

  • right – The array of double right values to be copied.

  • parent – The array of tsk_id_t parent values to be copied.

  • child – The array of tsk_id_t child values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

int tsk_edge_table_set_max_rows_increment(tsk_edge_table_t *self, tsk_size_t max_rows_increment)

Controls the pre-allocation strategy for this table.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_edge_table_t object.

  • max_rows_increment – The number of rows to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

int tsk_edge_table_set_max_metadata_length_increment(tsk_edge_table_t *self, tsk_size_t max_metadata_length_increment)

Controls the pre-allocation strategy for the metadata column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_edge_table_t object.

  • max_metadata_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

int tsk_edge_table_squash(tsk_edge_table_t *self)

Squash adjacent edges in-place.

Sorts, then condenses the table into the smallest possible number of rows by combining any adjacent edges. A pair of edges is said to be adjacent if they have the same parent and child nodes, and if the left coordinate of one of the edges is equal to the right coordinate of the other edge. This process is performed in-place so that any set of adjacent edges is replaced by a single edge. The new edge will have the same parent and child node, a left coordinate equal to the smallest left coordinate in the set, and a right coordinate equal to the largest right coordinate in the set. The new edge table will be sorted in the canonical order (P, C, L, R).

Note

Note that this method will fail if any edges have non-empty metadata.

Parameters
Returns

Return 0 on success or a negative value on failure.

Migrations

struct tsk_migration_t

A single migration defined by a row in the migration table.

See the data model section for the definition of a migration and its properties.

Public Members

tsk_id_t id

Non-negative ID value corresponding to table row.

tsk_id_t source

Source population ID.

tsk_id_t dest

Destination population ID.

tsk_id_t node

Node ID.

double left

Left coordinate.

double right

Right coordinate.

double time

Time.

const char *metadata

Metadata.

tsk_size_t metadata_length

Size of the metadata in bytes.

struct tsk_migration_table_t

The migration table.

See the migration table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows

The number of rows in this table.

tsk_size_t metadata_length

The total length of the metadata column.

tsk_id_t *source

The source column.

tsk_id_t *dest

The dest column.

tsk_id_t *node

The node column.

double *left

The left column.

double *right

The right column.

double *time

The time column.

char *metadata

The metadata column.

tsk_size_t *metadata_offset

The metadata_offset column.

char *metadata_schema

The metadata schema.

int tsk_migration_table_init(tsk_migration_table_t *self, tsk_flags_t options)

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Parameters
  • self – A pointer to an uninitialised tsk_migration_table_t object.

  • options – Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

int tsk_migration_table_free(tsk_migration_table_t *self)

Free the internal memory for the specified table.

Parameters
Returns

Always returns 0.

tsk_id_t tsk_migration_table_add_row(tsk_migration_table_t *self, double left, double right, tsk_id_t node, tsk_id_t source, tsk_id_t dest, double time, const char *metadata, tsk_size_t metadata_length)

Adds a row to this migration table.

Add a new migration with the specified left, right, node, source, dest, time and metadata to the table. See the table definition for details of the columns in this table.

Parameters
  • self – A pointer to a tsk_migration_table_t object.

  • left – The left coordinate for the new migration.

  • right – The right coordinate for the new migration.

  • node – The node ID for the new migration.

  • source – The source population ID for the new migration.

  • dest – The destination population ID for the new migration.

  • time – The time for the new migration.

  • metadata – The metadata to be associated with the new migration. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns

Return the ID of the newly added migration on success, or a negative value on failure.

int tsk_migration_table_update_row(tsk_migration_table_t *self, tsk_id_t index, double left, double right, tsk_id_t node, tsk_id_t source, tsk_id_t dest, double time, const char *metadata, tsk_size_t metadata_length)

Updates the row at the specified index.

Rewrite the row at the specified index in this table to use the specified values. A copy of the metadata parameter is taken immediately. See the table definition for details of the columns in this table.

Warning

Because of the way that ragged columns are encoded, this method requires a full rewrite of the internal column memory in worst case, and would therefore be inefficient for bulk updates for such columns. However, if the sizes of all ragged column values are unchanged in the updated row, this method is guaranteed to only update the memory for the row in question.

Parameters
  • self – A pointer to a tsk_migration_table_t object.

  • index – The row to update.

  • left – The left coordinate for the migration.

  • right – The right coordinate for the migration.

  • node – The node ID for the migration.

  • source – The source population ID for the migration.

  • dest – The destination population ID for the migration.

  • time – The time for the migration.

  • metadata – The metadata to be associated with the migration. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns

Return 0 on success or a negative value on failure.

int tsk_migration_table_clear(tsk_migration_table_t *self)

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_migration_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Parameters
Returns

Return 0 on success or a negative value on failure.

int tsk_migration_table_truncate(tsk_migration_table_t *self, tsk_size_t num_rows)

Truncates this table so that only the first num_rows are retained.

Parameters
  • self – A pointer to a tsk_migration_table_t object.

  • num_rows – The number of rows to retain in the table.

Returns

Return 0 on success or a negative value on failure.

int tsk_migration_table_extend(tsk_migration_table_t *self, const tsk_migration_table_t *other, tsk_size_t num_rows, const tsk_id_t *row_indexes, tsk_flags_t options)

Extends this table by appending rows copied from another table.

Appends the rows at the specified indexes from the table other to the end of this table. Row indexes can be repeated and in any order. If row_indexes is NULL, append the first num_rows from other to this table. Note that metadata is copied as-is and is not checked for compatibility with any existing schema on this table.

Parameters
  • self – A pointer to a tsk_migration_table_t object where rows are to be added.

  • other – A pointer to a tsk_migration_table_t object where rows are copied from.

  • num_rows – The number of rows from other to append to this table.

  • row_indexes – Array of row indexes in other. If NULL is passed then the first num_rows of other are used.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

bool tsk_migration_table_equals(const tsk_migration_table_t *self, const tsk_migration_table_t *other, tsk_flags_t options)

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

Parameters
Returns

Return true if the specified table is equal to this table.

int tsk_migration_table_copy(const tsk_migration_table_t *self, tsk_migration_table_t *dest, tsk_flags_t options)

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Parameters
  • self – A pointer to a tsk_migration_table_t object.

  • dest – A pointer to a tsk_migration_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised migration table. If not, it must be an uninitialised migration table.

  • options – Bitwise option flags.

Returns

Return 0 on success or a negative value on failure.

int tsk_migration_table_get_row(const tsk_migration_table_t *self, tsk_id_t index, tsk_migration_t *row)

Get the row at the specified index.

Updates the specified migration struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Parameters
  • self – A pointer to a tsk_migration_table_t object.

  • index – The requested table row.

  • row – A pointer to a tsk_migration_t struct that is updated to reflect the values in the specified row.

Returns

Return 0 on success or a negative value on failure.

int tsk_migration_table_set_metadata_schema(tsk_migration_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Parameters
  • self – A pointer to a tsk_migration_table_t object.

  • metadata_schema – A pointer to a char array.

  • metadata_schema_length – The size of the metadata schema in bytes.

Returns

Return 0 on success or a negative value on failure.

void tsk_migration_table_print_state(const tsk_migration_table_t *self, FILE *out)

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters
int tsk_migration_table_set_columns(tsk_migration_table_t *self, tsk_size_t num_rows, const double *left, const double *right, const tsk_id_t *node, const tsk_id_t *source, const tsk_id_t *dest, const double *time, const char *metadata, const tsk_size_t *metadata_offset)

Replace this table’s data by copying from a set of column arrays.

Clears the data columns of this table and then copies column data from the specified set of arrays. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters
  • self – A pointer to a tsk_migration_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • left – The array of double left values to be copied.

  • right – The array of double right values to be copied.

  • node – The array of tsk_id_t node values to be copied.

  • source – The array of tsk_id_t source values to be copied.

  • dest – The array of tsk_id_t dest values to be copied.

  • time – The array of double time values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns

Return 0 on success or a negative value on failure.

int tsk_migration_table_append_columns(tsk_migration_table_t *self, tsk_size_t num_rows, const double *left, const double *right, const tsk_id_t *node, const tsk_id_t *source, const tsk_id_t *dest, const double *time, const char *metadata, const tsk_size_t *metadata_offset)

Extends this table by copying from a set of column arrays.

Copies column data from the specified set of arrays to create new rows at the end of the table. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters
  • self – A pointer to a tsk_migration_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays

  • left – The array of double left values to be copied.

  • right – The array of double right values to be copied.

  • node – The array of tsk_id_t node values to be copied.

  • source – The array of tsk_id_t source values to be copied.

  • dest – The array of tsk_id_t dest values to be copied.

  • time – The array of double time values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns

Return 0 on success or a negative value on failure.

int tsk_migration_table_set_max_rows_increment(tsk_migration_table_t *self, tsk_size_t max_rows_increment)

Controls the pre-allocation strategy for this table.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_migration_table_t object.

  • max_rows_increment – The number of rows to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

int tsk_migration_table_set_max_metadata_length_increment(tsk_migration_table_t *self, tsk_size_t max_metadata_length_increment)

Controls the pre-allocation strategy for the metadata column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_migration_table_t object.

  • max_metadata_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

Sites

struct tsk_site_t

A single site defined by a row in the site table.

See the data model section for the definition of a site and its properties.

Public Members

tsk_id_t id

Non-negative ID value corresponding to table row.

double position

Position coordinate.

const char *ancestral_state

Ancestral state.

tsk_size_t ancestral_state_length

Ancestral state length in bytes.

const char *metadata

Metadata.

tsk_size_t metadata_length

Metadata length in bytes.

const tsk_mutation_t *mutations

An array of this site’s mutations.

tsk_size_t mutations_length

The number of mutations at this site.

struct tsk_site_table_t

The site table.

See the site table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows

The number of rows in this table.

tsk_size_t metadata_length

The total length of the metadata column.

double *position

The position column.

char *ancestral_state

The ancestral_state column.

tsk_size_t *ancestral_state_offset

The ancestral_state_offset column.

char *metadata

The metadata column.

tsk_size_t *metadata_offset

The metadata_offset column.

char *metadata_schema

The metadata schema.

int tsk_site_table_init(tsk_site_table_t *self, tsk_flags_t options)

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Parameters
  • self – A pointer to an uninitialised tsk_site_table_t object.

  • options – Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

int tsk_site_table_free(tsk_site_table_t *self)

Free the internal memory for the specified table.

Parameters
Returns

Always returns 0.

tsk_id_t tsk_site_table_add_row(tsk_site_table_t *self, double position, const char *ancestral_state, tsk_size_t ancestral_state_length, const char *metadata, tsk_size_t metadata_length)

Adds a row to this site table.

Add a new site with the specified position, ancestral_state and metadata to the table. Copies of ancestral_state and metadata are immediately taken. See the table definition for details of the columns in this table.

Parameters
  • self – A pointer to a tsk_site_table_t object.

  • position – The position coordinate for the new site.

  • ancestral_state – The ancestral_state for the new site.

  • ancestral_state_length – The length of the ancestral_state in bytes.

  • metadata – The metadata to be associated with the new site. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns

Return the ID of the newly added site on success, or a negative value on failure.

int tsk_site_table_update_row(tsk_site_table_t *self, tsk_id_t index, double position, const char *ancestral_state, tsk_size_t ancestral_state_length, const char *metadata, tsk_size_t metadata_length)

Updates the row at the specified index.

Rewrite the row at the specified index in this table to use the specified values. Copies of the ancestral_state and metadata parameters are taken immediately. See the table definition for details of the columns in this table.

Warning

Because of the way that ragged columns are encoded, this method requires a full rewrite of the internal column memory in worst case, and would therefore be inefficient for bulk updates for such columns. However, if the sizes of all ragged column values are unchanged in the updated row, this method is guaranteed to only update the memory for the row in question.

Parameters
  • self – A pointer to a tsk_site_table_t object.

  • index – The row to update.

  • position – The position coordinate for the site.

  • ancestral_state – The ancestral_state for the site.

  • ancestral_state_length – The length of the ancestral_state in bytes.

  • metadata – The metadata to be associated with the site. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns

Return 0 on success or a negative value on failure.

int tsk_site_table_clear(tsk_site_table_t *self)

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_site_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Parameters
Returns

Return 0 on success or a negative value on failure.

int tsk_site_table_truncate(tsk_site_table_t *self, tsk_size_t num_rows)

Truncates this table so that only the first num_rows are retained.

Parameters
  • self – A pointer to a tsk_site_table_t object.

  • num_rows – The number of rows to retain in the table.

Returns

Return 0 on success or a negative value on failure.

int tsk_site_table_extend(tsk_site_table_t *self, const tsk_site_table_t *other, tsk_size_t num_rows, const tsk_id_t *row_indexes, tsk_flags_t options)

Extends this table by appending rows copied from another table.

Appends the rows at the specified indexes from the table other to the end of this table. Row indexes can be repeated and in any order. If row_indexes is NULL, append the first num_rows from other to this table. Note that metadata is copied as-is and is not checked for compatibility with any existing schema on this table.

Parameters
  • self – A pointer to a tsk_site_table_t object where rows are to be added.

  • other – A pointer to a tsk_site_table_t object where rows are copied from.

  • num_rows – The number of rows from other to append to this table.

  • row_indexes – Array of row indexes in other. If NULL is passed then the first num_rows of other are used.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

bool tsk_site_table_equals(const tsk_site_table_t *self, const tsk_site_table_t *other, tsk_flags_t options)

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

Parameters
Returns

Return true if the specified table is equal to this table.

int tsk_site_table_copy(const tsk_site_table_t *self, tsk_site_table_t *dest, tsk_flags_t options)

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Parameters
  • self – A pointer to a tsk_site_table_t object.

  • dest – A pointer to a tsk_site_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised site table. If not, it must be an uninitialised site table.

  • options – Bitwise option flags.

Returns

Return 0 on success or a negative value on failure.

int tsk_site_table_get_row(const tsk_site_table_t *self, tsk_id_t index, tsk_site_t *row)

Get the row at the specified index.

Updates the specified site struct to reflect the values in the specified row.

This function always sets the mutations and mutations_length fields in the parameter tsk_site_t to NULL and 0 respectively. To get access to the mutations for a particular site, please use the tree sequence method, tsk_treeseq_get_site().

Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Parameters
  • self – A pointer to a tsk_site_table_t object.

  • index – The requested table row.

  • row – A pointer to a tsk_site_t struct that is updated to reflect the values in the specified row.

Returns

Return 0 on success or a negative value on failure.

int tsk_site_table_set_metadata_schema(tsk_site_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Parameters
  • self – A pointer to a tsk_site_table_t object.

  • metadata_schema – A pointer to a char array.

  • metadata_schema_length – The size of the metadata schema in bytes.

Returns

Return 0 on success or a negative value on failure.

void tsk_site_table_print_state(const tsk_site_table_t *self, FILE *out)

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters
  • self – A pointer to a tsk_site_table_t object.

  • out – The stream to write the summary to.

int tsk_site_table_set_columns(tsk_site_table_t *self, tsk_size_t num_rows, const double *position, const char *ancestral_state, const tsk_size_t *ancestral_state_offset, const char *metadata, const tsk_size_t *metadata_offset)

Replace this table’s data by copying from a set of column arrays.

Clears the data columns of this table and then copies column data from the specified set of arrays. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters
  • self – A pointer to a tsk_site_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • position – The array of double position values to be copied.

  • ancestral_state – The array of char ancestral state values to be copied.

  • ancestral_state_offset – The array of tsk_size_t ancestral state offset values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns

Return 0 on success or a negative value on failure.

int tsk_site_table_append_columns(tsk_site_table_t *self, tsk_size_t num_rows, const double *position, const char *ancestral_state, const tsk_size_t *ancestral_state_offset, const char *metadata, const tsk_size_t *metadata_offset)

Extends this table by copying from a set of column arrays.

Copies column data from the specified set of arrays to create new rows at the end of the table. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters
  • self – A pointer to a tsk_site_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • position – The array of double position values to be copied.

  • ancestral_state – The array of char ancestral state values to be copied.

  • ancestral_state_offset – The array of tsk_size_t ancestral state offset values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns

Return 0 on success or a negative value on failure.

int tsk_site_table_set_max_rows_increment(tsk_site_table_t *self, tsk_size_t max_rows_increment)

Controls the pre-allocation strategy for this table.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_site_table_t object.

  • max_rows_increment – The number of rows to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

int tsk_site_table_set_max_metadata_length_increment(tsk_site_table_t *self, tsk_size_t max_metadata_length_increment)

Controls the pre-allocation strategy for the metadata column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_site_table_t object.

  • max_metadata_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

int tsk_site_table_set_max_ancestral_state_length_increment(tsk_site_table_t *self, tsk_size_t max_ancestral_state_length_increment)

Controls the pre-allocation strategy for the ancestral_state column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_site_table_t object.

  • max_ancestral_state_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

Mutations

struct tsk_mutation_t

A single mutation defined by a row in the mutation table.

See the data model section for the definition of a mutation and its properties.

Public Members

tsk_id_t id

Non-negative ID value corresponding to table row.

tsk_id_t site

Site ID.

tsk_id_t node

Node ID.

tsk_id_t parent

Parent mutation ID.

double time

Mutation time.

const char *derived_state

Derived state.

tsk_size_t derived_state_length

Size of the derived state in bytes.

const char *metadata

Metadata.

tsk_size_t metadata_length

Size of the metadata in bytes.

tsk_id_t edge

The ID of the edge that this mutation lies on, or TSK_NULL if there is no corresponding edge.

struct tsk_mutation_table_t

The mutation table.

See the mutation table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows

The number of rows in this table.

tsk_size_t metadata_length

The total length of the metadata column.

tsk_id_t *node

The node column.

tsk_id_t *site

The site column.

tsk_id_t *parent

The parent column.

double *time

The time column.

char *derived_state

The derived_state column.

tsk_size_t *derived_state_offset

The derived_state_offset column.

char *metadata

The metadata column.

tsk_size_t *metadata_offset

The metadata_offset column.

char *metadata_schema

The metadata schema.

int tsk_mutation_table_init(tsk_mutation_table_t *self, tsk_flags_t options)

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Parameters
  • self – A pointer to an uninitialised tsk_mutation_table_t object.

  • options – Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

int tsk_mutation_table_free(tsk_mutation_table_t *self)

Free the internal memory for the specified table.

Parameters
Returns

Always returns 0.

tsk_id_t tsk_mutation_table_add_row(tsk_mutation_table_t *self, tsk_id_t site, tsk_id_t node, tsk_id_t parent, double time, const char *derived_state, tsk_size_t derived_state_length, const char *metadata, tsk_size_t metadata_length)

Adds a row to this mutation table.

Add a new mutation with the specified site, parent, derived_state and metadata to the table. Copies of derived_state and metadata are immediately taken. See the table definition for details of the columns in this table.

Parameters
  • self – A pointer to a tsk_mutation_table_t object.

  • site – The site ID for the new mutation.

  • node – The ID of the node this mutation occurs over.

  • parent – The ID of the parent mutation.

  • time – The time of the mutation.

  • derived_state – The derived_state for the new mutation.

  • derived_state_length – The length of the derived_state in bytes.

  • metadata – The metadata to be associated with the new mutation. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns

Return the ID of the newly added mutation on success, or a negative value on failure.

int tsk_mutation_table_update_row(tsk_mutation_table_t *self, tsk_id_t index, tsk_id_t site, tsk_id_t node, tsk_id_t parent, double time, const char *derived_state, tsk_size_t derived_state_length, const char *metadata, tsk_size_t metadata_length)

Updates the row at the specified index.

Rewrite the row at the specified index in this table to use the specified values. Copies of the derived_state and metadata parameters are taken immediately. See the table definition for details of the columns in this table.

Warning

Because of the way that ragged columns are encoded, this method requires a full rewrite of the internal column memory in worst case, and would therefore be inefficient for bulk updates for such columns. However, if the sizes of all ragged column values are unchanged in the updated row, this method is guaranteed to only update the memory for the row in question.

Parameters
  • self – A pointer to a tsk_mutation_table_t object.

  • index – The row to update.

  • site – The site ID for the mutation.

  • node – The ID of the node this mutation occurs over.

  • parent – The ID of the parent mutation.

  • time – The time of the mutation.

  • derived_state – The derived_state for the mutation.

  • derived_state_length – The length of the derived_state in bytes.

  • metadata – The metadata to be associated with the mutation. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns

Return 0 on success or a negative value on failure.

int tsk_mutation_table_clear(tsk_mutation_table_t *self)

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_mutation_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Parameters
Returns

Return 0 on success or a negative value on failure.

int tsk_mutation_table_truncate(tsk_mutation_table_t *self, tsk_size_t num_rows)

Truncates this table so that only the first num_rows are retained.

Parameters
  • self – A pointer to a tsk_mutation_table_t object.

  • num_rows – The number of rows to retain in the table.

Returns

Return 0 on success or a negative value on failure.

int tsk_mutation_table_extend(tsk_mutation_table_t *self, const tsk_mutation_table_t *other, tsk_size_t num_rows, const tsk_id_t *row_indexes, tsk_flags_t options)

Extends this table by appending rows copied from another table.

Appends the rows at the specified indexes from the table other to the end of this table. Row indexes can be repeated and in any order. If row_indexes is NULL, append the first num_rows from other to this table. Note that metadata is copied as-is and is not checked for compatibility with any existing schema on this table.

Parameters
  • self – A pointer to a tsk_mutation_table_t object where rows are to be added.

  • other – A pointer to a tsk_mutation_table_t object where rows are copied from.

  • num_rows – The number of rows from other to append to this table.

  • row_indexes – Array of row indexes in other. If NULL is passed then the first num_rows of other are used.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

bool tsk_mutation_table_equals(const tsk_mutation_table_t *self, const tsk_mutation_table_t *other, tsk_flags_t options)

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

Parameters
Returns

Return true if the specified table is equal to this table.

int tsk_mutation_table_copy(const tsk_mutation_table_t *self, tsk_mutation_table_t *dest, tsk_flags_t options)

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Parameters
  • self – A pointer to a tsk_mutation_table_t object.

  • dest – A pointer to a tsk_mutation_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised mutation table. If not, it must be an uninitialised mutation table.

  • options – Bitwise option flags.

Returns

Return 0 on success or a negative value on failure.

int tsk_mutation_table_get_row(const tsk_mutation_table_t *self, tsk_id_t index, tsk_mutation_t *row)

Get the row at the specified index.

Updates the specified mutation struct to reflect the values in the specified row.

This function always sets the edge field in parameter tsk_mutation_t to TSK_NULL. To determine the ID of the edge associated with a particular mutation, please use the tree sequence method, tsk_treeseq_get_mutation().

Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Parameters
  • self – A pointer to a tsk_mutation_table_t object.

  • index – The requested table row.

  • row – A pointer to a tsk_mutation_t struct that is updated to reflect the values in the specified row.

Returns

Return 0 on success or a negative value on failure.

int tsk_mutation_table_set_metadata_schema(tsk_mutation_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Parameters
  • self – A pointer to a tsk_mutation_table_t object.

  • metadata_schema – A pointer to a char array.

  • metadata_schema_length – The size of the metadata schema in bytes.

Returns

Return 0 on success or a negative value on failure.

void tsk_mutation_table_print_state(const tsk_mutation_table_t *self, FILE *out)

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters
int tsk_mutation_table_set_columns(tsk_mutation_table_t *self, tsk_size_t num_rows, const tsk_id_t *site, const tsk_id_t *node, const tsk_id_t *parent, const double *time, const char *derived_state, const tsk_size_t *derived_state_offset, const char *metadata, const tsk_size_t *metadata_offset)

Replace this table’s data by copying from a set of column arrays.

Clears the data columns of this table and then copies column data from the specified set of arrays. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters
  • self – A pointer to a tsk_mutation_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • site – The array of tsk_id_t site values to be copied.

  • node – The array of tsk_id_t node values to be copied.

  • parent – The array of tsk_id_t parent values to be copied.

  • time – The array of double time values to be copied.

  • derived_state – The array of char derived_state values to be copied.

  • derived_state_offset – The array of tsk_size_t derived state offset values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns

Return 0 on success or a negative value on failure.

int tsk_mutation_table_append_columns(tsk_mutation_table_t *self, tsk_size_t num_rows, const tsk_id_t *site, const tsk_id_t *node, const tsk_id_t *parent, const double *time, const char *derived_state, const tsk_size_t *derived_state_offset, const char *metadata, const tsk_size_t *metadata_offset)

Extends this table by copying from a set of column arrays.

Copies column data from the specified set of arrays to create new rows at the end of the table. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters
  • self – A pointer to a tsk_mutation_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • site – The array of tsk_id_t site values to be copied.

  • node – The array of tsk_id_t node values to be copied.

  • parent – The array of tsk_id_t parent values to be copied.

  • time – The array of double time values to be copied.

  • derived_state – The array of char derived_state values to be copied.

  • derived_state_offset – The array of tsk_size_t derived state offset values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns

Return 0 on success or a negative value on failure.

int tsk_mutation_table_set_max_rows_increment(tsk_mutation_table_t *self, tsk_size_t max_rows_increment)

Controls the pre-allocation strategy for this table.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_mutation_table_t object.

  • max_rows_increment – The number of rows to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

int tsk_mutation_table_set_max_metadata_length_increment(tsk_mutation_table_t *self, tsk_size_t max_metadata_length_increment)

Controls the pre-allocation strategy for the metadata column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_mutation_table_t object.

  • max_metadata_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

int tsk_mutation_table_set_max_derived_state_length_increment(tsk_mutation_table_t *self, tsk_size_t max_derived_state_length_increment)

Controls the pre-allocation strategy for the derived_state column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_mutation_table_t object.

  • max_derived_state_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

Populations

struct tsk_population_t

A single population defined by a row in the population table.

See the data model section for the definition of a population and its properties.

Public Members

tsk_id_t id

Non-negative ID value corresponding to table row.

const char *metadata

Metadata.

tsk_size_t metadata_length

Metadata length in bytes.

struct tsk_population_table_t

The population table.

See the population table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows

The number of rows in this table.

tsk_size_t metadata_length

The total length of the metadata column.

char *metadata

The metadata column.

tsk_size_t *metadata_offset

The metadata_offset column.

char *metadata_schema

The metadata schema.

int tsk_population_table_init(tsk_population_table_t *self, tsk_flags_t options)

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Parameters
  • self – A pointer to an uninitialised tsk_population_table_t object.

  • options – Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

int tsk_population_table_free(tsk_population_table_t *self)

Free the internal memory for the specified table.

Parameters
Returns

Always returns 0.

tsk_id_t tsk_population_table_add_row(tsk_population_table_t *self, const char *metadata, tsk_size_t metadata_length)

Adds a row to this population table.

Add a new population with the specified metadata to the table. A copy of the metadata is immediately taken. See the table definition for details of the columns in this table.

Parameters
  • self – A pointer to a tsk_population_table_t object.

  • metadata – The metadata to be associated with the new population. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns

Return the ID of the newly added population on success, or a negative value on failure.

int tsk_population_table_update_row(tsk_population_table_t *self, tsk_id_t index, const char *metadata, tsk_size_t metadata_length)

Updates the row at the specified index.

Rewrite the row at the specified index in this table to use the specified values. A copy of the metadata parameter is taken immediately. See the table definition for details of the columns in this table.

Warning

Because of the way that ragged columns are encoded, this method requires a full rewrite of the internal column memory in worst case, and would therefore be inefficient for bulk updates for such columns. However, if the sizes of all ragged column values are unchanged in the updated row, this method is guaranteed to only update the memory for the row in question.

Parameters
  • self – A pointer to a tsk_population_table_t object.

  • index – The row to update.

  • metadata – The metadata to be associated with the population. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns

Return 0 on success or a negative value on failure.

int tsk_population_table_clear(tsk_population_table_t *self)

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_population_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Parameters
Returns

Return 0 on success or a negative value on failure.

int tsk_population_table_truncate(tsk_population_table_t *self, tsk_size_t num_rows)

Truncates this table so that only the first num_rows are retained.

Parameters
  • self – A pointer to a tsk_population_table_t object.

  • num_rows – The number of rows to retain in the table.

Returns

Return 0 on success or a negative value on failure.

int tsk_population_table_extend(tsk_population_table_t *self, const tsk_population_table_t *other, tsk_size_t num_rows, const tsk_id_t *row_indexes, tsk_flags_t options)

Extends this table by appending rows copied from another table.

Appends the rows at the specified indexes from the table other to the end of this table. Row indexes can be repeated and in any order. If row_indexes is NULL, append the first num_rows from other to this table. Note that metadata is copied as-is and is not checked for compatibility with any existing schema on this table.

Parameters
  • self – A pointer to a tsk_population_table_t object where rows are to be added.

  • other – A pointer to a tsk_population_table_t object where rows are copied from.

  • num_rows – The number of rows from other to append to this table.

  • row_indexes – Array of row indexes in other. If NULL is passed then the first num_rows of other are used.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

bool tsk_population_table_equals(const tsk_population_table_t *self, const tsk_population_table_t *other, tsk_flags_t options)

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

  • TSK_CMP_IGNORE_METADATA

    Do not include metadata in the comparison. Note that as metadata is the only column in the population table, two population tables are considered equal if they have the same number of rows if this flag is specified.

Parameters
Returns

Return true if the specified table is equal to this table.

int tsk_population_table_copy(const tsk_population_table_t *self, tsk_population_table_t *dest, tsk_flags_t options)

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Parameters
  • self – A pointer to a tsk_population_table_t object.

  • dest – A pointer to a tsk_population_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised population table. If not, it must be an uninitialised population table.

  • options – Bitwise option flags.

Returns

Return 0 on success or a negative value on failure.

int tsk_population_table_get_row(const tsk_population_table_t *self, tsk_id_t index, tsk_population_t *row)

Get the row at the specified index.

Updates the specified population struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Parameters
  • self – A pointer to a tsk_population_table_t object.

  • index – The requested table row.

  • row – A pointer to a tsk_population_t struct that is updated to reflect the values in the specified row.

Returns

Return 0 on success or a negative value on failure.

int tsk_population_table_set_metadata_schema(tsk_population_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Parameters
  • self – A pointer to a tsk_population_table_t object.

  • metadata_schema – A pointer to a char array.

  • metadata_schema_length – The size of the metadata schema in bytes.

Returns

Return 0 on success or a negative value on failure.

void tsk_population_table_print_state(const tsk_population_table_t *self, FILE *out)

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters
int tsk_population_table_set_columns(tsk_population_table_t *self, tsk_size_t num_rows, const char *metadata, const tsk_size_t *metadata_offset)

Replace this table’s data by copying from a set of column arrays.

Clears the data columns of this table and then copies column data from the specified set of arrays. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters
  • self – A pointer to a tsk_population_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns

Return 0 on success or a negative value on failure.

int tsk_population_table_append_columns(tsk_population_table_t *self, tsk_size_t num_rows, const char *metadata, const tsk_size_t *metadata_offset)

Extends this table by copying from a set of column arrays.

Copies column data from the specified set of arrays to create new rows at the end of the table. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters
  • self – A pointer to a tsk_population_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns

Return 0 on success or a negative value on failure.

int tsk_population_table_set_max_rows_increment(tsk_population_table_t *self, tsk_size_t max_rows_increment)

Controls the pre-allocation strategy for this table.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_population_table_t object.

  • max_rows_increment – The number of rows to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

int tsk_population_table_set_max_metadata_length_increment(tsk_population_table_t *self, tsk_size_t max_metadata_length_increment)

Controls the pre-allocation strategy for the metadata column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_population_table_t object.

  • max_metadata_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

Provenances

struct tsk_provenance_t

A single provenance defined by a row in the provenance table.

See the data model section for the definition of a provenance object and its properties. See the Provenance section for more information on how provenance records should be structured.

Public Members

tsk_id_t id

Non-negative ID value corresponding to table row.

const char *timestamp

The timestamp.

tsk_size_t timestamp_length

The timestamp length in bytes.

const char *record

The record.

tsk_size_t record_length

The record length in bytes.

struct tsk_provenance_table_t

The provenance table.

See the provenance table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows

The number of rows in this table.

tsk_size_t timestamp_length

The total length of the timestamp column.

tsk_size_t record_length

The total length of the record column.

char *timestamp

The timestamp column.

tsk_size_t *timestamp_offset

The timestamp_offset column.

char *record

The record column.

tsk_size_t *record_offset

The record_offset column.

int tsk_provenance_table_init(tsk_provenance_table_t *self, tsk_flags_t options)

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Parameters
  • self – A pointer to an uninitialised tsk_provenance_table_t object.

  • options – Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

int tsk_provenance_table_free(tsk_provenance_table_t *self)

Free the internal memory for the specified table.

Parameters
Returns

Always returns 0.

tsk_id_t tsk_provenance_table_add_row(tsk_provenance_table_t *self, const char *timestamp, tsk_size_t timestamp_length, const char *record, tsk_size_t record_length)

Adds a row to this provenance table.

Add a new provenance with the specified timestamp and record to the table. Copies of the timestamp and record are immediately taken. See the table definition for details of the columns in this table.

Parameters
  • self – A pointer to a tsk_provenance_table_t object.

  • timestamp – The timestamp to be associated with the new provenance. This is a pointer to arbitrary memory. Can be NULL if timestamp_length is 0.

  • timestamp_length – The size of the timestamp array in bytes.

  • record – The record to be associated with the new provenance. This is a pointer to arbitrary memory. Can be NULL if record_length is 0.

  • record_length – The size of the record array in bytes.

Returns

Return the ID of the newly added provenance on success, or a negative value on failure.

int tsk_provenance_table_update_row(tsk_provenance_table_t *self, tsk_id_t index, const char *timestamp, tsk_size_t timestamp_length, const char *record, tsk_size_t record_length)

Updates the row at the specified index.

Rewrite the row at the specified index in this table to use the specified values. Copies of the timestamp and record parameters are taken immediately. See the table definition for details of the columns in this table.

Warning

Because of the way that ragged columns are encoded, this method requires a full rewrite of the internal column memory in worst case, and would therefore be inefficient for bulk updates for such columns. However, if the sizes of all ragged column values are unchanged in the updated row, this method is guaranteed to only update the memory for the row in question.

Parameters
  • self – A pointer to a tsk_provenance_table_t object.

  • index – The row to update.

  • timestamp – The timestamp to be associated with new provenance. This is a pointer to arbitrary memory. Can be NULL if timestamp_length is 0.

  • timestamp_length – The size of the timestamp array in bytes.

  • record – The record to be associated with the provenance. This is a pointer to arbitrary memory. Can be NULL if record_length is 0.

  • record_length – The size of the record array in bytes.

Returns

Return 0 on success or a negative value on failure.

int tsk_provenance_table_clear(tsk_provenance_table_t *self)

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_provenance_table_free() to free the table’s internal resources.

Parameters
Returns

Return 0 on success or a negative value on failure.

int tsk_provenance_table_truncate(tsk_provenance_table_t *self, tsk_size_t num_rows)

Truncates this table so that only the first num_rows are retained.

Parameters
  • self – A pointer to a tsk_provenance_table_t object.

  • num_rows – The number of rows to retain in the table.

Returns

Return 0 on success or a negative value on failure.

int tsk_provenance_table_extend(tsk_provenance_table_t *self, const tsk_provenance_table_t *other, tsk_size_t num_rows, const tsk_id_t *row_indexes, tsk_flags_t options)

Extends this table by appending rows copied from another table.

Appends the rows at the specified indexes from the table other to the end of this table. Row indexes can be repeated and in any order. If row_indexes is NULL, append the first num_rows from other to this table.

Parameters
  • self – A pointer to a tsk_provenance_table_t object where rows are to be added.

  • other – A pointer to a tsk_provenance_table_t object where rows are copied from.

  • num_rows – The number of rows from other to append to this table.

  • row_indexes – Array of row indexes in other. If NULL is passed then the first num_rows of other are used.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

bool tsk_provenance_table_equals(const tsk_provenance_table_t *self, const tsk_provenance_table_t *other, tsk_flags_t options)

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns.

Parameters
Returns

Return true if the specified table is equal to this table.

int tsk_provenance_table_copy(const tsk_provenance_table_t *self, tsk_provenance_table_t *dest, tsk_flags_t options)

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Parameters
  • self – A pointer to a tsk_provenance_table_t object.

  • dest – A pointer to a tsk_provenance_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised provenance table. If not, it must be an uninitialised provenance table.

  • options – Bitwise option flags.

Returns

Return 0 on success or a negative value on failure.

int tsk_provenance_table_get_row(const tsk_provenance_table_t *self, tsk_id_t index, tsk_provenance_t *row)

Get the row at the specified index.

Updates the specified provenance struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Parameters
  • self – A pointer to a tsk_provenance_table_t object.

  • index – The requested table row.

  • row – A pointer to a tsk_provenance_t struct that is updated to reflect the values in the specified row.

Returns

Return 0 on success or a negative value on failure.

void tsk_provenance_table_print_state(const tsk_provenance_table_t *self, FILE *out)

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters
int tsk_provenance_table_set_columns(tsk_provenance_table_t *self, tsk_size_t num_rows, const char *timestamp, const tsk_size_t *timestamp_offset, const char *record, const tsk_size_t *record_offset)

Replace this table’s data by copying from a set of column arrays.

Clears the data columns of this table and then copies column data from the specified set of arrays. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters
  • self – A pointer to a tsk_provenance_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • timestamp – The array of char timestamp values to be copied.

  • timestamp_offset – The array of tsk_size_t timestamp offset values to be copied.

  • record – The array of char record values to be copied.

  • record_offset – The array of tsk_size_t record offset values to be copied.

Returns

Return 0 on success or a negative value on failure.

int tsk_provenance_table_append_columns(tsk_provenance_table_t *self, tsk_size_t num_rows, const char *timestamp, const tsk_size_t *timestamp_offset, const char *record, const tsk_size_t *record_offset)

Extends this table by copying from a set of column arrays.

Copies column data from the specified set of arrays to create new rows at the end of the table. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters
  • self – A pointer to a tsk_provenance_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • timestamp – The array of char timestamp values to be copied.

  • timestamp_offset – The array of tsk_size_t timestamp offset values to be copied.

  • record – The array of char record values to be copied.

  • record_offset – The array of tsk_size_t record offset values to be copied.

Returns

Return 0 on success or a negative value on failure.

int tsk_provenance_table_set_max_rows_increment(tsk_provenance_table_t *self, tsk_size_t max_rows_increment)

Controls the pre-allocation strategy for this table.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_provenance_table_t object.

  • max_rows_increment – The number of rows to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

int tsk_provenance_table_set_max_timestamp_length_increment(tsk_provenance_table_t *self, tsk_size_t max_timestamp_length_increment)

Controls the pre-allocation strategy for the timestamp column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_provenance_table_t object.

  • max_timestamp_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

int tsk_provenance_table_set_max_record_length_increment(tsk_provenance_table_t *self, tsk_size_t max_record_length_increment)

Controls the pre-allocation strategy for the record column.

Set a fixed pre-allocation size, use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters
  • self – A pointer to a tsk_provenance_table_t object.

  • max_record_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns

Return 0 on success or a negative value on failure.

Table indexes

Along with the tree sequence ordering requirements, the Table indexes allow us to take a table collection and efficiently operate on the trees defined within it. This section defines the rules for safely operating on table indexes and their life-cycle.

The edge index used for tree generation consists of two arrays, each holding N edge IDs (where N is the size of the edge table). When the index is computed using tsk_table_collection_build_index(), we store the current size of the edge table along with the two arrays of edge IDs. The function tsk_table_collection_has_index() then returns true iff (a) both of these arrays are not NULL and (b) the stored number of edges is the same as the current size of the edge table.

Updating the edge table does not automatically invalidate the indexes. Thus, if we call tsk_edge_table_clear() on an edge table which has an index, this index will still exist. However, it will not be considered a valid index by tsk_table_collection_has_index() because of the size mismatch. Similarly for functions that increase the size of the table. Note that it is possible then to have tsk_table_collection_has_index() return true, but the index is not actually valid, if, for example, the user has manipulated the node and edge tables to describe a different topology, which happens to have the same number of edges. The behaviour of methods that use the indexes will be undefined in this case.

Thus, if you are manipulating an existing table collection that may be indexed, it is always recommended to call tsk_table_collection_drop_index() first.

Tree sequences

struct tsk_treeseq_t

The tree sequence object.

Public Members

tsk_table_collection_t *tables

The table collection underlying this tree sequence, This table collection must be treated as read-only, and any changes to it will lead to undefined behaviour.

int tsk_treeseq_init(tsk_treeseq_t *self, tsk_table_collection_t *tables, tsk_flags_t options)

Initialises the tree sequence based on the specified table collection.

This method will copy the supplied table collection unless TSK_TAKE_OWNERSHIP is specified. The table collection will be checked for integrity and index maps built.

This must be called before any operations are performed on the tree sequence. See the API structure for details on how objects are initialised and freed.

If specified, TSK_TAKE_OWNERSHIP takes immediate ownership of the tables, regardless of error conditions.

Options

Parameters
Returns

Return 0 on success or a negative value on failure.

int tsk_treeseq_load(tsk_treeseq_t *self, const char *filename, tsk_flags_t options)

Load a tree sequence from a file path.

Loads the data from the specified file into this tree sequence. The tree sequence is also initialised. The resources allocated must be freed using tsk_treeseq_free() even in error conditions.

Works similarly to tsk_table_collection_load() please see that function’s documentation for details and options.

Examples

int ret;
tsk_treeseq_t ts;
ret = tsk_treeseq_load(&ts, "data.trees", 0);
if (ret != 0) {
    fprintf(stderr, "Load error:%s\n", tsk_strerror(ret));
    exit(EXIT_FAILURE);
}

Parameters
  • self – A pointer to an uninitialised tsk_treeseq_t object

  • filename – A NULL terminated string containing the filename.

  • options – Bitwise options. See above for details.

Returns

Return 0 on success or a negative value on failure.

int tsk_treeseq_loadf(tsk_treeseq_t *self, FILE *file, tsk_flags_t options)

Load a tree sequence from a stream.

Loads a tree sequence from the specified file stream. The tree sequence is also initialised. The resources allocated must be freed using tsk_treeseq_free() even in error conditions.

Works similarly to tsk_table_collection_loadf() please see that function’s documentation for details and options.

Parameters
  • self – A pointer to an uninitialised tsk_treeseq_t object.

  • file – A FILE stream opened in an appropriate mode for reading (e.g. “r”, “r+” or “w+”) positioned at the beginning of a tree sequence definition.

  • options – Bitwise options. See above for details.

Returns

Return 0 on success or a negative value on failure.

int tsk_treeseq_dump(const tsk_treeseq_t *self, const char *filename, tsk_flags_t options)

Write a tree sequence to file.

Writes the data from this tree sequence to the specified file.

If an error occurs the file path is deleted, ensuring that only complete and well formed files will be written.

Parameters
  • self – A pointer to an initialised tsk_treeseq_t object.

  • filename – A NULL terminated string containing the filename.

  • options – Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

int tsk_treeseq_dumpf(const tsk_treeseq_t *self, FILE *file, tsk_flags_t options)

Write a tree sequence to a stream.

Writes the data from this tree sequence to the specified FILE stream. Semantics are identical to tsk_treeseq_dump().

Please see the File streaming section for an example of how to sequentially dump and load tree sequences from a stream.

Parameters
  • self – A pointer to an initialised tsk_treeseq_t object.

  • file – A FILE stream opened in an appropriate mode for writing (e.g. “w”, “a”, “r+” or “w+”).

  • options – Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

int tsk_treeseq_copy_tables(const tsk_treeseq_t *self, tsk_table_collection_t *tables, tsk_flags_t options)

Copies the state of the table collection underlying this tree sequence into the specified destination table collection.

By default the method initialises the specified destination table collection. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Parameters
  • self – A pointer to a tsk_treeseq_t object.

  • tables – A pointer to a tsk_table_collection_t object. If the TSK_NO_INIT option is specified, this must be an initialised table collection. If not, it must be an uninitialised table collection.

  • options – Bitwise option flags.

Returns

Return 0 on success or a negative value on failure.

int tsk_treeseq_free(tsk_treeseq_t *self)

Free the internal memory for the specified tree sequence.

Parameters
Returns

Always returns 0.

void tsk_treeseq_print_state(const tsk_treeseq_t *self, FILE *out)

Print out the state of this tree sequence to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters
  • self – A pointer to a tsk_treeseq_t object.

  • out – The stream to write the summary to.

tsk_size_t tsk_treeseq_get_num_nodes(const tsk_treeseq_t *self)

Get the number of nodes.

Returns the number of nodes in this tree sequence.

Parameters
Returns

Returns the number of nodes.

tsk_size_t tsk_treeseq_get_num_edges(const tsk_treeseq_t *self)

Get the number of edges.

Returns the number of edges in this tree sequence.

Parameters
Returns

Returns the number of edges.

tsk_size_t tsk_treeseq_get_num_migrations(const tsk_treeseq_t *self)

Get the number of migrations.

Returns the number of migrations in this tree sequence.

Parameters
Returns

Returns the number of migrations.

tsk_size_t tsk_treeseq_get_num_sites(const tsk_treeseq_t *self)

Get the number of sites.

Returns the number of sites in this tree sequence.

Parameters
Returns

Returns the number of sites.

tsk_size_t tsk_treeseq_get_num_mutations(const tsk_treeseq_t *self)

Get the number of mutations.

Returns the number of mutations in this tree sequence.

Parameters
Returns

Returns the number of mutations.

tsk_size_t tsk_treeseq_get_num_provenances(const tsk_treeseq_t *self)

Get the number of provenances.

Returns the number of provenances in this tree sequence.

Parameters
Returns

Returns the number of provenances.

tsk_size_t tsk_treeseq_get_num_populations(const tsk_treeseq_t *self)

Get the number of populations.

Returns the number of populations in this tree sequence.

Parameters
Returns

Returns the number of populations.

tsk_size_t tsk_treeseq_get_num_individuals(const tsk_treeseq_t *self)

Get the number of individuals.

Returns the number of individuals in this tree sequence.

Parameters
Returns

Returns the number of individuals.

tsk_size_t tsk_treeseq_get_num_trees(const tsk_treeseq_t *self)

Return the number of trees in this tree sequence.

This is a constant time operation.

Parameters
Returns

The number of trees in the tree sequence.

tsk_size_t tsk_treeseq_get_num_samples(const tsk_treeseq_t *self)

Get the number of samples.

Returns the number of nodes marked as samples in this tree sequence.

Parameters
Returns

Returns the number of samples.

const char *tsk_treeseq_get_metadata(const tsk_treeseq_t *self)

Get the top-level tree sequence metadata.

Returns a pointer to the metadata string, which is owned by the tree sequence and not null-terminated.

Parameters
Returns

Returns a pointer to the metadata.

tsk_size_t tsk_treeseq_get_metadata_length(const tsk_treeseq_t *self)

Get the length of top-level tree sequence metadata.

Returns the length of the metadata string.

Parameters
Returns

Returns the length of the metadata.

const char *tsk_treeseq_get_metadata_schema(const tsk_treeseq_t *self)

Get the top-level tree sequence metadata schema.

Returns a pointer to the metadata schema string, which is owned by the tree sequence and not null-terminated.

Parameters
Returns

Returns a pointer to the metadata schema.

tsk_size_t tsk_treeseq_get_metadata_schema_length(const tsk_treeseq_t *self)

Get the length of the top-level tree sequence metadata schema.

Returns the length of the metadata schema string.

Parameters
Returns

Returns the length of the metadata schema.

const char *tsk_treeseq_get_time_units(const tsk_treeseq_t *self)

Get the time units string.

Returns a pointer to the time units string, which is owned by the tree sequence and not null-terminated.

Parameters
Returns

Returns a pointer to the time units.

tsk_size_t tsk_treeseq_get_time_units_length(const tsk_treeseq_t *self)

Get the length of time units string.

Returns the length of the time units string.

Parameters
Returns

Returns the length of the time units.

const char *tsk_treeseq_get_file_uuid(const tsk_treeseq_t *self)

Get the file uuid.

Returns a pointer to the null-terminated file uuid string, which is owned by the tree sequence.

Parameters
Returns

Returns a pointer to the time units.

double tsk_treeseq_get_sequence_length(const tsk_treeseq_t *self)

Get the sequence length.

Returns the sequence length of this tree sequence

Parameters
Returns

Returns the sequence length.

const double *tsk_treeseq_get_breakpoints(const tsk_treeseq_t *self)

Get the breakpoints.

Returns an array of breakpoint locations, the array is owned by the tree sequence.

Parameters
Returns

Returns the pointer to the breakpoint array.

const tsk_id_t *tsk_treeseq_get_samples(const tsk_treeseq_t *self)

Get the samples.

Returns an array of ids of sample nodes in this tree sequence. I.e. nodes that have the TSK_NODE_IS_SAMPLE flag set. The array is owned by the tree sequence and should not be modified or free’d.

Parameters
Returns

Returns the pointer to the sample node id array.

const tsk_id_t *tsk_treeseq_get_sample_index_map(const tsk_treeseq_t *self)

Get the map of node id to sample index.

Returns the location of each node in the list of samples or TSK_NULL for nodes that are not samples.

Parameters
Returns

Returns the pointer to the array of sample indexes.

bool tsk_treeseq_is_sample(const tsk_treeseq_t *self, tsk_id_t u)

Check if a node is a sample.

Returns the sample status of a given node id.

Parameters
  • self – A pointer to a tsk_treeseq_t object.

  • u – The id of the node to be checked.

Returns

Returns true if the node is a sample.

bool tsk_treeseq_get_discrete_genome(const tsk_treeseq_t *self)

Get the discrete genome status.

If all the genomic locations in the tree sequence are discrete integer values then this flag will be true.

Parameters
Returns

Returns true if all genomic locations are discrete.

bool tsk_treeseq_get_discrete_time(const tsk_treeseq_t *self)

Get the discrete time status.

If all times in the tree sequence are discrete integer values then this flag will be true

Parameters
Returns

Returns true if all times are discrete.

int tsk_treeseq_get_node(const tsk_treeseq_t *self, tsk_id_t index, tsk_node_t *node)

Get a node by its index.

Copies a node from this tree sequence to the specified destination.

Parameters
  • self – A pointer to a tsk_treeseq_t object.

  • index – The node index to copy

  • node – A pointer to a tsk_node_t object.

Returns

Return 0 on success or a negative value on failure.

int tsk_treeseq_get_edge(const tsk_treeseq_t *self, tsk_id_t index, tsk_edge_t *edge)

Get a edge by its index.

Copies a edge from this tree sequence to the specified destination.

Parameters
  • self – A pointer to a tsk_treeseq_t object.

  • index – The edge index to copy

  • edge – A pointer to a tsk_edge_t object.

Returns

Return 0 on success or a negative value on failure.

int tsk_treeseq_get_migration(const tsk_treeseq_t *self, tsk_id_t index, tsk_migration_t *migration)

Get a edge by its index.

Copies a migration from this tree sequence to the specified destination.

Parameters
Returns

Return 0 on success or a negative value on failure.

int tsk_treeseq_get_site(const tsk_treeseq_t *self, tsk_id_t index, tsk_site_t *site)

Get a site by its index.

Copies a site from this tree sequence to the specified destination.

Parameters
  • self – A pointer to a tsk_treeseq_t object.

  • index – The site index to copy

  • site – A pointer to a tsk_site_t object.

Returns

Return 0 on success or a negative value on failure.

int tsk_treeseq_get_mutation(const tsk_treeseq_t *self, tsk_id_t index, tsk_mutation_t *mutation)

Get a mutation by its index.

Copies a mutation from this tree sequence to the specified destination.

Parameters
  • self – A pointer to a tsk_treeseq_t object.

  • index – The mutation index to copy

  • mutation – A pointer to a tsk_mutation_t object.

Returns

Return 0 on success or a negative value on failure.

int tsk_treeseq_get_provenance(const tsk_treeseq_t *self, tsk_id_t index, tsk_provenance_t *provenance)

Get a provenance by its index.

Copies a provenance from this tree sequence to the specified destination.

Parameters
Returns

Return 0 on success or a negative value on failure.

int tsk_treeseq_get_population(const tsk_treeseq_t *self, tsk_id_t index, tsk_population_t *population)

Get a population by its index.

Copies a population from this tree sequence to the specified destination.

Parameters
Returns

Return 0 on success or a negative value on failure.

int tsk_treeseq_get_individual(const tsk_treeseq_t *self, tsk_id_t index, tsk_individual_t *individual)

Get a individual by its index.

Copies a individual from this tree sequence to the specified destination.

Parameters
Returns

Return 0 on success or a negative value on failure.

int tsk_treeseq_simplify(const tsk_treeseq_t *self, const tsk_id_t *samples, tsk_size_t num_samples, tsk_flags_t options, tsk_treeseq_t *output, tsk_id_t *node_map)

Create a simplified instance of this tree sequence.

Copies this tree sequence to the specified destination and performs simplification. The destination tree sequence should be uninitialised. Simplification transforms the tables to remove redundancy and canonicalise tree sequence data. See the simplification tutorial for more details.

For full details and flags see tsk_table_collection_simplify() which performs the same operation in place.

Parameters
  • self – A pointer to a uninitialised tsk_treeseq_t object.

  • samples – Either NULL or an array of num_samples distinct and valid node IDs. If non-null the nodes in this array will be marked as samples in the output. If NULL, the num_samples parameter is ignored and the samples in the output will be the same as the samples in the input. This is equivalent to populating the samples array with all of the sample nodes in the input in increasing order of ID.

  • num_samples – The number of node IDs in the input samples array. Ignored if the samples array is NULL.

  • options – Simplify options; see above for the available bitwise flags. For the default behaviour, a value of 0 should be provided.

  • output – A pointer to an uninitialised tsk_treeseq_t object.

  • node_map – If not NULL, this array will be filled to define the mapping between nodes IDs in the table collection before and after simplification.

Returns

Return 0 on success or a negative value on failure.

Trees

struct tsk_tree_t

A single tree in a tree sequence.

A tsk_tree_t object has two basic functions:

  1. Represent the state of a single tree in a tree sequence;

  2. Provide methods to transform this state into different trees in the sequence.

The state of a single tree in the tree sequence is represented using the quintuply linked encoding: please see the data model section for details on how this works. The left-to-right ordering of nodes in this encoding is arbitrary, and may change depending on the order in which trees are accessed within the sequence. Please see the Tree traversals examples for recommended usage.

On initialisation, a tree is in the null state and we must call one of the seeking methods to make the state of the tree object correspond to a particular tree in the sequence. Please see the Tree iteration examples for recommended usage.

Public Members

const tsk_treeseq_t *tree_sequence

The parent tree sequence.

tsk_id_t virtual_root

The ID of the “virtual root” whose children are the roots of the tree.

tsk_id_t *parent

The parent of node u is parent[u]. Equal to TSK_NULL if node u is a root or is not a node in the current tree.

tsk_id_t *left_child

The leftmost child of node u is left_child[u]. Equal to TSK_NULL if node u is a leaf or is not a node in the current tree.

tsk_id_t *right_child

The rightmost child of node u is right_child[u]. Equal to TSK_NULL if node u is a leaf or is not a node in the current tree.

tsk_id_t *left_sib

The sibling to the left of node u is left_sib[u]. Equal to TSK_NULL if node u has no siblings to its left.

tsk_id_t *right_sib

The sibling to the right of node u is right_sib[u]. Equal to TSK_NULL if node u has no siblings to its right.

tsk_id_t *num_children

The number of children of node u is num_children[u].

tsk_id_t *edge

Array of edge ids where edge[u] is the edge that encodes the relationship between the child node u and its parent. Equal to TSK_NULL if node u is a root, virtual root or is not a node in the current tree.

tsk_size_t num_edges

The total number of edges defining the topology of this tree. This is equal to the number of tree sequence edges that intersect with the tree’s genomic interval.

struct tsk_tree_t.[anonymous] interval

Left and right coordinates of the genomic interval that this tree covers. The left coordinate is inclusive and the right coordinate exclusive.

Example:

tsk_tree_t tree;
int ret;
// initialise etc
ret = tsk_tree_first(&tree);
// Check for error
assert(ret == TSK_TREE_OK);
printf("Coordinates covered by first tree are left=%f, right=%f\n",
    tree.interval.left, tree.interval.right);

tsk_id_t index

The index of this tree in the tree sequence.

This attribute provides the zero-based index of the tree represented by the current state of the struct within the parent tree sequence. For example, immediately after we call tsk_tree_first(&tree), tree.index will be zero, and after we call tsk_tree_last(&tree), tree.index will be the number of trees - 1 (see tsk_treeseq_get_num_trees()) When the tree is in the null state (immediately after initialisation, or after, e.g., calling tsk_tree_prev() on the first tree) the value of the index is -1.

Lifecycle

int tsk_tree_init(tsk_tree_t *self, const tsk_treeseq_t *tree_sequence, tsk_flags_t options)

Initialises the tree by allocating internal memory and associating with the specified tree sequence.

This must be called before any operations are performed on the tree.

The specified tree sequence object must be initialised, and must be valid for the full lifetime of this tree.

See the API structure for details on how objects are initialised and freed.

The options parameter is provided to support future expansions of the API. A number of undocumented internal features are controlled via this parameter, and it must be set to 0 to ensure that operations work as expected and for compatibility with future versions of tskit.

Parameters
  • self – A pointer to an uninitialised tsk_tree_t object.

  • tree_sequence – A pointer to an initialised tsk_treeseq_t object.

  • options – Allocation time options. Must be 0, or behaviour is undefined.

Returns

Return 0 on success or a negative value on failure.

int tsk_tree_free(tsk_tree_t *self)

Free the internal memory for the specified tree.

Parameters
  • self – A pointer to an initialised tsk_tree_t object.

Returns

Always returns 0.

int tsk_tree_copy(const tsk_tree_t *self, tsk_tree_t *dest, tsk_flags_t options)

Copies the state of this tree into the specified destination.

By default (options = 0) the method initialises the specified destination tree by calling tsk_tree_init(). If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory. If TSK_NO_INIT is supplied and the tree sequence associated with the dest tree is not equal to the tree sequence associated with self, an error is raised.

The destination tree will keep a reference to the tree sequence object associated with the source tree, and this tree sequence must be valid for the full lifetime of the destination tree.

Options

If TSK_NO_INIT is not specified, options for tsk_tree_init() can be provided and will be passed on.

Parameters
  • self – A pointer to an initialised tsk_tree_t object.

  • dest – A pointer to a tsk_tree_t object. If the TSK_NO_INIT option is specified, this must be an initialised tree. If not, it must be an uninitialised tree.

  • options – Copy and allocation time options. See the notes above for details.

Returns

Return 0 on success or a negative value on failure.

Null state

Trees are initially in a “null state” where each sample is a root and there are no branches. The index of a tree in the null state is -1.

We must call one of the seeking methods to make the state of the tree object correspond to a particular tree in the sequence.

Seeking

When we are examining many trees along a tree sequence, we usually allocate a single tsk_tree_t object and update its state. This allows us to efficiently transform the state of a tree into nearby trees, using the underlying succinct tree sequence data structure.

The simplest example to visit trees left-to-right along the genome:

 1int
 2visit_trees(const tsk_treeseq_t *ts)
 3{
 4    tsk_tree_t tree;
 5    int ret;
 6
 7    ret = tsk_tree_init(&tree, &ts, 0);
 8    if (ret != 0) {
 9        goto out;
10    }
11    for (ret = tsk_tree_first(&tree); ret == TSK_TREE_OK; ret = tsk_tree_next(&tree)) {
12        printf("\ttree %lld covers interval left=%f right=%f\n",
13            (long long) tree.index, tree.interval.left, tree.interval.right);
14    }
15    if (ret != 0) {
16        goto out;
17    }
18    // Do other things in the function...
19out:
20    tsk_tree_free(&tree);
21    return ret;
22}

In this example we first initialise a tsk_tree_t object, associating it with the input tree sequence. We then iterate over the trees along the sequence using a for loop, with the ret variable controlling iteration. The usage of ret here follows a slightly different pattern to other functions in the tskit C API (see the Error handling section). The interaction between error handling and states of the tree object here is somewhat subtle, and is worth explaining in detail.

After successful initialisation (after line 10), the tree is in the null state where all samples are roots. The for loop begins by calling tsk_tree_first() which transforms the state of the tree into the first (leftmost) tree in the sequence. If this operation is successful, tsk_tree_first() returns TSK_TREE_OK. We then check the value of ret in the loop condition to see if it is equal to TSK_TREE_OK and execute the loop body for the first tree in the sequence.

On completing the loop body for the first tree in the sequence, we then execute the for loop increment operation, which calls tsk_tree_next() and assigns the returned value to ret. This function efficiently transforms the current state of tree so that it represents the next tree along the genome, and returns TSK_TREE_OK if the operation succeeds. When tsk_tree_next() is called on the last tree in the sequence, the state of tree is set back to the null state and the return value is 0.

Thus, the loop on lines 11-14 can exit in two ways:

  1. Either we successfully iterate over all trees in the sequence and ret has the value 0 at line 15; or

  2. An error occurs during tsk_tree_first() or tsk_tree_next(), and ret contains a negative value.

Warning

It is vital that you check the value of ret immediately after the loop exits like we do here at line 15, or errors can be silently lost. (Although it’s redundant here, as we don’t do anything else in the function.)

See also

See the examples section for more examples of sequential seeking, including an example of using use tsk_tree_last() and tsk_tree_prev() to iterate from right-to-left.

Note

Seeking functions tsk_tree_first(), tsk_tree_last(), tsk_tree_next() tsk_tree_prev(), and tsk_tree_seek() can be called in any order and from any non-error state.

int tsk_tree_first(tsk_tree_t *self)

Seek to the first tree in the sequence.

Set the state of this tree to reflect the first tree in parent tree sequence.

Parameters
  • self – A pointer to an initialised tsk_tree_t object.

Returns

Return TSK_TREE_OK on success; or a negative value if an error occurs.

int tsk_tree_last(tsk_tree_t *self)

Seek to the last tree in the sequence.

Set the state of this tree to reflect the last tree in parent tree sequence.

Parameters
  • self – A pointer to an initialised tsk_tree_t object.

Returns

Return TSK_TREE_OK on success; or a negative value if an error occurs.

int tsk_tree_next(tsk_tree_t *self)

Seek to the next tree in the sequence.

Set the state of this tree to reflect the next tree in parent tree sequence. If the index of the current tree is j, then the after this operation the index will be j + 1.

Calling tsk_tree_next() a tree in the null state is equivalent to calling tsk_tree_first().

Calling tsk_tree_next() on the last tree in the sequence will transform it into the null state (equivalent to calling tsk_tree_clear()).

Please see the Tree iteration examples for recommended usage.

Parameters
  • self – A pointer to an initialised tsk_tree_t object.

Returns

Return TSK_TREE_OK on successfully transforming to a non-null tree; 0 on successfully transforming into the null tree; or a negative value if an error occurs.

int tsk_tree_prev(tsk_tree_t *self)

Seek to the previous tree in the sequence.

Set the state of this tree to reflect the previous tree in parent tree sequence. If the index of the current tree is j, then the after this operation the index will be j - 1.

Calling tsk_tree_prev() a tree in the null state is equivalent to calling tsk_tree_last().

Calling tsk_tree_prev() on the first tree in the sequence will transform it into the null state (equivalent to calling tsk_tree_clear()).

Please see the Tree iteration examples for recommended usage.

Parameters
  • self – A pointer to an initialised tsk_tree_t object.

Returns

Return TSK_TREE_OK on successfully transforming to a non-null tree; 0 on successfully transforming into the null tree; or a negative value if an error occurs.

int tsk_tree_clear(tsk_tree_t *self)

Set the tree into the null state.

Transform this tree into the null state.

Parameters
  • self – A pointer to an initialised tsk_tree_t object.

Returns

Return 0 on success or a negative value on failure.

int tsk_tree_seek(tsk_tree_t *self, double position, tsk_flags_t options)

Seek to a particular position on the genome.

Set the state of this tree to reflect the tree in parent tree sequence covering the specified position. That is, on success we will have tree.interval.left <= position and we will have position < tree.interval.right.

Seeking to a position currently covered by the tree is a constant time operation.

Warning

The current implementation of seek does not provide efficient random access to arbitrary positions along the genome. However, sequentially seeking in either direction is as efficient as calling tsk_tree_next() or tsk_tree_prev() directly.

Parameters
  • self – A pointer to an initialised tsk_tree_t object.

  • position – The position in genome coordinates

  • options – Seek options. Currently unused. Set to 0 for compatibility with future versions of tskit.

Returns

Return 0 on success or a negative value on failure.

TSK_TREE_OK 1

Value returned by seeking methods when they have successfully seeked to a non-null tree.

Tree queries

tsk_size_t tsk_tree_get_num_roots(const tsk_tree_t *self)

Returns the number of roots in this tree.

See the Roots section for more information on how the roots of a tree are defined.

Parameters
  • self – A pointer to an initialised tsk_tree_t object.

Returns

Returns the number roots in this tree.

tsk_id_t tsk_tree_get_left_root(const tsk_tree_t *self)

Returns the leftmost root in this tree.

See the Roots section for more information on how the roots of a tree are defined.

This function is equivalent to tree.left_child[tree.virtual_root].

Parameters
  • self – A pointer to an initialised tsk_tree_t object.

Returns

Returns the leftmost root in the tree.

tsk_id_t tsk_tree_get_right_root(const tsk_tree_t *self)

Returns the rightmost root in this tree.

See the Roots section for more information on how the roots of a tree are defined.

This function is equivalent to tree.right_child[tree.virtual_root].

Parameters
  • self – A pointer to an initialised tsk_tree_t object.

Returns

Returns the rightmost root in the tree.

int tsk_tree_get_sites(const tsk_tree_t *self, const tsk_site_t **sites, tsk_size_t *sites_length)

Get the list of sites for this tree.

Gets the list of tsk_site_t objects in the parent tree sequence for which the position lies within this tree’s genomic interval.

The memory pointed to by the sites parameter is managed by the tsk_tree_t object and must not be altered or freed by client code.

static void
print_sites(const tsk_tree_t *tree)
{
    int ret;
    tsk_size_t j, num_sites;
    const tsk_site_t *sites;

    ret = tsk_tree_get_sites(tree, &sites, &num_sites);
    check_tsk_error(ret);
    for (j = 0; j < num_sites; j++) {
        printf("position = %f\n", sites[j].position);
    }
}

This is a constant time operation.

Parameters
  • self – A pointer to a tsk_tree_t object.

  • sites – The destination pointer for the list of sites.

  • sites_length – A pointer to a tsk_size_t value in which the number of sites is stored.

Returns

0 on success or a negative value on failure.

tsk_size_t tsk_tree_get_size_bound(const tsk_tree_t *self)

Return an upper bound on the number of nodes reachable from the roots of this tree.

This function provides an upper bound on the number of nodes that can be reached in tree traversals, and is intended to be used for memory allocation purposes. If num_nodes is the number of nodes visited in a tree traversal from the virtual root (e.g., tsk_tree_preorder_from(tree, tree->virtual_root, nodes, &num_nodes)), the bound N returned here is guaranteed to be greater than or equal to num_nodes.

Warning

The precise value returned is not defined and should not be depended on, as it may change from version-to-version.

Parameters
Returns

An upper bound on the number nodes reachable from the roots of this tree, or zero if this tree has not been initialised.

void tsk_tree_print_state(const tsk_tree_t *self, FILE *out)

Print out the state of this tree to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters
  • self – A pointer to a tsk_tree_t object.

  • out – The stream to write the summary to.

Node queries

int tsk_tree_get_parent(const tsk_tree_t *self, tsk_id_t u, tsk_id_t *parent)

Returns the parent of the specified node.

Equivalent to tree.parent[u] with bounds checking for the node u. Performance sensitive code which can guarantee that the node u is valid should use the direct array access in preference to this method.

Parameters
  • self – A pointer to a tsk_tree_t object.

  • u – The tree node.

  • parent – A tsk_id_t pointer to store the returned parent node.

Returns

0 on success or a negative value on failure.

int tsk_tree_get_time(const tsk_tree_t *self, tsk_id_t u, double *ret_time)

Returns the time of the specified node.

Equivalent to tables->nodes.time[u] with bounds checking for the node u. Performance sensitive code which can guarantee that the node u is valid should use the direct array access in preference to this method, for example:

static void
print_times(const tsk_tree_t *tree)
{
    int ret;
    tsk_size_t num_nodes, j;
    const double *node_time = tree->tree_sequence->tables->nodes.time;
    tsk_id_t *nodes = malloc(tsk_tree_get_size_bound(tree) * sizeof(*nodes));

    if (nodes == NULL) {
        errx(EXIT_FAILURE, "Out of memory");
    }
    ret = tsk_tree_preorder(tree, nodes, &num_nodes);
    check_tsk_error(ret);
    for (j = 0; j < num_nodes; j++) {
        printf("time = %f\n", node_time[nodes[j]]);
    }
    free(nodes);
}

Parameters
  • self – A pointer to a tsk_tree_t object.

  • u – The tree node.

  • ret_time – A double pointer to store the returned node time.

Returns

0 on success or a negative value on failure.

int tsk_tree_get_depth(const tsk_tree_t *self, tsk_id_t u, int *ret_depth)

Return number of nodes on the path from the specified node to root.

Return the number of nodes on the path from u to root, not including u. The depth of a root is therefore zero.

As a special case, the depth of the virtual root is defined as -1.

Parameters
  • self – A pointer to a tsk_tree_t object.

  • u – The tree node.

  • ret_depth – An int pointer to store the returned node depth.

Returns

0 on success or a negative value on failure.

int tsk_tree_get_branch_length(const tsk_tree_t *self, tsk_id_t u, double *ret_branch_length)

Return the length of the branch ancestral to the specified node.

Return the length of the branch ancestral to the specified node. Branch length is defined as difference between the time of a node and its parent. The branch length of a root is zero.

Parameters
  • self – A pointer to a tsk_tree_t object.

  • u – The tree node.

  • ret_branch_length – A double pointer to store the returned branch length.

Returns

0 on success or a negative value on failure.

int tsk_tree_get_total_branch_length(const tsk_tree_t *self, tsk_id_t u, double *ret_tbl)

Computes the sum of the lengths of all branches reachable from the specified node, or from all roots if u=TSK_NULL.

Return the total branch length in a particular subtree or of the entire tree. If the specified node is TSK_NULL (or the virtual root) the sum of the lengths of all branches reachable from roots is returned. Branch length is defined as difference between the time of a node and its parent. The branch length of a root is zero.

Note that if the specified node is internal its branch length is not included, so that, e.g., the total branch length of a leaf node is zero.

Parameters
  • self – A pointer to a tsk_tree_t object.

  • u – The root of the subtree of interest, or TSK_NULL to return the total branch length of the tree.

  • ret_tbl – A double pointer to store the returned total branch length.

Returns

0 on success or a negative value on failure.

int tsk_tree_get_num_samples(const tsk_tree_t *self, tsk_id_t u, tsk_size_t *ret_num_samples)

Counts the number of samples in the subtree rooted at a node.

Returns the number of samples descending from a particular node, including the node itself.

This is a constant time operation.

Parameters
  • self – A pointer to a tsk_tree_t object.

  • u – The tree node.

  • ret_num_samples – A tsk_size_t pointer to store the returned number of samples.

Returns

0 on success or a negative value on failure.

int tsk_tree_get_mrca(const tsk_tree_t *self, tsk_id_t u, tsk_id_t v, tsk_id_t *mrca)

Compute the most recent common ancestor of two nodes.

If two nodes do not share a common ancestor in the current tree, the MRCA node is TSK_NULL.

Parameters
  • self – A pointer to a tsk_tree_t object.

  • u – A tree node.

  • v – A tree node.

  • mrca – A tsk_id_t pointer to store the returned most recent common ancestor node.

Returns

0 on success or a negative value on failure.

bool tsk_tree_is_descendant(const tsk_tree_t *self, tsk_id_t u, tsk_id_t v)

Returns true if u is a descendant of v.

Returns true if u and v are both valid nodes in the tree sequence and v lies on the path from u to root, and false otherwise.

Any node is a descendant of itself.

Parameters
  • self – A pointer to a tsk_tree_t object.

  • u – The descendant node.

  • v – The ancestral node.

Returns

true if u is a descendant of v, and false otherwise.

Traversal orders

int tsk_tree_preorder(const tsk_tree_t *self, tsk_id_t *nodes, tsk_size_t *num_nodes)

Fill an array with the nodes of this tree in preorder.

Populate an array with the nodes in this tree in preorder. The array must be pre-allocated and be sufficiently large to hold the array of nodes visited. The recommended approach is to use the tsk_tree_get_size_bound() function, as in the following example:

static void
print_preorder(tsk_tree_t *tree)
{
    int ret;
    tsk_size_t num_nodes, j;
    tsk_id_t *nodes = malloc(tsk_tree_get_size_bound(tree) * sizeof(*nodes));

    if (nodes == NULL) {
        errx(EXIT_FAILURE, "Out of memory");
    }
    ret = tsk_tree_preorder(tree, nodes, &num_nodes);
    check_tsk_error(ret);
    for (j = 0; j < num_nodes; j++) {
        printf("Visit preorder %lld\n", (long long) nodes[j]);
    }
    free(nodes);
}

See also

See the Tree traversals section for more examples.

Parameters
  • self – A pointer to a tsk_tree_t object.

  • nodes – The tsk_id_t array to store nodes in. See notes above for details.

  • num_nodes – A pointer to a tsk_size_t value where we store the number of nodes in the traversal.

Returns

0 on success or a negative value on failure.

int tsk_tree_preorder_from(const tsk_tree_t *self, tsk_id_t root, tsk_id_t *nodes, tsk_size_t *num_nodes)

Fill an array with the nodes of this tree starting from a particular node.

As for tsk_tree_preorder() but starting the traversal at a particular node (which will be the first node in the traversal list). The virtual root is a valid input for this function and will be treated like any other tree node. The value -1 is a special case, in which we visit all nodes reachable from the roots, and equivalent to calling tsk_tree_preorder().

See tsk_tree_preorder() for details the requirements for the nodes array.

Parameters
  • self – A pointer to a tsk_tree_t object.

  • root – The root of the subtree to traverse, or -1 to visit all nodes.

  • nodes – The tsk_id_t array to store nodes in.

  • num_nodes – A pointer to a tsk_size_t value where we store the number of nodes in the traversal.

Returns

0 on success or a negative value on failure.

int tsk_tree_postorder(const tsk_tree_t *self, tsk_id_t *nodes, tsk_size_t *num_nodes)

Fill an array with the nodes of this tree in postorder.

Populate an array with the nodes in this tree in postorder. The array must be pre-allocated and be sufficiently large to hold the array of nodes visited. The recommended approach is to use the tsk_tree_get_size_bound() function, as in the following example:

static void
print_postorder(tsk_tree_t *tree)
{
    int ret;
    tsk_size_t num_nodes, j;
    tsk_id_t *nodes = malloc(tsk_tree_get_size_bound(tree) * sizeof(*nodes));

    if (nodes == NULL) {
        errx(EXIT_FAILURE, "Out of memory");
    }
    ret = tsk_tree_postorder(tree, nodes, &num_nodes);
    check_tsk_error(ret);
    for (j = 0; j < num_nodes; j++) {
        printf("Visit postorder %lld\n", (long long) nodes[j]);
    }
    free(nodes);
}

See also

See the Tree traversals section for more examples.

Parameters
  • self – A pointer to a tsk_tree_t object.

  • nodes – The tsk_id_t array to store nodes in. See notes above for details.

  • num_nodes – A pointer to a tsk_size_t value where we store the number of nodes in the traversal.

Returns

0 on success or a negative value on failure.

int tsk_tree_postorder_from(const tsk_tree_t *self, tsk_id_t root, tsk_id_t *nodes, tsk_size_t *num_nodes)

Fill an array with the nodes of this tree starting from a particular node.

As for tsk_tree_postorder() but starting the traversal at a particular node (which will be the last node in the traversal list). The virtual root is a valid input for this function and will be treated like any other tree node. The value -1 is a special case, in which we visit all nodes reachable from the roots, and equivalent to calling tsk_tree_postorder().

See tsk_tree_postorder() for details the requirements for the nodes array.

Parameters
  • self – A pointer to a tsk_tree_t object.

  • root – The root of the subtree to traverse, or -1 to visit all nodes.

  • nodes – The tsk_id_t array to store nodes in. See :c:func:tsk_tree_postorder for more details.

  • num_nodes – A pointer to a tsk_size_t value where we store the number of nodes in the traversal.

Returns

0 on success or a negative value on failure.

Low-level sorting

In some highly performance sensitive cases it can be useful to have more control over the process of sorting tables. This low-level API allows a user to provide their own edge sorting function. This can be useful, for example, to use parallel sorting algorithms, or to take advantage of the more efficient sorting procedures available in C++. It is the user’s responsibility to ensure that the edge sorting requirements are fulfilled by this function.

Todo

Create an idiomatic C++11 example where we load a table collection file from argv[1], and sort the edges using std::sort, based on the example in tests/test_minimal_cpp.cpp. We can include this in the examples below, and link to it here.

struct _tsk_table_sorter_t

Low-level table sorting method.

Public Members

tsk_table_collection_t *tables

The input tables that are being sorted.

int (*sort_edges)(struct _tsk_table_sorter_t *self, tsk_size_t start)

The edge sorting function. If set to NULL, edges are not sorted.

int (*sort_mutations)(struct _tsk_table_sorter_t *self)

The mutation sorting function.

int (*sort_individuals)(struct _tsk_table_sorter_t *self)

The individual sorting function.

void *user_data

An opaque pointer for use by client code.

tsk_id_t *site_id_map

Mapping from input site IDs to output site IDs.

int tsk_table_sorter_init(struct _tsk_table_sorter_t *self, tsk_table_collection_t *tables, tsk_flags_t options)

Initialises the memory for the sorter object.

This must be called before any operations are performed on the table sorter and initialises all fields. The edge_sort function is set to the default method using qsort. The user_data field is set to NULL. This method supports the same options as tsk_table_collection_sort().

Parameters
  • self – A pointer to an uninitialised tsk_table_sorter_t object.

  • tables – The table collection to sort.

  • options – Sorting options.

Returns

Return 0 on success or a negative value on failure.

int tsk_table_sorter_run(struct _tsk_table_sorter_t *self, const tsk_bookmark_t *start)

Runs the sort using the configured functions.

Runs the sorting process:

  1. Drop the table indexes.

  2. If the sort_edges function pointer is not NULL, run it. The first parameter to the called function will be a pointer to this table_sorter_t object. The second parameter will be the value start.edges. This specifies the offset at which sorting should start in the edge table. This offset is guaranteed to be within the bounds of the edge table.

  3. Sort the site table, building the mapping between site IDs in the current and sorted tables.

  4. Sort the mutation table, using the sort_mutations pointer.

If an error occurs during the execution of a user-supplied sorting function a non-zero value must be returned. This value will then be returned by tsk_table_sorter_run. The error return value should be chosen to avoid conflicts with tskit error codes.

See tsk_table_collection_sort() for details on the start parameter.

Parameters
  • self – A pointer to a tsk_table_sorter_t object.

  • start – The position in the tables at which sorting starts.

Returns

Return 0 on success or a negative value on failure.

int tsk_table_sorter_free(struct _tsk_table_sorter_t *self)

Free the internal memory for the specified table sorter.

Parameters
  • self – A pointer to an initialised tsk_table_sorter_t object.

Returns

Always returns 0.

Decoding genotypes

Obtaining genotypes for samples at specific sites is achieved via tsk_variant_t and its methods.

struct tsk_variant_t

A variant at a specific site.

Used to generate the genotypes for a given set of samples at a given site.

Public Members

const tsk_treeseq_t *tree_sequence

Unowned reference to the tree sequence of the variant.

tsk_site_t site

The site this variant is currently decoded at.

const char **alleles

Array of allele strings that the genotypes of the variant refer to These are not NULL terminated - use allele_lengths for example:. printf("%.*s", (int) var->allele_lengths[j], var->alleles[j]);

tsk_size_t *allele_lengths

Lengths of the allele strings.

tsk_size_t num_alleles

Length of the allele array.

bool has_missing_data

If True the genotypes of isolated nodes have been decoded to the “missing” genotype. If False they are set to the ancestral state (in the absence of mutations above them)

int32_t *genotypes

Array of genotypes for the current site.

tsk_size_t num_samples

Number of samples.

tsk_id_t *samples

Array of sample ids used.

int tsk_variant_init(tsk_variant_t *self, const tsk_treeseq_t *tree_sequence, const tsk_id_t *samples, tsk_size_t num_samples, const char **alleles, tsk_flags_t options)

Initialises the variant by allocating the internal memory.

This must be called before any operations are performed on the variant. See the API structure for details on how objects are initialised and freed.

Parameters
  • self – A pointer to an uninitialised tsk_variant_t object.

  • tree_sequence – A pointer to the tree sequence from which this variant will decode genotypes. No copy is taken, so this tree sequence must persist for the lifetime of the variant.

  • samples – Optional. Either NULL or an array of node ids of the samples that are to have their genotypes decoded. A copy of this array will be taken by the variant. If NULL then the samples from the tree sequence will be used.

  • num_samples – The number of ids in the samples array, ignored if samples is NULL

  • alleles – Optional. Either NULL or an array of string alleles with a terminal NULL sentinel value. If specified, the genotypes will be decoded to match the index in this allele array. If NULL then alleles will be automatically determined from the mutations encountered.

  • options – Variant options. Either 0 or TSK_ISOLATED_NOT_MISSING which if specified indicates that isolated sample nodes should not be decoded as the “missing” state but as the ancestral state (or the state of any mutation above them).

Returns

Return 0 on success or a negative value on failure.

int tsk_variant_restricted_copy(const tsk_variant_t *self, tsk_variant_t *other)

Copies the state of this variant to another variant.

Copies the site, genotypes and alleles from this variant to another. Note that the other variant should be uninitialised as this method does not free any memory that the other variant owns. After copying other is frozen and this restricts it from being further decoded at any site. self remains unchanged.

Parameters
  • self – A pointer to an initialised and decoded tsk_variant_t object.

  • other – A pointer to an uninitialised tsk_variant_t object.

Returns

Return 0 on success or a negative value on failure.

int tsk_variant_decode(tsk_variant_t *self, tsk_id_t site_id, tsk_flags_t options)

Decode the genotypes at the given site, storing them in this variant.

Decodes the genotypes for this variant’s samples, indexed to this variant’s alleles, at the specified site. This method is most efficient at decoding sites in-order, either forwards or backwards along the tree sequence. Resulting genotypes are stored in the genotypes member of this variant.

Parameters
  • self – A pointer to an initialised tsk_variant_t object.

  • site_id – A valid site id for the tree sequence of this variant.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns

Return 0 on success or a negative value on failure.

int tsk_variant_free(tsk_variant_t *self)

Free the internal memory for the specified variant.

Parameters
Returns

Always returns 0.

void tsk_variant_print_state(const tsk_variant_t *self, FILE *out)

Print out the state of this variant to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters
  • self – A pointer to a tsk_variant_t object.

  • out – The stream to write the summary to.

Miscellaneous functions

const char *tsk_strerror(int err)

Return a description of the specified error.

The memory for the returned string is handled by the library and should not be freed by client code.

Parameters
  • err – A tskit error code.

Returns

A description of the error.

bool tsk_is_unknown_time(double val)

Check if a number is TSK_UNKNOWN_TIME

Unknown time values in tskit are represented by a particular NaN value. Since NaN values are not equal to each other by definition, a simple comparison like mutation.time == TSK_UNKNOWN_TIME will fail even if the mutation’s time is TSK_UNKNOWN_TIME. This function compares the underlying bit representation of a double value and returns true iff it is equal to the specific NaN value TSK_UNKNOWN_TIME.

Parameters
  • val – The number to check

Returns

true if the number is TSK_UNKNOWN_TIME else false

Function Specific Options

Load and init

TSK_LOAD_SKIP_TABLES (1 << 0)

Skip reading tables, and only load top-level information.

TSK_LOAD_SKIP_REFERENCE_SEQUENCE (1 << 1)

Do not load reference sequence.

TSK_TABLE_NO_METADATA (1 << 2)

Do not allocate space to store metadata in this table. Operations attempting to add non-empty metadata to the table will fail with error TSK_ERR_METADATA_DISABLED.

TSK_TC_NO_EDGE_METADATA (1 << 3)

Do not allocate space to store metadata in the edge table. Operations attempting to add non-empty metadata to the edge table will fail with error TSK_ERR_METADATA_DISABLED.

tsk_treeseq_init()

TSK_TS_INIT_BUILD_INDEXES (1 << 0)

If specified edge indexes will be built and stored in the table collection when the tree sequence is initialised. Indexes are required for a valid tree sequence, and are not built by default for performance reasons.

tsk_treeseq_simplify(), tsk_table_collection_simplify()

TSK_SIMPLIFY_FILTER_SITES (1 << 0)

Remove sites from the output if there are no mutations that reference them.

TSK_SIMPLIFY_FILTER_POPULATIONS (1 << 1)

Remove populations from the output if there are no nodes or migrations that reference them.

TSK_SIMPLIFY_FILTER_INDIVIDUALS (1 << 2)

Remove individuals from the output if there are no nodes that reference them.

TSK_SIMPLIFY_REDUCE_TO_SITE_TOPOLOGY (1 << 3)

Reduce the topological information in the tables to the minimum necessary to represent the trees that contain sites. If there are zero sites this will result in an zero output edges. When the number of sites is greater than zero, every tree in the output tree sequence will contain at least one site. For a given site, the topology of the tree containing that site will be identical (up to node ID remapping) to the topology of the corresponding tree in the input.

TSK_SIMPLIFY_KEEP_UNARY (1 << 4)

By default simplify removes unary nodes (i.e., nodes with exactly one child) along the path from samples to root. If this option is specified such unary nodes will be preserved in the output.

TSK_SIMPLIFY_KEEP_INPUT_ROOTS (1 << 5)

By default simplify removes all topology ancestral the MRCAs of the samples. This option inserts edges from these MRCAs back to the roots of the input trees.

TSK_SIMPLIFY_KEEP_UNARY_IN_INDIVIDUALS (1 << 6)

This acts like TSK_SIMPLIFY_KEEP_UNARY (and is mutually exclusive with that flag). It keeps unary nodes, but only if the unary node is referenced from an individual.

tsk_table_collection_check_integrity()

TSK_CHECK_EDGE_ORDERING (1 << 0)

Check edge ordering constraints for a tree sequence.

TSK_CHECK_SITE_ORDERING (1 << 1)

Check that sites are in non-decreasing position order.

TSK_CHECK_SITE_DUPLICATES (1 << 2)

Check for any duplicate site positions.

TSK_CHECK_MUTATION_ORDERING (1 << 3)

Check constraints on the ordering of mutations. Any non-null mutation parents and known times are checked for ordering constraints.

TSK_CHECK_INDIVIDUAL_ORDERING (1 << 4)

Check individual parents are before children, where specified.

TSK_CHECK_MIGRATION_ORDERING (1 << 5)

Check migrations are ordered by time.

TSK_CHECK_INDEXES (1 << 6)

Check that the table indexes exist, and contain valid edge references.

TSK_CHECK_TREES (1 << 7)

All checks needed to define a valid tree sequence. Note that this implies all of the above checks.

TSK_NO_CHECK_POPULATION_REFS (1 << 12)

Do not check integrity of references to populations. This can be safely combined with the other checks.

tsk_table_collection_clear()

TSK_CLEAR_METADATA_SCHEMAS (1 << 0)

Additionally clear the table metadata schemas

TSK_CLEAR_TS_METADATA_AND_SCHEMA (1 << 1)

Additionally clear the tree-sequence metadata and schema

TSK_CLEAR_PROVENANCE (1 << 2)

Additionally clear the provenance table

tsk_table_collection_copy()

TSK_COPY_FILE_UUID (1 << 0)

Copy the file uuid, by default this is not copied.

All equality functions

TSK_CMP_IGNORE_TS_METADATA (1 << 0)

Do not include the top-level tree sequence metadata and metadata schemas in the comparison.

TSK_CMP_IGNORE_PROVENANCE (1 << 1)

Do not include the provenance table in comparison.

TSK_CMP_IGNORE_METADATA (1 << 2)

Do not include metadata when comparing the table collections. This includes both the top-level tree sequence metadata as well as the metadata for each of the tables (i.e, TSK_CMP_IGNORE_TS_METADATA is implied). All metadata schemas are also ignored.

TSK_CMP_IGNORE_TIMESTAMPS (1 << 3)

Do not include the timestamp information when comparing the provenance tables. This has no effect if TSK_CMP_IGNORE_PROVENANCE is specified.

TSK_CMP_IGNORE_TABLES (1 << 4)

Do not include any tables in the comparison, thus comparing only the top-level information of the table collections being compared.

TSK_CMP_IGNORE_REFERENCE_SEQUENCE (1 << 5)

Do not include the reference sequence in the comparison.

tsk_table_collection_subset()

TSK_SUBSET_NO_CHANGE_POPULATIONS (1 << 0)

If this flag is provided, the population table will not be changed in any way.

TSK_SUBSET_KEEP_UNREFERENCED (1 << 1)

If this flag is provided, then unreferenced sites, individuals, and populations will not be removed. If so, the site and individual tables will not be changed, and (unless TSK_SUBSET_NO_CHANGE_POPULATIONS is also provided) unreferenced populations will be placed last, in their original order.

tsk_table_collection_union()

TSK_UNION_NO_CHECK_SHARED (1 << 0)

By default, union checks that the portion of shared history between self and other, as implied by other_node_mapping, are indeed equivalent. It does so by subsetting both self and other on the equivalent nodes specified in other_node_mapping, and then checking for equality of the subsets.

TSK_UNION_NO_ADD_POP (1 << 1)

By default, all nodes new to self are assigned new populations. If this option is specified, nodes that are added to self will retain the population IDs they have in other.

Constants

API Version

TSK_VERSION_MAJOR 1

The library major version. Incremented when breaking changes to the API or ABI are introduced. This includes any changes to the signatures of functions and the sizes and types of externally visible structs.

TSK_VERSION_MINOR 0

The library minor version. Incremented when non-breaking backward-compatible changes to the API or ABI are introduced, i.e., the addition of a new function.

TSK_VERSION_PATCH 0

The library patch version. Incremented when any changes not relevant to the to the API or ABI are introduced, i.e., internal refactors of bugfixes.

Common constants

TSK_NODE_IS_SAMPLE 1u

Used in node flags to indicate that a node is a sample node.

TSK_NULL ((tsk_id_t) -1)

Null value used for cases such as absent id references.

TSK_MISSING_DATA (-1)

Value used for missing data in genotype arrays.

TSK_UNKNOWN_TIME __tsk_nan_f()

Value to indicate that a time is unknown. Note that this value is a non-signalling NAN whose representation differs from the NAN generated by computations such as divide by zeros.

Generic Errors

TSK_ERR_GENERIC -1

Generic error thrown when no other message can be generated.

TSK_ERR_NO_MEMORY -2

Memory could not be allocated.

TSK_ERR_IO -3

An IO error occurred.

TSK_ERR_BAD_PARAM_VALUE -4
TSK_ERR_BUFFER_OVERFLOW -5
TSK_ERR_UNSUPPORTED_OPERATION -6
TSK_ERR_GENERATE_UUID -7
TSK_ERR_EOF -8

The file stream ended after reading zero bytes.

File format errors

TSK_ERR_FILE_FORMAT -100

A file could not be read because it is in the wrong format

TSK_ERR_FILE_VERSION_TOO_OLD -101

The file is in tskit format, but the version is too old for the library to read. The file should be upgraded to the latest version using the tskit upgrade command line utility.

TSK_ERR_FILE_VERSION_TOO_NEW -102

The file is in tskit format, but the version is too new for the library to read. To read the file you must upgrade the version of tskit.

TSK_ERR_REQUIRED_COL_NOT_FOUND -103

A column that is a required member of a table was not found in the file.

TSK_ERR_BOTH_COLUMNS_REQUIRED -104

One of a pair of columns that must be specified together was not found in the file.

TSK_ERR_BAD_COLUMN_TYPE -105

An unsupported type was provided for a column in the file.

Out-of-bounds errors

TSK_ERR_BAD_OFFSET -200

A bad value was provided for a ragged column offset, values should start at zero and be monotonically increasing.

TSK_ERR_SEEK_OUT_OF_BOUNDS -201

A position to seek to was less than zero or greater than the length of the genome

TSK_ERR_NODE_OUT_OF_BOUNDS -202

A node id was less than zero or greater than the final index

TSK_ERR_EDGE_OUT_OF_BOUNDS -203

A edge id was less than zero or greater than the final index

TSK_ERR_POPULATION_OUT_OF_BOUNDS -204

A population id was less than zero or greater than the final index

TSK_ERR_SITE_OUT_OF_BOUNDS -205

A site id was less than zero or greater than the final index

TSK_ERR_MUTATION_OUT_OF_BOUNDS -206

A mutation id was less than zero or greater than the final index

TSK_ERR_INDIVIDUAL_OUT_OF_BOUNDS -207

An individual id was less than zero or greater than the final index

TSK_ERR_MIGRATION_OUT_OF_BOUNDS -208

A migration id was less than zero or greater than the final index

TSK_ERR_PROVENANCE_OUT_OF_BOUNDS -209

A provenance id was less than zero or greater than the final index

TSK_ERR_TIME_NONFINITE -210

A time value was non-finite (NaN counts as finite)

TSK_ERR_GENOME_COORDS_NONFINITE -211

A genomic position was non-finite

Edge errors

TSK_ERR_NULL_PARENT -300

A parent node of an edge was TSK_NULL.

TSK_ERR_NULL_CHILD -301

A child node of an edge was TSK_NULL.

TSK_ERR_EDGES_NOT_SORTED_PARENT_TIME -302

The edge table was not sorted by the time of each edge’s parent nodes. Sort order is (time[parent], child, left).

TSK_ERR_EDGES_NONCONTIGUOUS_PARENTS -303

A parent node had edges that were non-contigious.

TSK_ERR_EDGES_NOT_SORTED_CHILD -304

The edge table was not sorted by the id of the child node of each edge. Sort order is (time[parent], child, left).

TSK_ERR_EDGES_NOT_SORTED_LEFT -305

The edge table was not sorted by the left coordinate each edge. Sort order is (time[parent], child, left).

TSK_ERR_BAD_NODE_TIME_ORDERING -306

An edge had child node that was older than the parent. Parent times must be greater than the child time.

TSK_ERR_BAD_EDGE_INTERVAL -307

An edge had a genomic interval where right was greater or equal to left.

TSK_ERR_DUPLICATE_EDGES -308

An edge was duplicated.

TSK_ERR_RIGHT_GREATER_SEQ_LENGTH -309

An edge had a right coord greater than the genomic length.

TSK_ERR_LEFT_LESS_ZERO -310

An edge had a left coord less than zero.

TSK_ERR_BAD_EDGES_CONTRADICTORY_CHILDREN -311

A parent node had edges that were contradictory over an interval.

TSK_ERR_CANT_PROCESS_EDGES_WITH_METADATA -312

A method that doesn’t support edge metadata was attempted on an edge table containing metadata.

Site errors

TSK_ERR_UNSORTED_SITES -400

The site table was not in order of increasing genomic position.

TSK_ERR_DUPLICATE_SITE_POSITION -401

The site table had more than one site at a single genomic position.

TSK_ERR_BAD_SITE_POSITION -402

A site had a position that was less than zero or greater than the sequence length.

Mutation errors

TSK_ERR_MUTATION_PARENT_DIFFERENT_SITE -500

A mutation had a parent mutation that was at a different site.

TSK_ERR_MUTATION_PARENT_EQUAL -501

A mutation had a parent mutation that was itself.

TSK_ERR_MUTATION_PARENT_AFTER_CHILD -502

A mutation had a parent mutation that had a greater id.

TSK_ERR_MUTATION_PARENT_INCONSISTENT -503

Two or more mutation parent references formed a loop

TSK_ERR_UNSORTED_MUTATIONS -504

The mutation table was not in the order of non-decreasing site id and non-increasing time within each site.

TSK_ERR_MUTATION_TIME_YOUNGER_THAN_NODE -506

A mutation’s time was younger (not >=) the time of its node and wasn’t TSK_UNKNOWN_TIME.

TSK_ERR_MUTATION_TIME_OLDER_THAN_PARENT_MUTATION -507

A mutation’s time was older (not <=) than the time of its parent mutation, and wasn’t TSK_UNKNOWN_TIME.

TSK_ERR_MUTATION_TIME_OLDER_THAN_PARENT_NODE -508

A mutation’s time was older (not <) than the time of the parent node of the edge on which it occurs, and wasn’t TSK_UNKNOWN_TIME.

TSK_ERR_MUTATION_TIME_HAS_BOTH_KNOWN_AND_UNKNOWN -509

A single site had a mixture of known mutation times and TSK_UNKNOWN_TIME

Migration errors

TSK_ERR_UNSORTED_MIGRATIONS -550

The migration table was not sorted by time.

Sample errors

TSK_ERR_DUPLICATE_SAMPLE -600

A duplicate sample was specified.

TSK_ERR_BAD_SAMPLES -601

A sample id that was not valid was specified.

Table errors

TSK_ERR_BAD_TABLE_POSITION -700

An invalid table position was specifed.

TSK_ERR_BAD_SEQUENCE_LENGTH -701

A sequence length equal to or less than zero was specified.

TSK_ERR_TABLES_NOT_INDEXED -702

The table collection was not indexed.

TSK_ERR_TABLE_OVERFLOW -703

Tables cannot be larger than 2**31 rows.

TSK_ERR_COLUMN_OVERFLOW -704

Ragged array columns cannot be larger than 2**64 bytes.

TSK_ERR_TREE_OVERFLOW -705

The table collection contains more than 2**31 trees.

TSK_ERR_METADATA_DISABLED -706

Metadata was attempted to be set on a table where it is disabled.

TSK_ERR_TABLES_BAD_INDEXES -707

There was an error with the table’s indexes.

Genotype decoding errors

TSK_ERR_MUST_IMPUTE_NON_SAMPLES -1100

Genotypes were requested for non-samples at the same time as asking that isolated nodes be marked as missing. This is not supported.

TSK_ERR_ALLELE_NOT_FOUND -1101

A user-specified allele map was used, but didn’t contain an allele found in the tree sequence.

TSK_ERR_TOO_MANY_ALLELES -1102

More than 2147483647 alleles were specified.

TSK_ERR_ZERO_ALLELES -1103

A user-specified allele map was used, but it contained zero alleles.

Union errors

TSK_ERR_UNION_BAD_MAP -1400

A node map was specified that contained a node not present in the specified table collection.

TSK_ERR_UNION_DIFF_HISTORIES -1401

The shared portions of the specified tree sequences are not equal. Note that this may be the case if the table collections were not fully sorted before union was called.

Simplify errors

TSK_ERR_KEEP_UNARY_MUTUALLY_EXCLUSIVE -1600

Both TSK_SIMPLIFY_KEEP_UNARY and TSK_SIMPLIFY_KEEP_UNARY_IN_INDIVIDUALS were specified. Only one can be used.

Individual errors

TSK_ERR_UNSORTED_INDIVIDUALS -1700

Individuals were provided in an order where parents were after their children.

TSK_ERR_INDIVIDUAL_SELF_PARENT -1701

An individual was its own parent.

TSK_ERR_INDIVIDUAL_PARENT_CYCLE -1702

An individual was its own ancestor in a cycle of references.

TSK_ERR_INDIVIDUAL_POPULATION_MISMATCH -1703

An individual had nodes from more than one population (and only one was requested).

TSK_ERR_INDIVIDUAL_TIME_MISMATCH -1704

An individual had nodes from more than one time (and only one was requested).

Examples

Basic forwards simulator

This is an example of using the tables API to define a simple haploid Wright-Fisher simulator. Because this simple example repeatedly sorts the edge data, it is quite inefficient and should not be used as the basis of a large-scale simulator.

Note

This example uses the C function rand and constant RAND_MAX for random number generation. These methods are used for example purposes only and a high-quality random number library should be preferred for code used for research. Examples include, but are not limited to:

  1. The GNU Scientific Library, which is licensed under the GNU General Public License, version 3 (GPL3+.

  2. For C++ projects using C++11 or later, the built-in random number library.

  3. The numpy C API may be useful for those writing Python extension modules in C/C++.

Todo

Give a pointer to an example that caches and flushes edge data efficiently. Probably using the C++ API?

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <err.h>

#include <tskit/tables.h>

#define check_tsk_error(val)                                                            \
    if (val < 0) {                                                                      \
        errx(EXIT_FAILURE, "line %d: %s", __LINE__, tsk_strerror(val));                 \
    }

void
simulate(
    tsk_table_collection_t *tables, int N, int T, int simplify_interval)
{
    tsk_id_t *buffer, *parents, *children, child, left_parent, right_parent;
    double breakpoint;
    int ret, j, t, b;

    assert(simplify_interval != 0); // leads to division by zero
    buffer = malloc(2 * N * sizeof(tsk_id_t));
    if (buffer == NULL) {
        errx(EXIT_FAILURE, "Out of memory");
    }
    tables->sequence_length = 1.0;
    parents = buffer;
    for (j = 0; j < N; j++) {
        parents[j]
            = tsk_node_table_add_row(&tables->nodes, 0, T, TSK_NULL, TSK_NULL, NULL, 0);
        check_tsk_error(parents[j]);
    }
    b = 0;
    for (t = T - 1; t >= 0; t--) {
        /* Alternate between using the first and last N values in the buffer */
        parents = buffer + (b * N);
        b = (b + 1) % 2;
        children = buffer + (b * N);
        for (j = 0; j < N; j++) {
            child = tsk_node_table_add_row(
                &tables->nodes, 0, t, TSK_NULL, TSK_NULL, NULL, 0);
            check_tsk_error(child);
            /* NOTE: the use of rand() is discouraged for
             * research code and proper random number generator
             * libraries should be preferred.
             */
            left_parent = parents[(size_t)((rand()/(1.+RAND_MAX))*N)];
            right_parent = parents[(size_t)((rand()/(1.+RAND_MAX))*N)];
            do {
                breakpoint = rand()/(1.+RAND_MAX);
            } while (breakpoint == 0); /* tiny proba of breakpoint being 0 */
            ret = tsk_edge_table_add_row(
                &tables->edges, 0, breakpoint, left_parent, child, NULL, 0);
            check_tsk_error(ret);
            ret = tsk_edge_table_add_row(
                &tables->edges, breakpoint, 1, right_parent, child, NULL, 0);
            check_tsk_error(ret);
            children[j] = child;
        }
        if (t % simplify_interval == 0) {
            printf("Simplify at generation %lld: (%lld nodes %lld edges)",
                (long long) t,
                (long long) tables->nodes.num_rows,
                (long long) tables->edges.num_rows);
            /* Note: Edges must be sorted for simplify to work, and we use a brute force
             * approach of sorting each time here for simplicity. This is inefficient. */
            ret = tsk_table_collection_sort(tables, NULL, 0);
            check_tsk_error(ret);
            ret = tsk_table_collection_simplify(tables, children, N, 0, NULL);
            check_tsk_error(ret);
            printf(" -> (%lld nodes %lld edges)\n",
                (long long) tables->nodes.num_rows,
                (long long) tables->edges.num_rows);
            for (j = 0; j < N; j++) {
                children[j] = j;
            }
        }
    }
    free(buffer);
}

int
main(int argc, char **argv)
{
    int ret;
    tsk_table_collection_t tables;

    if (argc != 6) {
        errx(EXIT_FAILURE, "usage: N T simplify-interval output-file seed");
    }
    ret = tsk_table_collection_init(&tables, 0);
    check_tsk_error(ret);
    srand((unsigned)atoi(argv[5]));
    simulate(&tables, atoi(argv[1]), atoi(argv[2]), atoi(argv[3]));
    ret = tsk_table_collection_dump(&tables, argv[4], 0);
    check_tsk_error(ret);

    tsk_table_collection_free(&tables);
    return 0;
}

Tree iteration

#include <stdio.h>
#include <stdlib.h>
#include <err.h>

#include <tskit.h>

#define check_tsk_error(val)                                                            \
    if (val < 0) {                                                                      \
        errx(EXIT_FAILURE, "line %d: %s", __LINE__, tsk_strerror(val));                 \
    }

int
main(int argc, char **argv)
{
    int ret;
    tsk_treeseq_t ts;
    tsk_tree_t tree;

    if (argc != 2) {
        errx(EXIT_FAILURE, "usage: <tree sequence file>");
    }
    ret = tsk_treeseq_load(&ts, argv[1], 0);
    check_tsk_error(ret);
    ret = tsk_tree_init(&tree, &ts, 0);
    check_tsk_error(ret);

    printf("Iterate forwards\n");
    for (ret = tsk_tree_first(&tree); ret == TSK_TREE_OK; ret = tsk_tree_next(&tree)) {
        printf("\ttree %lld has %lld roots\n",
            (long long) tree.index,
            (long long) tsk_tree_get_num_roots(&tree));
    }
    check_tsk_error(ret);

    printf("Iterate backwards\n");
    for (ret = tsk_tree_last(&tree); ret == TSK_TREE_OK; ret = tsk_tree_prev(&tree)) {
        printf("\ttree %lld has %lld roots\n",
            (long long) tree.index,
            (long long) tsk_tree_get_num_roots(&tree));
    }
    check_tsk_error(ret);

    tsk_tree_free(&tree);
    tsk_treeseq_free(&ts);
    return 0;
}

Tree traversals

In this example we load a tree sequence file, and then traverse the first tree in four different ways:

  1. We first traverse the tree in preorder and postorder using the tsk_tree_preorder() tsk_tree_postorder() functions to fill an array of nodes in the appropriate orders. This is the recommended approach and will be convenient and efficient for most purposes.

  2. As an example of how we might build our own traveral algorithms, we then traverse the tree in preorder using recursion. This is a very common way of navigating around trees and can be convenient for some applications. For example, here we compute the depth of each node (i.e., it’s distance from the root) and use this when printing out the nodes as we visit them.

  3. Then we traverse the tree in preorder using an iterative approach. This is a little more efficient than using recursion, and is sometimes more convenient than structuring the calculation recursively.

  4. In the third example we iterate upwards from the samples rather than downwards from the root.

#include <stdio.h>
#include <stdlib.h>
#include <err.h>

#include <tskit.h>

#define check_tsk_error(val)                                                            \
    if (val < 0) {                                                                      \
        errx(EXIT_FAILURE, "line %d: %s", __LINE__, tsk_strerror(val));                 \
    }

static void
traverse_standard(const tsk_tree_t *tree)
{
    int ret;
    tsk_size_t num_nodes, j;
    tsk_id_t *nodes = malloc(tsk_tree_get_size_bound(tree) * sizeof(*nodes));

    if (nodes == NULL) {
        errx(EXIT_FAILURE, "Out of memory");
    }
    ret = tsk_tree_preorder(tree, nodes, &num_nodes);
    check_tsk_error(ret);
    for (j = 0; j < num_nodes; j++) {
        printf("Visit preorder %lld\n", (long long) nodes[j]);
    }

    ret = tsk_tree_postorder(tree, nodes, &num_nodes);
    check_tsk_error(ret);
    for (j = 0; j < num_nodes; j++) {
        printf("Visit postorder %lld\n", (long long) nodes[j]);
    }

    free(nodes);
}

static void
_traverse(const tsk_tree_t *tree, tsk_id_t u, int depth)
{
    tsk_id_t v;
    int j;

    for (j = 0; j < depth; j++) {
        printf("    ");
    }
    printf("Visit recursive %lld\n", (long long) u);
    for (v = tree->left_child[u]; v != TSK_NULL; v = tree->right_sib[v]) {
        _traverse(tree, v, depth + 1);
    }
}

static void
traverse_recursive(const tsk_tree_t *tree)
{
    _traverse(tree, tree->virtual_root, -1);
}

static void
traverse_stack(const tsk_tree_t *tree)
{
    int stack_top;
    tsk_id_t u, v;
    tsk_id_t *stack = malloc(tsk_tree_get_size_bound(tree) * sizeof(*stack));

    if (stack == NULL) {
        errx(EXIT_FAILURE, "Out of memory");
    }
    stack_top = 0;
    stack[stack_top] = tree->virtual_root;
    while (stack_top >= 0) {
        u = stack[stack_top];
        stack_top--;
        printf("Visit stack %lld\n", (long long) u);
        /* Put nodes on the stack right-to-left, so we visit in left-to-right */
        for (v = tree->right_child[u]; v != TSK_NULL; v = tree->left_sib[v]) {
            stack_top++;
            stack[stack_top] = v;
        }
    }
    free(stack);
}

static void
traverse_upwards(const tsk_tree_t *tree)
{
    const tsk_id_t *samples = tsk_treeseq_get_samples(tree->tree_sequence);
    tsk_size_t num_samples = tsk_treeseq_get_num_samples(tree->tree_sequence);
    tsk_size_t j;
    tsk_id_t u;

    for (j = 0; j < num_samples; j++) {
        u = samples[j];
        while (u != TSK_NULL) {
            printf("Visit upwards: %lld\n", (long long) u);
            u = tree->parent[u];
        }
    }
}

int
main(int argc, char **argv)
{
    int ret;
    tsk_treeseq_t ts;
    tsk_tree_t tree;

    if (argc != 2) {
        errx(EXIT_FAILURE, "usage: <tree sequence file>");
    }
    ret = tsk_treeseq_load(&ts, argv[1], 0);
    check_tsk_error(ret);
    ret = tsk_tree_init(&tree, &ts, 0);
    check_tsk_error(ret);
    ret = tsk_tree_first(&tree);
    check_tsk_error(ret);

    traverse_standard(&tree);

    traverse_recursive(&tree);

    traverse_stack(&tree);

    traverse_upwards(&tree);

    tsk_tree_free(&tree);
    tsk_treeseq_free(&ts);
    return 0;
}

File streaming

It is often useful to read tree sequence files from a stream rather than from a fixed filename. This example shows how to do this using the tsk_table_collection_loadf() and tsk_table_collection_dumpf() functions. Here, we sequentially load table collections from the stdin stream and write them back out to stdout with their mutations removed.

#include <stdio.h>
#include <stdlib.h>
#include <tskit/tables.h>

#define check_tsk_error(val)                                                            \
    if (val < 0) {                                                                      \
        fprintf(stderr, "Error: line %d: %s\n", __LINE__, tsk_strerror(val));           \
        exit(EXIT_FAILURE);                                                             \
    }

int
main(int argc, char **argv)
{
    int ret;
    int j = 0;
    tsk_table_collection_t tables;

    ret = tsk_table_collection_init(&tables, 0);
    check_tsk_error(ret);

    while (true) {
        ret = tsk_table_collection_loadf(&tables, stdin, TSK_NO_INIT);
        if (ret == TSK_ERR_EOF) {
            break;
        }
        check_tsk_error(ret);
        fprintf(stderr, "Tree sequence %d had %lld mutations\n", j,
            (long long) tables.mutations.num_rows);
        ret = tsk_mutation_table_truncate(&tables.mutations, 0);
        check_tsk_error(ret);
        ret = tsk_table_collection_dumpf(&tables, stdout, 0);
        check_tsk_error(ret);
        j++;
    }
    tsk_table_collection_free(&tables);
    return EXIT_SUCCESS;
}

Note that we use the value TSK_ERR_EOF to detect when the stream ends, as we don’t know how many tree sequences to expect on the input. In this case, TSK_ERR_EOF is not considered an error and we exit normally.

Running this program on some tree sequence files we might get:

$ cat tmp1.trees tmp2.trees | ./build/streaming > no_mutations.trees
Tree sequence 0 had 38 mutations
Tree sequence 1 had 132 mutations

Then, running this program again on the output of the previous command, we see that we now have two tree sequences with their mutations removed stored in the file no_mutations.trees:

$ ./build/streaming < no_mutations.trees > /dev/null
Tree sequence 0 had 0 mutations
Tree sequence 1 had 0 mutations