C API#

This is the documentation for the tskit C API, a low-level library for manipulating and processing tree sequence data. The library is written using the C99 standard and is fully thread safe. Tskit uses kastore to define a simple storage format for the tree sequence data.

To see the API in action, please see Examples section.

Overview#

Do I need the C API?#

The tskit C API is generally useful in the following situations:

  • You want to use the tskit API in a larger C/C++ application (e.g., in order to output data in the .trees format);

  • You need to perform lots of tree traversals/loops etc. to analyse some data that is in tree sequence form.

For high level operations that are not performance sensitive, the Python API is generally more useful. Python is much more convenient that C, and since the tskit Python module is essentially a wrapper for this C library, there’s often no real performance penalty for using it.

Differences with the Python API#

Much of the explanatory material (for example tutorials) about the Python API applies to the C-equivalent methods as the Python API wraps this API.

The main area of difference is, unlike the Python API, the C API doesn’t do any decoding, encoding or schema validation of Metadata fields, instead only handling the byte sting representation of the metadata. Metadata is therefore never used directly by any tskit C API method, just stored.

API stability contract#

Since the C API 1.0 release we pledge to make no breaking changes to the documented API in subsequent releases in the 1.0 series. What this means is that any code that compiles under the 1.0 release should also compile without changes in subsequent 1.x releases. We will not change the semantics of documented functions, unless it is to fix clearly buggy behaviour. We will not change the values of macro constants.

Undocumented functions do not have this guarantee, and may be changed arbitrarily between releases.

Note

We do not currently make any guarantees about ABI stability, since the primary use-case is for tskit to be embedded within another application rather than used as a shared library. If you do intend to use tskit as a shared library and ABI stability is therefore imporant to you, please let us know and we can plan accordingly.

API structure#

Tskit uses a set of conventions to provide a pseudo object-oriented API. Each ‘object’ is represented by a C struct and has a set of ‘methods’. This is most easily explained by an example:

#include <stdio.h>
#include <stdlib.h>
#include <tskit/tables.h>

#define check_tsk_error(val)                                                            \
    if (val < 0) {                                                                      \
        fprintf(stderr, "line %d: %s", __LINE__, tsk_strerror(val));                    \
        exit(EXIT_FAILURE);                                                             \
    }

int
main(int argc, char **argv)
{
    int j, ret;
    tsk_edge_table_t edges;

    ret = tsk_edge_table_init(&edges, 0);
    check_tsk_error(ret);
    for (j = 0; j < 5; j++) {
        ret = tsk_edge_table_add_row(&edges, 0, 1, j + 1, j, NULL, 0);
        check_tsk_error(ret);
    }
    tsk_edge_table_print_state(&edges, stdout);
    tsk_edge_table_free(&edges);

    return EXIT_SUCCESS;
}

In this program we create a tsk_edge_table_t instance, add five rows using tsk_edge_table_add_row(), print out its contents using the tsk_edge_table_print_state() debugging method, and finally free the memory used by the edge table object. We define this edge table ‘class’ by using some simple naming conventions which are adhered to throughout tskit. This is simply a naming convention that helps to keep code written in plain C logically structured; there are no extra C++ style features. We use object oriented terminology freely throughout this documentation with this understanding.

In this convention, a class is defined by a struct tsk_class_name_t (e.g. tsk_edge_table_t) and its methods all have the form tsk_class_name_method_name whose first argument is always a pointer to an instance of the class (e.g., tsk_edge_table_add_row above). Each class has an initialise and free method, called tsk_class_name_init and tsk_class_name_free, respectively. The init method must be called to ensure that the object is correctly initialised (except for functions such as for tsk_table_collection_load() and tsk_table_collection_copy() which automatically initialise the object by default for convenience). The free method must always be called to avoid leaking memory, even in the case of an error occurring during initialisation. If tsk_class_name_init has been called successfully, we say the object has been “initialised”; if not, it is “uninitialised”. After tsk_class_name_free has been called, the object is again uninitialised.

It is important to note that the init methods only allocate internal memory; the memory for the instance itself must be allocated either on the heap or the stack:

// Instance allocated on the stack
tsk_node_table_t nodes;
tsk_node_table_init(&nodes, 0);
tsk_node_table_free(&nodes);

// Instance allocated on the heap
tsk_edge_table_t *edges = malloc(sizeof(tsk_edge_table_t));
tsk_edge_table_init(edges, 0);
tsk_edge_table_free(edges);
free(edges);

Error handling#

C does not have a mechanism for propagating exceptions, and great care must be taken to ensure that errors are correctly and safely handled. The convention adopted in tskit is that every function (except for trivial accessor methods) returns an integer. If this return value is negative an error has occured which must be handled. A description of the error that occured can be obtained using the tsk_strerror() function. The following example illustrates the key conventions around error handling in tskit:

#include <stdio.h>
#include <stdlib.h>
#include <tskit.h>

int
main(int argc, char **argv)
{
    int ret;
    tsk_treeseq_t ts;

    if (argc != 2) {
        fprintf(stderr, "usage: <tree sequence file>");
        exit(EXIT_FAILURE);
    }
    ret = tsk_treeseq_load(&ts, argv[1], 0);
    if (ret < 0) {
        /* Error condition. Free and exit */
        tsk_treeseq_free(&ts);
        fprintf(stderr, "%s", tsk_strerror(ret));
        exit(EXIT_FAILURE);
    }
    printf("Loaded tree sequence with %lld nodes and %lld edges from %s\n",
        (long long) tsk_treeseq_get_num_nodes(&ts),
        (long long) tsk_treeseq_get_num_edges(&ts),
        argv[1]);
    tsk_treeseq_free(&ts);

    return EXIT_SUCCESS;
}

In this example we load a tree sequence from file and print out a summary of the number of nodes and edges it contains. After calling tsk_treeseq_load() we check the return value ret to see if an error occured. If an error has occured we exit with an error message produced by tsk_strerror(). Note that in this example we call tsk_treeseq_free() whether or not an error occurs: in general, once a function that initialises an object (e.g., X_init, X_copy or X_load) is called, then X_free must be called to ensure that memory is not leaked.

Most functions in tskit return an error status; we recommend that every return value is checked.

Memory allocation strategy#

To reduce the frequency of memory allocations tskit pre-allocates space for additional table rows in each table, along with space for the contents of ragged columns. The default behaviour is to start with space for 1,024 rows in each table and 65,536 bytes in each ragged column. The table then grows as needed by doubling, until a maximum pre-allocation of 2,097,152 rows for a table or 104,857,600 bytes for a ragged column. This behaviour can be disabled and a fixed increment used, on a per-table and per-ragged-column basis using the tsk_X_table_set_max_rows_increment and tsk_provenance_table_set_max_X_length_increment methods where X is the name of the table or column.

Using tskit in your project#

Tskit is built as a standard C library and so there are many different ways in which it can be included in downstream projects. It is possible to install tskit onto a system (i.e., installing a shared library and header files to a standard locations on Unix) and linking against it, but there are many different ways in which this can go wrong. In the interest of simplicity and improving the end-user experience we recommend embedding tskit directly into your applications.

There are many different build systems and approaches to compiling code, and so it’s not possible to give definitive documentation on how tskit should be included in downstream projects. Please see the build examples repo for some examples of how to incorporate tskit into different project structures and build systems.

Tskit uses the meson build system internally, and supports being used a meson subproject. We show an example in which this is combined with the tskit distribution tarball to neatly abstract many details of cross-platform C development.

Some users may choose to check the source for tskit directly into their source control repositories. If you wish to do this, the code is in the c subdirectory of the tskit repo. The following header files should be placed in the search path: subprojects/kastore/kastore.h, tskit.h, and tskit/*.h. The C files subprojects/kastore/kastore.c and tskit/*.c should be compiled. For those who wish to minimise the size of their compiled binaries, tskit is quite modular, and C files can be omitted if not needed. For example, if you are just using the Generic Errors then only the files tskit/core.[c,h] and tskit/tables.[c,h] are needed.

However you include tskit in your project, however, please ensure that it is a released version. Released versions are tagged on GitHub using the convention C_{VERSION}. The code can either be downloaded from GitHub on the releases page where each release has a distribution tarball for example tskit-dev/tskit Alternatively the code can be checked out using git. For example, to check out the C_1.0.0 release:

$ git clone https://github.com/tskit-dev/tskit.git
$ cd tskit
$ git checkout C_1.0.0

Basic Types#

typedef int32_t tsk_id_t#

Tskit Object IDs.

All objects in tskit are referred to by integer IDs corresponding to the row they occupy in the relevant table. The tsk_id_t type should be used when manipulating these ID values. The reserved value TSK_NULL (-1) defines missing data.

typedef uint64_t tsk_size_t#

Tskit sizes.

The tsk_size_t type is an unsigned integer used for any size or count value.

typedef uint32_t tsk_flags_t#

Container for bitwise flags.

Bitwise flags are used in tskit as a column type and also as a way to specify options to API functions.

typedef uint8_t tsk_bool_t#

Boolean type.

Fixed-size (1 byte) boolean values.

Common options#

TSK_DEBUG (1u << 31)#

Turn on debugging output. Not supported by all functions.

TSK_NO_INIT (1u << 30)#

Do not initialise the parameter object.

TSK_NO_CHECK_INTEGRITY (1u << 29)#

Do not run integrity checks before performing an operation. This performance optimisation should not be used unless the calling code can guarantee reference integrity within the table collection. References to rows not in the table or bad offsets will result in undefined behaviour.

TSK_TAKE_OWNERSHIP (1u << 28)#

Instead of taking a copy of input objects, the function should take ownership of them and manage their lifecycle. The caller specifying this flag should no longer modify or free the object or objects passed. See individual functions using this flag for what object it applies to.

Tables API#

The tables API section of tskit is defined in the tskit/tables.h header.

Table collections#

struct tsk_table_collection_t#

A collection of tables defining the data for a tree sequence.

Public Members

double sequence_length#

The sequence length defining the tree sequence’s coordinate space.

char *time_units#

The units of the time dimension.

char *metadata#

The tree-sequence metadata.

char *metadata_schema#

The metadata schema.

tsk_individual_table_t individuals#

The individual table.

tsk_node_table_t nodes#

The node table.

tsk_edge_table_t edges#

The edge table.

tsk_migration_table_t migrations#

The migration table.

tsk_site_table_t sites#

The site table.

tsk_mutation_table_t mutations#

The mutation table.

tsk_population_table_t populations#

The population table.

tsk_provenance_table_t provenances#

The provenance table.

struct tsk_bookmark_t#

A bookmark recording the position of all the tables in a table collection.

Public Members

tsk_size_t individuals#

The position in the individual table.

tsk_size_t nodes#

The position in the node table.

tsk_size_t edges#

The position in the edge table.

tsk_size_t migrations#

The position in the migration table.

tsk_size_t sites#

The position in the site table.

tsk_size_t mutations#

The position in the mutation table.

tsk_size_t populations#

The position in the population table.

tsk_size_t provenances#

The position in the provenance table.

int tsk_table_collection_init(tsk_table_collection_t *self, tsk_flags_t options)#

Initialises the table collection by allocating the internal memory and initialising all the constituent tables.

This must be called before any operations are performed on the table collection. See the API structure for details on how objects are initialised and freed.

Options

Options can be specified by providing bitwise flags:

Parameters:
  • self – A pointer to an uninitialised tsk_table_collection_t object.

  • options – Allocation time options as above.

Returns:

Return 0 on success or a negative value on failure.

int tsk_table_collection_free(tsk_table_collection_t *self)#

Free the internal memory for the specified table collection.

Parameters:
Returns:

Always returns 0.

int tsk_table_collection_clear(tsk_table_collection_t *self, tsk_flags_t options)#

Clears data tables (and optionally provenances and metadata) in this table collection.

By default this operation clears all tables except the provenance table, retaining table metadata schemas and the tree-sequence level metadata and schema.

No memory is freed as a result of this operation; please use tsk_table_collection_free() to free internal resources.

Options

Options can be specified by providing one or more of the following bitwise flags:

Parameters:
Returns:

Return 0 on success or a negative value on failure.

bool tsk_table_collection_equals(const tsk_table_collection_t *self, const tsk_table_collection_t *other, tsk_flags_t options)#

Returns true if the data in the specified table collection is equal to the data in this table collection.

Returns true if the two table collections are equal. The indexes are not considered as these are derived from the tables. We also do not consider the file_uuid, since it is a property of the file that set of tables is stored in.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) two table collections are considered equal if all of the tables are byte-wise identical, and the sequence lengths, metadata and metadata schemas of the two table collections are identical.

Parameters:
Returns:

Return true if the specified table collection is equal to this table.

int tsk_table_collection_copy(const tsk_table_collection_t *self, tsk_table_collection_t *dest, tsk_flags_t options)#

Copies the state of this table collection into the specified destination.

By default the method initialises the specified destination table collection. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Options

Options can be specified by providing bitwise flags:

TSK_COPY_FILE_UUID

Parameters:
  • self – A pointer to a tsk_table_collection_t object.

  • dest – A pointer to a tsk_table_collection_t object. If the TSK_NO_INIT option is specified, this must be an initialised table collection. If not, it must be an uninitialised table collection.

  • options – Bitwise option flags.

Returns:

Return 0 on success or a negative value on failure.

void tsk_table_collection_print_state(const tsk_table_collection_t *self, FILE *out)#

Print out the state of this table collection to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters:
int tsk_table_collection_load(tsk_table_collection_t *self, const char *filename, tsk_flags_t options)#

Load a table collection from a file path.

Loads the data from the specified file into this table collection. By default, the table collection is also initialised. The resources allocated must be freed using tsk_table_collection_free() even in error conditions.

If the TSK_NO_INIT option is set, the table collection is not initialised, allowing an already initialised table collection to be overwritten with the data from a file.

If the file contains multiple table collections, this function will load the first. Please see the tsk_table_collection_loadf() for details on how to sequentially load table collections from a stream.

If the TSK_LOAD_SKIP_TABLES option is set, only the non-table information from the table collection will be read, leaving all tables with zero rows and no metadata or schema. If the TSK_LOAD_SKIP_REFERENCE_SEQUENCE option is set, the table collection is read without loading the reference sequence.

Options

Options can be specified by providing one or more of the following bitwise flags:

Examples

int ret;
tsk_table_collection_t tables;
ret = tsk_table_collection_load(&tables, "data.trees", 0);
if (ret != 0) {
    fprintf(stderr, "Load error:%s\n", tsk_strerror(ret));
    exit(EXIT_FAILURE);
}

Parameters:
  • self – A pointer to an uninitialised tsk_table_collection_t object if the TSK_NO_INIT option is not set (default), or an initialised tsk_table_collection_t otherwise.

  • filename – A NULL terminated string containing the filename.

  • options – Bitwise options. See above for details.

Returns:

Return 0 on success or a negative value on failure.

int tsk_table_collection_loadf(tsk_table_collection_t *self, FILE *file, tsk_flags_t options)#

Load a table collection from a stream.

Loads a tables definition from the specified file stream to this table collection. By default, the table collection is also initialised. The resources allocated must be freed using tsk_table_collection_free() even in error conditions.

If the TSK_NO_INIT option is set, the table collection is not initialised, allowing an already initialised table collection to be overwritten with the data from a file.

The stream can be an arbitrary file descriptor, for example a network socket. No seek operations are performed.

If the stream contains multiple table collection definitions, this function will load the next table collection from the stream. If the stream contains no more table collection definitions the error value TSK_ERR_EOF will be returned. Note that EOF is only returned in the case where zero bytes are read from the stream — malformed files or other errors will result in different error conditions. Please see the File streaming section for an example of how to sequentially load tree sequences from a stream.

Please note that this streaming behaviour is not supported if the TSK_LOAD_SKIP_TABLES or TSK_LOAD_SKIP_REFERENCE_SEQUENCE option is set. If the TSK_LOAD_SKIP_TABLES option is set, only the non-table information from the table collection will be read, leaving all tables with zero rows and no metadata or schema. If the TSK_LOAD_SKIP_REFERENCE_SEQUENCE option is set, the table collection is read without loading the reference sequence. When attempting to read from a stream with multiple table collection definitions and either of these two options set, the requested information from the first table collection will be read on the first call to tsk_table_collection_loadf(), with subsequent calls leading to errors.

Options

Options can be specified by providing one or more of the following bitwise flags:

Parameters:
  • self – A pointer to an uninitialised tsk_table_collection_t object if the TSK_NO_INIT option is not set (default), or an initialised tsk_table_collection_t otherwise.

  • file – A FILE stream opened in an appropriate mode for reading (e.g. “r”, “r+” or “w+”) positioned at the beginning of a table collection definition.

  • options – Bitwise options. See above for details.

Returns:

Return 0 on success or a negative value on failure.

int tsk_table_collection_dump(const tsk_table_collection_t *self, const char *filename, tsk_flags_t options)#

Write a table collection to file.

Writes the data from this table collection to the specified file.

If an error occurs the file path is deleted, ensuring that only complete and well formed files will be written.

Examples

int ret;
tsk_table_collection_t tables;

ret = tsk_table_collection_init(&tables, 0);
error_check(ret);
tables.sequence_length = 1.0;
// Write out the empty tree sequence
ret = tsk_table_collection_dump(&tables, "empty.trees", 0);
error_check(ret);

Parameters:
  • self – A pointer to an initialised tsk_table_collection_t object.

  • filename – A NULL terminated string containing the filename.

  • options – Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_table_collection_dumpf(const tsk_table_collection_t *self, FILE *file, tsk_flags_t options)#

Write a table collection to a stream.

Writes the data from this table collection to the specified FILE stream. Semantics are identical to tsk_table_collection_dump().

Please see the File streaming section for an example of how to sequentially dump and load tree sequences from a stream.

Parameters:
  • self – A pointer to an initialised tsk_table_collection_t object.

  • file – A FILE stream opened in an appropriate mode for writing (e.g. “w”, “a”, “r+” or “w+”).

  • options – Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_table_collection_record_num_rows(const tsk_table_collection_t *self, tsk_bookmark_t *bookmark)#

Record the number of rows in each table in the specified tsk_bookmark_t object.

Parameters:
Returns:

Return 0 on success or a negative value on failure.

int tsk_table_collection_truncate(tsk_table_collection_t *self, tsk_bookmark_t *bookmark)#

Truncates the tables in this table collection according to the specified bookmark.

Truncate the tables in this collection so that each one has the number of rows specified in the parameter tsk_bookmark_t. Use the tsk_table_collection_record_num_rows() function to record the number rows for each table in a table collection at a particular time.

Parameters:
  • self – A pointer to a tsk_individual_table_t object.

  • bookmark – The number of rows to retain in each table.

Returns:

Return 0 on success or a negative value on failure.

int tsk_table_collection_sort(tsk_table_collection_t *self, const tsk_bookmark_t *start, tsk_flags_t options)#

Sorts the tables in this collection.

Some of the tables in a table collection must satisfy specific sortedness requirements in order to define a valid tree sequence. This method sorts the edge, site, mutation and individual tables such that these requirements are guaranteed to be fulfilled. The node, population and provenance tables do not have any sortedness requirements, and are therefore ignored by this method.

The specified tsk_bookmark_t allows us to specify a start position for sorting in each of the tables; rows before this value are assumed to already be in sorted order and this information is used to make sorting more efficient. Positions in tables that are not sorted (node, population and provenance) are ignored and can be set to arbitrary values.

The table collection will always be unindexed after sort successfully completes.

For more control over the sorting process, see the Low-level sorting section.

Options

Options can be specified by providing one or more of the following bitwise flags:

TSK_NO_CHECK_INTEGRITY

Do not run integrity checks using tsk_table_collection_check_integrity() before sorting, potentially leading to a small reduction in execution time. This performance optimisation should not be used unless the calling code can guarantee reference integrity within the table collection. References to rows not in the table or bad offsets will result in undefined behaviour.

Note

The current implementation may sort in such a way that exceeds these requirements, but this behaviour should not be relied upon and later versions may weaken the level of sortedness. However, the method does guarantee that the resulting tables describes a valid tree sequence.

Warning

Sorting migrations is currently not supported and an error will be raised if a table collection containing a non-empty migration table is specified.

Warning

The current implementation only supports specifying a start position for the edge table and in a limited form for the site, mutation and individual tables. Specifying a non-zero migration, start position results in an error. The start positions for the site, mutation and individual tables can either be 0 or the length of the respective tables, allowing these tables to either be fully sorted, or not sorted at all.

Parameters:
  • self – A pointer to a tsk_table_collection_t object.

  • start – The position to begin sorting in each table; all rows less than this position must fulfill the tree sequence sortedness requirements. If this is NULL, sort all rows.

  • options – Sort options.

Returns:

Return 0 on success or a negative value on failure.

int tsk_table_collection_individual_topological_sort(tsk_table_collection_t *self, tsk_flags_t options)#

Sorts the individual table in this collection.

Sorts the individual table in place, so that parents come before children, and the parent column is remapped as required. Node references to individuals are also updated.

Parameters:
  • self – A pointer to a tsk_table_collection_t object.

  • options – Sort options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_table_collection_canonicalise(tsk_table_collection_t *self, tsk_flags_t options)#

Puts the tables into canonical form.

Put tables into canonical form such that randomly reshuffled tables are guaranteed to always be sorted in the same order, and redundant information is removed. The canonical sorting exceeds the usual tree sequence sortedness requirements.

Options:

Options can be specified by providing one or more of the following bitwise flags:

Returns:

Return 0 on success or a negative value on failure.

int tsk_table_collection_simplify(tsk_table_collection_t *self, const tsk_id_t *samples, tsk_size_t num_samples, tsk_flags_t options, tsk_id_t *node_map)#

Simplify the tables to remove redundant information.

Simplification transforms the tables to remove redundancy and canonicalise tree sequence data. See the simplification tutorial for more details.

A mapping from the node IDs in the table before simplification to their equivalent values after simplification can be obtained via the node_map argument. If this is non NULL, node_map[u] will contain the new ID for node u after simplification, or TSK_NULL if the node has been removed. Thus, node_map must be an array of at least self->nodes.num_rows tsk_id_t values.

If the TSK_SIMPLIFY_NO_FILTER_NODES option is specified, the node table will be unaltered except for changing the sample status of nodes (but see the TSK_SIMPLIFY_NO_UPDATE_SAMPLE_FLAGS option below) and to update references to other tables that may have changed as a result of filtering (see below). The node_map (if specified) will always be the identity mapping, such that node_map[u] == u for all nodes. Note also that the order of the list of samples is not important in this case.

When a table is not filtered (i.e., if the TSK_SIMPLIFY_NO_FILTER_NODES option is provided or the TSK_SIMPLIFY_FILTER_SITES, TSK_SIMPLIFY_FILTER_POPULATIONS or TSK_SIMPLIFY_FILTER_INDIVIDUALS options are not provided) the corresponding table is modified as little as possible, and all pointers are guaranteed to remain valid after simplification. The only changes made to an unfiltered table are to update any references to tables that may have changed (for example, remapping population IDs in the node table if TSK_SIMPLIFY_FILTER_POPULATIONS was specified) or altering the sample status flag of nodes.

By default, the node sample flags are updated by unsetting the TSK_NODE_IS_SAMPLE flag for all nodes and subsequently setting it for the nodes provided as input to this function. The TSK_SIMPLIFY_NO_UPDATE_SAMPLE_FLAGS option will prevent this from occuring, making it the responsibility of calling code to keep track of the ultimate sample status of nodes. Using this option in conjunction with TSK_SIMPLIFY_NO_FILTER_NODES (and without the TSK_SIMPLIFY_FILTER_POPULATIONS and TSK_SIMPLIFY_FILTER_INDIVIDUALS options) guarantees that the node table will not be written to during the lifetime of this function.

The table collection will always be unindexed after simplify successfully completes.

Options:

Options can be specified by providing one or more of the following bitwise flags:

Note

It is possible for populations and individuals to be filtered even if TSK_SIMPLIFY_NO_FILTER_NODES is specified because there may be entirely unreferenced entities in the input tables, which are not affected by whether we filter nodes or not.

Note

Migrations are currently not supported by simplify, and an error will be raised if we attempt call simplify on a table collection with greater than zero migrations. See tskit-dev/tskit#20

Parameters:
  • self – A pointer to a tsk_table_collection_t object.

  • samples – Either NULL or an array of num_samples distinct and valid node IDs. If non-null the nodes in this array will be marked as samples in the output. If NULL, the num_samples parameter is ignored and the samples in the output will be the same as the samples in the input. This is equivalent to populating the samples array with all of the sample nodes in the input in increasing order of ID.

  • num_samples – The number of node IDs in the input samples array. Ignored if the samples array is NULL.

  • options – Simplify options; see above for the available bitwise flags. For the default behaviour, a value of 0 should be provided.

  • node_map – If not NULL, this array will be filled to define the mapping between nodes IDs in the table collection before and after simplification.

Returns:

Return 0 on success or a negative value on failure.

int tsk_table_collection_subset(tsk_table_collection_t *self, const tsk_id_t *nodes, tsk_size_t num_nodes, tsk_flags_t options)#

Subsets and reorders a table collection according to an array of nodes.

Reduces the table collection to contain only the entries referring to the provided list of nodes, with nodes reordered according to the order they appear in the nodes argument. Specifically, this subsets and reorders each of the tables as follows (but see options, below):

  1. Nodes: if in the list of nodes, and in the order provided.

  2. Individuals: if referred to by a retained node.

  3. Populations: if referred to by a retained node, and in the order first seen when traversing the list of retained nodes.

  4. Edges: if both parent and child are retained nodes.

  5. Mutations: if the mutation’s node is a retained node.

  6. Sites: if any mutations remain at the site after removing mutations.

Retained individuals, edges, mutations, and sites appear in the same order as in the original tables. Note that only the information directly associated with the provided nodes is retained - for instance, subsetting to nodes=[A, B] does not retain nodes ancestral to A and B, and only retains the individuals A and B are in, and not their parents.

This function does not require the tables to be sorted.

Options:

Options can be specified by providing one or more of the following bitwise flags:

Note

Migrations are currently not supported by subset, and an error will be raised if we attempt call subset on a table collection with greater than zero migrations.

Parameters:
  • self – A pointer to a tsk_table_collection_t object.

  • nodes – An array of num_nodes valid node IDs.

  • num_nodes – The number of node IDs in the input nodes array.

  • options – Bitwise option flags.

Returns:

Return 0 on success or a negative value on failure.

int tsk_table_collection_union(tsk_table_collection_t *self, const tsk_table_collection_t *other, const tsk_id_t *other_node_mapping, tsk_flags_t options)#

Forms the node-wise union of two table collections.

Expands this table collection by adding the non-shared portions of another table collection to itself. The other_node_mapping encodes which nodes in other are equivalent to a node in self. The positions in the other_node_mapping array correspond to node ids in other, and the elements encode the equivalent node id in self or TSK_NULL if the node is exclusive to other. Nodes that are exclusive other are added to self, along with:

  1. Individuals which are new to self.

  2. Edges whose parent or child are new to self.

  3. Sites which were not present in self.

  4. Mutations whose nodes are new to self.

By default, populations of newly added nodes are assumed to be new populations, and added to the population table as well.

This operation will also sort the resulting tables, so the tables may change even if nothing new is added, if the original tables were not sorted.

Options:

Options can be specified by providing one or more of the following bitwise flags:

Note

Migrations are currently not supported by union, and an error will be raised if we attempt call union on a table collection with migrations.

Parameters:
  • self – A pointer to a tsk_table_collection_t object.

  • other – A pointer to a tsk_table_collection_t object.

  • other_node_mapping – An array of node IDs that relate nodes in other to nodes in self: the k-th element of other_node_mapping should be the index of the equivalent node in self, or TSK_NULL if the node is not present in self (in which case it will be added to self).

  • options – Union options; see above for the available bitwise flags. For the default behaviour, a value of 0 should be provided.

Returns:

Return 0 on success or a negative value on failure.

int tsk_table_collection_set_time_units(tsk_table_collection_t *self, const char *time_units, tsk_size_t time_units_length)#

Set the time_units.

Copies the time_units string to this table collection, replacing any existing.

Parameters:
  • self – A pointer to a tsk_table_collection_t object.

  • time_units – A pointer to a char array.

  • time_units_length – The size of the time units string in bytes.

Returns:

Return 0 on success or a negative value on failure.

int tsk_table_collection_set_metadata(tsk_table_collection_t *self, const char *metadata, tsk_size_t metadata_length)#

Set the metadata.

Copies the metadata string to this table collection, replacing any existing.

Parameters:
  • self – A pointer to a tsk_table_collection_t object.

  • metadata – A pointer to a char array.

  • metadata_length – The size of the metadata in bytes.

Returns:

Return 0 on success or a negative value on failure.

int tsk_table_collection_set_metadata_schema(tsk_table_collection_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)#

Set the metadata schema.

Copies the metadata schema string to this table collection, replacing any existing.

Parameters:
  • self – A pointer to a tsk_table_collection_t object.

  • metadata_schema – A pointer to a char array.

  • metadata_schema_length – The size of the metadata schema in bytes.

Returns:

Return 0 on success or a negative value on failure.

bool tsk_table_collection_has_index(const tsk_table_collection_t *self, tsk_flags_t options)#

Returns true if this table collection is indexed.

This method returns true if the table collection has an index for the edge table. It guarantees that the index exists, and that it is for the same number of edges that are in the edge table. It does not guarantee that the index is valid (i.e., if the rows in the edge have been permuted in some way since the index was built).

See the Table indexes section for details on the index life-cycle.

Parameters:
  • self – A pointer to a tsk_table_collection_t object.

  • options – Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return true if there is an index present for this table collection.

int tsk_table_collection_drop_index(tsk_table_collection_t *self, tsk_flags_t options)#

Deletes the indexes for this table collection.

Unconditionally drop the indexes that may be present for this table collection. It is not an error to call this method on an unindexed table collection. See the Table indexes section for details on the index life-cycle.

Parameters:
  • self – A pointer to a tsk_table_collection_t object.

  • options – Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Always returns 0.

int tsk_table_collection_build_index(tsk_table_collection_t *self, tsk_flags_t options)#

Builds indexes for this table collection.

Builds the tree traversal indexes for this table collection. Any existing index is first dropped using tsk_table_collection_drop_index(). See the Table indexes section for details on the index life-cycle.

Parameters:
  • self – A pointer to a tsk_table_collection_t object.

  • options – Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

tsk_id_t tsk_table_collection_check_integrity(const tsk_table_collection_t *self, tsk_flags_t options)#

Runs integrity checks on this table collection.

Checks the integrity of this table collection. The default checks (i.e., with options = 0) guarantee the integrity of memory and entity references within the table collection. All positions along the genome are checked to see if they are finite values and within the required bounds. Time values are checked to see if they are finite or marked as unknown. Consistency of the direction of inheritance is also checked: whether parents are more recent than children, mutations are not more recent than their nodes or their mutation parents, etcetera.

To check if a set of tables fulfills the requirements needed for a valid tree sequence, use the TSK_CHECK_TREES option. When this method is called with TSK_CHECK_TREES, the number of trees in the tree sequence is returned. Thus, to check for errors client code should verify that the return value is less than zero. All other options will return zero on success and a negative value on failure.

More fine-grained checks can be achieved using bitwise combinations of the other options.

Options:

Options can be specified by providing one or more of the following bitwise flags:

Parameters:
Returns:

Return a negative error value on if any problems are detected in the tree sequence. If the TSK_CHECK_TREES option is provided, the number of trees in the tree sequence will be returned, on success.

Individuals#

struct tsk_individual_t#

A single individual defined by a row in the individual table.

See the data model section for the definition of an individual and its properties.

Public Members

tsk_id_t id#

Non-negative ID value corresponding to table row.

tsk_flags_t flags#

Bitwise flags.

const double *location#

Spatial location. The number of dimensions is defined by location_length.

tsk_size_t location_length#

Number of spatial dimensions.

tsk_id_t *parents#

IDs of the parents. The number of parents given by parents_length

tsk_size_t parents_length#

Number of parents.

const char *metadata#

Metadata.

tsk_size_t metadata_length#

Size of the metadata in bytes.

const tsk_id_t *nodes#

An array of the nodes associated with this individual.

tsk_size_t nodes_length#

The number of nodes associated with this individual.

struct tsk_individual_table_t#

The individual table.

See the individual table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows#

The number of rows in this table.

tsk_size_t location_length#

The total length of the location column.

tsk_size_t parents_length#

The total length of the parent column.

tsk_size_t metadata_length#

The total length of the metadata column.

tsk_flags_t *flags#

The flags column.

double *location#

The location column.

tsk_size_t *location_offset#

The location_offset column.

tsk_id_t *parents#

The parents column.

tsk_size_t *parents_offset#

The parents_offset column.

char *metadata#

The metadata column.

tsk_size_t *metadata_offset#

The metadata_offset column.

char *metadata_schema#

The metadata schema.

int tsk_individual_table_init(tsk_individual_table_t *self, tsk_flags_t options)#

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Parameters:
  • self – A pointer to an uninitialised tsk_individual_table_t object.

  • options – Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_individual_table_free(tsk_individual_table_t *self)#

Free the internal memory for the specified table.

Parameters:
Returns:

Always returns 0.

tsk_id_t tsk_individual_table_add_row(tsk_individual_table_t *self, tsk_flags_t flags, const double *location, tsk_size_t location_length, const tsk_id_t *parents, tsk_size_t parents_length, const char *metadata, tsk_size_t metadata_length)#

Adds a row to this individual table.

Add a new individual with the specified flags, location, parents and metadata to the table. Copies of the location, parents and metadata parameters are taken immediately. See the table definition for details of the columns in this table.

Parameters:
  • self – A pointer to a tsk_individual_table_t object.

  • flags – The bitwise flags for the new individual.

  • location – A pointer to a double array representing the spatial location of the new individual. Can be NULL if location_length is 0.

  • location_length – The number of dimensions in the locations position. Note this the number of elements in the corresponding double array not the number of bytes.

  • parents – A pointer to a tsk_id array representing the parents of the new individual. Can be NULL if parents_length is 0.

  • parents_length – The number of parents. Note this the number of elements in the corresponding tsk_id array not the number of bytes.

  • metadata – The metadata to be associated with the new individual. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns:

Return the ID of the newly added individual on success, or a negative value on failure.

int tsk_individual_table_update_row(tsk_individual_table_t *self, tsk_id_t index, tsk_flags_t flags, const double *location, tsk_size_t location_length, const tsk_id_t *parents, tsk_size_t parents_length, const char *metadata, tsk_size_t metadata_length)#

Updates the row at the specified index.

Rewrite the row at the specified index in this table to use the specified values. Copies of the location, parents and metadata parameters are taken immediately. See the table definition for details of the columns in this table.

Warning

Because of the way that ragged columns are encoded, this method requires a full rewrite of the internal column memory in worst case, and would therefore be inefficient for bulk updates for such columns. However, if the sizes of all ragged column values are unchanged in the updated row, this method is guaranteed to only update the memory for the row in question.

Parameters:
  • self – A pointer to a tsk_individual_table_t object.

  • index – The row to update.

  • flags – The bitwise flags for the individual.

  • location – A pointer to a double array representing the spatial location of the new individual. Can be NULL if location_length is 0.

  • location_length – The number of dimensions in the locations position. Note this the number of elements in the corresponding double array not the number of bytes.

  • parents – A pointer to a tsk_id array representing the parents of the new individual. Can be NULL if parents_length is 0.

  • parents_length – The number of parents. Note this the number of elements in the corresponding tsk_id array not the number of bytes.

  • metadata – The metadata to be associated with the new individual. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns:

Return 0 on success or a negative value on failure.

int tsk_individual_table_clear(tsk_individual_table_t *self)#

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_individual_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Parameters:
Returns:

Return 0 on success or a negative value on failure.

int tsk_individual_table_truncate(tsk_individual_table_t *self, tsk_size_t num_rows)#

Truncates this table so that only the first num_rows are retained.

Parameters:
  • self – A pointer to a tsk_individual_table_t object.

  • num_rows – The number of rows to retain in the table.

Returns:

Return 0 on success or a negative value on failure.

int tsk_individual_table_extend(tsk_individual_table_t *self, const tsk_individual_table_t *other, tsk_size_t num_rows, const tsk_id_t *row_indexes, tsk_flags_t options)#

Extends this table by appending rows copied from another table.

Appends the rows at the specified indexes from the table other to the end of this table. Row indexes can be repeated and in any order. If row_indexes is NULL, append the first num_rows from other to this table. Note that metadata is copied as-is and is not checked for compatibility with any existing schema on this table.

Parameters:
  • self – A pointer to a tsk_individual_table_t object where rows are to be added.

  • other – A pointer to a tsk_individual_table_t object where rows are copied from.

  • num_rows – The number of rows from other to append to this table.

  • row_indexes – Array of row indexes in other. If NULL is passed then the first num_rows of other are used.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_individual_table_keep_rows(tsk_individual_table_t *self, const tsk_bool_t *keep, tsk_flags_t options, tsk_id_t *id_map)#

Subset this table by keeping rows according to a boolean mask.

Deletes rows from this table and optionally return the mapping from IDs in the current table to the updated table. Rows are kept or deleted according to the specified boolean array keep such that for each row j if keep[j] is false (zero) the row is deleted, and otherwise the row is retained. Thus, keep must be an array of at least num_rows bool values.

If the id_map argument is non-null, this array will be updated to represent the mapping between IDs before and after row deletion. For row j, id_map[j] will contain the new ID for row j if it is retained, or TSK_NULL if the row has been removed. Thus, id_map must be an array of at least num_rows tsk_id_t values.

The values in the parents column are updated according to this map, so that reference integrity within the table is maintained. As a consequence of this, the values in the parents column for kept rows are bounds-checked and an error raised if they are not valid. Rows that are deleted are not checked for parent ID integrity.

If an attempt is made to delete rows that are referred to by the parents column of rows that are retained, an error is raised.

These error conditions are checked before any alterations to the table are made.

Warning

C++ users need to be careful to specify the correct type when passing in values for the keep array, using std::vector<tsk_bool_t> and not std::vector<bool>, as the latter may not be correct size.

Parameters:
  • self – A pointer to a tsk_individual_table_t object.

  • keep – Array of boolean flags describing whether a particular row should be kept or not. Must be at least num_rows long.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

  • id_map – An array in which to store the mapping between new and old IDs. If NULL, this will be ignored.

Returns:

Return 0 on success or a negative value on failure.

bool tsk_individual_table_equals(const tsk_individual_table_t *self, const tsk_individual_table_t *other, tsk_flags_t options)#

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

Parameters:
Returns:

Return true if the specified table is equal to this table.

int tsk_individual_table_copy(const tsk_individual_table_t *self, tsk_individual_table_t *dest, tsk_flags_t options)#

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Indexes that are present are also copied to the destination table.

Parameters:
  • self – A pointer to a tsk_individual_table_t object.

  • dest – A pointer to a tsk_individual_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised individual table. If not, it must be an uninitialised individual table.

  • options – Bitwise option flags.

Returns:

Return 0 on success or a negative value on failure.

int tsk_individual_table_get_row(const tsk_individual_table_t *self, tsk_id_t index, tsk_individual_t *row)#

Get the row at the specified index.

Updates the specified individual struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Parameters:
  • self – A pointer to a tsk_individual_table_t object.

  • index – The requested table row.

  • row – A pointer to a tsk_individual_t struct that is updated to reflect the values in the specified row.

Returns:

Return 0 on success or a negative value on failure.

int tsk_individual_table_set_metadata_schema(tsk_individual_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)#

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Parameters:
  • self – A pointer to a tsk_individual_table_t object.

  • metadata_schema – A pointer to a char array.

  • metadata_schema_length – The size of the metadata schema in bytes.

Returns:

Return 0 on success or a negative value on failure.

void tsk_individual_table_print_state(const tsk_individual_table_t *self, FILE *out)#

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters:
int tsk_individual_table_set_columns(tsk_individual_table_t *self, tsk_size_t num_rows, const tsk_flags_t *flags, const double *location, const tsk_size_t *location_offset, const tsk_id_t *parents, const tsk_size_t *parents_offset, const char *metadata, const tsk_size_t *metadata_offset)#

Replace this table’s data by copying from a set of column arrays.

Clears the data columns of this table and then copies column data from the specified set of arrays. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters:
  • self – A pointer to a tsk_individual_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • flags – The array of tsk_flag_t flag values to be copied.

  • location – The array of double location values to be copied.

  • location_offset – The array of tsk_size_t location offset values to be copied.

  • parents – The array of tsk_id_t parent values to be copied.

  • parents_offset – The array of tsk_size_t parent offset values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns:

Return 0 on success or a negative value on failure.

int tsk_individual_table_append_columns(tsk_individual_table_t *self, tsk_size_t num_rows, const tsk_flags_t *flags, const double *location, const tsk_size_t *location_offset, const tsk_id_t *parents, const tsk_size_t *parents_offset, const char *metadata, const tsk_size_t *metadata_offset)#

Extends this table by copying from a set of column arrays.

Copies column data from the specified set of arrays to create new rows at the end of the table. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters:
  • self – A pointer to a tsk_individual_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays

  • flags – The array of tsk_flag_t flag values to be copied.

  • location – The array of double location values to be copied.

  • location_offset – The array of tsk_size_t location offset values to be copied.

  • parents – The array of tsk_id_t parent values to be copied.

  • parents_offset – The array of tsk_size_t parent offset values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns:

Return 0 on success or a negative value on failure.

int tsk_individual_table_set_max_rows_increment(tsk_individual_table_t *self, tsk_size_t max_rows_increment)#

Controls the pre-allocation strategy for this table.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_individual_table_t object.

  • max_rows_increment – The number of rows to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

int tsk_individual_table_set_max_metadata_length_increment(tsk_individual_table_t *self, tsk_size_t max_metadata_length_increment)#

Controls the pre-allocation strategy for the metadata column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_individual_table_t object.

  • max_metadata_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

int tsk_individual_table_set_max_location_length_increment(tsk_individual_table_t *self, tsk_size_t max_location_length_increment)#

Controls the pre-allocation strategy for the location column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_individual_table_t object.

  • max_location_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

int tsk_individual_table_set_max_parents_length_increment(tsk_individual_table_t *self, tsk_size_t max_parents_length_increment)#

Controls the pre-allocation strategy for the parents column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_individual_table_t object.

  • max_parents_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

Nodes#

struct tsk_node_t#

A single node defined by a row in the node table.

See the data model section for the definition of a node and its properties.

Public Members

tsk_id_t id#

Non-negative ID value corresponding to table row.

tsk_flags_t flags#

Bitwise flags.

double time#

Time.

tsk_id_t population#

Population ID.

tsk_id_t individual#

Individual ID.

const char *metadata#

Metadata.

tsk_size_t metadata_length#

Size of the metadata in bytes.

struct tsk_node_table_t#

The node table.

See the node table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows#

The number of rows in this table.

tsk_size_t metadata_length#

The total length of the metadata column.

tsk_flags_t *flags#

The flags column.

double *time#

The time column.

tsk_id_t *population#

The population column.

tsk_id_t *individual#

The individual column.

char *metadata#

The metadata column.

tsk_size_t *metadata_offset#

The metadata_offset column.

char *metadata_schema#

The metadata schema.

int tsk_node_table_init(tsk_node_table_t *self, tsk_flags_t options)#

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Parameters:
  • self – A pointer to an uninitialised tsk_node_table_t object.

  • options – Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_node_table_free(tsk_node_table_t *self)#

Free the internal memory for the specified table.

Parameters:
Returns:

Always returns 0.

tsk_id_t tsk_node_table_add_row(tsk_node_table_t *self, tsk_flags_t flags, double time, tsk_id_t population, tsk_id_t individual, const char *metadata, tsk_size_t metadata_length)#

Adds a row to this node table.

Add a new node with the specified flags, time, population, individual and metadata to the table. A copy of the metadata parameter is taken immediately. See the table definition for details of the columns in this table.

Parameters:
  • self – A pointer to a tsk_node_table_t object.

  • flags – The bitwise flags for the new node.

  • time – The time for the new node.

  • population – The population for the new node. Set to TSK_NULL if not known.

  • individual – The individual for the new node. Set to TSK_NULL if not known.

  • metadata – The metadata to be associated with the new node. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns:

Return the ID of the newly added node on success, or a negative value on failure.

int tsk_node_table_update_row(tsk_node_table_t *self, tsk_id_t index, tsk_flags_t flags, double time, tsk_id_t population, tsk_id_t individual, const char *metadata, tsk_size_t metadata_length)#

Updates the row at the specified index.

Rewrite the row at the specified index in this table to use the specified values. A copy of the metadata parameter is taken immediately. See the table definition for details of the columns in this table.

Warning

Because of the way that ragged columns are encoded, this method requires a full rewrite of the internal column memory in worst case, and would therefore be inefficient for bulk updates for such columns. However, if the sizes of all ragged column values are unchanged in the updated row, this method is guaranteed to only update the memory for the row in question.

Parameters:
  • self – A pointer to a tsk_node_table_t object.

  • index – The row to update.

  • flags – The bitwise flags for the node.

  • time – The time for the node.

  • population – The population for the node. Set to TSK_NULL if not known.

  • individual – The individual for the node. Set to TSK_NULL if not known.

  • metadata – The metadata to be associated with the node. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns:

Return 0 on success or a negative value on failure.

int tsk_node_table_clear(tsk_node_table_t *self)#

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_node_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Parameters:
Returns:

Return 0 on success or a negative value on failure.

int tsk_node_table_truncate(tsk_node_table_t *self, tsk_size_t num_rows)#

Truncates this table so that only the first num_rows are retained.

Parameters:
  • self – A pointer to a tsk_node_table_t object.

  • num_rows – The number of rows to retain in the table.

Returns:

Return 0 on success or a negative value on failure.

int tsk_node_table_extend(tsk_node_table_t *self, const tsk_node_table_t *other, tsk_size_t num_rows, const tsk_id_t *row_indexes, tsk_flags_t options)#

Extends this table by appending rows copied from another table.

Appends the rows at the specified indexes from the table other to the end of this table. Row indexes can be repeated and in any order. If row_indexes is NULL, append the first num_rows from other to this table. Note that metadata is copied as-is and is not checked for compatibility with any existing schema on this table.

Parameters:
  • self – A pointer to a tsk_node_table_t object where rows are to be added.

  • other – A pointer to a tsk_node_table_t object where rows are copied from.

  • num_rows – The number of rows from other to append to this table.

  • row_indexes – Array of row indexes in other. If NULL is passed then the first num_rows of other are used.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_node_table_keep_rows(tsk_node_table_t *self, const tsk_bool_t *keep, tsk_flags_t options, tsk_id_t *id_map)#

Subset this table by keeping rows according to a boolean mask.

Deletes rows from this table and optionally return the mapping from IDs in the current table to the updated table. Rows are kept or deleted according to the specified boolean array keep such that for each row j if keep[j] is false (zero) the row is deleted, and otherwise the row is retained. Thus, keep must be an array of at least num_rows bool values.

If the id_map argument is non-null, this array will be updated to represent the mapping between IDs before and after row deletion. For row j, id_map[j] will contain the new ID for row j if it is retained, or TSK_NULL if the row has been removed. Thus, id_map must be an array of at least num_rows tsk_id_t values.

Warning

C++ users need to be careful to specify the correct type when passing in values for the keep array, using std::vector<tsk_bool_t> and not std::vector<bool>, as the latter may not be correct size.

Parameters:
  • self – A pointer to a tsk_node_table_t object.

  • keep – Array of boolean flags describing whether a particular row should be kept or not. Must be at least num_rows long.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

  • id_map – An array in which to store the mapping between new and old IDs. If NULL, this will be ignored.

Returns:

Return 0 on success or a negative value on failure.

bool tsk_node_table_equals(const tsk_node_table_t *self, const tsk_node_table_t *other, tsk_flags_t options)#

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

Parameters:
Returns:

Return true if the specified table is equal to this table.

int tsk_node_table_copy(const tsk_node_table_t *self, tsk_node_table_t *dest, tsk_flags_t options)#

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Parameters:
  • self – A pointer to a tsk_node_table_t object.

  • dest – A pointer to a tsk_node_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised node table. If not, it must be an uninitialised node table.

  • options – Bitwise option flags.

Returns:

Return 0 on success or a negative value on failure.

int tsk_node_table_get_row(const tsk_node_table_t *self, tsk_id_t index, tsk_node_t *row)#

Get the row at the specified index.

Updates the specified node struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Parameters:
  • self – A pointer to a tsk_node_table_t object.

  • index – The requested table row.

  • row – A pointer to a tsk_node_t struct that is updated to reflect the values in the specified row.

Returns:

Return 0 on success or a negative value on failure.

int tsk_node_table_set_metadata_schema(tsk_node_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)#

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Parameters:
  • self – A pointer to a tsk_node_table_t object.

  • metadata_schema – A pointer to a char array.

  • metadata_schema_length – The size of the metadata schema in bytes.

Returns:

Return 0 on success or a negative value on failure.

void tsk_node_table_print_state(const tsk_node_table_t *self, FILE *out)#

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters:
  • self – A pointer to a tsk_node_table_t object.

  • out – The stream to write the summary to.

int tsk_node_table_set_columns(tsk_node_table_t *self, tsk_size_t num_rows, const tsk_flags_t *flags, const double *time, const tsk_id_t *population, const tsk_id_t *individual, const char *metadata, const tsk_size_t *metadata_offset)#

Replace this table’s data by copying from a set of column arrays.

Clears the data columns of this table and then copies column data from the specified set of arrays. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters:
  • self – A pointer to a tsk_node_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • flags – The array of tsk_flag_t values to be copied.

  • time – The array of double time values to be copied.

  • population – The array of tsk_id_t population values to be copied.

  • individual – The array of tsk_id_t individual values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns:

Return 0 on success or a negative value on failure.

int tsk_node_table_append_columns(tsk_node_table_t *self, tsk_size_t num_rows, const tsk_flags_t *flags, const double *time, const tsk_id_t *population, const tsk_id_t *individual, const char *metadata, const tsk_size_t *metadata_offset)#

Extends this table by copying from a set of column arrays.

Copies column data from the specified set of arrays to create new rows at the end of the table. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters:
  • self – A pointer to a tsk_node_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays

  • flags – The array of tsk_flag_t values to be copied.

  • time – The array of double time values to be copied.

  • population – The array of tsk_id_t population values to be copied.

  • individual – The array of tsk_id_t individual values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns:

Return 0 on success or a negative value on failure.

int tsk_node_table_set_max_rows_increment(tsk_node_table_t *self, tsk_size_t max_rows_increment)#

Controls the pre-allocation strategy for this table.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_node_table_t object.

  • max_rows_increment – The number of rows to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

int tsk_node_table_set_max_metadata_length_increment(tsk_node_table_t *self, tsk_size_t max_metadata_length_increment)#

Controls the pre-allocation strategy for the metadata column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_node_table_t object.

  • max_metadata_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

Edges#

struct tsk_edge_t#

A single edge defined by a row in the edge table.

See the data model section for the definition of an edge and its properties.

Public Members

tsk_id_t id#

Non-negative ID value corresponding to table row.

tsk_id_t parent#

Parent node ID.

tsk_id_t child#

Child node ID.

double left#

Left coordinate.

double right#

Right coordinate.

const char *metadata#

Metadata.

tsk_size_t metadata_length#

Size of the metadata in bytes.

struct tsk_edge_table_t#

The edge table.

See the edge table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows#

The number of rows in this table.

tsk_size_t metadata_length#

The total length of the metadata column.

double *left#

The left column.

double *right#

The right column.

tsk_id_t *parent#

The parent column.

tsk_id_t *child#

The child column.

char *metadata#

The metadata column.

tsk_size_t *metadata_offset#

The metadata_offset column.

char *metadata_schema#

The metadata schema.

tsk_flags_t options#

Flags for this table.

int tsk_edge_table_init(tsk_edge_table_t *self, tsk_flags_t options)#

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Options

Options can be specified by providing one or more of the following bitwise flags:

Parameters:
  • self – A pointer to an uninitialised tsk_edge_table_t object.

  • options – Allocation time options.

Returns:

Return 0 on success or a negative value on failure.

int tsk_edge_table_free(tsk_edge_table_t *self)#

Free the internal memory for the specified table.

Parameters:
Returns:

Always returns 0.

tsk_id_t tsk_edge_table_add_row(tsk_edge_table_t *self, double left, double right, tsk_id_t parent, tsk_id_t child, const char *metadata, tsk_size_t metadata_length)#

Adds a row to this edge table.

Add a new edge with the specified left, right, parent, child and metadata to the table. See the table definition for details of the columns in this table.

Parameters:
  • self – A pointer to a tsk_edge_table_t object.

  • left – The left coordinate for the new edge.

  • right – The right coordinate for the new edge.

  • parent – The parent node for the new edge.

  • child – The child node for the new edge.

  • metadata – The metadata to be associated with the new edge. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns:

Return the ID of the newly added edge on success, or a negative value on failure.

int tsk_edge_table_update_row(tsk_edge_table_t *self, tsk_id_t index, double left, double right, tsk_id_t parent, tsk_id_t child, const char *metadata, tsk_size_t metadata_length)#

Updates the row at the specified index.

Rewrite the row at the specified index in this table to use the specified values. A copy of the metadata parameter is taken immediately. See the table definition for details of the columns in this table.

Warning

Because of the way that ragged columns are encoded, this method requires a full rewrite of the internal column memory in worst case, and would therefore be inefficient for bulk updates for such columns. However, if the sizes of all ragged column values are unchanged in the updated row, this method is guaranteed to only update the memory for the row in question.

Parameters:
  • self – A pointer to a tsk_edge_table_t object.

  • index – The row to update.

  • left – The left coordinate for the edge.

  • right – The right coordinate for the edge.

  • parent – The parent node for the edge.

  • child – The child node for the edge.

  • metadata – The metadata to be associated with the edge. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns:

Return 0 on success or a negative value on failure.

int tsk_edge_table_clear(tsk_edge_table_t *self)#

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_edge_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Parameters:
Returns:

Return 0 on success or a negative value on failure.

int tsk_edge_table_truncate(tsk_edge_table_t *self, tsk_size_t num_rows)#

Truncates this table so that only the first num_rows are retained.

Parameters:
  • self – A pointer to a tsk_edge_table_t object.

  • num_rows – The number of rows to retain in the table.

Returns:

Return 0 on success or a negative value on failure.

int tsk_edge_table_extend(tsk_edge_table_t *self, const tsk_edge_table_t *other, tsk_size_t num_rows, const tsk_id_t *row_indexes, tsk_flags_t options)#

Extends this table by appending rows copied from another table.

Appends the rows at the specified indexes from the table other to the end of this table. Row indexes can be repeated and in any order. If row_indexes is NULL, append the first num_rows from other to this table. Note that metadata is copied as-is and is not checked for compatibility with any existing schema on this table.

Parameters:
  • self – A pointer to a tsk_edge_table_t object where rows are to be added.

  • other – A pointer to a tsk_edge_table_t object where rows are copied from.

  • num_rows – The number of rows from other to append to this table.

  • row_indexes – Array of row indexes in other. If NULL is passed then the first num_rows of other are used.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_edge_table_keep_rows(tsk_edge_table_t *self, const tsk_bool_t *keep, tsk_flags_t options, tsk_id_t *id_map)#

Subset this table by keeping rows according to a boolean mask.

Deletes rows from this table and optionally return the mapping from IDs in the current table to the updated table. Rows are kept or deleted according to the specified boolean array keep such that for each row j if keep[j] is false (zero) the row is deleted, and otherwise the row is retained. Thus, keep must be an array of at least num_rows bool values.

If the id_map argument is non-null, this array will be updated to represent the mapping between IDs before and after row deletion. For row j, id_map[j] will contain the new ID for row j if it is retained, or TSK_NULL if the row has been removed. Thus, id_map must be an array of at least num_rows tsk_id_t values.

Warning

C++ users need to be careful to specify the correct type when passing in values for the keep array, using std::vector<tsk_bool_t> and not std::vector<bool>, as the latter may not be correct size.

Parameters:
  • self – A pointer to a tsk_edge_table_t object.

  • keep – Array of boolean flags describing whether a particular row should be kept or not. Must be at least num_rows long.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

  • id_map – An array in which to store the mapping between new and old IDs. If NULL, this will be ignored.

Returns:

Return 0 on success or a negative value on failure.

bool tsk_edge_table_equals(const tsk_edge_table_t *self, const tsk_edge_table_t *other, tsk_flags_t options)#

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

Parameters:
Returns:

Return true if the specified table is equal to this table.

int tsk_edge_table_copy(const tsk_edge_table_t *self, tsk_edge_table_t *dest, tsk_flags_t options)#

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Parameters:
  • self – A pointer to a tsk_edge_table_t object.

  • dest – A pointer to a tsk_edge_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised edge table. If not, it must be an uninitialised edge table.

  • options – Bitwise option flags.

Returns:

Return 0 on success or a negative value on failure.

int tsk_edge_table_get_row(const tsk_edge_table_t *self, tsk_id_t index, tsk_edge_t *row)#

Get the row at the specified index.

Updates the specified edge struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Parameters:
  • self – A pointer to a tsk_edge_table_t object.

  • index – The requested table row.

  • row – A pointer to a tsk_edge_t struct that is updated to reflect the values in the specified row.

Returns:

Return 0 on success or a negative value on failure.

int tsk_edge_table_set_metadata_schema(tsk_edge_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)#

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Parameters:
  • self – A pointer to a tsk_edge_table_t object.

  • metadata_schema – A pointer to a char array

  • metadata_schema_length – The size of the metadata schema in bytes.

Returns:

Return 0 on success or a negative value on failure.

void tsk_edge_table_print_state(const tsk_edge_table_t *self, FILE *out)#

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters:
  • self – A pointer to a tsk_edge_table_t object.

  • out – The stream to write the summary to.

int tsk_edge_table_set_columns(tsk_edge_table_t *self, tsk_size_t num_rows, const double *left, const double *right, const tsk_id_t *parent, const tsk_id_t *child, const char *metadata, const tsk_size_t *metadata_offset)#

Replace this table’s data by copying from a set of column arrays.

Clears the data columns of this table and then copies column data from the specified set of arrays. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters:
  • self – A pointer to a tsk_edge_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • left – The array of double left values to be copied.

  • right – The array of double right values to be copied.

  • parent – The array of tsk_id_t parent values to be copied.

  • child – The array of tsk_id_t child values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns:

Return 0 on success or a negative value on failure.

int tsk_edge_table_append_columns(tsk_edge_table_t *self, tsk_size_t num_rows, const double *left, const double *right, const tsk_id_t *parent, const tsk_id_t *child, const char *metadata, const tsk_size_t *metadata_offset)#

Extends this table by copying from a set of column arrays.

Copies column data from the specified set of arrays to create new rows at the end of the table. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters:
  • self – A pointer to a tsk_edge_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • left – The array of double left values to be copied.

  • right – The array of double right values to be copied.

  • parent – The array of tsk_id_t parent values to be copied.

  • child – The array of tsk_id_t child values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

int tsk_edge_table_set_max_rows_increment(tsk_edge_table_t *self, tsk_size_t max_rows_increment)#

Controls the pre-allocation strategy for this table.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_edge_table_t object.

  • max_rows_increment – The number of rows to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

int tsk_edge_table_set_max_metadata_length_increment(tsk_edge_table_t *self, tsk_size_t max_metadata_length_increment)#

Controls the pre-allocation strategy for the metadata column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_edge_table_t object.

  • max_metadata_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

int tsk_edge_table_squash(tsk_edge_table_t *self)#

Squash adjacent edges in-place.

Sorts, then condenses the table into the smallest possible number of rows by combining any adjacent edges. A pair of edges is said to be adjacent if they have the same parent and child nodes, and if the left coordinate of one of the edges is equal to the right coordinate of the other edge. This process is performed in-place so that any set of adjacent edges is replaced by a single edge. The new edge will have the same parent and child node, a left coordinate equal to the smallest left coordinate in the set, and a right coordinate equal to the largest right coordinate in the set. The new edge table will be sorted in the canonical order (P, C, L, R).

Note

Note that this method will fail if any edges have non-empty metadata.

Parameters:
Returns:

Return 0 on success or a negative value on failure.

Migrations#

struct tsk_migration_t#

A single migration defined by a row in the migration table.

See the data model section for the definition of a migration and its properties.

Public Members

tsk_id_t id#

Non-negative ID value corresponding to table row.

tsk_id_t source#

Source population ID.

tsk_id_t dest#

Destination population ID.

tsk_id_t node#

Node ID.

double left#

Left coordinate.

double right#

Right coordinate.

double time#

Time.

const char *metadata#

Metadata.

tsk_size_t metadata_length#

Size of the metadata in bytes.

struct tsk_migration_table_t#

The migration table.

See the migration table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows#

The number of rows in this table.

tsk_size_t metadata_length#

The total length of the metadata column.

tsk_id_t *source#

The source column.

tsk_id_t *dest#

The dest column.

tsk_id_t *node#

The node column.

double *left#

The left column.

double *right#

The right column.

double *time#

The time column.

char *metadata#

The metadata column.

tsk_size_t *metadata_offset#

The metadata_offset column.

char *metadata_schema#

The metadata schema.

int tsk_migration_table_init(tsk_migration_table_t *self, tsk_flags_t options)#

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Parameters:
  • self – A pointer to an uninitialised tsk_migration_table_t object.

  • options – Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_migration_table_free(tsk_migration_table_t *self)#

Free the internal memory for the specified table.

Parameters:
Returns:

Always returns 0.

tsk_id_t tsk_migration_table_add_row(tsk_migration_table_t *self, double left, double right, tsk_id_t node, tsk_id_t source, tsk_id_t dest, double time, const char *metadata, tsk_size_t metadata_length)#

Adds a row to this migration table.

Add a new migration with the specified left, right, node, source, dest, time and metadata to the table. See the table definition for details of the columns in this table.

Parameters:
  • self – A pointer to a tsk_migration_table_t object.

  • left – The left coordinate for the new migration.

  • right – The right coordinate for the new migration.

  • node – The node ID for the new migration.

  • source – The source population ID for the new migration.

  • dest – The destination population ID for the new migration.

  • time – The time for the new migration.

  • metadata – The metadata to be associated with the new migration. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns:

Return the ID of the newly added migration on success, or a negative value on failure.

int tsk_migration_table_update_row(tsk_migration_table_t *self, tsk_id_t index, double left, double right, tsk_id_t node, tsk_id_t source, tsk_id_t dest, double time, const char *metadata, tsk_size_t metadata_length)#

Updates the row at the specified index.

Rewrite the row at the specified index in this table to use the specified values. A copy of the metadata parameter is taken immediately. See the table definition for details of the columns in this table.

Warning

Because of the way that ragged columns are encoded, this method requires a full rewrite of the internal column memory in worst case, and would therefore be inefficient for bulk updates for such columns. However, if the sizes of all ragged column values are unchanged in the updated row, this method is guaranteed to only update the memory for the row in question.

Parameters:
  • self – A pointer to a tsk_migration_table_t object.

  • index – The row to update.

  • left – The left coordinate for the migration.

  • right – The right coordinate for the migration.

  • node – The node ID for the migration.

  • source – The source population ID for the migration.

  • dest – The destination population ID for the migration.

  • time – The time for the migration.

  • metadata – The metadata to be associated with the migration. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns:

Return 0 on success or a negative value on failure.

int tsk_migration_table_clear(tsk_migration_table_t *self)#

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_migration_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Parameters:
Returns:

Return 0 on success or a negative value on failure.

int tsk_migration_table_truncate(tsk_migration_table_t *self, tsk_size_t num_rows)#

Truncates this table so that only the first num_rows are retained.

Parameters:
  • self – A pointer to a tsk_migration_table_t object.

  • num_rows – The number of rows to retain in the table.

Returns:

Return 0 on success or a negative value on failure.

int tsk_migration_table_extend(tsk_migration_table_t *self, const tsk_migration_table_t *other, tsk_size_t num_rows, const tsk_id_t *row_indexes, tsk_flags_t options)#

Extends this table by appending rows copied from another table.

Appends the rows at the specified indexes from the table other to the end of this table. Row indexes can be repeated and in any order. If row_indexes is NULL, append the first num_rows from other to this table. Note that metadata is copied as-is and is not checked for compatibility with any existing schema on this table.

Parameters:
  • self – A pointer to a tsk_migration_table_t object where rows are to be added.

  • other – A pointer to a tsk_migration_table_t object where rows are copied from.

  • num_rows – The number of rows from other to append to this table.

  • row_indexes – Array of row indexes in other. If NULL is passed then the first num_rows of other are used.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_migration_table_keep_rows(tsk_migration_table_t *self, const tsk_bool_t *keep, tsk_flags_t options, tsk_id_t *id_map)#

Subset this table by keeping rows according to a boolean mask.

Deletes rows from this table and optionally return the mapping from IDs in the current table to the updated table. Rows are kept or deleted according to the specified boolean array keep such that for each row j if keep[j] is false (zero) the row is deleted, and otherwise the row is retained. Thus, keep must be an array of at least num_rows bool values.

If the id_map argument is non-null, this array will be updated to represent the mapping between IDs before and after row deletion. For row j, id_map[j] will contain the new ID for row j if it is retained, or TSK_NULL if the row has been removed. Thus, id_map must be an array of at least num_rows tsk_id_t values.

Warning

C++ users need to be careful to specify the correct type when passing in values for the keep array, using std::vector<tsk_bool_t> and not std::vector<bool>, as the latter may not be correct size.

Parameters:
  • self – A pointer to a tsk_migration_table_t object.

  • keep – Array of boolean flags describing whether a particular row should be kept or not. Must be at least num_rows long.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

  • id_map – An array in which to store the mapping between new and old IDs. If NULL, this will be ignored.

Returns:

Return 0 on success or a negative value on failure.

bool tsk_migration_table_equals(const tsk_migration_table_t *self, const tsk_migration_table_t *other, tsk_flags_t options)#

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

Parameters:
Returns:

Return true if the specified table is equal to this table.

int tsk_migration_table_copy(const tsk_migration_table_t *self, tsk_migration_table_t *dest, tsk_flags_t options)#

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Parameters:
  • self – A pointer to a tsk_migration_table_t object.

  • dest – A pointer to a tsk_migration_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised migration table. If not, it must be an uninitialised migration table.

  • options – Bitwise option flags.

Returns:

Return 0 on success or a negative value on failure.

int tsk_migration_table_get_row(const tsk_migration_table_t *self, tsk_id_t index, tsk_migration_t *row)#

Get the row at the specified index.

Updates the specified migration struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Parameters:
  • self – A pointer to a tsk_migration_table_t object.

  • index – The requested table row.

  • row – A pointer to a tsk_migration_t struct that is updated to reflect the values in the specified row.

Returns:

Return 0 on success or a negative value on failure.

int tsk_migration_table_set_metadata_schema(tsk_migration_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)#

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Parameters:
  • self – A pointer to a tsk_migration_table_t object.

  • metadata_schema – A pointer to a char array.

  • metadata_schema_length – The size of the metadata schema in bytes.

Returns:

Return 0 on success or a negative value on failure.

void tsk_migration_table_print_state(const tsk_migration_table_t *self, FILE *out)#

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters:
int tsk_migration_table_set_columns(tsk_migration_table_t *self, tsk_size_t num_rows, const double *left, const double *right, const tsk_id_t *node, const tsk_id_t *source, const tsk_id_t *dest, const double *time, const char *metadata, const tsk_size_t *metadata_offset)#

Replace this table’s data by copying from a set of column arrays.

Clears the data columns of this table and then copies column data from the specified set of arrays. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters:
  • self – A pointer to a tsk_migration_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • left – The array of double left values to be copied.

  • right – The array of double right values to be copied.

  • node – The array of tsk_id_t node values to be copied.

  • source – The array of tsk_id_t source values to be copied.

  • dest – The array of tsk_id_t dest values to be copied.

  • time – The array of double time values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns:

Return 0 on success or a negative value on failure.

int tsk_migration_table_append_columns(tsk_migration_table_t *self, tsk_size_t num_rows, const double *left, const double *right, const tsk_id_t *node, const tsk_id_t *source, const tsk_id_t *dest, const double *time, const char *metadata, const tsk_size_t *metadata_offset)#

Extends this table by copying from a set of column arrays.

Copies column data from the specified set of arrays to create new rows at the end of the table. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters:
  • self – A pointer to a tsk_migration_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays

  • left – The array of double left values to be copied.

  • right – The array of double right values to be copied.

  • node – The array of tsk_id_t node values to be copied.

  • source – The array of tsk_id_t source values to be copied.

  • dest – The array of tsk_id_t dest values to be copied.

  • time – The array of double time values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns:

Return 0 on success or a negative value on failure.

int tsk_migration_table_set_max_rows_increment(tsk_migration_table_t *self, tsk_size_t max_rows_increment)#

Controls the pre-allocation strategy for this table.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_migration_table_t object.

  • max_rows_increment – The number of rows to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

int tsk_migration_table_set_max_metadata_length_increment(tsk_migration_table_t *self, tsk_size_t max_metadata_length_increment)#

Controls the pre-allocation strategy for the metadata column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_migration_table_t object.

  • max_metadata_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

Sites#

struct tsk_site_t#

A single site defined by a row in the site table.

See the data model section for the definition of a site and its properties.

Public Members

tsk_id_t id#

Non-negative ID value corresponding to table row.

double position#

Position coordinate.

const char *ancestral_state#

Ancestral state.

tsk_size_t ancestral_state_length#

Ancestral state length in bytes.

const char *metadata#

Metadata.

tsk_size_t metadata_length#

Metadata length in bytes.

const tsk_mutation_t *mutations#

An array of this site’s mutations.

tsk_size_t mutations_length#

The number of mutations at this site.

struct tsk_site_table_t#

The site table.

See the site table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows#

The number of rows in this table.

tsk_size_t metadata_length#

The total length of the metadata column.

double *position#

The position column.

char *ancestral_state#

The ancestral_state column.

tsk_size_t *ancestral_state_offset#

The ancestral_state_offset column.

char *metadata#

The metadata column.

tsk_size_t *metadata_offset#

The metadata_offset column.

char *metadata_schema#

The metadata schema.

int tsk_site_table_init(tsk_site_table_t *self, tsk_flags_t options)#

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Parameters:
  • self – A pointer to an uninitialised tsk_site_table_t object.

  • options – Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_site_table_free(tsk_site_table_t *self)#

Free the internal memory for the specified table.

Parameters:
Returns:

Always returns 0.

tsk_id_t tsk_site_table_add_row(tsk_site_table_t *self, double position, const char *ancestral_state, tsk_size_t ancestral_state_length, const char *metadata, tsk_size_t metadata_length)#

Adds a row to this site table.

Add a new site with the specified position, ancestral_state and metadata to the table. Copies of ancestral_state and metadata are immediately taken. See the table definition for details of the columns in this table.

Parameters:
  • self – A pointer to a tsk_site_table_t object.

  • position – The position coordinate for the new site.

  • ancestral_state – The ancestral_state for the new site.

  • ancestral_state_length – The length of the ancestral_state in bytes.

  • metadata – The metadata to be associated with the new site. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns:

Return the ID of the newly added site on success, or a negative value on failure.

int tsk_site_table_update_row(tsk_site_table_t *self, tsk_id_t index, double position, const char *ancestral_state, tsk_size_t ancestral_state_length, const char *metadata, tsk_size_t metadata_length)#

Updates the row at the specified index.

Rewrite the row at the specified index in this table to use the specified values. Copies of the ancestral_state and metadata parameters are taken immediately. See the table definition for details of the columns in this table.

Warning

Because of the way that ragged columns are encoded, this method requires a full rewrite of the internal column memory in worst case, and would therefore be inefficient for bulk updates for such columns. However, if the sizes of all ragged column values are unchanged in the updated row, this method is guaranteed to only update the memory for the row in question.

Parameters:
  • self – A pointer to a tsk_site_table_t object.

  • index – The row to update.

  • position – The position coordinate for the site.

  • ancestral_state – The ancestral_state for the site.

  • ancestral_state_length – The length of the ancestral_state in bytes.

  • metadata – The metadata to be associated with the site. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns:

Return 0 on success or a negative value on failure.

int tsk_site_table_clear(tsk_site_table_t *self)#

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_site_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Parameters:
Returns:

Return 0 on success or a negative value on failure.

int tsk_site_table_truncate(tsk_site_table_t *self, tsk_size_t num_rows)#

Truncates this table so that only the first num_rows are retained.

Parameters:
  • self – A pointer to a tsk_site_table_t object.

  • num_rows – The number of rows to retain in the table.

Returns:

Return 0 on success or a negative value on failure.

int tsk_site_table_extend(tsk_site_table_t *self, const tsk_site_table_t *other, tsk_size_t num_rows, const tsk_id_t *row_indexes, tsk_flags_t options)#

Extends this table by appending rows copied from another table.

Appends the rows at the specified indexes from the table other to the end of this table. Row indexes can be repeated and in any order. If row_indexes is NULL, append the first num_rows from other to this table. Note that metadata is copied as-is and is not checked for compatibility with any existing schema on this table.

Parameters:
  • self – A pointer to a tsk_site_table_t object where rows are to be added.

  • other – A pointer to a tsk_site_table_t object where rows are copied from.

  • num_rows – The number of rows from other to append to this table.

  • row_indexes – Array of row indexes in other. If NULL is passed then the first num_rows of other are used.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_site_table_keep_rows(tsk_site_table_t *self, const tsk_bool_t *keep, tsk_flags_t options, tsk_id_t *id_map)#

Subset this table by keeping rows according to a boolean mask.

Deletes rows from this table and optionally return the mapping from IDs in the current table to the updated table. Rows are kept or deleted according to the specified boolean array keep such that for each row j if keep[j] is false (zero) the row is deleted, and otherwise the row is retained. Thus, keep must be an array of at least num_rows bool values.

If the id_map argument is non-null, this array will be updated to represent the mapping between IDs before and after row deletion. For row j, id_map[j] will contain the new ID for row j if it is retained, or TSK_NULL if the row has been removed. Thus, id_map must be an array of at least num_rows tsk_id_t values.

Warning

C++ users need to be careful to specify the correct type when passing in values for the keep array, using std::vector<tsk_bool_t> and not std::vector<bool>, as the latter may not be correct size.

Parameters:
  • self – A pointer to a tsk_site_table_t object.

  • keep – Array of boolean flags describing whether a particular row should be kept or not. Must be at least num_rows long.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

  • id_map – An array in which to store the mapping between new and old IDs. If NULL, this will be ignored.

Returns:

Return 0 on success or a negative value on failure.

bool tsk_site_table_equals(const tsk_site_table_t *self, const tsk_site_table_t *other, tsk_flags_t options)#

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

Parameters:
Returns:

Return true if the specified table is equal to this table.

int tsk_site_table_copy(const tsk_site_table_t *self, tsk_site_table_t *dest, tsk_flags_t options)#

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Parameters:
  • self – A pointer to a tsk_site_table_t object.

  • dest – A pointer to a tsk_site_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised site table. If not, it must be an uninitialised site table.

  • options – Bitwise option flags.

Returns:

Return 0 on success or a negative value on failure.

int tsk_site_table_get_row(const tsk_site_table_t *self, tsk_id_t index, tsk_site_t *row)#

Get the row at the specified index.

Updates the specified site struct to reflect the values in the specified row.

This function always sets the mutations and mutations_length fields in the parameter tsk_site_t to NULL and 0 respectively. To get access to the mutations for a particular site, please use the tree sequence method, tsk_treeseq_get_site().

Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Parameters:
  • self – A pointer to a tsk_site_table_t object.

  • index – The requested table row.

  • row – A pointer to a tsk_site_t struct that is updated to reflect the values in the specified row.

Returns:

Return 0 on success or a negative value on failure.

int tsk_site_table_set_metadata_schema(tsk_site_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)#

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Parameters:
  • self – A pointer to a tsk_site_table_t object.

  • metadata_schema – A pointer to a char array.

  • metadata_schema_length – The size of the metadata schema in bytes.

Returns:

Return 0 on success or a negative value on failure.

void tsk_site_table_print_state(const tsk_site_table_t *self, FILE *out)#

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters:
  • self – A pointer to a tsk_site_table_t object.

  • out – The stream to write the summary to.

int tsk_site_table_set_columns(tsk_site_table_t *self, tsk_size_t num_rows, const double *position, const char *ancestral_state, const tsk_size_t *ancestral_state_offset, const char *metadata, const tsk_size_t *metadata_offset)#

Replace this table’s data by copying from a set of column arrays.

Clears the data columns of this table and then copies column data from the specified set of arrays. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters:
  • self – A pointer to a tsk_site_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • position – The array of double position values to be copied.

  • ancestral_state – The array of char ancestral state values to be copied.

  • ancestral_state_offset – The array of tsk_size_t ancestral state offset values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns:

Return 0 on success or a negative value on failure.

int tsk_site_table_append_columns(tsk_site_table_t *self, tsk_size_t num_rows, const double *position, const char *ancestral_state, const tsk_size_t *ancestral_state_offset, const char *metadata, const tsk_size_t *metadata_offset)#

Extends this table by copying from a set of column arrays.

Copies column data from the specified set of arrays to create new rows at the end of the table. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters:
  • self – A pointer to a tsk_site_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • position – The array of double position values to be copied.

  • ancestral_state – The array of char ancestral state values to be copied.

  • ancestral_state_offset – The array of tsk_size_t ancestral state offset values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns:

Return 0 on success or a negative value on failure.

int tsk_site_table_set_max_rows_increment(tsk_site_table_t *self, tsk_size_t max_rows_increment)#

Controls the pre-allocation strategy for this table.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_site_table_t object.

  • max_rows_increment – The number of rows to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

int tsk_site_table_set_max_metadata_length_increment(tsk_site_table_t *self, tsk_size_t max_metadata_length_increment)#

Controls the pre-allocation strategy for the metadata column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_site_table_t object.

  • max_metadata_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

int tsk_site_table_set_max_ancestral_state_length_increment(tsk_site_table_t *self, tsk_size_t max_ancestral_state_length_increment)#

Controls the pre-allocation strategy for the ancestral_state column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_site_table_t object.

  • max_ancestral_state_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

Mutations#

struct tsk_mutation_t#

A single mutation defined by a row in the mutation table.

See the data model section for the definition of a mutation and its properties.

Public Members

tsk_id_t id#

Non-negative ID value corresponding to table row.

tsk_id_t site#

Site ID.

tsk_id_t node#

Node ID.

tsk_id_t parent#

Parent mutation ID.

double time#

Mutation time.

const char *derived_state#

Derived state.

tsk_size_t derived_state_length#

Size of the derived state in bytes.

const char *metadata#

Metadata.

tsk_size_t metadata_length#

Size of the metadata in bytes.

tsk_id_t edge#

The ID of the edge that this mutation lies on, or TSK_NULL if there is no corresponding edge.

struct tsk_mutation_table_t#

The mutation table.

See the mutation table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows#

The number of rows in this table.

tsk_size_t metadata_length#

The total length of the metadata column.

tsk_id_t *node#

The node column.

tsk_id_t *site#

The site column.

tsk_id_t *parent#

The parent column.

double *time#

The time column.

char *derived_state#

The derived_state column.

tsk_size_t *derived_state_offset#

The derived_state_offset column.

char *metadata#

The metadata column.

tsk_size_t *metadata_offset#

The metadata_offset column.

char *metadata_schema#

The metadata schema.

int tsk_mutation_table_init(tsk_mutation_table_t *self, tsk_flags_t options)#

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Parameters:
  • self – A pointer to an uninitialised tsk_mutation_table_t object.

  • options – Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_mutation_table_free(tsk_mutation_table_t *self)#

Free the internal memory for the specified table.

Parameters:
Returns:

Always returns 0.

tsk_id_t tsk_mutation_table_add_row(tsk_mutation_table_t *self, tsk_id_t site, tsk_id_t node, tsk_id_t parent, double time, const char *derived_state, tsk_size_t derived_state_length, const char *metadata, tsk_size_t metadata_length)#

Adds a row to this mutation table.

Add a new mutation with the specified site, parent, derived_state and metadata to the table. Copies of derived_state and metadata are immediately taken. See the table definition for details of the columns in this table.

Parameters:
  • self – A pointer to a tsk_mutation_table_t object.

  • site – The site ID for the new mutation.

  • node – The ID of the node this mutation occurs over.

  • parent – The ID of the parent mutation.

  • time – The time of the mutation.

  • derived_state – The derived_state for the new mutation.

  • derived_state_length – The length of the derived_state in bytes.

  • metadata – The metadata to be associated with the new mutation. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns:

Return the ID of the newly added mutation on success, or a negative value on failure.

int tsk_mutation_table_update_row(tsk_mutation_table_t *self, tsk_id_t index, tsk_id_t site, tsk_id_t node, tsk_id_t parent, double time, const char *derived_state, tsk_size_t derived_state_length, const char *metadata, tsk_size_t metadata_length)#

Updates the row at the specified index.

Rewrite the row at the specified index in this table to use the specified values. Copies of the derived_state and metadata parameters are taken immediately. See the table definition for details of the columns in this table.

Warning

Because of the way that ragged columns are encoded, this method requires a full rewrite of the internal column memory in worst case, and would therefore be inefficient for bulk updates for such columns. However, if the sizes of all ragged column values are unchanged in the updated row, this method is guaranteed to only update the memory for the row in question.

Parameters:
  • self – A pointer to a tsk_mutation_table_t object.

  • index – The row to update.

  • site – The site ID for the mutation.

  • node – The ID of the node this mutation occurs over.

  • parent – The ID of the parent mutation.

  • time – The time of the mutation.

  • derived_state – The derived_state for the mutation.

  • derived_state_length – The length of the derived_state in bytes.

  • metadata – The metadata to be associated with the mutation. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns:

Return 0 on success or a negative value on failure.

int tsk_mutation_table_clear(tsk_mutation_table_t *self)#

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_mutation_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Parameters:
Returns:

Return 0 on success or a negative value on failure.

int tsk_mutation_table_truncate(tsk_mutation_table_t *self, tsk_size_t num_rows)#

Truncates this table so that only the first num_rows are retained.

Parameters:
  • self – A pointer to a tsk_mutation_table_t object.

  • num_rows – The number of rows to retain in the table.

Returns:

Return 0 on success or a negative value on failure.

int tsk_mutation_table_extend(tsk_mutation_table_t *self, const tsk_mutation_table_t *other, tsk_size_t num_rows, const tsk_id_t *row_indexes, tsk_flags_t options)#

Extends this table by appending rows copied from another table.

Appends the rows at the specified indexes from the table other to the end of this table. Row indexes can be repeated and in any order. If row_indexes is NULL, append the first num_rows from other to this table. Note that metadata is copied as-is and is not checked for compatibility with any existing schema on this table.

Parameters:
  • self – A pointer to a tsk_mutation_table_t object where rows are to be added.

  • other – A pointer to a tsk_mutation_table_t object where rows are copied from.

  • num_rows – The number of rows from other to append to this table.

  • row_indexes – Array of row indexes in other. If NULL is passed then the first num_rows of other are used.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_mutation_table_keep_rows(tsk_mutation_table_t *self, const tsk_bool_t *keep, tsk_flags_t options, tsk_id_t *id_map)#

Subset this table by keeping rows according to a boolean mask.

Deletes rows from this table and optionally return the mapping from IDs in the current table to the updated table. Rows are kept or deleted according to the specified boolean array keep such that for each row j if keep[j] is false (zero) the row is deleted, and otherwise the row is retained. Thus, keep must be an array of at least num_rows bool values.

If the id_map argument is non-null, this array will be updated to represent the mapping between IDs before and after row deletion. For row j, id_map[j] will contain the new ID for row j if it is retained, or TSK_NULL if the row has been removed. Thus, id_map must be an array of at least num_rows tsk_id_t values.

The values in the parent column are updated according to this map, so that reference integrity within the table is maintained. As a consequence of this, the values in the parent column for kept rows are bounds-checked and an error raised if they are not valid. Rows that are deleted are not checked for parent ID integrity.

If an attempt is made to delete rows that are referred to by the parent column of rows that are retained, an error is raised.

These error conditions are checked before any alterations to the table are made.

Warning

C++ users need to be careful to specify the correct type when passing in values for the keep array, using std::vector<tsk_bool_t> and not std::vector<bool>, as the latter may not be correct size.

Parameters:
  • self – A pointer to a tsk_mutation_table_t object.

  • keep – Array of boolean flags describing whether a particular row should be kept or not. Must be at least num_rows long.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

  • id_map – An array in which to store the mapping between new and old IDs. If NULL, this will be ignored.

Returns:

Return 0 on success or a negative value on failure.

bool tsk_mutation_table_equals(const tsk_mutation_table_t *self, const tsk_mutation_table_t *other, tsk_flags_t options)#

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

Parameters:
Returns:

Return true if the specified table is equal to this table.

int tsk_mutation_table_copy(const tsk_mutation_table_t *self, tsk_mutation_table_t *dest, tsk_flags_t options)#

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Parameters:
  • self – A pointer to a tsk_mutation_table_t object.

  • dest – A pointer to a tsk_mutation_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised mutation table. If not, it must be an uninitialised mutation table.

  • options – Bitwise option flags.

Returns:

Return 0 on success or a negative value on failure.

int tsk_mutation_table_get_row(const tsk_mutation_table_t *self, tsk_id_t index, tsk_mutation_t *row)#

Get the row at the specified index.

Updates the specified mutation struct to reflect the values in the specified row.

This function always sets the edge field in parameter tsk_mutation_t to TSK_NULL. To determine the ID of the edge associated with a particular mutation, please use the tree sequence method, tsk_treeseq_get_mutation().

Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Parameters:
  • self – A pointer to a tsk_mutation_table_t object.

  • index – The requested table row.

  • row – A pointer to a tsk_mutation_t struct that is updated to reflect the values in the specified row.

Returns:

Return 0 on success or a negative value on failure.

int tsk_mutation_table_set_metadata_schema(tsk_mutation_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)#

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Parameters:
  • self – A pointer to a tsk_mutation_table_t object.

  • metadata_schema – A pointer to a char array.

  • metadata_schema_length – The size of the metadata schema in bytes.

Returns:

Return 0 on success or a negative value on failure.

void tsk_mutation_table_print_state(const tsk_mutation_table_t *self, FILE *out)#

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters:
int tsk_mutation_table_set_columns(tsk_mutation_table_t *self, tsk_size_t num_rows, const tsk_id_t *site, const tsk_id_t *node, const tsk_id_t *parent, const double *time, const char *derived_state, const tsk_size_t *derived_state_offset, const char *metadata, const tsk_size_t *metadata_offset)#

Replace this table’s data by copying from a set of column arrays.

Clears the data columns of this table and then copies column data from the specified set of arrays. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters:
  • self – A pointer to a tsk_mutation_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • site – The array of tsk_id_t site values to be copied.

  • node – The array of tsk_id_t node values to be copied.

  • parent – The array of tsk_id_t parent values to be copied.

  • time – The array of double time values to be copied.

  • derived_state – The array of char derived_state values to be copied.

  • derived_state_offset – The array of tsk_size_t derived state offset values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns:

Return 0 on success or a negative value on failure.

int tsk_mutation_table_append_columns(tsk_mutation_table_t *self, tsk_size_t num_rows, const tsk_id_t *site, const tsk_id_t *node, const tsk_id_t *parent, const double *time, const char *derived_state, const tsk_size_t *derived_state_offset, const char *metadata, const tsk_size_t *metadata_offset)#

Extends this table by copying from a set of column arrays.

Copies column data from the specified set of arrays to create new rows at the end of the table. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters:
  • self – A pointer to a tsk_mutation_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • site – The array of tsk_id_t site values to be copied.

  • node – The array of tsk_id_t node values to be copied.

  • parent – The array of tsk_id_t parent values to be copied.

  • time – The array of double time values to be copied.

  • derived_state – The array of char derived_state values to be copied.

  • derived_state_offset – The array of tsk_size_t derived state offset values to be copied.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns:

Return 0 on success or a negative value on failure.

int tsk_mutation_table_set_max_rows_increment(tsk_mutation_table_t *self, tsk_size_t max_rows_increment)#

Controls the pre-allocation strategy for this table.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_mutation_table_t object.

  • max_rows_increment – The number of rows to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

int tsk_mutation_table_set_max_metadata_length_increment(tsk_mutation_table_t *self, tsk_size_t max_metadata_length_increment)#

Controls the pre-allocation strategy for the metadata column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_mutation_table_t object.

  • max_metadata_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

int tsk_mutation_table_set_max_derived_state_length_increment(tsk_mutation_table_t *self, tsk_size_t max_derived_state_length_increment)#

Controls the pre-allocation strategy for the derived_state column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_mutation_table_t object.

  • max_derived_state_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

Populations#

struct tsk_population_t#

A single population defined by a row in the population table.

See the data model section for the definition of a population and its properties.

Public Members

tsk_id_t id#

Non-negative ID value corresponding to table row.

const char *metadata#

Metadata.

tsk_size_t metadata_length#

Metadata length in bytes.

struct tsk_population_table_t#

The population table.

See the population table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows#

The number of rows in this table.

tsk_size_t metadata_length#

The total length of the metadata column.

char *metadata#

The metadata column.

tsk_size_t *metadata_offset#

The metadata_offset column.

char *metadata_schema#

The metadata schema.

int tsk_population_table_init(tsk_population_table_t *self, tsk_flags_t options)#

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Parameters:
  • self – A pointer to an uninitialised tsk_population_table_t object.

  • options – Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_population_table_free(tsk_population_table_t *self)#

Free the internal memory for the specified table.

Parameters:
Returns:

Always returns 0.

tsk_id_t tsk_population_table_add_row(tsk_population_table_t *self, const char *metadata, tsk_size_t metadata_length)#

Adds a row to this population table.

Add a new population with the specified metadata to the table. A copy of the metadata is immediately taken. See the table definition for details of the columns in this table.

Parameters:
  • self – A pointer to a tsk_population_table_t object.

  • metadata – The metadata to be associated with the new population. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns:

Return the ID of the newly added population on success, or a negative value on failure.

int tsk_population_table_update_row(tsk_population_table_t *self, tsk_id_t index, const char *metadata, tsk_size_t metadata_length)#

Updates the row at the specified index.

Rewrite the row at the specified index in this table to use the specified values. A copy of the metadata parameter is taken immediately. See the table definition for details of the columns in this table.

Warning

Because of the way that ragged columns are encoded, this method requires a full rewrite of the internal column memory in worst case, and would therefore be inefficient for bulk updates for such columns. However, if the sizes of all ragged column values are unchanged in the updated row, this method is guaranteed to only update the memory for the row in question.

Parameters:
  • self – A pointer to a tsk_population_table_t object.

  • index – The row to update.

  • metadata – The metadata to be associated with the population. This is a pointer to arbitrary memory. Can be NULL if metadata_length is 0.

  • metadata_length – The size of the metadata array in bytes.

Returns:

Return 0 on success or a negative value on failure.

int tsk_population_table_clear(tsk_population_table_t *self)#

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_population_table_free() to free the table’s internal resources. Note that the metadata schema is not cleared.

Parameters:
Returns:

Return 0 on success or a negative value on failure.

int tsk_population_table_truncate(tsk_population_table_t *self, tsk_size_t num_rows)#

Truncates this table so that only the first num_rows are retained.

Parameters:
  • self – A pointer to a tsk_population_table_t object.

  • num_rows – The number of rows to retain in the table.

Returns:

Return 0 on success or a negative value on failure.

int tsk_population_table_extend(tsk_population_table_t *self, const tsk_population_table_t *other, tsk_size_t num_rows, const tsk_id_t *row_indexes, tsk_flags_t options)#

Extends this table by appending rows copied from another table.

Appends the rows at the specified indexes from the table other to the end of this table. Row indexes can be repeated and in any order. If row_indexes is NULL, append the first num_rows from other to this table. Note that metadata is copied as-is and is not checked for compatibility with any existing schema on this table.

Parameters:
  • self – A pointer to a tsk_population_table_t object where rows are to be added.

  • other – A pointer to a tsk_population_table_t object where rows are copied from.

  • num_rows – The number of rows from other to append to this table.

  • row_indexes – Array of row indexes in other. If NULL is passed then the first num_rows of other are used.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_population_table_keep_rows(tsk_population_table_t *self, const tsk_bool_t *keep, tsk_flags_t options, tsk_id_t *id_map)#

Subset this table by keeping rows according to a boolean mask.

Deletes rows from this table and optionally return the mapping from IDs in the current table to the updated table. Rows are kept or deleted according to the specified boolean array keep such that for each row j if keep[j] is false (zero) the row is deleted, and otherwise the row is retained. Thus, keep must be an array of at least num_rows bool values.

If the id_map argument is non-null, this array will be updated to represent the mapping between IDs before and after row deletion. For row j, id_map[j] will contain the new ID for row j if it is retained, or TSK_NULL if the row has been removed. Thus, id_map must be an array of at least num_rows tsk_id_t values.

Warning

C++ users need to be careful to specify the correct type when passing in values for the keep array, using std::vector<tsk_bool_t> and not std::vector<bool>, as the latter may not be correct size.

Parameters:
  • self – A pointer to a tsk_population_table_t object.

  • keep – Array of boolean flags describing whether a particular row should be kept or not. Must be at least num_rows long.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

  • id_map – An array in which to store the mapping between new and old IDs. If NULL, this will be ignored.

Returns:

Return 0 on success or a negative value on failure.

bool tsk_population_table_equals(const tsk_population_table_t *self, const tsk_population_table_t *other, tsk_flags_t options)#

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns, and their metadata schemas are byte-wise identical.

  • TSK_CMP_IGNORE_METADATA

    Do not include metadata in the comparison. Note that as metadata is the only column in the population table, two population tables are considered equal if they have the same number of rows if this flag is specified.

Parameters:
Returns:

Return true if the specified table is equal to this table.

int tsk_population_table_copy(const tsk_population_table_t *self, tsk_population_table_t *dest, tsk_flags_t options)#

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Parameters:
  • self – A pointer to a tsk_population_table_t object.

  • dest – A pointer to a tsk_population_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised population table. If not, it must be an uninitialised population table.

  • options – Bitwise option flags.

Returns:

Return 0 on success or a negative value on failure.

int tsk_population_table_get_row(const tsk_population_table_t *self, tsk_id_t index, tsk_population_t *row)#

Get the row at the specified index.

Updates the specified population struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Parameters:
  • self – A pointer to a tsk_population_table_t object.

  • index – The requested table row.

  • row – A pointer to a tsk_population_t struct that is updated to reflect the values in the specified row.

Returns:

Return 0 on success or a negative value on failure.

int tsk_population_table_set_metadata_schema(tsk_population_table_t *self, const char *metadata_schema, tsk_size_t metadata_schema_length)#

Set the metadata schema.

Copies the metadata schema string to this table, replacing any existing.

Parameters:
  • self – A pointer to a tsk_population_table_t object.

  • metadata_schema – A pointer to a char array.

  • metadata_schema_length – The size of the metadata schema in bytes.

Returns:

Return 0 on success or a negative value on failure.

void tsk_population_table_print_state(const tsk_population_table_t *self, FILE *out)#

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters:
int tsk_population_table_set_columns(tsk_population_table_t *self, tsk_size_t num_rows, const char *metadata, const tsk_size_t *metadata_offset)#

Replace this table’s data by copying from a set of column arrays.

Clears the data columns of this table and then copies column data from the specified set of arrays. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters:
  • self – A pointer to a tsk_population_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns:

Return 0 on success or a negative value on failure.

int tsk_population_table_append_columns(tsk_population_table_t *self, tsk_size_t num_rows, const char *metadata, const tsk_size_t *metadata_offset)#

Extends this table by copying from a set of column arrays.

Copies column data from the specified set of arrays to create new rows at the end of the table. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters:
  • self – A pointer to a tsk_population_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • metadata – The array of char metadata values to be copied.

  • metadata_offset – The array of tsk_size_t metadata offset values to be copied.

Returns:

Return 0 on success or a negative value on failure.

int tsk_population_table_set_max_rows_increment(tsk_population_table_t *self, tsk_size_t max_rows_increment)#

Controls the pre-allocation strategy for this table.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_population_table_t object.

  • max_rows_increment – The number of rows to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

int tsk_population_table_set_max_metadata_length_increment(tsk_population_table_t *self, tsk_size_t max_metadata_length_increment)#

Controls the pre-allocation strategy for the metadata column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_population_table_t object.

  • max_metadata_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

Provenances#

struct tsk_provenance_t#

A single provenance defined by a row in the provenance table.

See the data model section for the definition of a provenance object and its properties. See the Provenance section for more information on how provenance records should be structured.

Public Members

tsk_id_t id#

Non-negative ID value corresponding to table row.

const char *timestamp#

The timestamp.

tsk_size_t timestamp_length#

The timestamp length in bytes.

const char *record#

The record.

tsk_size_t record_length#

The record length in bytes.

struct tsk_provenance_table_t#

The provenance table.

See the provenance table definition for details of the columns in this table.

Public Members

tsk_size_t num_rows#

The number of rows in this table.

tsk_size_t timestamp_length#

The total length of the timestamp column.

tsk_size_t record_length#

The total length of the record column.

char *timestamp#

The timestamp column.

tsk_size_t *timestamp_offset#

The timestamp_offset column.

char *record#

The record column.

tsk_size_t *record_offset#

The record_offset column.

int tsk_provenance_table_init(tsk_provenance_table_t *self, tsk_flags_t options)#

Initialises the table by allocating the internal memory.

This must be called before any operations are performed on the table. See the API structure for details on how objects are initialised and freed.

Parameters:
  • self – A pointer to an uninitialised tsk_provenance_table_t object.

  • options – Allocation time options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_provenance_table_free(tsk_provenance_table_t *self)#

Free the internal memory for the specified table.

Parameters:
Returns:

Always returns 0.

tsk_id_t tsk_provenance_table_add_row(tsk_provenance_table_t *self, const char *timestamp, tsk_size_t timestamp_length, const char *record, tsk_size_t record_length)#

Adds a row to this provenance table.

Add a new provenance with the specified timestamp and record to the table. Copies of the timestamp and record are immediately taken. See the table definition for details of the columns in this table.

Parameters:
  • self – A pointer to a tsk_provenance_table_t object.

  • timestamp – The timestamp to be associated with the new provenance. This is a pointer to arbitrary memory. Can be NULL if timestamp_length is 0.

  • timestamp_length – The size of the timestamp array in bytes.

  • record – The record to be associated with the new provenance. This is a pointer to arbitrary memory. Can be NULL if record_length is 0.

  • record_length – The size of the record array in bytes.

Returns:

Return the ID of the newly added provenance on success, or a negative value on failure.

int tsk_provenance_table_update_row(tsk_provenance_table_t *self, tsk_id_t index, const char *timestamp, tsk_size_t timestamp_length, const char *record, tsk_size_t record_length)#

Updates the row at the specified index.

Rewrite the row at the specified index in this table to use the specified values. Copies of the timestamp and record parameters are taken immediately. See the table definition for details of the columns in this table.

Warning

Because of the way that ragged columns are encoded, this method requires a full rewrite of the internal column memory in worst case, and would therefore be inefficient for bulk updates for such columns. However, if the sizes of all ragged column values are unchanged in the updated row, this method is guaranteed to only update the memory for the row in question.

Parameters:
  • self – A pointer to a tsk_provenance_table_t object.

  • index – The row to update.

  • timestamp – The timestamp to be associated with new provenance. This is a pointer to arbitrary memory. Can be NULL if timestamp_length is 0.

  • timestamp_length – The size of the timestamp array in bytes.

  • record – The record to be associated with the provenance. This is a pointer to arbitrary memory. Can be NULL if record_length is 0.

  • record_length – The size of the record array in bytes.

Returns:

Return 0 on success or a negative value on failure.

int tsk_provenance_table_clear(tsk_provenance_table_t *self)#

Clears this table, setting the number of rows to zero.

No memory is freed as a result of this operation; please use tsk_provenance_table_free() to free the table’s internal resources.

Parameters:
Returns:

Return 0 on success or a negative value on failure.

int tsk_provenance_table_truncate(tsk_provenance_table_t *self, tsk_size_t num_rows)#

Truncates this table so that only the first num_rows are retained.

Parameters:
  • self – A pointer to a tsk_provenance_table_t object.

  • num_rows – The number of rows to retain in the table.

Returns:

Return 0 on success or a negative value on failure.

int tsk_provenance_table_extend(tsk_provenance_table_t *self, const tsk_provenance_table_t *other, tsk_size_t num_rows, const tsk_id_t *row_indexes, tsk_flags_t options)#

Extends this table by appending rows copied from another table.

Appends the rows at the specified indexes from the table other to the end of this table. Row indexes can be repeated and in any order. If row_indexes is NULL, append the first num_rows from other to this table.

Parameters:
  • self – A pointer to a tsk_provenance_table_t object where rows are to be added.

  • other – A pointer to a tsk_provenance_table_t object where rows are copied from.

  • num_rows – The number of rows from other to append to this table.

  • row_indexes – Array of row indexes in other. If NULL is passed then the first num_rows of other are used.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_provenance_table_keep_rows(tsk_provenance_table_t *self, const tsk_bool_t *keep, tsk_flags_t options, tsk_id_t *id_map)#

Subset this table by keeping rows according to a boolean mask.

Deletes rows from this table and optionally return the mapping from IDs in the current table to the updated table. Rows are kept or deleted according to the specified boolean array keep such that for each row j if keep[j] is false (zero) the row is deleted, and otherwise the row is retained. Thus, keep must be an array of at least num_rows bool values.

If the id_map argument is non-null, this array will be updated to represent the mapping between IDs before and after row deletion. For row j, id_map[j] will contain the new ID for row j if it is retained, or TSK_NULL if the row has been removed. Thus, id_map must be an array of at least num_rows tsk_id_t values.

Warning

C++ users need to be careful to specify the correct type when passing in values for the keep array, using std::vector<tsk_bool_t> and not std::vector<bool>, as the latter may not be correct size.

Parameters:
  • self – A pointer to a tsk_provenance_table_t object.

  • keep – Array of boolean flags describing whether a particular row should be kept or not. Must be at least num_rows long.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

  • id_map – An array in which to store the mapping between new and old IDs. If NULL, this will be ignored.

Returns:

Return 0 on success or a negative value on failure.

bool tsk_provenance_table_equals(const tsk_provenance_table_t *self, const tsk_provenance_table_t *other, tsk_flags_t options)#

Returns true if the data in the specified table is identical to the data in this table.

Options

Options to control the comparison can be specified by providing one or more of the following bitwise flags. By default (options=0) tables are considered equal if they are byte-wise identical in all columns.

Parameters:
Returns:

Return true if the specified table is equal to this table.

int tsk_provenance_table_copy(const tsk_provenance_table_t *self, tsk_provenance_table_t *dest, tsk_flags_t options)#

Copies the state of this table into the specified destination.

By default the method initialises the specified destination table. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Parameters:
  • self – A pointer to a tsk_provenance_table_t object.

  • dest – A pointer to a tsk_provenance_table_t object. If the TSK_NO_INIT option is specified, this must be an initialised provenance table. If not, it must be an uninitialised provenance table.

  • options – Bitwise option flags.

Returns:

Return 0 on success or a negative value on failure.

int tsk_provenance_table_get_row(const tsk_provenance_table_t *self, tsk_id_t index, tsk_provenance_t *row)#

Get the row at the specified index.

Updates the specified provenance struct to reflect the values in the specified row. Pointers to memory within this struct are handled by the table and should not be freed by client code. These pointers are guaranteed to be valid until the next operation that modifies the table (e.g., by adding a new row), but not afterwards.

Parameters:
  • self – A pointer to a tsk_provenance_table_t object.

  • index – The requested table row.

  • row – A pointer to a tsk_provenance_t struct that is updated to reflect the values in the specified row.

Returns:

Return 0 on success or a negative value on failure.

void tsk_provenance_table_print_state(const tsk_provenance_table_t *self, FILE *out)#

Print out the state of this table to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters:
int tsk_provenance_table_set_columns(tsk_provenance_table_t *self, tsk_size_t num_rows, const char *timestamp, const tsk_size_t *timestamp_offset, const char *record, const tsk_size_t *record_offset)#

Replace this table’s data by copying from a set of column arrays.

Clears the data columns of this table and then copies column data from the specified set of arrays. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters:
  • self – A pointer to a tsk_provenance_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • timestamp – The array of char timestamp values to be copied.

  • timestamp_offset – The array of tsk_size_t timestamp offset values to be copied.

  • record – The array of char record values to be copied.

  • record_offset – The array of tsk_size_t record offset values to be copied.

Returns:

Return 0 on success or a negative value on failure.

int tsk_provenance_table_append_columns(tsk_provenance_table_t *self, tsk_size_t num_rows, const char *timestamp, const tsk_size_t *timestamp_offset, const char *record, const tsk_size_t *record_offset)#

Extends this table by copying from a set of column arrays.

Copies column data from the specified set of arrays to create new rows at the end of the table. The supplied arrays should all contain data on the same number of rows. The metadata schema is not affected.

Parameters:
  • self – A pointer to a tsk_provenance_table_t object.

  • num_rows – The number of rows to copy from the specifed arrays.

  • timestamp – The array of char timestamp values to be copied.

  • timestamp_offset – The array of tsk_size_t timestamp offset values to be copied.

  • record – The array of char record values to be copied.

  • record_offset – The array of tsk_size_t record offset values to be copied.

Returns:

Return 0 on success or a negative value on failure.

int tsk_provenance_table_set_max_rows_increment(tsk_provenance_table_t *self, tsk_size_t max_rows_increment)#

Controls the pre-allocation strategy for this table.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_provenance_table_t object.

  • max_rows_increment – The number of rows to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

int tsk_provenance_table_set_max_timestamp_length_increment(tsk_provenance_table_t *self, tsk_size_t max_timestamp_length_increment)#

Controls the pre-allocation strategy for the timestamp column.

Set a fixed pre-allocation size, or use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_provenance_table_t object.

  • max_timestamp_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

int tsk_provenance_table_set_max_record_length_increment(tsk_provenance_table_t *self, tsk_size_t max_record_length_increment)#

Controls the pre-allocation strategy for the record column.

Set a fixed pre-allocation size, use the default doubling strategy. See Memory allocation strategy for details on the default pre-allocation strategy,

Parameters:
  • self – A pointer to a tsk_provenance_table_t object.

  • max_record_length_increment – The number of bytes to pre-allocate, or zero for the default doubling strategy.

Returns:

Return 0 on success or a negative value on failure.

Table indexes#

Along with the tree sequence ordering requirements, the Table indexes allow us to take a table collection and efficiently operate on the trees defined within it. This section defines the rules for safely operating on table indexes and their life-cycle.

The edge index used for tree generation consists of two arrays, each holding N edge IDs (where N is the size of the edge table). When the index is computed using tsk_table_collection_build_index(), we store the current size of the edge table along with the two arrays of edge IDs. The function tsk_table_collection_has_index() then returns true iff (a) both of these arrays are not NULL and (b) the stored number of edges is the same as the current size of the edge table.

Updating the edge table does not automatically invalidate the indexes. Thus, if we call tsk_edge_table_clear() on an edge table which has an index, this index will still exist. However, it will not be considered a valid index by tsk_table_collection_has_index() because of the size mismatch. Similarly for functions that increase the size of the table. Note that it is possible then to have tsk_table_collection_has_index() return true, but the index is not actually valid, if, for example, the user has manipulated the node and edge tables to describe a different topology, which happens to have the same number of edges. The behaviour of methods that use the indexes will be undefined in this case.

Thus, if you are manipulating an existing table collection that may be indexed, it is always recommended to call tsk_table_collection_drop_index() first.

Tree sequences#

struct tsk_treeseq_t#

The tree sequence object.

Public Members

tsk_table_collection_t *tables#

The table collection underlying this tree sequence, This table collection must be treated as read-only, and any changes to it will lead to undefined behaviour.

int tsk_treeseq_init(tsk_treeseq_t *self, tsk_table_collection_t *tables, tsk_flags_t options)#

Initialises the tree sequence based on the specified table collection.

This method will copy the supplied table collection unless TSK_TAKE_OWNERSHIP is specified. The table collection will be checked for integrity and index maps built.

This must be called before any operations are performed on the tree sequence. See the API structure for details on how objects are initialised and freed.

If specified, TSK_TAKE_OWNERSHIP takes immediate ownership of the tables, regardless of error conditions.

Options

Parameters:
Returns:

Return 0 on success or a negative value on failure.

int tsk_treeseq_load(tsk_treeseq_t *self, const char *filename, tsk_flags_t options)#

Load a tree sequence from a file path.

Loads the data from the specified file into this tree sequence. The tree sequence is also initialised. The resources allocated must be freed using tsk_treeseq_free() even in error conditions.

Works similarly to tsk_table_collection_load() please see that function’s documentation for details and options.

Examples

int ret;
tsk_treeseq_t ts;
ret = tsk_treeseq_load(&ts, "data.trees", 0);
if (ret != 0) {
    fprintf(stderr, "Load error:%s\n", tsk_strerror(ret));
    exit(EXIT_FAILURE);
}

Parameters:
  • self – A pointer to an uninitialised tsk_treeseq_t object

  • filename – A NULL terminated string containing the filename.

  • options – Bitwise options. See above for details.

Returns:

Return 0 on success or a negative value on failure.

int tsk_treeseq_loadf(tsk_treeseq_t *self, FILE *file, tsk_flags_t options)#

Load a tree sequence from a stream.

Loads a tree sequence from the specified file stream. The tree sequence is also initialised. The resources allocated must be freed using tsk_treeseq_free() even in error conditions.

Works similarly to tsk_table_collection_loadf() please see that function’s documentation for details and options.

Parameters:
  • self – A pointer to an uninitialised tsk_treeseq_t object.

  • file – A FILE stream opened in an appropriate mode for reading (e.g. “r”, “r+” or “w+”) positioned at the beginning of a tree sequence definition.

  • options – Bitwise options. See above for details.

Returns:

Return 0 on success or a negative value on failure.

int tsk_treeseq_dump(const tsk_treeseq_t *self, const char *filename, tsk_flags_t options)#

Write a tree sequence to file.

Writes the data from this tree sequence to the specified file.

If an error occurs the file path is deleted, ensuring that only complete and well formed files will be written.

Parameters:
  • self – A pointer to an initialised tsk_treeseq_t object.

  • filename – A NULL terminated string containing the filename.

  • options – Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_treeseq_dumpf(const tsk_treeseq_t *self, FILE *file, tsk_flags_t options)#

Write a tree sequence to a stream.

Writes the data from this tree sequence to the specified FILE stream. Semantics are identical to tsk_treeseq_dump().

Please see the File streaming section for an example of how to sequentially dump and load tree sequences from a stream.

Parameters:
  • self – A pointer to an initialised tsk_treeseq_t object.

  • file – A FILE stream opened in an appropriate mode for writing (e.g. “w”, “a”, “r+” or “w+”).

  • options – Bitwise options. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_treeseq_copy_tables(const tsk_treeseq_t *self, tsk_table_collection_t *tables, tsk_flags_t options)#

Copies the state of the table collection underlying this tree sequence into the specified destination table collection.

By default the method initialises the specified destination table collection. If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory.

Parameters:
  • self – A pointer to a tsk_treeseq_t object.

  • tables – A pointer to a tsk_table_collection_t object. If the TSK_NO_INIT option is specified, this must be an initialised table collection. If not, it must be an uninitialised table collection.

  • options – Bitwise option flags.

Returns:

Return 0 on success or a negative value on failure.

int tsk_treeseq_free(tsk_treeseq_t *self)#

Free the internal memory for the specified tree sequence.

Parameters:
Returns:

Always returns 0.

void tsk_treeseq_print_state(const tsk_treeseq_t *self, FILE *out)#

Print out the state of this tree sequence to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters:
  • self – A pointer to a tsk_treeseq_t object.

  • out – The stream to write the summary to.

tsk_size_t tsk_treeseq_get_num_nodes(const tsk_treeseq_t *self)#

Get the number of nodes.

Returns the number of nodes in this tree sequence.

Parameters:
Returns:

Returns the number of nodes.

tsk_size_t tsk_treeseq_get_num_edges(const tsk_treeseq_t *self)#

Get the number of edges.

Returns the number of edges in this tree sequence.

Parameters:
Returns:

Returns the number of edges.

tsk_size_t tsk_treeseq_get_num_migrations(const tsk_treeseq_t *self)#

Get the number of migrations.

Returns the number of migrations in this tree sequence.

Parameters:
Returns:

Returns the number of migrations.

tsk_size_t tsk_treeseq_get_num_sites(const tsk_treeseq_t *self)#

Get the number of sites.

Returns the number of sites in this tree sequence.

Parameters:
Returns:

Returns the number of sites.

tsk_size_t tsk_treeseq_get_num_mutations(const tsk_treeseq_t *self)#

Get the number of mutations.

Returns the number of mutations in this tree sequence.

Parameters:
Returns:

Returns the number of mutations.

tsk_size_t tsk_treeseq_get_num_provenances(const tsk_treeseq_t *self)#

Get the number of provenances.

Returns the number of provenances in this tree sequence.

Parameters:
Returns:

Returns the number of provenances.

tsk_size_t tsk_treeseq_get_num_populations(const tsk_treeseq_t *self)#

Get the number of populations.

Returns the number of populations in this tree sequence.

Parameters:
Returns:

Returns the number of populations.

tsk_size_t tsk_treeseq_get_num_individuals(const tsk_treeseq_t *self)#

Get the number of individuals.

Returns the number of individuals in this tree sequence.

Parameters:
Returns:

Returns the number of individuals.

tsk_size_t tsk_treeseq_get_num_trees(const tsk_treeseq_t *self)#

Return the number of trees in this tree sequence.

This is a constant time operation.

Parameters:
Returns:

The number of trees in the tree sequence.

tsk_size_t tsk_treeseq_get_num_samples(const tsk_treeseq_t *self)#

Get the number of samples.

Returns the number of nodes marked as samples in this tree sequence.

Parameters:
Returns:

Returns the number of samples.

const char *tsk_treeseq_get_metadata(const tsk_treeseq_t *self)#

Get the top-level tree sequence metadata.

Returns a pointer to the metadata string, which is owned by the tree sequence and not null-terminated.

Parameters:
Returns:

Returns a pointer to the metadata.

tsk_size_t tsk_treeseq_get_metadata_length(const tsk_treeseq_t *self)#

Get the length of top-level tree sequence metadata.

Returns the length of the metadata string.

Parameters:
Returns:

Returns the length of the metadata.

const char *tsk_treeseq_get_metadata_schema(const tsk_treeseq_t *self)#

Get the top-level tree sequence metadata schema.

Returns a pointer to the metadata schema string, which is owned by the tree sequence and not null-terminated.

Parameters:
Returns:

Returns a pointer to the metadata schema.

tsk_size_t tsk_treeseq_get_metadata_schema_length(const tsk_treeseq_t *self)#

Get the length of the top-level tree sequence metadata schema.

Returns the length of the metadata schema string.

Parameters:
Returns:

Returns the length of the metadata schema.

const char *tsk_treeseq_get_time_units(const tsk_treeseq_t *self)#

Get the time units string.

Returns a pointer to the time units string, which is owned by the tree sequence and not null-terminated.

Parameters:
Returns:

Returns a pointer to the time units.

tsk_size_t tsk_treeseq_get_time_units_length(const tsk_treeseq_t *self)#

Get the length of time units string.

Returns the length of the time units string.

Parameters:
Returns:

Returns the length of the time units.

const char *tsk_treeseq_get_file_uuid(const tsk_treeseq_t *self)#

Get the file uuid.

Returns a pointer to the null-terminated file uuid string, which is owned by the tree sequence.

Parameters:
Returns:

Returns a pointer to the time units.

double tsk_treeseq_get_sequence_length(const tsk_treeseq_t *self)#

Get the sequence length.

Returns the sequence length of this tree sequence

Parameters:
Returns:

Returns the sequence length.

const double *tsk_treeseq_get_breakpoints(const tsk_treeseq_t *self)#

Get the breakpoints.

Returns an array of breakpoint locations, the array is owned by the tree sequence.

Parameters:
Returns:

Returns the pointer to the breakpoint array.

const tsk_id_t *tsk_treeseq_get_samples(const tsk_treeseq_t *self)#

Get the samples.

Returns an array of ids of sample nodes in this tree sequence. I.e. nodes that have the TSK_NODE_IS_SAMPLE flag set. The array is owned by the tree sequence and should not be modified or free’d.

Parameters:
Returns:

Returns the pointer to the sample node id array.

const tsk_id_t *tsk_treeseq_get_sample_index_map(const tsk_treeseq_t *self)#

Get the map of node id to sample index.

Returns the location of each node in the list of samples or TSK_NULL for nodes that are not samples.

Parameters:
Returns:

Returns the pointer to the array of sample indexes.

bool tsk_treeseq_is_sample(const tsk_treeseq_t *self, tsk_id_t u)#

Check if a node is a sample.

Returns the sample status of a given node id.

Parameters:
  • self – A pointer to a tsk_treeseq_t object.

  • u – The id of the node to be checked.

Returns:

Returns true if the node is a sample.

bool tsk_treeseq_get_discrete_genome(const tsk_treeseq_t *self)#

Get the discrete genome status.

If all the genomic locations in the tree sequence are discrete integer values then this flag will be true.

Parameters:
Returns:

Returns true if all genomic locations are discrete.

bool tsk_treeseq_get_discrete_time(const tsk_treeseq_t *self)#

Get the discrete time status.

If all times in the tree sequence are discrete integer values then this flag will be true

Parameters:
Returns:

Returns true if all times are discrete.

double tsk_treeseq_get_min_time(const tsk_treeseq_t *self)#

Get the min time in node table and mutation table.

The times stored in both the node and mutation tables are considered.

Parameters:
Returns:

Returns the min time of all nodes and mutations.

double tsk_treeseq_get_max_time(const tsk_treeseq_t *self)#

Get the max time in node table and mutation table.

The times stored in both the node and mutation tables are considered.

Parameters:
Returns:

Returns the max time of all nodes and mutations.

int tsk_treeseq_get_node(const tsk_treeseq_t *self, tsk_id_t index, tsk_node_t *node)#

Get a node by its index.

Copies a node from this tree sequence to the specified destination.

Parameters:
  • self – A pointer to a tsk_treeseq_t object.

  • index – The node index to copy

  • node – A pointer to a tsk_node_t object.

Returns:

Return 0 on success or a negative value on failure.

int tsk_treeseq_get_edge(const tsk_treeseq_t *self, tsk_id_t index, tsk_edge_t *edge)#

Get a edge by its index.

Copies a edge from this tree sequence to the specified destination.

Parameters:
  • self – A pointer to a tsk_treeseq_t object.

  • index – The edge index to copy

  • edge – A pointer to a tsk_edge_t object.

Returns:

Return 0 on success or a negative value on failure.

int tsk_treeseq_get_migration(const tsk_treeseq_t *self, tsk_id_t index, tsk_migration_t *migration)#

Get a edge by its index.

Copies a migration from this tree sequence to the specified destination.

Parameters:
Returns:

Return 0 on success or a negative value on failure.

int tsk_treeseq_get_site(const tsk_treeseq_t *self, tsk_id_t index, tsk_site_t *site)#

Get a site by its index.

Copies a site from this tree sequence to the specified destination.

Parameters:
  • self – A pointer to a tsk_treeseq_t object.

  • index – The site index to copy

  • site – A pointer to a tsk_site_t object.

Returns:

Return 0 on success or a negative value on failure.

int tsk_treeseq_get_mutation(const tsk_treeseq_t *self, tsk_id_t index, tsk_mutation_t *mutation)#

Get a mutation by its index.

Copies a mutation from this tree sequence to the specified destination.

Parameters:
  • self – A pointer to a tsk_treeseq_t object.

  • index – The mutation index to copy

  • mutation – A pointer to a tsk_mutation_t object.

Returns:

Return 0 on success or a negative value on failure.

int tsk_treeseq_get_provenance(const tsk_treeseq_t *self, tsk_id_t index, tsk_provenance_t *provenance)#

Get a provenance by its index.

Copies a provenance from this tree sequence to the specified destination.

Parameters:
Returns:

Return 0 on success or a negative value on failure.

int tsk_treeseq_get_population(const tsk_treeseq_t *self, tsk_id_t index, tsk_population_t *population)#

Get a population by its index.

Copies a population from this tree sequence to the specified destination.

Parameters:
Returns:

Return 0 on success or a negative value on failure.

int tsk_treeseq_get_individual(const tsk_treeseq_t *self, tsk_id_t index, tsk_individual_t *individual)#

Get a individual by its index.

Copies a individual from this tree sequence to the specified destination.

Parameters:
Returns:

Return 0 on success or a negative value on failure.

int tsk_treeseq_simplify(const tsk_treeseq_t *self, const tsk_id_t *samples, tsk_size_t num_samples, tsk_flags_t options, tsk_treeseq_t *output, tsk_id_t *node_map)#

Create a simplified instance of this tree sequence.

Copies this tree sequence to the specified destination and performs simplification. The destination tree sequence should be uninitialised. Simplification transforms the tables to remove redundancy and canonicalise tree sequence data. See the simplification tutorial for more details.

For full details and flags see tsk_table_collection_simplify() which performs the same operation in place.

Parameters:
  • self – A pointer to a uninitialised tsk_treeseq_t object.

  • samples – Either NULL or an array of num_samples distinct and valid node IDs. If non-null the nodes in this array will be marked as samples in the output. If NULL, the num_samples parameter is ignored and the samples in the output will be the same as the samples in the input. This is equivalent to populating the samples array with all of the sample nodes in the input in increasing order of ID.

  • num_samples – The number of node IDs in the input samples array. Ignored if the samples array is NULL.

  • options – Simplify options; see above for the available bitwise flags. For the default behaviour, a value of 0 should be provided.

  • output – A pointer to an uninitialised tsk_treeseq_t object.

  • node_map – If not NULL, this array will be filled to define the mapping between nodes IDs in the table collection before and after simplification.

Returns:

Return 0 on success or a negative value on failure.

int tsk_treeseq_extend_haplotypes(const tsk_treeseq_t *self, int max_iter, tsk_flags_t options, tsk_treeseq_t *output)#

Extends haplotypes.

Returns a new tree sequence in which the span covered by ancestral nodes is “extended” to regions of the genome according to the following rule: If an ancestral segment corresponding to node n has ancestor p and descendant c on some portion of the genome, and on an adjacent segment of genome p is still an ancestor of c, then n is inserted into the path from p to c. For instance, if p is the parent of n and n is the parent of c, then the span of the edges from p to n and n to c are extended, and the span of the edge from p to c is reduced. However, any edges whose child node is a sample are not modified. The node of certain mutations may also be remapped; to do this unambiguously we need to know mutation times. If mutations times are unknown, use tsk_table_collection_compute_mutation_times first.

The method will not affect any tables except the edge table, or the node column in the mutation table.

The method works by iterating over the genome to look for edges that can be extended in this way; the maximum number of such iterations is controlled by max_iter.

Options: None currently defined.

Parameters:
  • self – A pointer to a tsk_treeseq_t object.

  • max_iter – The maximum number of iterations over the tree sequence.

  • options – Bitwise option flags. (UNUSED)

  • output – A pointer to an uninitialised tsk_treeseq_t object.

Returns:

Return 0 on success or a negative value on failure.

Trees#

struct tsk_tree_t#

A single tree in a tree sequence.

A tsk_tree_t object has two basic functions:

  1. Represent the state of a single tree in a tree sequence;

  2. Provide methods to transform this state into different trees in the sequence.

The state of a single tree in the tree sequence is represented using the quintuply linked encoding: please see the data model section for details on how this works. The left-to-right ordering of nodes in this encoding is arbitrary, and may change depending on the order in which trees are accessed within the sequence. Please see the Tree traversals examples for recommended usage.

On initialisation, a tree is in the null state and we must call one of the seeking methods to make the state of the tree object correspond to a particular tree in the sequence. Please see the Tree iteration examples for recommended usage.

Public Members

const tsk_treeseq_t *tree_sequence#

The parent tree sequence.

tsk_id_t virtual_root#

The ID of the “virtual root” whose children are the roots of the tree.

tsk_id_t *parent#

The parent of node u is parent[u]. Equal to TSK_NULL if node u is a root or is not a node in the current tree.

tsk_id_t *left_child#

The leftmost child of node u is left_child[u]. Equal to TSK_NULL if node u is a leaf or is not a node in the current tree.

tsk_id_t *right_child#

The rightmost child of node u is right_child[u]. Equal to TSK_NULL if node u is a leaf or is not a node in the current tree.

tsk_id_t *left_sib#

The sibling to the left of node u is left_sib[u]. Equal to TSK_NULL if node u has no siblings to its left.

tsk_id_t *right_sib#

The sibling to the right of node u is right_sib[u]. Equal to TSK_NULL if node u has no siblings to its right.

tsk_id_t *num_children#

The number of children of node u is num_children[u].

tsk_id_t *edge#

Array of edge ids where edge[u] is the edge that encodes the relationship between the child node u and its parent. Equal to TSK_NULL if node u is a root, virtual root or is not a node in the current tree.

tsk_size_t num_edges#

The total number of edges defining the topology of this tree. This is equal to the number of tree sequence edges that intersect with the tree’s genomic interval.

struct tsk_tree_t.[anonymous] interval#

Left and right coordinates of the genomic interval that this tree covers. The left coordinate is inclusive and the right coordinate exclusive.

Example:

tsk_tree_t tree;
int ret;
// initialise etc
ret = tsk_tree_first(&tree);
// Check for error
assert(ret == TSK_TREE_OK);
printf("Coordinates covered by first tree are left=%f, right=%f\n",
    tree.interval.left, tree.interval.right);

tsk_id_t index#

The index of this tree in the tree sequence.

This attribute provides the zero-based index of the tree represented by the current state of the struct within the parent tree sequence. For example, immediately after we call tsk_tree_first(&tree), tree.index will be zero, and after we call tsk_tree_last(&tree), tree.index will be the number of trees - 1 (see tsk_treeseq_get_num_trees()) When the tree is in the null state (immediately after initialisation, or after, e.g., calling tsk_tree_prev() on the first tree) the value of the index is -1.

Lifecycle#

int tsk_tree_init(tsk_tree_t *self, const tsk_treeseq_t *tree_sequence, tsk_flags_t options)#

Initialises the tree by allocating internal memory and associating with the specified tree sequence.

This must be called before any operations are performed on the tree.

The specified tree sequence object must be initialised, and must be valid for the full lifetime of this tree.

See the API structure for details on how objects are initialised and freed.

The options parameter is provided to support future expansions of the API. A number of undocumented internal features are controlled via this parameter, and it must be set to 0 to ensure that operations work as expected and for compatibility with future versions of tskit.

Parameters:
  • self – A pointer to an uninitialised tsk_tree_t object.

  • tree_sequence – A pointer to an initialised tsk_treeseq_t object.

  • options – Allocation time options. Must be 0, or behaviour is undefined.

Returns:

Return 0 on success or a negative value on failure.

int tsk_tree_free(tsk_tree_t *self)#

Free the internal memory for the specified tree.

Parameters:
  • self – A pointer to an initialised tsk_tree_t object.

Returns:

Always returns 0.

int tsk_tree_copy(const tsk_tree_t *self, tsk_tree_t *dest, tsk_flags_t options)#

Copies the state of this tree into the specified destination.

By default (options = 0) the method initialises the specified destination tree by calling tsk_tree_init(). If the destination is already initialised, the TSK_NO_INIT option should be supplied to avoid leaking memory. If TSK_NO_INIT is supplied and the tree sequence associated with the dest tree is not equal to the tree sequence associated with self, an error is raised.

The destination tree will keep a reference to the tree sequence object associated with the source tree, and this tree sequence must be valid for the full lifetime of the destination tree.

Options

If TSK_NO_INIT is not specified, options for tsk_tree_init() can be provided and will be passed on.

Parameters:
  • self – A pointer to an initialised tsk_tree_t object.

  • dest – A pointer to a tsk_tree_t object. If the TSK_NO_INIT option is specified, this must be an initialised tree. If not, it must be an uninitialised tree.

  • options – Copy and allocation time options. See the notes above for details.

Returns:

Return 0 on success or a negative value on failure.

Null state#

Trees are initially in a “null state” where each sample is a root and there are no branches. The index of a tree in the null state is -1.

We must call one of the seeking methods to make the state of the tree object correspond to a particular tree in the sequence.

Seeking#

When we are examining many trees along a tree sequence, we usually allocate a single tsk_tree_t object and update its state. This allows us to efficiently transform the state of a tree into nearby trees, using the underlying succinct tree sequence data structure.

The simplest example to visit trees left-to-right along the genome:

 1int
 2visit_trees(const tsk_treeseq_t *ts)
 3{
 4    tsk_tree_t tree;
 5    int ret;
 6
 7    ret = tsk_tree_init(&tree, &ts, 0);
 8    if (ret != 0) {
 9        goto out;
10    }
11    for (ret = tsk_tree_first(&tree); ret == TSK_TREE_OK; ret = tsk_tree_next(&tree)) {
12        printf("\ttree %lld covers interval left=%f right=%f\n",
13            (long long) tree.index, tree.interval.left, tree.interval.right);
14    }
15    if (ret != 0) {
16        goto out;
17    }
18    // Do other things in the function...
19out:
20    tsk_tree_free(&tree);
21    return ret;
22}

In this example we first initialise a tsk_tree_t object, associating it with the input tree sequence. We then iterate over the trees along the sequence using a for loop, with the ret variable controlling iteration. The usage of ret here follows a slightly different pattern to other functions in the tskit C API (see the Error handling section). The interaction between error handling and states of the tree object here is somewhat subtle, and is worth explaining in detail.

After successful initialisation (after line 10), the tree is in the null state where all samples are roots. The for loop begins by calling tsk_tree_first() which transforms the state of the tree into the first (leftmost) tree in the sequence. If this operation is successful, tsk_tree_first() returns TSK_TREE_OK. We then check the value of ret in the loop condition to see if it is equal to TSK_TREE_OK and execute the loop body for the first tree in the sequence.

On completing the loop body for the first tree in the sequence, we then execute the for loop increment operation, which calls tsk_tree_next() and assigns the returned value to ret. This function efficiently transforms the current state of tree so that it represents the next tree along the genome, and returns TSK_TREE_OK if the operation succeeds. When tsk_tree_next() is called on the last tree in the sequence, the state of tree is set back to the null state and the return value is 0.

Thus, the loop on lines 11-14 can exit in two ways:

  1. Either we successfully iterate over all trees in the sequence and ret has the value 0 at line 15; or

  2. An error occurs during tsk_tree_first() or tsk_tree_next(), and ret contains a negative value.

Warning

It is vital that you check the value of ret immediately after the loop exits like we do here at line 15, or errors can be silently lost. (Although it’s redundant here, as we don’t do anything else in the function.)

See also

See the examples section for more examples of sequential seeking, including an example of using use tsk_tree_last() and tsk_tree_prev() to iterate from right-to-left.

Note

Seeking functions tsk_tree_first(), tsk_tree_last(), tsk_tree_next() tsk_tree_prev(), and tsk_tree_seek() can be called in any order and from any non-error state.

int tsk_tree_first(tsk_tree_t *self)#

Seek to the first tree in the sequence.

Set the state of this tree to reflect the first tree in parent tree sequence.

Parameters:
  • self – A pointer to an initialised tsk_tree_t object.

Returns:

Return TSK_TREE_OK on success; or a negative value if an error occurs.

int tsk_tree_last(tsk_tree_t *self)#

Seek to the last tree in the sequence.

Set the state of this tree to reflect the last tree in parent tree sequence.

Parameters:
  • self – A pointer to an initialised tsk_tree_t object.

Returns:

Return TSK_TREE_OK on success; or a negative value if an error occurs.

int tsk_tree_next(tsk_tree_t *self)#

Seek to the next tree in the sequence.

Set the state of this tree to reflect the next tree in parent tree sequence. If the index of the current tree is j, then the after this operation the index will be j + 1.

Calling tsk_tree_next() a tree in the null state is equivalent to calling tsk_tree_first().

Calling tsk_tree_next() on the last tree in the sequence will transform it into the null state (equivalent to calling tsk_tree_clear()).

Please see the Tree iteration examples for recommended usage.

Parameters:
  • self – A pointer to an initialised tsk_tree_t object.

Returns:

Return TSK_TREE_OK on successfully transforming to a non-null tree; 0 on successfully transforming into the null tree; or a negative value if an error occurs.

int tsk_tree_prev(tsk_tree_t *self)#

Seek to the previous tree in the sequence.

Set the state of this tree to reflect the previous tree in parent tree sequence. If the index of the current tree is j, then the after this operation the index will be j - 1.

Calling tsk_tree_prev() a tree in the null state is equivalent to calling tsk_tree_last().

Calling tsk_tree_prev() on the first tree in the sequence will transform it into the null state (equivalent to calling tsk_tree_clear()).

Please see the Tree iteration examples for recommended usage.

Parameters:
  • self – A pointer to an initialised tsk_tree_t object.

Returns:

Return TSK_TREE_OK on successfully transforming to a non-null tree; 0 on successfully transforming into the null tree; or a negative value if an error occurs.

int tsk_tree_clear(tsk_tree_t *self)#

Set the tree into the null state.

Transform this tree into the null state.

Parameters:
  • self – A pointer to an initialised tsk_tree_t object.

Returns:

Return 0 on success or a negative value on failure.

int tsk_tree_seek(tsk_tree_t *self, double position, tsk_flags_t options)#

Seek to a particular position on the genome.

Set the state of this tree to reflect the tree in parent tree sequence covering the specified position. That is, on success we will have tree.interval.left <= position and we will have position < tree.interval.right.

Seeking to a position currently covered by the tree is a constant time operation.

Parameters:
  • self – A pointer to an initialised tsk_tree_t object.

  • position – The position in genome coordinates

  • options – Seek options. Currently unused. Set to 0 for compatibility with future versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_tree_seek_index(tsk_tree_t *self, tsk_id_t tree, tsk_flags_t options)#

Seek to a specific tree in a tree sequence.

Set the state of this tree to reflect the tree in parent tree sequence whose index is 0 <= tree < num_trees.

Parameters:
  • self – A pointer to an initialised tsk_tree_t object.

  • tree – The target tree index.

  • options – Seek options. Currently unused. Set to 0 for compatibility with future versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

TSK_TREE_OK 1#

Value returned by seeking methods when they have successfully seeked to a non-null tree.

Tree queries#

tsk_size_t tsk_tree_get_num_roots(const tsk_tree_t *self)#

Returns the number of roots in this tree.

See the Roots section for more information on how the roots of a tree are defined.

Parameters:
  • self – A pointer to an initialised tsk_tree_t object.

Returns:

Returns the number roots in this tree.

tsk_id_t tsk_tree_get_left_root(const tsk_tree_t *self)#

Returns the leftmost root in this tree.

See the Roots section for more information on how the roots of a tree are defined.

This function is equivalent to tree.left_child[tree.virtual_root].

Parameters:
  • self – A pointer to an initialised tsk_tree_t object.

Returns:

Returns the leftmost root in the tree.

tsk_id_t tsk_tree_get_right_root(const tsk_tree_t *self)#

Returns the rightmost root in this tree.

See the Roots section for more information on how the roots of a tree are defined.

This function is equivalent to tree.right_child[tree.virtual_root].

Parameters:
  • self – A pointer to an initialised tsk_tree_t object.

Returns:

Returns the rightmost root in the tree.

int tsk_tree_get_sites(const tsk_tree_t *self, const tsk_site_t **sites, tsk_size_t *sites_length)#

Get the list of sites for this tree.

Gets the list of tsk_site_t objects in the parent tree sequence for which the position lies within this tree’s genomic interval.

The memory pointed to by the sites parameter is managed by the tsk_tree_t object and must not be altered or freed by client code.

static void
print_sites(const tsk_tree_t *tree)
{
    int ret;
    tsk_size_t j, num_sites;
    const tsk_site_t *sites;

    ret = tsk_tree_get_sites(tree, &sites, &num_sites);
    check_tsk_error(ret);
    for (j = 0; j < num_sites; j++) {
        printf("position = %f\n", sites[j].position);
    }
}

This is a constant time operation.

Parameters:
  • self – A pointer to a tsk_tree_t object.

  • sites – The destination pointer for the list of sites.

  • sites_length – A pointer to a tsk_size_t value in which the number of sites is stored.

Returns:

0 on success or a negative value on failure.

tsk_size_t tsk_tree_get_size_bound(const tsk_tree_t *self)#

Return an upper bound on the number of nodes reachable from the roots of this tree.

This function provides an upper bound on the number of nodes that can be reached in tree traversals, and is intended to be used for memory allocation purposes. If num_nodes is the number of nodes visited in a tree traversal from the virtual root (e.g., tsk_tree_preorder_from(tree, tree->virtual_root, nodes, &num_nodes)), the bound N returned here is guaranteed to be greater than or equal to num_nodes.

Warning

The precise value returned is not defined and should not be depended on, as it may change from version-to-version.

Parameters:
Returns:

An upper bound on the number nodes reachable from the roots of this tree, or zero if this tree has not been initialised.

void tsk_tree_print_state(const tsk_tree_t *self, FILE *out)#

Print out the state of this tree to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters:
  • self – A pointer to a tsk_tree_t object.

  • out – The stream to write the summary to.

Node queries#

int tsk_tree_get_parent(const tsk_tree_t *self, tsk_id_t u, tsk_id_t *parent)#

Returns the parent of the specified node.

Equivalent to tree.parent[u] with bounds checking for the node u. Performance sensitive code which can guarantee that the node u is valid should use the direct array access in preference to this method.

Parameters:
  • self – A pointer to a tsk_tree_t object.

  • u – The tree node.

  • parent – A tsk_id_t pointer to store the returned parent node.

Returns:

0 on success or a negative value on failure.

int tsk_tree_get_time(const tsk_tree_t *self, tsk_id_t u, double *ret_time)#

Returns the time of the specified node.

Equivalent to tables->nodes.time[u] with bounds checking for the node u. Performance sensitive code which can guarantee that the node u is valid should use the direct array access in preference to this method, for example:

static void
print_times(const tsk_tree_t *tree)
{
    int ret;
    tsk_size_t num_nodes, j;
    const double *node_time = tree->tree_sequence->tables->nodes.time;
    tsk_id_t *nodes = malloc(tsk_tree_get_size_bound(tree) * sizeof(*nodes));

    if (nodes == NULL) {
        errx(EXIT_FAILURE, "Out of memory");
    }
    ret = tsk_tree_preorder(tree, nodes, &num_nodes);
    check_tsk_error(ret);
    for (j = 0; j < num_nodes; j++) {
        printf("time = %f\n", node_time[nodes[j]]);
    }
    free(nodes);
}

Parameters:
  • self – A pointer to a tsk_tree_t object.

  • u – The tree node.

  • ret_time – A double pointer to store the returned node time.

Returns:

0 on success or a negative value on failure.

int tsk_tree_get_depth(const tsk_tree_t *self, tsk_id_t u, int *ret_depth)#

Return number of nodes on the path from the specified node to root.

Return the number of nodes on the path from u to root, not including u. The depth of a root is therefore zero.

As a special case, the depth of the virtual root is defined as -1.

Parameters:
  • self – A pointer to a tsk_tree_t object.

  • u – The tree node.

  • ret_depth – An int pointer to store the returned node depth.

Returns:

0 on success or a negative value on failure.

int tsk_tree_get_branch_length(const tsk_tree_t *self, tsk_id_t u, double *ret_branch_length)#

Return the length of the branch ancestral to the specified node.

Return the length of the branch ancestral to the specified node. Branch length is defined as difference between the time of a node and its parent. The branch length of a root is zero.

Parameters:
  • self – A pointer to a tsk_tree_t object.

  • u – The tree node.

  • ret_branch_length – A double pointer to store the returned branch length.

Returns:

0 on success or a negative value on failure.

int tsk_tree_get_total_branch_length(const tsk_tree_t *self, tsk_id_t u, double *ret_tbl)#

Computes the sum of the lengths of all branches reachable from the specified node, or from all roots if u=TSK_NULL.

Return the total branch length in a particular subtree or of the entire tree. If the specified node is TSK_NULL (or the virtual root) the sum of the lengths of all branches reachable from roots is returned. Branch length is defined as difference between the time of a node and its parent. The branch length of a root is zero.

Note that if the specified node is internal its branch length is not included, so that, e.g., the total branch length of a leaf node is zero.

Parameters:
  • self – A pointer to a tsk_tree_t object.

  • u – The root of the subtree of interest, or TSK_NULL to return the total branch length of the tree.

  • ret_tbl – A double pointer to store the returned total branch length.

Returns:

0 on success or a negative value on failure.

int tsk_tree_get_num_samples(const tsk_tree_t *self, tsk_id_t u, tsk_size_t *ret_num_samples)#

Counts the number of samples in the subtree rooted at a node.

Returns the number of samples descending from a particular node, including the node itself.

This is a constant time operation.

Parameters:
  • self – A pointer to a tsk_tree_t object.

  • u – The tree node.

  • ret_num_samples – A tsk_size_t pointer to store the returned number of samples.

Returns:

0 on success or a negative value on failure.

int tsk_tree_get_mrca(const tsk_tree_t *self, tsk_id_t u, tsk_id_t v, tsk_id_t *mrca)#

Compute the most recent common ancestor of two nodes.

If two nodes do not share a common ancestor in the current tree, the MRCA node is TSK_NULL.

Parameters:
  • self – A pointer to a tsk_tree_t object.

  • u – A tree node.

  • v – A tree node.

  • mrca – A tsk_id_t pointer to store the returned most recent common ancestor node.

Returns:

0 on success or a negative value on failure.

bool tsk_tree_is_descendant(const tsk_tree_t *self, tsk_id_t u, tsk_id_t v)#

Returns true if u is a descendant of v.

Returns true if u and v are both valid nodes in the tree sequence and v lies on the path from u to root, and false otherwise.

Any node is a descendant of itself.

Parameters:
  • self – A pointer to a tsk_tree_t object.

  • u – The descendant node.

  • v – The ancestral node.

Returns:

true if u is a descendant of v, and false otherwise.

Traversal orders#

int tsk_tree_preorder(const tsk_tree_t *self, tsk_id_t *nodes, tsk_size_t *num_nodes)#

Fill an array with the nodes of this tree in preorder.

Populate an array with the nodes in this tree in preorder. The array must be pre-allocated and be sufficiently large to hold the array of nodes visited. The recommended approach is to use the tsk_tree_get_size_bound() function, as in the following example:

static void
print_preorder(tsk_tree_t *tree)
{
    int ret;
    tsk_size_t num_nodes, j;
    tsk_id_t *nodes = malloc(tsk_tree_get_size_bound(tree) * sizeof(*nodes));

    if (nodes == NULL) {
        errx(EXIT_FAILURE, "Out of memory");
    }
    ret = tsk_tree_preorder(tree, nodes, &num_nodes);
    check_tsk_error(ret);
    for (j = 0; j < num_nodes; j++) {
        printf("Visit preorder %lld\n", (long long) nodes[j]);
    }
    free(nodes);
}

See also

See the Tree traversals section for more examples.

Parameters:
  • self – A pointer to a tsk_tree_t object.

  • nodes – The tsk_id_t array to store nodes in. See notes above for details.

  • num_nodes – A pointer to a tsk_size_t value where we store the number of nodes in the traversal.

Returns:

0 on success or a negative value on failure.

int tsk_tree_preorder_from(const tsk_tree_t *self, tsk_id_t root, tsk_id_t *nodes, tsk_size_t *num_nodes)#

Fill an array with the nodes of this tree starting from a particular node.

As for tsk_tree_preorder() but starting the traversal at a particular node (which will be the first node in the traversal list). The virtual root is a valid input for this function and will be treated like any other tree node. The value -1 is a special case, in which we visit all nodes reachable from the roots, and equivalent to calling tsk_tree_preorder().

See tsk_tree_preorder() for details the requirements for the nodes array.

Parameters:
  • self – A pointer to a tsk_tree_t object.

  • root – The root of the subtree to traverse, or -1 to visit all nodes.

  • nodes – The tsk_id_t array to store nodes in.

  • num_nodes – A pointer to a tsk_size_t value where we store the number of nodes in the traversal.

Returns:

0 on success or a negative value on failure.

int tsk_tree_postorder(const tsk_tree_t *self, tsk_id_t *nodes, tsk_size_t *num_nodes)#

Fill an array with the nodes of this tree in postorder.

Populate an array with the nodes in this tree in postorder. The array must be pre-allocated and be sufficiently large to hold the array of nodes visited. The recommended approach is to use the tsk_tree_get_size_bound() function, as in the following example:

static void
print_postorder(tsk_tree_t *tree)
{
    int ret;
    tsk_size_t num_nodes, j;
    tsk_id_t *nodes = malloc(tsk_tree_get_size_bound(tree) * sizeof(*nodes));

    if (nodes == NULL) {
        errx(EXIT_FAILURE, "Out of memory");
    }
    ret = tsk_tree_postorder(tree, nodes, &num_nodes);
    check_tsk_error(ret);
    for (j = 0; j < num_nodes; j++) {
        printf("Visit postorder %lld\n", (long long) nodes[j]);
    }
    free(nodes);
}

See also

See the Tree traversals section for more examples.

Parameters:
  • self – A pointer to a tsk_tree_t object.

  • nodes – The tsk_id_t array to store nodes in. See notes above for details.

  • num_nodes – A pointer to a tsk_size_t value where we store the number of nodes in the traversal.

Returns:

0 on success or a negative value on failure.

int tsk_tree_postorder_from(const tsk_tree_t *self, tsk_id_t root, tsk_id_t *nodes, tsk_size_t *num_nodes)#

Fill an array with the nodes of this tree starting from a particular node.

As for tsk_tree_postorder() but starting the traversal at a particular node (which will be the last node in the traversal list). The virtual root is a valid input for this function and will be treated like any other tree node. The value -1 is a special case, in which we visit all nodes reachable from the roots, and equivalent to calling tsk_tree_postorder().

See tsk_tree_postorder() for details the requirements for the nodes array.

Parameters:
  • self – A pointer to a tsk_tree_t object.

  • root – The root of the subtree to traverse, or -1 to visit all nodes.

  • nodes – The tsk_id_t array to store nodes in. See :c:func:tsk_tree_postorder for more details.

  • num_nodes – A pointer to a tsk_size_t value where we store the number of nodes in the traversal.

Returns:

0 on success or a negative value on failure.

Low-level sorting#

In some highly performance sensitive cases it can be useful to have more control over the process of sorting tables. This low-level API allows a user to provide their own edge sorting function. This can be useful, for example, to use parallel sorting algorithms, or to take advantage of the more efficient sorting procedures available in C++. It is the user’s responsibility to ensure that the edge sorting requirements are fulfilled by this function.

Todo

Create an idiomatic C++11 example where we load a table collection file from argv[1], and sort the edges using std::sort, based on the example in tests/test_minimal_cpp.cpp. We can include this in the examples below, and link to it here.

struct _tsk_table_sorter_t#

Low-level table sorting method.

Public Members

tsk_table_collection_t *tables#

The input tables that are being sorted.

int (*sort_edges)(struct _tsk_table_sorter_t *self, tsk_size_t start)#

The edge sorting function. If set to NULL, edges are not sorted.

int (*sort_mutations)(struct _tsk_table_sorter_t *self)#

The mutation sorting function.

int (*sort_individuals)(struct _tsk_table_sorter_t *self)#

The individual sorting function.

void *user_data#

An opaque pointer for use by client code.

tsk_id_t *site_id_map#

Mapping from input site IDs to output site IDs.

int tsk_table_sorter_init(struct _tsk_table_sorter_t *self, tsk_table_collection_t *tables, tsk_flags_t options)#

Initialises the memory for the sorter object.

This must be called before any operations are performed on the table sorter and initialises all fields. The edge_sort function is set to the default method using qsort. The user_data field is set to NULL. This method supports the same options as tsk_table_collection_sort().

Parameters:
  • self – A pointer to an uninitialised tsk_table_sorter_t object.

  • tables – The table collection to sort.

  • options – Sorting options.

Returns:

Return 0 on success or a negative value on failure.

int tsk_table_sorter_run(struct _tsk_table_sorter_t *self, const tsk_bookmark_t *start)#

Runs the sort using the configured functions.

Runs the sorting process:

  1. Drop the table indexes.

  2. If the sort_edges function pointer is not NULL, run it. The first parameter to the called function will be a pointer to this table_sorter_t object. The second parameter will be the value start.edges. This specifies the offset at which sorting should start in the edge table. This offset is guaranteed to be within the bounds of the edge table.

  3. Sort the site table, building the mapping between site IDs in the current and sorted tables.

  4. Sort the mutation table, using the sort_mutations pointer.

If an error occurs during the execution of a user-supplied sorting function a non-zero value must be returned. This value will then be returned by tsk_table_sorter_run. The error return value should be chosen to avoid conflicts with tskit error codes.

See tsk_table_collection_sort() for details on the start parameter.

Parameters:
  • self – A pointer to a tsk_table_sorter_t object.

  • start – The position in the tables at which sorting starts.

Returns:

Return 0 on success or a negative value on failure.

int tsk_table_sorter_free(struct _tsk_table_sorter_t *self)#

Free the internal memory for the specified table sorter.

Parameters:
  • self – A pointer to an initialised tsk_table_sorter_t object.

Returns:

Always returns 0.

Decoding genotypes#

Obtaining genotypes for samples at specific sites is achieved via tsk_variant_t and its methods.

struct tsk_variant_t#

A variant at a specific site.

Used to generate the genotypes for a given set of samples at a given site.

Public Members

const tsk_treeseq_t *tree_sequence#

Unowned reference to the tree sequence of the variant.

tsk_site_t site#

The site this variant is currently decoded at.

const char **alleles#

Array of allele strings that the genotypes of the variant refer to These are not NULL terminated - use allele_lengths for example:. printf("%.*s", (int) var->allele_lengths[j], var->alleles[j]);

tsk_size_t *allele_lengths#

Lengths of the allele strings.

tsk_size_t num_alleles#

Length of the allele array.

bool has_missing_data#

If True the genotypes of isolated nodes have been decoded to the “missing” genotype. If False they are set to the ancestral state (in the absence of mutations above them)

int32_t *genotypes#

Array of genotypes for the current site.

tsk_size_t num_samples#

Number of samples.

tsk_id_t *samples#

Array of sample ids used.

int tsk_variant_init(tsk_variant_t *self, const tsk_treeseq_t *tree_sequence, const tsk_id_t *samples, tsk_size_t num_samples, const char **alleles, tsk_flags_t options)#

Initialises the variant by allocating the internal memory.

This must be called before any operations are performed on the variant. See the API structure for details on how objects are initialised and freed.

Parameters:
  • self – A pointer to an uninitialised tsk_variant_t object.

  • tree_sequence – A pointer to the tree sequence from which this variant will decode genotypes. No copy is taken, so this tree sequence must persist for the lifetime of the variant.

  • samples – Optional. Either NULL or an array of node ids of the samples that are to have their genotypes decoded. A copy of this array will be taken by the variant. If NULL then the samples from the tree sequence will be used.

  • num_samples – The number of ids in the samples array, ignored if samples is NULL

  • alleles – Optional. Either NULL or an array of string alleles with a terminal NULL sentinel value. If specified, the genotypes will be decoded to match the index in this allele array. If NULL then alleles will be automatically determined from the mutations encountered.

  • options – Variant options. Either 0 or TSK_ISOLATED_NOT_MISSING which if specified indicates that isolated sample nodes should not be decoded as the “missing” state but as the ancestral state (or the state of any mutation above them).

Returns:

Return 0 on success or a negative value on failure.

int tsk_variant_restricted_copy(const tsk_variant_t *self, tsk_variant_t *other)#

Copies the state of this variant to another variant.

Copies the site, genotypes and alleles from this variant to another. Note that the other variant should be uninitialised as this method does not free any memory that the other variant owns. After copying other is frozen and this restricts it from being further decoded at any site. self remains unchanged.

Parameters:
  • self – A pointer to an initialised and decoded tsk_variant_t object.

  • other – A pointer to an uninitialised tsk_variant_t object.

Returns:

Return 0 on success or a negative value on failure.

int tsk_variant_decode(tsk_variant_t *self, tsk_id_t site_id, tsk_flags_t options)#

Decode the genotypes at the given site, storing them in this variant.

Decodes the genotypes for this variant’s samples, indexed to this variant’s alleles, at the specified site. This method is most efficient at decoding sites in-order, either forwards or backwards along the tree sequence. Resulting genotypes are stored in the genotypes member of this variant.

Parameters:
  • self – A pointer to an initialised tsk_variant_t object.

  • site_id – A valid site id for the tree sequence of this variant.

  • options – Bitwise option flags. Currently unused; should be set to zero to ensure compatibility with later versions of tskit.

Returns:

Return 0 on success or a negative value on failure.

int tsk_variant_free(tsk_variant_t *self)#

Free the internal memory for the specified variant.

Parameters:
Returns:

Always returns 0.

void tsk_variant_print_state(const tsk_variant_t *self, FILE *out)#

Print out the state of this variant to the specified stream.

This method is intended for debugging purposes and should not be used in production code. The format of the output should not be depended on and may change arbitrarily between versions.

Parameters:
  • self – A pointer to a tsk_variant_t object.

  • out – The stream to write the summary to.

Miscellaneous functions#

const char *tsk_strerror(int err)#

Return a description of the specified error.

The memory for the returned string is handled by the library and should not be freed by client code.

Parameters:
  • err – A tskit error code.

Returns:

A description of the error.

bool tsk_is_unknown_time(double val)#

Check if a number is TSK_UNKNOWN_TIME

Unknown time values in tskit are represented by a particular NaN value. Since NaN values are not equal to each other by definition, a simple comparison like mutation.time == TSK_UNKNOWN_TIME will fail even if the mutation’s time is TSK_UNKNOWN_TIME. This function compares the underlying bit representation of a double value and returns true iff it is equal to the specific NaN value TSK_UNKNOWN_TIME.

Parameters:
  • val – The number to check

Returns:

true if the number is TSK_UNKNOWN_TIME else false

Function Specific Options#

Load and init#

TSK_LOAD_SKIP_TABLES (1 << 0)#

Skip reading tables, and only load top-level information.

TSK_LOAD_SKIP_REFERENCE_SEQUENCE (1 << 1)#

Do not load reference sequence.

TSK_TABLE_NO_METADATA (1 << 2)#

Do not allocate space to store metadata in this table. Operations attempting to add non-empty metadata to the table will fail with error TSK_ERR_METADATA_DISABLED.

TSK_TC_NO_EDGE_METADATA (1 << 3)#

Do not allocate space to store metadata in the edge table. Operations attempting to add non-empty metadata to the edge table will fail with error TSK_ERR_METADATA_DISABLED.

tsk_treeseq_init()#

TSK_TS_INIT_BUILD_INDEXES (1 << 0)#

If specified edge indexes will be built and stored in the table collection when the tree sequence is initialised. Indexes are required for a valid tree sequence, and are not built by default for performance reasons.

tsk_treeseq_simplify(), tsk_table_collection_simplify()#

TSK_SIMPLIFY_FILTER_SITES (1 << 0)#

Remove sites from the output if there are no mutations that reference them.

TSK_SIMPLIFY_FILTER_POPULATIONS (1 << 1)#

Remove populations from the output if there are no nodes or migrations that reference them.

TSK_SIMPLIFY_FILTER_INDIVIDUALS (1 << 2)#

Remove individuals from the output if there are no nodes that reference them.

TSK_SIMPLIFY_NO_FILTER_NODES (1 << 7)#

Do not remove nodes from the output if there are no edges that reference them and do not reorder nodes so that the samples are nodes 0 to num_samples - 1. Note that this flag is negated compared to other filtering options because the default behaviour is to filter unreferenced nodes and reorder to put samples first.

TSK_SIMPLIFY_NO_UPDATE_SAMPLE_FLAGS (1 << 8)#

Do not update the sample status of nodes as a result of simplification.

TSK_SIMPLIFY_REDUCE_TO_SITE_TOPOLOGY (1 << 3)#

Reduce the topological information in the tables to the minimum necessary to represent the trees that contain sites. If there are zero sites this will result in an zero output edges. When the number of sites is greater than zero, every tree in the output tree sequence will contain at least one site. For a given site, the topology of the tree containing that site will be identical (up to node ID remapping) to the topology of the corresponding tree in the input.

TSK_SIMPLIFY_KEEP_UNARY (1 << 4)#

By default simplify removes unary nodes (i.e., nodes with exactly one child) along the path from samples to root. If this option is specified such unary nodes will be preserved in the output.

TSK_SIMPLIFY_KEEP_INPUT_ROOTS (1 << 5)#

By default simplify removes all topology ancestral the MRCAs of the samples. This option inserts edges from these MRCAs back to the roots of the input trees.

TSK_SIMPLIFY_KEEP_UNARY_IN_INDIVIDUALS (1 << 6)#

This acts like TSK_SIMPLIFY_KEEP_UNARY (and is mutually exclusive with that flag). It keeps unary nodes, but only if the unary node is referenced from an individual.

tsk_table_collection_check_integrity()#

TSK_CHECK_EDGE_ORDERING (1 << 0)#

Check edge ordering constraints for a tree sequence.

TSK_CHECK_SITE_ORDERING (1 << 1)#

Check that sites are in non-decreasing position order.

TSK_CHECK_SITE_DUPLICATES (1 << 2)#

Check for any duplicate site positions.

TSK_CHECK_MUTATION_ORDERING (1 << 3)#

Check constraints on the ordering of mutations. Any non-null mutation parents and known times are checked for ordering constraints.

TSK_CHECK_INDIVIDUAL_ORDERING (1 << 4)#

Check individual parents are before children, where specified.

TSK_CHECK_MIGRATION_ORDERING (1 << 5)#

Check migrations are ordered by time.

TSK_CHECK_INDEXES (1 << 6)#

Check that the table indexes exist, and contain valid edge references.

TSK_CHECK_TREES (1 << 7)#

All checks needed to define a valid tree sequence. Note that this implies all of the above checks.

TSK_NO_CHECK_POPULATION_REFS (1 << 12)#

Do not check integrity of references to populations. This can be safely combined with the other checks.

tsk_table_collection_clear()#

TSK_CLEAR_METADATA_SCHEMAS (1 << 0)#

Additionally clear the table metadata schemas

TSK_CLEAR_TS_METADATA_AND_SCHEMA (1 << 1)#

Additionally clear the tree-sequence metadata and schema

TSK_CLEAR_PROVENANCE (1 << 2)#

Additionally clear the provenance table

tsk_table_collection_copy()#

TSK_COPY_FILE_UUID (1 << 0)#

Copy the file uuid, by default this is not copied.

All equality functions#

TSK_CMP_IGNORE_TS_METADATA (1 << 0)#

Do not include the top-level tree sequence metadata and metadata schemas in the comparison.

TSK_CMP_IGNORE_PROVENANCE (1 << 1)#

Do not include the provenance table in comparison.

TSK_CMP_IGNORE_METADATA (1 << 2)#

Do not include metadata when comparing the table collections. This includes both the top-level tree sequence metadata as well as the metadata for each of the tables (i.e, TSK_CMP_IGNORE_TS_METADATA is implied). All metadata schemas are also ignored.

TSK_CMP_IGNORE_TIMESTAMPS (1 << 3)#

Do not include the timestamp information when comparing the provenance tables. This has no effect if TSK_CMP_IGNORE_PROVENANCE is specified.

TSK_CMP_IGNORE_TABLES (1 << 4)#

Do not include any tables in the comparison, thus comparing only the top-level information of the table collections being compared.

TSK_CMP_IGNORE_REFERENCE_SEQUENCE (1 << 5)#

Do not include the reference sequence in the comparison.

tsk_table_collection_subset()#

TSK_SUBSET_NO_CHANGE_POPULATIONS (1 << 0)#

If this flag is provided, the population table will not be changed in any way.

TSK_SUBSET_KEEP_UNREFERENCED (1 << 1)#

If this flag is provided, then unreferenced sites, individuals, and populations will not be removed. If so, the site and individual tables will not be changed, and (unless TSK_SUBSET_NO_CHANGE_POPULATIONS is also provided) unreferenced populations will be placed last, in their original order.

tsk_table_collection_union()#

TSK_UNION_NO_CHECK_SHARED (1 << 0)#

By default, union checks that the portion of shared history between self and other, as implied by other_node_mapping, are indeed equivalent. It does so by subsetting both self and other on the equivalent nodes specified in other_node_mapping, and then checking for equality of the subsets.

TSK_UNION_NO_ADD_POP (1 << 1)#

By default, all nodes new to self are assigned new populations. If this option is specified, nodes that are added to self will retain the population IDs they have in other.

Constants#

API Version#

TSK_VERSION_MAJOR 1#

The library major version. Incremented when breaking changes to the API or ABI are introduced. This includes any changes to the signatures of functions and the sizes and types of externally visible structs.

TSK_VERSION_MINOR 1#

The library minor version. Incremented when non-breaking backward-compatible changes to the API or ABI are introduced, i.e., the addition of a new function.

TSK_VERSION_PATCH 3#

The library patch version. Incremented when any changes not relevant to the to the API or ABI are introduced, i.e., internal refactors of bugfixes.

Common constants#

TSK_NODE_IS_SAMPLE 1u#

Used in node flags to indicate that a node is a sample node.

TSK_NULL ((tsk_id_t) -1)#

Null value used for cases such as absent id references.

TSK_MISSING_DATA (-1)#

Value used for missing data in genotype arrays.

TSK_UNKNOWN_TIME __tsk_nan_f()#

Value to indicate that a time is unknown. Note that this value is a non-signalling NAN whose representation differs from the NAN generated by computations such as divide by zeros.

Generic Errors#

TSK_ERR_GENERIC -1#

Generic error thrown when no other message can be generated.

TSK_ERR_NO_MEMORY -2#

Memory could not be allocated.

TSK_ERR_IO -3#

An IO error occurred.

TSK_ERR_BAD_PARAM_VALUE -4#
TSK_ERR_BUFFER_OVERFLOW -5#
TSK_ERR_UNSUPPORTED_OPERATION -6#
TSK_ERR_GENERATE_UUID -7#
TSK_ERR_EOF -8#

The file stream ended after reading zero bytes.

File format errors#

TSK_ERR_FILE_FORMAT -100#

A file could not be read because it is in the wrong format

TSK_ERR_FILE_VERSION_TOO_OLD -101#

The file is in tskit format, but the version is too old for the library to read. The file should be upgraded to the latest version using the tskit upgrade command line utility.

TSK_ERR_FILE_VERSION_TOO_NEW -102#

The file is in tskit format, but the version is too new for the library to read. To read the file you must upgrade the version of tskit.

TSK_ERR_REQUIRED_COL_NOT_FOUND -103#

A column that is a required member of a table was not found in the file.

TSK_ERR_BOTH_COLUMNS_REQUIRED -104#

One of a pair of columns that must be specified together was not found in the file.

TSK_ERR_BAD_COLUMN_TYPE -105#

An unsupported type was provided for a column in the file.

Out-of-bounds errors#

TSK_ERR_BAD_OFFSET -200#

A bad value was provided for a ragged column offset, values should start at zero and be monotonically increasing.

TSK_ERR_SEEK_OUT_OF_BOUNDS -201#

A position to seek to was less than zero or greater than the length of the genome

TSK_ERR_NODE_OUT_OF_BOUNDS -202#

A node id was less than zero or greater than the final index

TSK_ERR_EDGE_OUT_OF_BOUNDS -203#

A edge id was less than zero or greater than the final index

TSK_ERR_POPULATION_OUT_OF_BOUNDS -204#

A population id was less than zero or greater than the final index

TSK_ERR_SITE_OUT_OF_BOUNDS -205#

A site id was less than zero or greater than the final index

TSK_ERR_MUTATION_OUT_OF_BOUNDS -206#

A mutation id was less than zero or greater than the final index

TSK_ERR_INDIVIDUAL_OUT_OF_BOUNDS -207#

An individual id was less than zero or greater than the final index

TSK_ERR_MIGRATION_OUT_OF_BOUNDS -208#

A migration id was less than zero or greater than the final index

TSK_ERR_PROVENANCE_OUT_OF_BOUNDS -209#

A provenance id was less than zero or greater than the final index

TSK_ERR_TIME_NONFINITE -210#

A time value was non-finite (NaN counts as finite)

TSK_ERR_GENOME_COORDS_NONFINITE -211#

A genomic position was non-finite

TSK_ERR_KEEP_ROWS_MAP_TO_DELETED -212#

One of the rows in the retained table refers to a row that has been deleted.

TSK_ERR_POSITION_OUT_OF_BOUNDS -213#

A genomic position was less than zero or greater equal to the sequence length

Edge errors#

TSK_ERR_NULL_PARENT -300#

A parent node of an edge was TSK_NULL.

TSK_ERR_NULL_CHILD -301#

A child node of an edge was TSK_NULL.

TSK_ERR_EDGES_NOT_SORTED_PARENT_TIME -302#

The edge table was not sorted by the time of each edge’s parent nodes. Sort order is (time[parent], child, left).

TSK_ERR_EDGES_NONCONTIGUOUS_PARENTS -303#

A parent node had edges that were non-contigious.

TSK_ERR_EDGES_NOT_SORTED_CHILD -304#

The edge table was not sorted by the id of the child node of each edge. Sort order is (time[parent], child, left).

TSK_ERR_EDGES_NOT_SORTED_LEFT -305#

The edge table was not sorted by the left coordinate each edge. Sort order is (time[parent], child, left).

TSK_ERR_BAD_NODE_TIME_ORDERING -306#

An edge had child node that was older than the parent. Parent times must be greater than the child time.

TSK_ERR_BAD_EDGE_INTERVAL -307#

An edge had a genomic interval where right was greater or equal to left.

TSK_ERR_DUPLICATE_EDGES -308#

An edge was duplicated.

TSK_ERR_RIGHT_GREATER_SEQ_LENGTH -309#

An edge had a right coord greater than the genomic length.

TSK_ERR_LEFT_LESS_ZERO -310#

An edge had a left coord less than zero.

TSK_ERR_BAD_EDGES_CONTRADICTORY_CHILDREN -311#

A parent node had edges that were contradictory over an interval.

TSK_ERR_CANT_PROCESS_EDGES_WITH_METADATA -312#

A method that doesn’t support edge metadata was attempted on an edge table containing metadata.

Site errors#

TSK_ERR_UNSORTED_SITES -400#

The site table was not in order of increasing genomic position.

TSK_ERR_DUPLICATE_SITE_POSITION -401#

The site table had more than one site at a single genomic position.

TSK_ERR_BAD_SITE_POSITION -402#

A site had a position that was less than zero or greater than the sequence length.

Mutation errors#

TSK_ERR_MUTATION_PARENT_DIFFERENT_SITE -500#

A mutation had a parent mutation that was at a different site.

TSK_ERR_MUTATION_PARENT_EQUAL -501#

A mutation had a parent mutation that was itself.

TSK_ERR_MUTATION_PARENT_AFTER_CHILD -502#

A mutation had a parent mutation that had a greater id.

TSK_ERR_MUTATION_PARENT_INCONSISTENT -503#

Two or more mutation parent references formed a loop

TSK_ERR_UNSORTED_MUTATIONS -504#

The mutation table was not in the order of non-decreasing site id and non-increasing time within each site.

TSK_ERR_MUTATION_TIME_YOUNGER_THAN_NODE -506#

A mutation’s time was younger (not >=) the time of its node and wasn’t TSK_UNKNOWN_TIME.

TSK_ERR_MUTATION_TIME_OLDER_THAN_PARENT_MUTATION -507#

A mutation’s time was older (not <=) than the time of its parent mutation, and wasn’t TSK_UNKNOWN_TIME.

TSK_ERR_MUTATION_TIME_OLDER_THAN_PARENT_NODE -508#

A mutation’s time was older (not <) than the time of the parent node of the edge on which it occurs, and wasn’t TSK_UNKNOWN_TIME.

TSK_ERR_MUTATION_TIME_HAS_BOTH_KNOWN_AND_UNKNOWN -509#

A single site had a mixture of known mutation times and TSK_UNKNOWN_TIME

TSK_ERR_DISALLOWED_UNKNOWN_MUTATION_TIME -510#

Some mutations have TSK_UNKNOWN_TIME in an algorithm where that’s disallowed (use compute_mutation_times?).

Migration errors#

TSK_ERR_UNSORTED_MIGRATIONS -550#

The migration table was not sorted by time.

Sample errors#

TSK_ERR_DUPLICATE_SAMPLE -600#

A duplicate sample was specified.

TSK_ERR_BAD_SAMPLES -601#

A sample id that was not valid was specified.

Table errors#

TSK_ERR_BAD_TABLE_POSITION -700#

An invalid table position was specifed.

TSK_ERR_BAD_SEQUENCE_LENGTH -701#

A sequence length equal to or less than zero was specified.

TSK_ERR_TABLES_NOT_INDEXED -702#

The table collection was not indexed.

TSK_ERR_TABLE_OVERFLOW -703#

Tables cannot be larger than 2**31 rows.

TSK_ERR_COLUMN_OVERFLOW -704#

Ragged array columns cannot be larger than 2**64 bytes.

TSK_ERR_TREE_OVERFLOW -705#

The table collection contains more than 2**31 trees.

TSK_ERR_METADATA_DISABLED -706#

Metadata was attempted to be set on a table where it is disabled.

TSK_ERR_TABLES_BAD_INDEXES -707#

There was an error with the table’s indexes.

Genotype decoding errors#

TSK_ERR_MUST_IMPUTE_NON_SAMPLES -1100#

Genotypes were requested for non-samples at the same time as asking that isolated nodes be marked as missing. This is not supported.

TSK_ERR_ALLELE_NOT_FOUND -1101#

A user-specified allele map was used, but didn’t contain an allele found in the tree sequence.

TSK_ERR_TOO_MANY_ALLELES -1102#

More than 2147483647 alleles were specified.

TSK_ERR_ZERO_ALLELES -1103#

A user-specified allele map was used, but it contained zero alleles.

Union errors#

TSK_ERR_UNION_BAD_MAP -1400#

A node map was specified that contained a node not present in the specified table collection.

TSK_ERR_UNION_DIFF_HISTORIES -1401#

The shared portions of the specified tree sequences are not equal. Note that this may be the case if the table collections were not fully sorted before union was called.

Simplify errors#

TSK_ERR_KEEP_UNARY_MUTUALLY_EXCLUSIVE -1600#

Both TSK_SIMPLIFY_KEEP_UNARY and TSK_SIMPLIFY_KEEP_UNARY_IN_INDIVIDUALS were specified. Only one can be used.

Individual errors#

TSK_ERR_UNSORTED_INDIVIDUALS -1700#

Individuals were provided in an order where parents were after their children.

TSK_ERR_INDIVIDUAL_SELF_PARENT -1701#

An individual was its own parent.

TSK_ERR_INDIVIDUAL_PARENT_CYCLE -1702#

An individual was its own ancestor in a cycle of references.

TSK_ERR_INDIVIDUAL_POPULATION_MISMATCH -1703#

An individual had nodes from more than one population (and only one was requested).

TSK_ERR_INDIVIDUAL_TIME_MISMATCH -1704#

An individual had nodes from more than one time (and only one was requested).

Extend edges errors#

TSK_ERR_EXTEND_EDGES_BAD_MAXITER -1800#

Maximum iteration number (max_iter) must be positive.

Examples#

Basic forwards simulator#

This is an example of using the tables API to define a simple haploid Wright-Fisher simulator. Because this simple example repeatedly sorts the edge data, it is quite inefficient and should not be used as the basis of a large-scale simulator.

Note

This example uses the C function rand and constant RAND_MAX for random number generation. These methods are used for example purposes only and a high-quality random number library should be preferred for code used for research. Examples include, but are not limited to:

  1. The GNU Scientific Library, which is licensed under the GNU General Public License, version 3 (GPL3+.

  2. For C++ projects using C++11 or later, the built-in random number library.

  3. The numpy C API may be useful for those writing Python extension modules in C/C++.

Todo

Give a pointer to an example that caches and flushes edge data efficiently. Probably using the C++ API?

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <err.h>

#include <tskit/tables.h>

#define check_tsk_error(val)                                                            \
    if (val < 0) {                                                                      \
        errx(EXIT_FAILURE, "line %d: %s", __LINE__, tsk_strerror(val));                 \
    }

void
simulate(
    tsk_table_collection_t *tables, int N, int T, int simplify_interval)
{
    tsk_id_t *buffer, *parents, *children, child, left_parent, right_parent;
    double breakpoint;
    int ret, j, t, b;

    assert(simplify_interval != 0); // leads to division by zero
    buffer = malloc(2 * N * sizeof(tsk_id_t));
    if (buffer == NULL) {
        errx(EXIT_FAILURE, "Out of memory");
    }
    tables->sequence_length = 1.0;
    parents = buffer;
    for (j = 0; j < N; j++) {
        parents[j]
            = tsk_node_table_add_row(&tables->nodes, 0, T, TSK_NULL, TSK_NULL, NULL, 0);
        check_tsk_error(parents[j]);
    }
    b = 0;
    for (t = T - 1; t >= 0; t--) {
        /* Alternate between using the first and last N values in the buffer */
        parents = buffer + (b * N);
        b = (b + 1) % 2;
        children = buffer + (b * N);
        for (j = 0; j < N; j++) {
            child = tsk_node_table_add_row(
                &tables->nodes, 0, t, TSK_NULL, TSK_NULL, NULL, 0);
            check_tsk_error(child);
            /* NOTE: the use of rand() is discouraged for
             * research code and proper random number generator
             * libraries should be preferred.
             */
            left_parent = parents[(size_t)((rand()/(1.+RAND_MAX))*N)];
            right_parent = parents[(size_t)((rand()/(1.+RAND_MAX))*N)];
            do {
                breakpoint = rand()/(1.+RAND_MAX);
            } while (breakpoint == 0); /* tiny proba of breakpoint being 0 */
            ret = tsk_edge_table_add_row(
                &tables->edges, 0, breakpoint, left_parent, child, NULL, 0);
            check_tsk_error(ret);
            ret = tsk_edge_table_add_row(
                &tables->edges, breakpoint, 1, right_parent, child, NULL, 0);
            check_tsk_error(ret);
            children[j] = child;
        }
        if (t % simplify_interval == 0) {
            printf("Simplify at generation %lld: (%lld nodes %lld edges)",
                (long long) t,
                (long long) tables->nodes.num_rows,
                (long long) tables->edges.num_rows);
            /* Note: Edges must be sorted for simplify to work, and we use a brute force
             * approach of sorting each time here for simplicity. This is inefficient. */
            ret = tsk_table_collection_sort(tables, NULL, 0);
            check_tsk_error(ret);
            ret = tsk_table_collection_simplify(tables, children, N, 0, NULL);
            check_tsk_error(ret);
            printf(" -> (%lld nodes %lld edges)\n",
                (long long) tables->nodes.num_rows,
                (long long) tables->edges.num_rows);
            for (j = 0; j < N; j++) {
                children[j] = j;
            }
        }
    }
    free(buffer);
}

int
main(int argc, char **argv)
{
    int ret;
    tsk_table_collection_t tables;

    if (argc != 6) {
        errx(EXIT_FAILURE, "usage: N T simplify-interval output-file seed");
    }
    ret = tsk_table_collection_init(&tables, 0);
    check_tsk_error(ret);
    srand((unsigned)atoi(argv[5]));
    simulate(&tables, atoi(argv[1]), atoi(argv[2]), atoi(argv[3]));
    ret = tsk_table_collection_dump(&tables, argv[4], 0);
    check_tsk_error(ret);

    tsk_table_collection_free(&tables);
    return 0;
}

Tree iteration#

#include <stdio.h>
#include <stdlib.h>
#include <err.h>

#include <tskit.h>

#define check_tsk_error(val)                                                            \
    if (val < 0) {                                                                      \
        errx(EXIT_FAILURE, "line %d: %s", __LINE__, tsk_strerror(val));                 \
    }

int
main(int argc, char **argv)
{
    int ret;
    tsk_treeseq_t ts;
    tsk_tree_t tree;

    if (argc != 2) {
        errx(EXIT_FAILURE, "usage: <tree sequence file>");
    }
    ret = tsk_treeseq_load(&ts, argv[1], 0);
    check_tsk_error(ret);
    ret = tsk_tree_init(&tree, &ts, 0);
    check_tsk_error(ret);

    printf("Iterate forwards\n");
    for (ret = tsk_tree_first(&tree); ret == TSK_TREE_OK; ret = tsk_tree_next(&tree)) {
        printf("\ttree %lld has %lld roots\n",
            (long long) tree.index,
            (long long) tsk_tree_get_num_roots(&tree));
    }
    check_tsk_error(ret);

    printf("Iterate backwards\n");
    for (ret = tsk_tree_last(&tree); ret == TSK_TREE_OK; ret = tsk_tree_prev(&tree)) {
        printf("\ttree %lld has %lld roots\n",
            (long long) tree.index,
            (long long) tsk_tree_get_num_roots(&tree));
    }
    check_tsk_error(ret);

    tsk_tree_free(&tree);
    tsk_treeseq_free(&ts);
    return 0;
}

Tree traversals#

In this example we load a tree sequence file, and then traverse the first tree in four different ways:

  1. We first traverse the tree in preorder and postorder using the tsk_tree_preorder() tsk_tree_postorder() functions to fill an array of nodes in the appropriate orders. This is the recommended approach and will be convenient and efficient for most purposes.

  2. As an example of how we might build our own traveral algorithms, we then traverse the tree in preorder using recursion. This is a very common way of navigating around trees and can be convenient for some applications. For example, here we compute the depth of each node (i.e., it’s distance from the root) and use this when printing out the nodes as we visit them.

  3. Then we traverse the tree in preorder using an iterative approach. This is a little more efficient than using recursion, and is sometimes more convenient than structuring the calculation recursively.

  4. In the third example we iterate upwards from the samples rather than downwards from the root.

#include <stdio.h>
#include <stdlib.h>
#include <err.h>

#include <tskit.h>

#define check_tsk_error(val)                                                            \
    if (val < 0) {                                                                      \
        errx(EXIT_FAILURE, "line %d: %s", __LINE__, tsk_strerror(val));                 \
    }

static void
traverse_standard(const tsk_tree_t *tree)
{
    int ret;
    tsk_size_t num_nodes, j;
    tsk_id_t *nodes = malloc(tsk_tree_get_size_bound(tree) * sizeof(*nodes));

    if (nodes == NULL) {
        errx(EXIT_FAILURE, "Out of memory");
    }
    ret = tsk_tree_preorder(tree, nodes, &num_nodes);
    check_tsk_error(ret);
    for (j = 0; j < num_nodes; j++) {
        printf("Visit preorder %lld\n", (long long) nodes[j]);
    }

    ret = tsk_tree_postorder(tree, nodes, &num_nodes);
    check_tsk_error(ret);
    for (j = 0; j < num_nodes; j++) {
        printf("Visit postorder %lld\n", (long long) nodes[j]);
    }

    free(nodes);
}

static void
_traverse(const tsk_tree_t *tree, tsk_id_t u, int depth)
{
    tsk_id_t v;
    int j;

    for (j = 0; j < depth; j++) {
        printf("    ");
    }
    printf("Visit recursive %lld\n", (long long) u);
    for (v = tree->left_child[u]; v != TSK_NULL; v = tree->right_sib[v]) {
        _traverse(tree, v, depth + 1);
    }
}

static void
traverse_recursive(const tsk_tree_t *tree)
{
    _traverse(tree, tree->virtual_root, -1);
}

static void
traverse_stack(const tsk_tree_t *tree)
{
    int stack_top;
    tsk_id_t u, v;
    tsk_id_t *stack = malloc(tsk_tree_get_size_bound(tree) * sizeof(*stack));

    if (stack == NULL) {
        errx(EXIT_FAILURE, "Out of memory");
    }
    stack_top = 0;
    stack[stack_top] = tree->virtual_root;
    while (stack_top >= 0) {
        u = stack[stack_top];
        stack_top--;
        printf("Visit stack %lld\n", (long long) u);
        /* Put nodes on the stack right-to-left, so we visit in left-to-right */
        for (v = tree->right_child[u]; v != TSK_NULL; v = tree->left_sib[v]) {
            stack_top++;
            stack[stack_top] = v;
        }
    }
    free(stack);
}

static void
traverse_upwards(const tsk_tree_t *tree)
{
    const tsk_id_t *samples = tsk_treeseq_get_samples(tree->tree_sequence);
    tsk_size_t num_samples = tsk_treeseq_get_num_samples(tree->tree_sequence);
    tsk_size_t j;
    tsk_id_t u;

    for (j = 0; j < num_samples; j++) {
        u = samples[j];
        while (u != TSK_NULL) {
            printf("Visit upwards: %lld\n", (long long) u);
            u = tree->parent[u];
        }
    }
}

int
main(int argc, char **argv)
{
    int ret;
    tsk_treeseq_t ts;
    tsk_tree_t tree;

    if (argc != 2) {
        errx(EXIT_FAILURE, "usage: <tree sequence file>");
    }
    ret = tsk_treeseq_load(&ts, argv[1], 0);
    check_tsk_error(ret);
    ret = tsk_tree_init(&tree, &ts, 0);
    check_tsk_error(ret);
    ret = tsk_tree_first(&tree);
    check_tsk_error(ret);

    traverse_standard(&tree);

    traverse_recursive(&tree);

    traverse_stack(&tree);

    traverse_upwards(&tree);

    tsk_tree_free(&tree);
    tsk_treeseq_free(&ts);
    return 0;
}

File streaming#

It is often useful to read tree sequence files from a stream rather than from a fixed filename. This example shows how to do this using the tsk_table_collection_loadf() and tsk_table_collection_dumpf() functions. Here, we sequentially load table collections from the stdin stream and write them back out to stdout with their mutations removed.

#include <stdio.h>
#include <stdlib.h>
#include <tskit/tables.h>

#define check_tsk_error(val)                                                            \
    if (val < 0) {                                                                      \
        fprintf(stderr, "Error: line %d: %s\n", __LINE__, tsk_strerror(val));           \
        exit(EXIT_FAILURE);                                                             \
    }

int
main(int argc, char **argv)
{
    int ret;
    int j = 0;
    tsk_table_collection_t tables;

    ret = tsk_table_collection_init(&tables, 0);
    check_tsk_error(ret);

    while (true) {
        ret = tsk_table_collection_loadf(&tables, stdin, TSK_NO_INIT);
        if (ret == TSK_ERR_EOF) {
            break;
        }
        check_tsk_error(ret);
        fprintf(stderr, "Tree sequence %d had %lld mutations\n", j,
            (long long) tables.mutations.num_rows);
        ret = tsk_mutation_table_truncate(&tables.mutations, 0);
        check_tsk_error(ret);
        ret = tsk_table_collection_dumpf(&tables, stdout, 0);
        check_tsk_error(ret);
        j++;
    }
    tsk_table_collection_free(&tables);
    return EXIT_SUCCESS;
}

Note that we use the value TSK_ERR_EOF to detect when the stream ends, as we don’t know how many tree sequences to expect on the input. In this case, TSK_ERR_EOF is not considered an error and we exit normally.

Running this program on some tree sequence files we might get:

$ cat tmp1.trees tmp2.trees | ./build/streaming > no_mutations.trees
Tree sequence 0 had 38 mutations
Tree sequence 1 had 132 mutations

Then, running this program again on the output of the previous command, we see that we now have two tree sequences with their mutations removed stored in the file no_mutations.trees:

$ ./build/streaming < no_mutations.trees > /dev/null
Tree sequence 0 had 0 mutations
Tree sequence 1 had 0 mutations