Development#
If you would like to add some features to tskit
, this
documentation should help you get set up and contributing.
Please help us to improve the documentation by either
opening an issue or
pull request if you
see any problems.
The tskit-dev team strives to create a welcoming and open environment for contributors; please see our code of conduct for details. We wish our code and documentation to be inclusive and in particular to be gender and racially neutral.
Project structure#
Tskit is a multi-language project, which is reflected in the directory structure:
The
python
directory contains the Python library and command line interface, which is what most contributors are likely to be interested in. Please see the Python library section for details. The low-level Python C Interface is also defined here.The
c
directory contains the high-performance C library code. Please see the C Library for details on how to contribute.The
docs
directory contains the source for this documentation, which covers both the Python and C APIs. Please see the Documentation for details.
The remaining files in the root directory of the project are for controlling Continuous Integration tests providers and other administrative purposes.
Please see the Best Practices for Development section for
an overview of how to contribute a new feature to tskit
.
Getting started#
Requirements#
To develop the Python code you will need a working C compiler and a development installation of Python (>= 3.8). On Debian/Ubuntu we can install these with:
$ sudo apt install python3-dev build-essential doxygen
Python packages required for development are listed in python/requirements/development.txt
.
These can be installed using pip
::
$ python3 -m pip install -r python/requirements/development.txt
You may wish isolate your development environment using a virtualenv <https://docs.python-guide.org/dev/virtualenvs/>
_.
A few extra dependencies are required if you wish to work on the C library.
For OSX and Windows users we recommending using conda_, and isolating development in a dedicated environment as follows::
$ conda env create -f python/requirements/development.yml
$ conda activate tskit-dev
On macOS, conda builds are generally done using clang
packages that are kept up to date:
$ conda install clang_osx-64 clangxx_osx-64
In order to make sure that these compilers work correctly (e.g., so that they can find
other dependencies installed via conda
), you need to compile tskit
with this command
on versions of macOS older than “Mojave”:
$ cd python
$ CONDA_BUILD_SYSROOT=/ python3 setup.py build_ext -i
On more recent macOS releases, you may omit the CONDA_BUILD_SYSROOT
prefix.
If you run into issues with the conda compiler, be sure that your command line tools are installed and up to date (you should also reboot your system after installing CLI tools). Note that you may also have to install a specific version of the Xcode command line tools.
Note
The use of the C toolchain on macOS is a moving target. The above advice
was updated on 22 June, 2021 and was validated by a few tskit
contributors.
Caveat emptor, etc..
Environment#
To get a local git development environment, please follow these steps:
Make a fork of the tskit repo on GitHub
Clone your fork into a local directory:
$ git clone git@github.com:YOUR_GITHUB_USERNAME/tskit.git
Install the Pre-commit checks:
$ pre-commit install
If you later have trouble with these checks, you can skip them with
git commit --no-verify
.
See the Git workflow section for detailed information on the recommended way to use git and GitHub.
Workflow#
Git workflow#
If you would like to make an addition/fix to tskit, then follow the steps below to get things set up. If you would just like to review someone else’s proposed changes (either to the code or to the docs), then skip to Checking out someone else’s pull request.
Open an issue with your proposed functionality/fix. If adding or changing the public API close thought should be given to names and signatures of proposed functions. If consensus is reached that your proposed addition should be added to the codebase, proceed!
Make your own fork of the
tskit
repository on GitHub, and clone a local copy as detailed in Environment.Make sure that your local repository has been configured with an upstream remote:
$ git remote add upstream https://github.com/tskit-dev/tskit.git
Create a “topic branch” to work on. One reliable way to do it is to follow this recipe:
$ git fetch upstream $ git checkout upstream/main $ git checkout -b topic_branch_name
Write your code following the outline in Best Practices for Development. As you work on your topic branch you can add commits to it. Once you’re ready to share this, you can then open a pull request (PR). This can be done at any time! You don’t have to have code that is completely functional and tested to get feedback. Use the drop-down button to create a “draft PR” to indicate that it’s not done, and explain in the comments what feedback you need and/or what you think needs to be done.
As you code it is best to rebase your work onto the
main
branch periodically (e.g. once a week) to keep up with changes. If you mergemain
viagit pull upstream main
it will create a much more complex rebase when your code is finally ready to be incorporated into the main branch, so should be avoided.Once you’re done coding add content to the tutorial and other documentation pages if appropriate.
Update the change logs at
python/CHANGELOG.rst
andc/CHANGELOG.rst
, taking care to document any breaking changes separately in a “breaking changes” section.Push your changes to your topic branch and either open the PR or, if you opened a draft PR above change it to a non-draft PR by clicking “Ready to Review”.
The tskit community will review the code, asking you to make changes where appropriate. This usually takes at least two rounds of review.
Once the review process is complete, squash the commits to the minimal set of changes - usually one or two commits. Please follow this guide for step-by-step instructions on rebasing and squashing commits.
Your PR will be merged, time to celebrate! 🎉🍾
Checking out someone else’s pull request#
Sometimes you want to just check out someone else’s pull request,
for the purpose of trying it out and giving them feedback.
To do this, you first need your own local version of the git repository,
so you should first do steps 1 and 2 above.
(Strictly speaking, you don’t need a fork on github
if you don’t plan to edit, but it won’t hurt.)
Continuing from there, let’s say you want to check out the current
state of the code on pull request #854.
(So, below you should replace 854
with the number of the pull request
that you actually want to investigate.)
Then, continuing from above:
Fetch the pull request, and store it as a local branch. For instance, to name the local branch
my_pr_copy
:$ git fetch upstream pull/854/head:my_pr_copy
You should probably call the branch something more descriptive, though. (Also note that you might need to put
origin
instead ofupstream
for the remote repository name: seegit remote -v
for a list of possible remotes.)Check out the pull request’s local branch:
$ git checkout my_pr_copy
Now, your repository will be in exactly the same state as that of the person who’s submitted the pull request. Great! Now you can test things out.
To view the documentation,
cd docs && make
, which should build the documentation,
and then navigate your web browser to the docs/_build/html/
subdirectory.
To test out changes to the code, you can change to the python/
subdirectory,
and run make
to compile the C code.
If you then execute python
from this subdirectory (and only this one!),
it will use the modified version of the package.
(For instance, you might want to
open an interactive python
shell from the python/
subdirectory,
or running python3 -m pytest
from this subdirectory.)
After you’re done, you should do:
$ git checkout main
to get your repository back to the “main” branch of development.
If the pull request is changed and you want to do the same thing again,
then first delete your local copy (by doing git branch -d my_pr_copy
)
and repeat the steps again.
Pre-commit checks#
On each commit a pre-commit hook will run
that checks for violations of code style
(see the Code style section for details)
and other common problems.
Where possible, these hooks will try to fix any problems that they find (including reformatting
your code to conform to the required style). In this case, the commit
will not complete and report that “files were modified by this hook”.
To include the changes that the hooks made, git add
any
files that were modified and run git commit
(or, use git commit -a
to commit all changed files.)
If you would like to run the checks without committing, use pre-commit run
(but, note that this will only check changes that have been staged;
do pre-commit run --all
to check unstaged changes as well).
To bypass the checks (to save or get feedback on work-in-progress) use
git commit --no-verify
Documentation#
The documentation for tskit is written using
Sphinx
and contained in the docs
directory. The files in this directory are
markdown files that serve as an input to jupyterbook,
which allows jupyter notebook code, primarily in Python, to be automatically
executed and the output inserted before deployment. The docs are then
deployed automatically to the tskit.dev website.
API documentation for both Python and C are generated automatically from
source: documentation embedded in the source code makes use of sphinx and
the reStructuredText format to
alloow formating and cross referencing.
For the C code, a combination of Doxygen
and breathe is used to
generate API documentation.
Please help us to improve the documentation! You can check on the list of documentation issues on GitHub, and help us fix any, or add issues for anything that’s wrong or missing.
Small edits#
If you see a typo or some other small problem that you’d like to fix, this is most easily done through the GitHub UI.
If the typo is in a large section of text (like this page), go to the top of the page and click on the “Edit on GitHub” link at the top right. This will bring you to the page on GitHub for the RST source file in question. Then, click on the pencil icon on the right hand side. This will open a web editor allowing you to quickly fix the typo and submit a pull request with the changes. Fix the typo, add a commit message like “Fixed typo” and click on the green “Propose file change” button. Then follow the dialogues until you’ve created a new pull request with your changes, so that we can incorporate them.
If the change you’d like to make is in the API documentation for a particular function, then you’ll need to find where this function is defined first. The simplest way to do this is to click the green “[source]” link next to the function. This will show you a HTML rendered version of the function, and the rest of the file that it is in. You can then navigate to this file on GitHub, and edit it using the same approach as above.
Significant edits#
When making changes more substantial than typo fixes it’s best to check out a local copy. Follow the steps in the Git workflow to get a fork of tskit, a local clone and newly checked out feature branch. Then follow the steps in the Getting started section to get a working development environment.
Once you are ready to make edits to the documentation,
cd
into the docs
directory and run make
.
This should build the HTML
documentation in docs/_build/html/
, which you can then view in
your browser. As you make changes, run make
regularly and
view the final result to see if it matches your expectations.
Once you are happy with the changes, commit your updates and open a pull request on GitHub.
Tips and resources#
The reStructuredText primer is a useful general resource on rst.
See also the sphinx and rst cheatsheet
The Sphinx Python and C domains have extensive options for marking up code.
Make extensive use of cross referencing. When linking to sections in the documentation, use the
:ref:`sec_some_section_label`
form rather than matching on the section title (which is brittle). Use:meth:`.Tree.some_method`
,:func:`some_function`
etc to refer to parts of the API.
Python library#
The Python library is defined in the python
directory. We assume throughout
this section that you have cd
’d into this directory.
We also assume that the tskit
package is built and
run locally within this directory. That is, tskit
is not installed
into the Python installation using pip install -e
or setuptools
development mode.
Please see the Troubleshooting section for help
if you encounter problems with compiling or running the tests.
Getting started#
After you have installed the basic Requirements
and created a development environment,
you will need to compile the low-level Python C Interface module.
This is most easily done using make
:
$ make
If this has completed successfully you should see a file _tskit.cpython-XXXXXX.so
in the current directory (the suffix depends on your platform and Python version;
with Python 3.11 on Linux it’s _tskit.cpython-311-x86_64-linux-gnu.so
).
To make sure that your development environment is working, run some tests.
Layout#
Code for the tskit
module is in the tskit
directory. The code is split
into a number of modules that are roughly split by function; for example,
code for visualisation is kept in the tskit/drawing.py
.
Test code is contained in the tests
directory. Tests are also roughly split
by function, so that tests for the drawing
module are in the
tests/test_drawing.py
file. This is not a one-to-one mapping, though.
The requirements
directory contains descriptions of the requirements
needed for development and on various
Continuous Integration tests providers.
Code style#
Python code in tskit is formatted using Black. Any code submitted as a pull request will be checked to see if it conforms to this format as part of the Continuous Integration tests. Black is very strict, which seems unhelpful and nitpicky at first but is actually very useful. This is because it can also automatically format code for you, removing tedious and subjective choices (and even more tedious and subjective discussions!) about exactly how a particular code block should be aligned and indented.
In addition to Black autoformatting, code is checked for common problems using flake8
Black autoformatting and flake8 checks are performed as part of the pre-commit checks, which ensures that your code is always formatted correctly.
Vim users may find the black, and vim-flake8 plugins useful for automatically formatting code and lint checking within vim. There is good support for Black in a number of other editors.
Tests#
The tests are defined in the tests
directory, and run using
pytest from the python
directory.
If you want to run the tests in a particular module (say, test_tables.py
), use:
$ python3 -m pytest tests/test_tables.py
To run all the tests in a particular class in this module (say, TestNodeTable
)
use:
$ python3 -m pytest tests/test_tables.py::TestNodeTable
To run a specific test case in this class (say, test_copy
) use:
$ python3 -m pytest tests/test_tables.py::TestNodeTable::test_copy
You can also run tests with a keyword expression search. For example this will
run all tests that have TestNodeTable
but not copy
in their name:
$ python3 -m pytest -k "TestNodeTable and not copy"
When developing your own tests, it is much quicker to run the specific tests that you are developing rather than rerunning large sections of the test suite each time.
To run all of the tests, we can use:
$ python3 -m pytest
By default the tests are run on 4 cores, if you have more you can specify:
$ python3 -m pytest -n8
A few of the tests take most of the time, we can skip the slow tests to get the test run under 20 seconds on an modern workstation:
$ python3 -m pytest --skip-slow
If you have a lot of failing tests it can be useful to have a shorter summary of the failing lines:
$ python3 -m pytest --tb=line
If you need to see the output of tests (e.g. print
statements) then you need to use
these flags to run a single thread and capture output:
$ python3 -m pytest -n0 -vs
All new code must have high test coverage, which will be checked as part of the Continuous Integration tests tests by CodeCov. All tests must pass for a PR to be accepted.
Packaging#
The tskit
Python module follows the current
best-practices advocated by the
Python Packaging Authority. The primary means of
distribution is though PyPI, which provides the
canonical source for each release.
A package for conda is also available on conda-forge.
Interfacing with low-level module#
Much of the high-level Python code only exists to provide a simpler interface to
the low-level _tskit module.
As such, many objects (e.g. Tree
)
are really just a shallow layer on top of the corresponding low-level object.
The usual convention here is to keep a reference to the low-level object via
a private instance variable such as self._ll_tree
.
Command line interface#
The command line interface for tskit
is defined in the tskit/cli.py
file.
The CLI has a single entry point (e.g. tskit_main
) which is invoked to run the
program. These entry points are registered with setuptools
using the
console_scripts
argument in setup.py
, which allows them to be deployed as
first-class executable programs in a cross-platform manner.
The CLI can also be run using python3 -m tskit
. This is the recommended
approach for running the CLI during development.
Installing development versions#
We strongly recommend that you do not install development versions of
tskit
and instead use versions released to PyPI and conda-forge.
However, if you really need to be on the bleeding edge, you can use
the following command to install:
$ python3 -m pip install git+https://github.com/tskit-dev/tskit.git#subdirectory=python
(Because the Python package is not defined in the project root directory, using pip to
install directly from GitHub requires you to specify subdirectory=python
.)
Troubleshooting#
If
make
is giving you strange errors, or if tests are failing for strange reasons, try runningmake clean
in the project root and then rebuilding.Beware of multiple versions of the python library installed by different programs (e.g., pip versus installing locally from source)! In python,
tskit.__file__
will tell you the location of the package that is being used.
C Library#
The Python module uses the high-performance tskit C API
behind the scenes. All C code and associated development infrastructure
is held in the c
directory.
Requirements#
We use the meson build system in conjunction with ninja-build to compile the C code. Unit tests use the CUnit library and we use clang-format to automatically format code. On Debian/Ubuntu, these can be installed using
$ sudo apt install libcunit1-dev ninja-build meson clang-format-6.0
Notes:
A more recent version of meson can alternatively be installed using
pip
, if you wish.Recent versions of Debian do not have clang-format-6.0 available; if so, you can install it instead with
pip
by runningpip3 install clang-format==6.0.1 && ln -s clang-format $(which clang-format)-6.0
.
Conda users can install the basic requirements from python/requirements/development.txt
.
Unfortunately clang-format is not available on conda, but it is not essential.
Code style#
C code is formatted using clang-format with a custom configuration and version 6.0. This is checked as part of the pre-commit checks. To manually format run:
$ clang-format-6.0 -i c/tskit/* c/tests/*.c c/tests/*.h
Vim users may find the vim-clang-format plugin useful for automatically formatting code.
Building#
We use meson and ninja-build to
compile the C code. Meson keeps all compiled binaries in a build directory (this has many advantages
such as allowing multiple builds with different options to coexist). The build configuration
is defined in meson.build
. To set up the initial build
directory, run
$ cd c
$ meson build
To compile the code run
$ ninja -C build
All the tests and other artefacts are in the build directory. Individual test
suites can be run, via (e.g.) ./build/test_trees
. To run all of the tests,
run
$ ninja -C build test
For vim users, the mesonic plugin simplifies this process and allows code to be compiled seamlessly within the editor.
Unit Tests#
The C-library has an extensive suite of unit tests written using CUnit. These tests aim to establish that the low-level APIs work correctly over a variety of inputs, and particularly, that the tests don’t result in leaked memory or illegal memory accesses. All tests are run under valgrind to make sure of this as part of the Continuous Integration tests.
Tests are defined in the tests/*.c
files. These are roughly split by
the source files, so that the tests for functionality in the tskit/tables.c
file
will be tested in tests/test_tables.c
.
To run all the tests
in the test_tables
suite, run (e.g.) ./build/test_tables
.
To just run a specific test on its own, provide
this test name as a command line argument, e.g.:
$ ./build/test_tables test_node_table
While 100% test coverage is not feasible for C code, we aim to cover all code that can be reached. (Some classes of error such as malloc failures and IO errors are difficult to simulate in C.) Code coverage statistics are automatically tracked using CodeCov.
Coding conventions#
The code is written using the C99 standard. All variable declarations should be done at the start of a function, and functions kept short and simple where at all possible.
No global or module level variables are used for production code.
Function parameters should be marked as const
where possible.
Parameters that are used as return variables should come last.
The common options
parameter should be the last non-output
parameter.
Please see the API structure section for more information about how the API is structured.
Error handling#
A critical element of producing reliable C programs is consistent error handling and checking of return values. All return values must be checked! In tskit, all functions (except the most trivial accessors) return an integer to indicate success or failure. Any negative value is an error, and must be handled accordingly. The following pattern is canonical:
ret = tsk_tree_do_something(self, argument);
if (ret != 0) {
goto out;
}
// rest of function
out:
return ret;
Here we test the return value of tsk_tree_do_something
and if it is non-zero,
abort the function and return this same value from the current function. This
is a bit like throwing an exception in higher-level languages, but discipline
is required to ensure that the error codes are propagated back to the original
caller correctly.
Particular care must be taken in functions that allocate memory, because we must ensure that this memory is freed in all possible success and failure scenarios. The following pattern is used throughout for this purpose:
double *x = NULL;
x = malloc(n * sizeof(double));
if (x == NULL) {
ret = TSK_ERR_NO_MEMORY;
goto out;
}
// rest of function
out:
tsk_safe_free(x);
return ret;
It is vital here that x
is initialised to NULL
so that we are guaranteed
correct behaviour in all cases. For this reason, the convention is to declare all
pointer variables on a single line and to initialise them to NULL
as part
of the declaration.
Error codes are defined in core.h
, and these can be translated into a
message using tsk_strerror(err)
.
Using assertions#
There are two different ways to express assertions in tskit code.
The first is using the custom tsk_bug_assert
macro, which is used to
make inexpensive checks at key points during execution. These assertions
are always run, regardless of the compiler settings, and should not
contribute significantly to the overall runtime.
More expensive assertions, used, for example, to check pre and post conditions
on performance critical loops should be expressed using the standard
assert
macro from assert.h
. These assertions will be checked
during the execution of C unit tests, but will not be enabled when
compiled into the Python C module.
Type conventions#
tsk_id_t
is an ID for any entity in a table.tsk_size_t
refers to any size or count values in tskit.size_t
is a standard C type and refers to the size of a memory block. This should only be used when computing memory block sizes for functions likemalloc
or passing the size of a memory buffer as a parameter.Error indicators (the return type of most functions) are
int
.uint32_t
etc should be avoided (any that exist are a leftover from older code that didn’t usetsk_size_t
etc.)int64_t
anduint64_t
are sometimes useful when working with bitstrings (e.g. to implement a set).
Python C Interface#
Overview#
The Python C interface is defined in the python
directory
and written using the Python C API.
The source code for this interface is in the _tskitmodule.c
file.
When compiled, this produces the _tskit
module,
which is imported by the high-level Python code. The low-level Python module is
not intended to be used directly by users and may change arbitrarily over time.
The usual pattern in the low-level Python API is to define a Python class
which corresponds to a given “class” in the C API. For example, we define
a TreeSequence
class, which is essentially a thin wrapper around the
tsk_tree_t
type from the C library.
The _tskitmodule.c
file follows the standard conventions given in the
Python documentation.
Compiling and debugging#
The setup.py
file describes the requirements for the low-level _tskit
module and how it is built from source. The simplest way to compile the
low-level module is to run make
in the python
directory:
$ make
If make
is not available, you can run the same command manually:
$ python3 setup.py build_ext --inplace
It is sometimes useful to specify compiler flags when building the low level module. For example, to make a debug build you can use:
$ CFLAGS='-Wall -O0 -g' make
If you need to track down a segfault etc, running some code through gdb can be very useful. For example, to run a particular test case, we can do:
$ gdb python3
(gdb) run -m pytest tests/test_lowlevel.py
(gdb) run -m pytest -vs tests/test_tables.py::TestNodeTable::test_copy
Starting program: /usr/bin/python3 run -m pytest tests/test_tables.py::TestNodeTable::test_copy
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff1e48700 (LWP 1503)]
[New Thread 0x7fffef647700 (LWP 1504)]
[New Thread 0x7fffeee46700 (LWP 1505)]
[Thread 0x7fffeee46700 (LWP 1505) exited]
[Thread 0x7fffef647700 (LWP 1504) exited]
[Thread 0x7ffff1e48700 (LWP 1503) exited]
collected 1 item
tests/test_tables.py::TestNodeTable::test_copy PASSED
[Inferior 1 (process 1499) exited normally]
(gdb)
Tracing problems in C code is many times more difficult when the Python C API is involved because of the complexity of Python’s memory management. It is nearly always best to start by making sure that the tskit C API part of your addition is thoroughly tested with valgrind before resorting to the debugger.
Testing for memory leaks#
The Python C API can be subtle, and it is easy to get the reference counting wrong.
The stress_lowlevel.py
script makes it easier to track down memory leaks
when they do occur. The script runs the unit tests in a loop, and outputs
memory usage statistics.
Continuous Integration tests#
A number of different continuous integration providers are used, which run different combinations of tests on different platforms, as well as running various checks for code quality.
A Github action runs some code style and quality checks along with running the Python test suite on Linux, OSX and Windows. It uses conda for those dependencies which are tricky to compile on all systems. An additional action builds the docs and posts a link to preview them.
CircleCI runs all Python tests using the apt-get infrastructure for system requirements. We also runs C tests, compiled using gcc and clang, and check for memory leaks using valgrind.
CodeCov_ tracks test coverage in Python and C.
PyUp Runs monthly checks on the Python dependencies listed in the requirements files, which are pinned to ensure CI reproducibility. PyUp opens one PR a month with updated pins.
Best Practices for Development#
The following is a rough guide of best practices for contributing a function to the tskit codebase.
Note that this guide covers the most complex case of adding a new function to both the C and Python APIs.
Write your function in Python: in
python/tests/
find the test module that pertains to the functionality you wish to add. For instance, the kc_distance metric was added to test_topology.py. Add a python version of your function here.Create a new class in this module to write unit tests for your function: in addition to making sure that your function is correct, make sure it fails on inappropriate inputs. This can often require judgement. For instance,
Tree.kc_distance()
fails on a tree with multiple roots, but allows users to input parameter values that are nonsensical, as long as they don’t break functionality. See the TestKCMetric for example.Write your function in C: check out the C API for guidance. There are also many examples in the c directory. Your function will probably go in trees.c.
Write a few tests for your function in C: again, write your tests in tskit/c/tests/test_tree.c. The key here is code coverage, you don’t need to worry as much about covering every corner case, as we will proceed to link this function to the Python tests you wrote earlier.
Create a low-level definition of your function using Python’s C API: this will go in _tskitmodule.c.
Test your low-level implementation in tskit/python/tests/test_lowlevel.py : again, these tests don’t need to be as comprehensive as your first python tests, instead, they should focus on the interface, e.g., does the function behave correctly on malformed inputs?
Link your C function to the Python API: write a function in tskit’s Python API, for example the kc_distance function lives in tskit/python/tskit/trees.py.
Modify your Python tests to test the new C-linked function: if you followed the example of other tests, you might need to only add a single line of code here. In this case, the tests are well factored so that we can easily compare the results from both the Python and C versions.
Write a docstring for your function in the Python API: for instance, the kc_distance docstring is in tskit/python/tskit/trees.py. Ensure that your docstring renders correctly by building the documentation (see Documentation).
Troubleshooting#
pre-commit is blocking me!#
You might be having a hard time committing because of the “pre-commit” checks
(described above). First, consider: the pre-commit hooks are supposed to make your life easier,
not add a layer of frustration to contributing.
So, you should feel free to just ask git to skip the pre-commit!
There’s no shame in a broken build - you can get it fixed up (and we’ll help)
before it’s merged into the rest of the project.
To skip, just append --no-verify
to the git commit
command.
Below are some more specific situations.
pre-commit complains about files I didn’t edit#
For instance, suppose you have not edited util.py
and yet:
> git commit -a -m 'dev docs'
python/tskit/util.py:117:26: E203 whitespace before ':'
python/tskit/util.py:135:31: E203 whitespace before ':'
python/tskit/util.py:195:23: E203 whitespace before ':'
python/tskit/util.py:213:36: E203 whitespace before ':'
... lots more, gah, what is this ...
First, check (with git status
) that you didn’t actually edit util.py
.
Then, you should not try to fix these errors; this is not your problem.
You might first try restarting your pre-commit, by running
pre-commit clean
pre-commit gc
You might also check you don’t have other pre-commit hook files in .git/hooks
.
If this doesn’t fix the problem,
then you should just skip the pre-commit (but alert us to the problem),
by appending --no-verify
:
> git commit -a -m 'dev docs' --no-verify
[main 46f3f2e] dev docs
1 file changed, 43 insertions(+)
Now you can go ahead and push your changes!
pre-commit won’t run#
For instance:
> git commit -a -m 'fixed all the things'
/usr/bin/env: ‘python3.8’: No such file or directory
What the heck? Why is this even looking for python3.8?
This is because of the “pre-commit hook”, mentioned above.
As above, you can proceed by just appending --no-verify
:
> git commit -a -m 'fixed all the things' --no-verify
[main 99a01da] fixed all the things
1 file changed, 10 insertions(+)
We’ll help you sort it out in the PR. But, you should fix the problem at some point. In this case, uninstalling and reinstalling the pre-commit hooks fixed the problem:
> pre-commit uninstall
pre-commit uninstalled
Restored previous hooks to .git/hooks/pre-commit
> pre-commit install -f
pre-commit installed at .git/hooks/pre-commit
> # do some more edits
> git commit -a -m 'wrote the docs'
[main 79b81ff] fixed all the things
1 file changed, 42 insertions(+)
Benchmarking#
Tskit has a simple benchmarking tool to help keep track of performance.
Running benchmarks#
The benchmark suite can be run with:
> cd python/benchmark
> python run.py
A subset of benchmarks can be run by specifying a string. For example, the following command runs all the benchmarks whose names contain “genotype”, e.g. “genotype_matrix”.
> python run.py -k genotype
If desired, the results of the benchmarks can be printed to STDOUT.
> python run.py -k genotype -p
Results are written to bench-results.json
in the same folder. Note that if any version of tskit
is installed then that will be used for the benchmarking. To use the local development version of
tskit ensure you have pip uninstall tskit
before running the benchmarking. The version used is
shown in the header of the report.
Adding a new benchmark#
The benchmarks are specified by the config.yaml
file in python/benchmark
. To add a new benchmark
add an entry to the benchmarks
dictionary. For example:
- code: do_my_thing({option_name})
setup: |
import a_module
name: my_benchmark #optional, the code is used by default
parameters:
option_name:
- "reticulate_splines"
- "foobar"
Strings are interpreted as Python f-strings, so you can use the parameters
dictionary to provide
values that will be interpolated into both the setup
and code
strings.
The suite can be run for all released versions with the run-for-all-releases.py
script.
Releasing a new version#
Tskit maintains separate versioning for the C API and Python package, each has its own release process.
C API#
To release the C API, the TSK_VERSION_*
macros should be updated,
along with VERSION.txt
and the changelog updated with the release
date and version. The changelog should also be checked for
completeness. Comparing git log --follow --oneline -- c
with
git log --follow --oneline -- c/CHANGELOG.rst
may help here.
After the commit including these changes has been merged, tag a
release on GitHub using the pattern C_MAJOR.MINOR.PATCH
, with:
git tag -a C_MAJOR.MINOR.PATCH -m "C API version C_MAJOR.MINOR.PATCH"
git push upstream --tags
After a couple of minutes a github action will make a draft release with the changelog at the releases page. Check it looks right and publish the release (Click on the little pencil). After release, start a section in the changelog for new developments and close the GitHub issue milestone of the release.
Python#
It is worth running the benchmarks as above before release to check for any unexpected
major regressions. To make a release first prepare a pull request that sets the correct
version number in tskit/_version.py
following PEP440 format. For a normal release
this should be MAJOR.MINOR.PATCH, for a beta release use MAJOR.MINOR.PATCHbX
e.g. 1.0.0b1. Update the Python CHANGELOG.rst, ensuring that all significant
changes since the last release have been listed. Comparing
git log --follow --oneline -- python
with git log --follow --oneline -- python/CHANGELOG.rst
may help here.
Once this PR is merged, push a tag to github:
git tag -a MAJOR.MINOR.PATCH -m "Python version MAJOR.MINOR.PATCH"
git push upstream --tags
This will trigger a build of the distribution artifacts for Python
on Github Actions. and deploy
them to the test PyPI. Check
the release looks good there, then publish the draft release on the
releases page (Click on the little pencil).
Publishing this release will cause the github
action to deploy to the production PyPI.
After release, start a section in the changelog for new developments, close the
GitHub issue milestone of the release and update ROADMAP.md.
For a major release the website (github repo tskit-dev/tskit-site) should then
be updated with a notebook of new features. The benchmarks should be run as above
and the bench-results.html
updated on the website.