API Reference#

This page provides a detailed explanation of all public tstrait objects and functions.

Summary#

Simulation functions#

sim_phenotype(ts, model, *[, num_causal, ...])

Simulate quantitative traits.

sim_trait(ts, model, *[, num_causal, ...])

Simulates traits.

genetic_value(ts, trait_df)

Obtains genetic value from a trait dataframe.

sim_env(genetic_df, *[, h2, random_seed])

Simulates environmental noise.

Effect size distributions#

trait_model(distribution, **kwargs)

Return a trait model corresponding to the specified model.

TraitModel(name)

Superclass of the trait model.

TraitModelNormal(mean, var)

Normal distribution trait model.

TraitModelT(mean, var, df)

Student's t distribution trait model.

TraitModelFixed(value[, random_sign])

Fixed value trait model.

TraitModelExponential(scale[, random_sign])

Exponential distribution trait model.

TraitModelGamma(shape, scale[, random_sign])

Gamma distribution trait model.

TraitModelMultivariateNormal(mean, cov)

Multivariate normal distribution trait model.

Postprocessing functions#

normalise_phenotypes(phenotype_df[, mean, ...])

Normalise phenotype dataframe.

normalise_genetic_value(genetic_df[, mean, ...])

Normalise genetic value dataframe.

Result data classes#

PhenotypeResult(trait, phenotype)

Dataclass that contains effect size dataframe and phenotype dataframe.

Reference documentation#

Simulation functions#

tstrait.sim_phenotype(ts, model, *, num_causal=None, causal_sites=None, alpha=None, h2=None, random_seed=None)[source]#

Simulate quantitative traits.

Parameters:
tstskit.TreeSequence

The tree sequence data that will be used in the quantitative trait simulation.

modeltstrait.TraitModel

Trait model that will be used to simulate effect sizes.

num_causalint, default None

Number of causal sites. If None, number of causal sites will be 1.

causal_siteslist, default None

List of site IDs that have causal allele. If None, causal site IDs will be chosen randomly according to num_causal.

alphafloat, default None

Parameter that determines the degree of the frequency dependence model. Please see frequency_dependence for details on how this parameter influences effect size simulation. If None, alpha will be 0.

h2float or array-like, default None.

Narrow-sense heritability. When it is 1, environmental noise will be a vector of zeros. If h2 is array-like, the dimension of h2 must match the number of traits to be simulated. If None, h2 will be 1.

random_seedint, default None

Random seed of simulation. If None, simulation will be conducted randomly.

Returns:
PhenotypeResult

Dataclass object that includes phenotype and trait dataframe.

Raises:
ValueError

If the number of mutations in ts is smaller than num_causal.

ValueError

If h2 <= 0 or h2 > 1

See also

trait_model

Returns a trait model, which can be used as model input.

PhenotypeResult

Dataclass object that will be used as an output.

sim_trait

Used to simulate a trait dataframe.

genetic_value

Used to determine genetic value of individuals.

sim_env

Used to simulate environmental noise.

Notes

The simulation outputs of traits and phenotypes are given as a pandas.DataFrame.

The trait dataframe can be extracted by using .trait in the resulting object and contains the following columns:

  • position: Position of sites that have causal allele in genome coordinates.

  • site_id: Site IDs that have causal allele.

  • effect_size: Simulated effect size of causal allele.

  • causal_allele: Causal allele.

  • allele_freq: Allele frequency of causal allele. It is described in detail in Frequency Dependence.

  • trait_id: Trait ID.

The phenotype dataframe can be extracted by using .phenotype in the resulting object and contains the following columns:

  • trait_id: Trait ID.

  • individual_id: Individual ID inside the tree sequence input.

  • genetic_value: Simulated genetic values.

  • environmental_noise: Simulated environmental noise.

  • phenotype: Simulated phenotype.

Please refer to Phenotype Model for mathematical details of the phenotypic model.

Examples

See Quick start for worked examples.

tstrait.sim_trait(ts, model, *, num_causal=None, causal_sites=None, alpha=None, random_seed=None)[source]#

Simulates traits.

Parameters:
tstskit.TreeSequence

The tree sequence data that will be used in the quantitative trait simulation.

modeltstrait.TraitModel

Trait model that will be used to simulate effect sizes.

num_causalint, default None

Number of causal sites that will be randomly selected . If both num_causal and causal_sites are None, number of causal sites will be 1.

causal_siteslist, default None

List of site IDs that have causal allele. If None, causal site IDs will be chosen randomly according to num_causal.

alphafloat, default None

Parameter that determines the degree of the frequency dependence model. Please see frequency_dependence for details on how this parameter influences effect size simulation. If None, alpha will be 0.

random_seedint, default None

Random seed of simulation. If None, simulation will be conducted randomly.

Returns:
pandas.DataFrame

Trait dataframe that includes simulated effect sizes.

Raises:
ValueError

If the number of mutations in ts is smaller than num_causal.

ValueError

If both num_causal and causal_sites are specified.

ValueError

If there are repeated values in causal_sites.

See also

trait_model

Return a trait model, which can be used as model input.

genetic_value

The trait dataframe output can be used as an input to obtain genetic values.

Notes

The simulation output is given as a pandas.DataFrame and contains the following columns:

  • position: Position of sites that have causal allele in genome coordinates.

  • site_id: Site IDs that have causal allele. The output dataframe has sorted site IDs.

  • effect_size: Simulated effect size of causal allele.

  • causal_allele: Causal allele.

  • allele_freq: Allele frequency of causal allele. It is described in detail in Frequency Dependence.

  • trait_id: Trait ID.

Examples

See Trait simulation for worked examples.

tstrait.genetic_value(ts, trait_df)[source]#

Obtains genetic value from a trait dataframe.

Parameters:
tstskit.TreeSequence

The tree sequence data that will be used in the quantitative trait simulation.

trait_dfpandas.DataFrame

Trait dataframe.

Returns:
pandas.DataFrame

Pandas dataframe that includes genetic value of individuals in the tree sequence.

See also

trait_model

Return a trait model, which can be used as model input.

sim_trait

Return a trait dataframe, whch can be used as a trait_df input.

sim_env

Genetic value dataframe output can be used as an input to simulate environmental noise.

Notes

The trait_df input has some requirements that will be noted below.

  1. Columns

The following columns must be included in trait_df:

  • site_id: Site IDs that have causal allele.

  • effect_size: Simulated effect size of causal allele.

  • causal_allele: Causal allele.

  • trait_id: Trait ID.

  1. Data requirements

    • Site IDs in site_id column must be sorted in an ascending order. Please refer to pandas.DataFrame.sort_values() for details on sorting values in a pandas.DataFrame.

    • Trait IDs in trait_id column must start from zero and be consecutive.

The genetic value dataframe contains the following columns:

  • trait_id: Trait ID.

  • individual_id: Individual ID inside the tree sequence input.

  • genetic_value: Genetic values that are obtained from the trait dataframe.

Examples

See genetic_value for worked examples.

tstrait.sim_env(genetic_df, *, h2=None, random_seed=None)[source]#

Simulates environmental noise.

Parameters:
genetic_dfpandas.DataFrame

Genetic value dataframe.

h2float or array-like, default None.

Narrow-sense heritability. When it is 1, environmental noise will be a vector of zeros. If h2 is array-like, the dimension of h2 must match the number of traits to be simulated. If None, h2 will be 1.

random_seedint, default None

Random seed of simulation. If None, simulation will be conducted randomly.

Returns:
pandas.DataFrame

Dataframe with simulated environmental noise.

Raises:
ValueError

If h2 <= 0 or h2 > 1

See also

sim_genetic

Return a dataclass with genetic value dataframe, which can be used as genetic_df input.

Notes

The genetic_df input has some requirements that will be noted below.

  1. Columns

The following columns must be included in genetic_df:

  • trait_id: Trait ID.

  • individual_id: Individual ID inside the tree sequence input.

  • genetic_value: Simulated genetic values.

  1. Data requirement

Trait IDs in trait_id column must start from 0 and be consecutive.

The dataframe output has the following columns:

  • trait_id: Trait ID.

  • individual_id: Individual ID inside the tree sequence input.

  • genetic_value: Simulated genetic values.

  • environmental_noise: Simulated environmental noise.

  • phenotype: Simulated phenotype.

Examples

See Environmental noise for worked examples.

Effect size distributions#

tstrait.trait_model(distribution, **kwargs)[source]#

Return a trait model corresponding to the specified model.

Parameters:
distributionstr

String describing the trait model. The list of supported distributions are: * “normal”: Normal distribution * “t”: Student’s t distribution * “fixed”: Fixed value * “exponential”: Exponential distribution * “gamma”: Gamma distribution * “multi_normal”: Multivariate normal distribution

**kwargs

These parameters will be used to specify the trait model.

Returns:
TraitModel

Trait model that specifies the distribution of effect size simulation.

See also

TraitModelNormal

Return a normal distribution trait model.

TraitModelT

Return a Student’s t-distribution trait model.

TraitModelFixed

Return a fixed value trait model.

TraitModelExponential

Return an exponential distribution trait model.

TraitModelGamma

Return a gamma distribution trait model.

TraitModelMultivariateNormal

Return a multivariate normal distribution trait model.

Notes

Please reference effect_size for details on the effect size simulation. Multivariate normal distribution trait model is used in multi-trait simulation, which is described in Multi-trait simulation.

Examples

>>> import tstrait

Constructing a normal distribution trait model with mean \(0\) and variance \(1\).

>>> import tstrait
>>> model = tstrait.trait_model(distribution="normal", mean=0, var=1)
>>> model.name
'normal'

Constructing a student’s t-distribution trait model with mean \(0\), variance \(1\) and degrees of freedom \(1\).

>>> model = tstrait.trait_model(distribution="t", mean=0, var=1, df=1)
>>> model.name
't'

Constructing a fixed value trait model with value \(1\).

>>> model = tstrait.trait_model(distribution="fixed", value=1)
>>> model.name
'fixed'

Constructing an exponential distribution trait model with scale \(1\).

>>> model = tstrait.trait_model(distribution="exponential", scale=1)
>>> model.name
'exponential'

Constructing an exponential distribution trait model with scale \(1\), and enable simulation of negative values.

>>> model = tstrait.trait_model(distribution="exponential", scale=1,                                     random_sign=True)

Constructing a gamma distribution trait model with shape \(1\) and scale \(2\).

>>> model = tstrait.trait_model(distribution="gamma", shape=1, scale=2)
>>> model.name
'gamma'

Constructing a gamma distribution trait model with shape \(1\), scale \(2\), and allow simulation of negative values.

>>> model = tstrait.trait_model(distribution="gamma", shape=1, scale=2,                                     random_sign=True)
>>> model.name
'gamma'

Constructing a multivariate normal distribution trait model with mean vector \([0, 0]\) and covariance matrix being an identity matrix.

>>> import numpy as np
>>> model = tstrait.trait_model(distribution="multi_normal",                                     mean=np.zeros(2), cov=np.eye(2))
>>> model.name
'multi_normal'
>>> model.num_trait
2
class tstrait.TraitModel(name)[source]#

Superclass of the trait model.

See also

trait_model

Construct a trait model.

TraitModelNormal

Return a normal distribution trait model.

TraitModelT

Return a Student’s t-distribution trait model.

TraitModelFixed

Return a fixed value trait model.

TraitModelExponential

Return an exponential distribution trait model.

TraitModelGamma

Return a gamma distribution trait model.

TraitModelMultivariateNormal

Return a multivariate normal distribution trait model.

Notes

This is the base class for all trait models in tstrait. All trait models should set all parameters in their __init__ as arguments.

Attributes:
namestr

Name of the trait model.

num_traitint

Number of traits to be simulated.

class tstrait.TraitModelNormal(mean, var)[source]#

Normal distribution trait model.

Parameters:
meanfloat

Mean of the simulated effect size.

varfloat

Variance of the simulated effect size. Must be non-negative.

Returns:
TraitModel

Normal distribution trait model.

See also

trait_model

Construct a trait model.

numpy.random.Generator.normal

Details on the input parameters and distribution.

Notes

This is a trait model built on top of numpy.random.Generator.normal(), so please see its documentation for the details of the normal distribution simulation.

Examples

Please see the docstring example of trait_model() for constructing a normal distribution trait model.

class tstrait.TraitModelT(mean, var, df)[source]#

Student’s t distribution trait model.

Parameters:
meanfloat

Mean of the simulated effect size.

varfloat

Variance of the simulated effect size. Must be > 0.

dffloat

Degrees of freedom. Must be > 0.

Returns:
TraitModel

Student’s t distribution trait model.

See also

trait_model

Construct a trait model.

numpy.random.Generator.standard_t

Details on the input parameters and distribution.

Notes

This is a trait model built on top of numpy.random.Generator.standard_t(), so please see its documentation for the details of the normal distribution simulation.

Examples

Please see the docstring example of trait_model() for constructing a student’s t distribution trait model.

class tstrait.TraitModelFixed(value, random_sign=False)[source]#

Fixed value trait model.

Parameters:
valuefloat

Value of the simulated effect size.

random_signbool, default False

If True, \(1\) or \(-1\) will be randomly multiplied to the simulated effect sizes, such that we can simulate constant value effect sizes with randomly chosen signs.

Returns:
TraitModel

Fixed value trait model.

See also

trait_model

Construct a trait model.

Notes

This is a trait model that gives the fixed value that is specified in value if random_sign is False. If it is true, this simulates effect sizes with randomly chosen signs.

Examples

Please see the docstring example of trait_model() for constructing a fixed value trait model.

class tstrait.TraitModelExponential(scale, random_sign=False)[source]#

Exponential distribution trait model.

Parameters:
scalefloat

Scale of the exponential distribution. Must be non-negative.

random_signbool, default False

If True, \(1\) or \(-1\) will be randomly multiplied to the simulated effect sizes, such that we can simulate effect sizes with randomly chosen signs. If False, only positive values are being simulated as part of the property of the exponential distribution.

Returns:
TraitModel

Exponential distribution trait model.

See also

trait_model

Construct a trait model.

numpy.random.Generator.exponential

Details on the input parameters and distribution.

Notes

This is a trait model built on top of numpy.random.Generator.exponential(), so please see its documentation for the details of the exponential distribution simulation.

Examples

Please see the docstring example of trait_model() for constructing an exponential distribution trait model.

class tstrait.TraitModelGamma(shape, scale, random_sign=False)[source]#

Gamma distribution trait model.

Parameters:
shapefloat

Shape of the gamma distribution. Must be non-negative.

scalefloat

Scale of the gamma distribution. Must be non-negative.

random_signbool, default False

If True, \(1\) or \(-1\) will be randomly multiplied to the simulated effect sizes, such that we can simulate effect sizes with randomly chosen signs. If False, only positive values are being simulated as part of the property of the gamma distribution.

Returns:
TraitModel

Gamma distribution trait model.

See also

trait_model

Construct a trait model.

numpy.random.Generator.gamma

Details on the input parameters and distribution.

Notes

This is a trait model built on top of numpy.random.Generator.gamma(), so please see its documentation for the details of the gamma distribution simulation.

Examples

Please see the docstring example of trait_model() for constructing an gamma distribution trait model.

class tstrait.TraitModelMultivariateNormal(mean, cov)[source]#

Multivariate normal distribution trait model.

Parameters:
mean1-D array_like, of length N

Mean vector.

cov2-D array_like, of shape (N, N)

Covariance matrix. Must be symmetric and positive-semidefinite.

Returns:
TraitModel

Multivariate normal distribution trait model.

See also

trait_model

Construct a trait model.

numpy.random.Generator.multivariate_normal

Details on the input parameters and distribution.

Notes

Multivariate normal distribution simulation is used in multi-trait simulation, which is described in Multi-trait simulation.

This is a trait model built on top of numpy.random.Generator.multivariate_normal(), so please see its documentation for the details of the multivariate normal distribution simulation.

The number of dimensions of mean vector and covariance matrix should match, and the length of the mean vector specifies the number of traits that will be simulated by using this model.

Examples

Please see the docstring example of trait_model() for constructing a multivariate normal distribution trait model.

Postprocessing functions#

tstrait.normalise_phenotypes(phenotype_df, mean=0, var=1, ddof=1)[source]#

Normalise phenotype dataframe.

Parameters:
phenotype_dfpandas.DataFrame

Phenotype dataframe.

meanfloat, default 0

Mean of the resulting phenotype.

varfloat, default 1

Variance of the resulting phenotype.

ddofint, default 1

Delta degrees of freedom. The divisor used in computing the variance is N - ddof, where N represents the number of elements.

Returns:
pandas.DataFrame

Dataframe with normalised phenotype.

Raises:
ValueError

If var <= 0.

Notes

The following columns must be included in phenotype_df:

  • trait_id: Trait ID.

  • individual_id: Individual ID.

  • phenotype: Simulated phenotypes.

The dataframe output has the following columns:

  • trait_id: Trait ID inside the phenotype_df input.

  • individual_id: Individual ID inside the phenotype_df input.

  • phenotype: Normalised phenotype.

Examples

See Normalise Phenotype section for worked examples.

tstrait.normalise_genetic_value(genetic_df, mean=0, var=1, ddof=1)[source]#

Normalise genetic value dataframe.

Parameters:
genetic_dfpandas.DataFrame

Genetic value dataframe.

meanfloat, default 0

Mean of the resulting genetic value.

varfloat, default 1

Variance of the resulting genetic value.

ddofint, default 1

Delta degrees of freedom. The divisor used in computing the variance is N - ddof, where N represents the number of elements.

Returns:
pandas.DataFrame

Dataframe with normalised genetic value.

Raises:
ValueError

If var <= 0.

Notes

The following columns must be included in genetic_df:

  • trait_id: Trait ID.

  • individual_id: Individual ID inside the tree sequence input.

  • genetic_value: Simulated genetic values.

The dataframe output has the following columns:

  • trait_id: Trait ID.

  • individual_id: Individual ID inside the tree sequence input.

  • genetic_value: Normalised genetic values.

Examples

See Normalise Genetic Value section for worked examples.

Result data classes#

class tstrait.PhenotypeResult(trait: DataFrame, phenotype: DataFrame)[source]#

Dataclass that contains effect size dataframe and phenotype dataframe.

See also

sim_phenotype

Use this dataclass as a simulation output.

Examples

See Trait Dataframe for details on extracting the trait dataframe, and Phenotype Output for details on extracting the phenotype dataframe.

Attributes:
traitpandas.DataFrame

Trait dataframe that includes simulated effect sizes.

phenotypepandas.DataFrame

Phenotype dataframe that includes simulated phenotype.