API Reference

API Reference#

This page provides a detailed explanation of all public tstrait objects and functions.

Summary#

Simulation functions#

`sim_phenotype`(ts, model, *[, num_causal, ...])	Simulate quantitative traits.
`sim_trait`(ts, model, *[, num_causal, ...])	Simulates traits.
`genetic_value`(ts, trait_df)	Obtains genetic value from a trait dataframe.
`sim_env`(genetic_df, *[, h2, random_seed])	Simulates environmental noise.

Effect size distributions#

`trait_model`(distribution, **kwargs)	Return a trait model corresponding to the specified model.
`TraitModel`(name)	Superclass of the trait model.
`TraitModelNormal`(mean, var)	Normal distribution trait model.
`TraitModelT`(mean, var, df)	Student's t distribution trait model.
`TraitModelFixed`(value[, random_sign])	Fixed value trait model.
`TraitModelExponential`(scale[, random_sign])	Exponential distribution trait model.
`TraitModelGamma`(shape, scale[, random_sign])	Gamma distribution trait model.
`TraitModelMultivariateNormal`(mean, cov)	Multivariate normal distribution trait model.

Postprocessing functions#

`normalise_phenotypes`(phenotype_df[, mean, ...])	Normalise phenotype dataframe.
`normalise_genetic_value`(genetic_df[, mean, ...])	Normalise genetic value dataframe.

Result data classes#

PhenotypeResult(trait, phenotype)

Dataclass that contains effect size dataframe and phenotype dataframe.

Reference documentation#

Simulation functions#

tstrait.sim_phenotype(ts, model, *, num_causal=None, causal_sites=None, alpha=None, h2=None, random_seed=None)[source]#

Simulate quantitative traits.

Parameters:

tstskit.TreeSequence: The tree sequence data that will be used in the quantitative trait simulation.
modeltstrait.TraitModel: Trait model that will be used to simulate effect sizes.
num_causalint, default None: Number of causal sites. If None, number of causal sites will be 1.
causal_siteslist, default None: List of site IDs that have causal allele. If None, causal site IDs will be chosen randomly according to num_causal.
alphafloat, default None: Parameter that determines the degree of the frequency dependence model. Please see frequency_dependence for details on how this parameter influences effect size simulation. If None, alpha will be 0.
h2float or array-like, default None.: Narrow-sense heritability. When it is 1, environmental noise will be a vector of zeros. If h2 is array-like, the dimension of h2 must match the number of traits to be simulated. If None, h2 will be 1.
random_seedint, default None: Random seed of simulation. If None, simulation will be conducted randomly.

Returns:

PhenotypeResult: Dataclass object that includes phenotype and trait dataframe.

Raises:

ValueError: If the number of mutations in ts is smaller than num_causal.
ValueError: If h2 <= 0 or h2 > 1

See also

trait_model: Returns a trait model, which can be used as model input.
PhenotypeResult: Dataclass object that will be used as an output.
sim_trait: Used to simulate a trait dataframe.
genetic_value: Used to determine genetic value of individuals.
sim_env: Used to simulate environmental noise.

Notes

The simulation outputs of traits and phenotypes are given as a pandas.DataFrame.

The trait dataframe can be extracted by using .trait in the resulting object and contains the following columns:

position: Position of sites that have causal allele in genome coordinates.

site_id: Site IDs that have causal allele.

effect_size: Simulated effect size of causal allele.

causal_allele: Causal allele.

allele_freq: Allele frequency of causal allele. It is described in detail in Frequency Dependence.

trait_id: Trait ID.

The phenotype dataframe can be extracted by using .phenotype in the resulting object and contains the following columns:

trait_id: Trait ID.

individual_id: Individual ID inside the tree sequence input.

genetic_value: Simulated genetic values.

environmental_noise: Simulated environmental noise.

phenotype: Simulated phenotype.

Please refer to Phenotype Model for mathematical details of the phenotypic model.

Examples

See Quick start for worked examples.

tstrait.sim_trait(ts, model, *, num_causal=None, causal_sites=None, alpha=None, random_seed=None)[source]#

Simulates traits.

Parameters:

tstskit.TreeSequence: The tree sequence data that will be used in the quantitative trait simulation.
modeltstrait.TraitModel: Trait model that will be used to simulate effect sizes.
num_causalint, default None: Number of causal sites that will be randomly selected . If both num_causal and causal_sites are None, number of causal sites will be 1.
causal_siteslist, default None: List of site IDs that have causal allele. If None, causal site IDs will be chosen randomly according to num_causal.
alphafloat, default None: Parameter that determines the degree of the frequency dependence model. Please see frequency_dependence for details on how this parameter influences effect size simulation. If None, alpha will be 0.
random_seedint, default None: Random seed of simulation. If None, simulation will be conducted randomly.

Returns:

pandas.DataFrame: Trait dataframe that includes simulated effect sizes.

Raises:

ValueError: If the number of mutations in ts is smaller than num_causal.
ValueError: If both num_causal and causal_sites are specified.
ValueError: If there are repeated values in causal_sites.

See also

trait_model: Return a trait model, which can be used as model input.
genetic_value: The trait dataframe output can be used as an input to obtain genetic values.

Notes

The simulation output is given as a pandas.DataFrame and contains the following columns:

position: Position of sites that have causal allele in genome coordinates.

site_id: Site IDs that have causal allele. The output dataframe has sorted site IDs.

effect_size: Simulated effect size of causal allele.

causal_allele: Causal allele.

allele_freq: Allele frequency of causal allele. It is described in detail in Frequency Dependence.

trait_id: Trait ID.

Examples

See Trait simulation for worked examples.

tstrait.genetic_value(ts, trait_df)[source]#

Obtains genetic value from a trait dataframe.

Parameters:

tstskit.TreeSequence: The tree sequence data that will be used in the quantitative trait simulation.
trait_dfpandas.DataFrame: Trait dataframe.

Returns:

pandas.DataFrame: Pandas dataframe that includes genetic value of individuals in the tree sequence.

See also

trait_model: Return a trait model, which can be used as model input.
sim_trait: Return a trait dataframe, whch can be used as a trait_df input.
sim_env: Genetic value dataframe output can be used as an input to simulate environmental noise.

Notes

The trait_df input has some requirements that will be noted below.

Columns

The following columns must be included in trait_df:

site_id: Site IDs that have causal allele.

effect_size: Simulated effect size of causal allele.

causal_allele: Causal allele.

trait_id: Trait ID.

Data requirements
- Site IDs in site_id column must be sorted in an ascending order. Please refer to pandas.DataFrame.sort_values() for details on sorting values in a pandas.DataFrame.
- Trait IDs in trait_id column must start from zero and be consecutive.

The genetic value dataframe contains the following columns:

trait_id: Trait ID.

individual_id: Individual ID inside the tree sequence input.

genetic_value: Genetic values that are obtained from the trait dataframe.

Examples

See genetic_value for worked examples.

tstrait.sim_env(genetic_df, *, h2=None, random_seed=None)[source]#

Simulates environmental noise.

Parameters:

genetic_dfpandas.DataFrame: Genetic value dataframe.
h2float or array-like, default None.: Narrow-sense heritability. When it is 1, environmental noise will be a vector of zeros. If h2 is array-like, the dimension of h2 must match the number of traits to be simulated. If None, h2 will be 1.
random_seedint, default None: Random seed of simulation. If None, simulation will be conducted randomly.

Returns:

pandas.DataFrame: Dataframe with simulated environmental noise.

Raises:

ValueError: If h2 <= 0 or h2 > 1

See also

sim_genetic: Return a dataclass with genetic value dataframe, which can be used as genetic_df input.

Notes

The genetic_df input has some requirements that will be noted below.

Columns

The following columns must be included in genetic_df:

trait_id: Trait ID.

individual_id: Individual ID inside the tree sequence input.

genetic_value: Simulated genetic values.

Data requirement

Trait IDs in trait_id column must start from 0 and be consecutive.

The dataframe output has the following columns:

trait_id: Trait ID.

individual_id: Individual ID inside the tree sequence input.

genetic_value: Simulated genetic values.

environmental_noise: Simulated environmental noise.

phenotype: Simulated phenotype.

Examples

See Environmental noise for worked examples.

Effect size distributions#

tstrait.trait_model(distribution, **kwargs)[source]#

Return a trait model corresponding to the specified model.

Parameters:

distributionstr: String describing the trait model. The list of supported distributions are: * “normal”: Normal distribution * “t”: Student’s t distribution * “fixed”: Fixed value * “exponential”: Exponential distribution * “gamma”: Gamma distribution * “multi_normal”: Multivariate normal distribution
**kwargs: These parameters will be used to specify the trait model.

Returns:

TraitModel: Trait model that specifies the distribution of effect size simulation.

See also

TraitModelNormal: Return a normal distribution trait model.
TraitModelT: Return a Student’s t-distribution trait model.
TraitModelFixed: Return a fixed value trait model.
TraitModelExponential: Return an exponential distribution trait model.
TraitModelGamma: Return a gamma distribution trait model.
TraitModelMultivariateNormal: Return a multivariate normal distribution trait model.

Notes

Please reference effect_size for details on the effect size simulation. Multivariate normal distribution trait model is used in multi-trait simulation, which is described in Multi-trait simulation.

Examples

>>> import tstrait

Constructing a normal distribution trait model with mean \(0\) and variance \(1\).

>>> import tstrait
>>> model = tstrait.trait_model(distribution="normal", mean=0, var=1)
>>> model.name
'normal'

Constructing a student’s t-distribution trait model with mean \(0\), variance \(1\) and degrees of freedom \(1\).

>>> model = tstrait.trait_model(distribution="t", mean=0, var=1, df=1)
>>> model.name
't'

Constructing a fixed value trait model with value \(1\).

>>> model = tstrait.trait_model(distribution="fixed", value=1)
>>> model.name
'fixed'

Constructing an exponential distribution trait model with scale \(1\).

>>> model = tstrait.trait_model(distribution="exponential", scale=1)
>>> model.name
'exponential'

Constructing an exponential distribution trait model with scale \(1\), and enable simulation of negative values.

>>> model = tstrait.trait_model(distribution="exponential", scale=1,                                     random_sign=True)

Constructing a gamma distribution trait model with shape \(1\) and scale \(2\).

>>> model = tstrait.trait_model(distribution="gamma", shape=1, scale=2)
>>> model.name
'gamma'

Constructing a gamma distribution trait model with shape \(1\), scale \(2\), and allow simulation of negative values.

>>> model = tstrait.trait_model(distribution="gamma", shape=1, scale=2,                                     random_sign=True)
>>> model.name
'gamma'

Constructing a multivariate normal distribution trait model with mean vector \([0, 0]\) and covariance matrix being an identity matrix.

>>> import numpy as np
>>> model = tstrait.trait_model(distribution="multi_normal",                                     mean=np.zeros(2), cov=np.eye(2))
>>> model.name
'multi_normal'
>>> model.num_trait
2

class tstrait.TraitModel(name)[source]#

Superclass of the trait model.

Attributes:

namestr: Name of the trait model.
num_traitint: Number of traits to be simulated.

See also

trait_model: Construct a trait model.
TraitModelNormal: Return a normal distribution trait model.
TraitModelT: Return a Student’s t-distribution trait model.
TraitModelFixed: Return a fixed value trait model.
TraitModelExponential: Return an exponential distribution trait model.
TraitModelGamma: Return a gamma distribution trait model.
TraitModelMultivariateNormal: Return a multivariate normal distribution trait model.

Notes

This is the base class for all trait models in tstrait. All trait models should set all parameters in their __init__ as arguments.

class tstrait.TraitModelNormal(mean, var)[source]#

Normal distribution trait model.

Parameters:

meanfloat: Mean of the simulated effect size.
varfloat: Variance of the simulated effect size. Must be non-negative.

Returns:

TraitModel: Normal distribution trait model.

See also

trait_model: Construct a trait model.
numpy.random.Generator.normal: Details on the input parameters and distribution.

Notes

This is a trait model built on top of numpy.random.Generator.normal(), so please see its documentation for the details of the normal distribution simulation.

Examples

Please see the docstring example of trait_model() for constructing a normal distribution trait model.

class tstrait.TraitModelT(mean, var, df)[source]#

Student’s t distribution trait model.

Parameters:

meanfloat: Mean of the simulated effect size.
varfloat: Variance of the simulated effect size. Must be > 0.
dffloat: Degrees of freedom. Must be > 0.

Returns:

TraitModel: Student’s t distribution trait model.

See also

trait_model: Construct a trait model.
numpy.random.Generator.standard_t: Details on the input parameters and distribution.

Notes

This is a trait model built on top of numpy.random.Generator.standard_t(), so please see its documentation for the details of the normal distribution simulation.

Examples

Please see the docstring example of trait_model() for constructing a student’s t distribution trait model.

class tstrait.TraitModelFixed(value, random_sign=False)[source]#

Fixed value trait model.

Parameters:

valuefloat: Value of the simulated effect size.
random_signbool, default False: If True, \(1\) or \(-1\) will be randomly multiplied to the simulated effect sizes, such that we can simulate constant value effect sizes with randomly chosen signs.

Returns:

TraitModel: Fixed value trait model.

See also

trait_model: Construct a trait model.

Notes

This is a trait model that gives the fixed value that is specified in value if random_sign is False. If it is true, this simulates effect sizes with randomly chosen signs.

Examples

Please see the docstring example of trait_model() for constructing a fixed value trait model.

class tstrait.TraitModelExponential(scale, random_sign=False)[source]#

Exponential distribution trait model.

Parameters:

scalefloat: Scale of the exponential distribution. Must be non-negative.
random_signbool, default False: If True, \(1\) or \(-1\) will be randomly multiplied to the simulated effect sizes, such that we can simulate effect sizes with randomly chosen signs. If False, only positive values are being simulated as part of the property of the exponential distribution.

Returns:

TraitModel: Exponential distribution trait model.

See also

trait_model: Construct a trait model.
numpy.random.Generator.exponential: Details on the input parameters and distribution.

Notes

This is a trait model built on top of numpy.random.Generator.exponential(), so please see its documentation for the details of the exponential distribution simulation.

Examples

Please see the docstring example of trait_model() for constructing an exponential distribution trait model.

class tstrait.TraitModelGamma(shape, scale, random_sign=False)[source]#

Gamma distribution trait model.

Parameters:

shapefloat: Shape of the gamma distribution. Must be non-negative.
scalefloat: Scale of the gamma distribution. Must be non-negative.
random_signbool, default False: If True, \(1\) or \(-1\) will be randomly multiplied to the simulated effect sizes, such that we can simulate effect sizes with randomly chosen signs. If False, only positive values are being simulated as part of the property of the gamma distribution.

Returns:

TraitModel: Gamma distribution trait model.

See also

trait_model: Construct a trait model.
numpy.random.Generator.gamma: Details on the input parameters and distribution.

Notes

This is a trait model built on top of numpy.random.Generator.gamma(), so please see its documentation for the details of the gamma distribution simulation.

Examples

Please see the docstring example of trait_model() for constructing an gamma distribution trait model.

class tstrait.TraitModelMultivariateNormal(mean, cov)[source]#

Multivariate normal distribution trait model.

Parameters:

mean1-D array_like, of length N: Mean vector.
cov2-D array_like, of shape (N, N): Covariance matrix. Must be symmetric and positive-semidefinite.

Returns:

TraitModel: Multivariate normal distribution trait model.

See also

trait_model: Construct a trait model.
numpy.random.Generator.multivariate_normal: Details on the input parameters and distribution.

Notes

Multivariate normal distribution simulation is used in multi-trait simulation, which is described in Multi-trait simulation.

This is a trait model built on top of numpy.random.Generator.multivariate_normal(), so please see its documentation for the details of the multivariate normal distribution simulation.

The number of dimensions of mean vector and covariance matrix should match, and the length of the mean vector specifies the number of traits that will be simulated by using this model.

Examples

Please see the docstring example of trait_model() for constructing a multivariate normal distribution trait model.

Postprocessing functions#

tstrait.normalise_phenotypes(phenotype_df, mean=0, var=1, ddof=1)[source]#

Normalise phenotype dataframe.

Parameters:

phenotype_dfpandas.DataFrame: Phenotype dataframe.
meanfloat, default 0: Mean of the resulting phenotype.
varfloat, default 1: Variance of the resulting phenotype.
ddofint, default 1: Delta degrees of freedom. The divisor used in computing the variance is N - ddof, where N represents the number of elements.

Returns:

pandas.DataFrame: Dataframe with normalised phenotype.

Raises:

ValueError: If var <= 0.

Notes

The following columns must be included in phenotype_df:

trait_id: Trait ID.

individual_id: Individual ID.

phenotype: Simulated phenotypes.

The dataframe output has the following columns:

trait_id: Trait ID inside the phenotype_df input.

individual_id: Individual ID inside the phenotype_df input.

phenotype: Normalised phenotype.

Examples

See Normalise Phenotype section for worked examples.

tstrait.normalise_genetic_value(genetic_df, mean=0, var=1, ddof=1)[source]#

Normalise genetic value dataframe.

Parameters:

genetic_dfpandas.DataFrame: Genetic value dataframe.
meanfloat, default 0: Mean of the resulting genetic value.
varfloat, default 1: Variance of the resulting genetic value.
ddofint, default 1: Delta degrees of freedom. The divisor used in computing the variance is N - ddof, where N represents the number of elements.

Returns:

pandas.DataFrame: Dataframe with normalised genetic value.

Raises:

ValueError: If var <= 0.

Notes

The following columns must be included in genetic_df:

trait_id: Trait ID.

individual_id: Individual ID inside the tree sequence input.

genetic_value: Simulated genetic values.

The dataframe output has the following columns:

trait_id: Trait ID.

individual_id: Individual ID inside the tree sequence input.

genetic_value: Normalised genetic values.

Examples

See Normalise Genetic Value section for worked examples.

Result data classes#

class tstrait.PhenotypeResult(trait: DataFrame, phenotype: DataFrame)[source]#

Dataclass that contains effect size dataframe and phenotype dataframe.

Attributes:

traitpandas.DataFrame: Trait dataframe that includes simulated effect sizes.
phenotypepandas.DataFrame: Phenotype dataframe that includes simulated phenotype.

See also

sim_phenotype: Use this dataclass as a simulation output.

Examples

See Trait Dataframe for details on extracting the trait dataframe, and Phenotype Output for details on extracting the phenotype dataframe.

API Reference

Contents

API Reference#

Summary#

Simulation functions#

Effect size distributions#

Postprocessing functions#

Result data classes#

Reference documentation#

Simulation functions#

Effect size distributions#

Postprocessing functions#

Result data classes#