API Reference#
This page provides a detailed explanation of all public tstrait objects and functions.
Summary#
Simulation functions#
|
Simulate quantitative traits. |
|
Simulates traits. |
|
Obtains genetic value from a trait dataframe. |
|
Simulates environmental noise. |
Effect size distributions#
|
Return a trait model corresponding to the specified model. |
|
Superclass of the trait model. |
|
Normal distribution trait model. |
|
Student's t distribution trait model. |
|
Fixed value trait model. |
|
Exponential distribution trait model. |
|
Gamma distribution trait model. |
|
Multivariate normal distribution trait model. |
Postprocessing functions#
|
Normalise phenotype dataframe. |
|
Normalise genetic value dataframe. |
Result data classes#
|
Dataclass that contains effect size dataframe and phenotype dataframe. |
Reference documentation#
Simulation functions#
- tstrait.sim_phenotype(ts, model, *, num_causal=None, causal_sites=None, alpha=None, h2=None, random_seed=None)[source]#
Simulate quantitative traits.
- Parameters:
- tstskit.TreeSequence
The tree sequence data that will be used in the quantitative trait simulation.
- modeltstrait.TraitModel
Trait model that will be used to simulate effect sizes.
- num_causalint, default None
Number of causal sites. If None, number of causal sites will be 1.
- causal_siteslist, default None
List of site IDs that have causal allele. If None, causal site IDs will be chosen randomly according to num_causal.
- alphafloat, default None
Parameter that determines the degree of the frequency dependence model. Please see frequency_dependence for details on how this parameter influences effect size simulation. If None, alpha will be 0.
- h2float or array-like, default None.
Narrow-sense heritability. When it is 1, environmental noise will be a vector of zeros. If h2 is array-like, the dimension of h2 must match the number of traits to be simulated. If None, h2 will be 1.
- random_seedint, default None
Random seed of simulation. If None, simulation will be conducted randomly.
- Returns:
- PhenotypeResult
Dataclass object that includes phenotype and trait dataframe.
- Raises:
- ValueError
If the number of mutations in ts is smaller than num_causal.
- ValueError
If h2 <= 0 or h2 > 1
See also
trait_model
Returns a trait model, which can be used as model input.
PhenotypeResult
Dataclass object that will be used as an output.
sim_trait
Used to simulate a trait dataframe.
genetic_value
Used to determine genetic value of individuals.
sim_env
Used to simulate environmental noise.
Notes
The simulation outputs of traits and phenotypes are given as a
pandas.DataFrame
.The trait dataframe can be extracted by using
.trait
in the resulting object and contains the following columns:position: Position of sites that have causal allele in genome coordinates.
site_id: Site IDs that have causal allele.
effect_size: Simulated effect size of causal allele.
causal_allele: Causal allele.
allele_freq: Allele frequency of causal allele. It is described in detail in Frequency Dependence.
trait_id: Trait ID.
The phenotype dataframe can be extracted by using
.phenotype
in the resulting object and contains the following columns:trait_id: Trait ID.
individual_id: Individual ID inside the tree sequence input.
genetic_value: Simulated genetic values.
environmental_noise: Simulated environmental noise.
phenotype: Simulated phenotype.
Please refer to Phenotype Model for mathematical details of the phenotypic model.
Examples
See Quick start for worked examples.
- tstrait.sim_trait(ts, model, *, num_causal=None, causal_sites=None, alpha=None, random_seed=None)[source]#
Simulates traits.
- Parameters:
- tstskit.TreeSequence
The tree sequence data that will be used in the quantitative trait simulation.
- modeltstrait.TraitModel
Trait model that will be used to simulate effect sizes.
- num_causalint, default None
Number of causal sites that will be randomly selected . If both num_causal and causal_sites are None, number of causal sites will be 1.
- causal_siteslist, default None
List of site IDs that have causal allele. If None, causal site IDs will be chosen randomly according to num_causal.
- alphafloat, default None
Parameter that determines the degree of the frequency dependence model. Please see frequency_dependence for details on how this parameter influences effect size simulation. If None, alpha will be 0.
- random_seedint, default None
Random seed of simulation. If None, simulation will be conducted randomly.
- Returns:
- pandas.DataFrame
Trait dataframe that includes simulated effect sizes.
- Raises:
- ValueError
If the number of mutations in ts is smaller than num_causal.
- ValueError
If both num_causal and causal_sites are specified.
- ValueError
If there are repeated values in causal_sites.
See also
trait_model
Return a trait model, which can be used as model input.
genetic_value
The trait dataframe output can be used as an input to obtain genetic values.
Notes
The simulation output is given as a
pandas.DataFrame
and contains the following columns:position: Position of sites that have causal allele in genome coordinates.
site_id: Site IDs that have causal allele. The output dataframe has sorted site IDs.
effect_size: Simulated effect size of causal allele.
causal_allele: Causal allele.
allele_freq: Allele frequency of causal allele. It is described in detail in Frequency Dependence.
trait_id: Trait ID.
Examples
See Trait simulation for worked examples.
- tstrait.genetic_value(ts, trait_df)[source]#
Obtains genetic value from a trait dataframe.
- Parameters:
- tstskit.TreeSequence
The tree sequence data that will be used in the quantitative trait simulation.
- trait_dfpandas.DataFrame
Trait dataframe.
- Returns:
- pandas.DataFrame
Pandas dataframe that includes genetic value of individuals in the tree sequence.
See also
trait_model
Return a trait model, which can be used as model input.
sim_trait
Return a trait dataframe, whch can be used as a trait_df input.
sim_env
Genetic value dataframe output can be used as an input to simulate environmental noise.
Notes
The trait_df input has some requirements that will be noted below.
Columns
The following columns must be included in trait_df:
site_id: Site IDs that have causal allele.
effect_size: Simulated effect size of causal allele.
causal_allele: Causal allele.
trait_id: Trait ID.
Data requirements
Site IDs in site_id column must be sorted in an ascending order. Please refer to
pandas.DataFrame.sort_values()
for details on sorting values in apandas.DataFrame
.Trait IDs in trait_id column must start from zero and be consecutive.
The genetic value dataframe contains the following columns:
trait_id: Trait ID.
individual_id: Individual ID inside the tree sequence input.
genetic_value: Genetic values that are obtained from the trait dataframe.
Examples
See genetic_value for worked examples.
- tstrait.sim_env(genetic_df, *, h2=None, random_seed=None)[source]#
Simulates environmental noise.
- Parameters:
- genetic_dfpandas.DataFrame
Genetic value dataframe.
- h2float or array-like, default None.
Narrow-sense heritability. When it is 1, environmental noise will be a vector of zeros. If h2 is array-like, the dimension of h2 must match the number of traits to be simulated. If None, h2 will be 1.
- random_seedint, default None
Random seed of simulation. If None, simulation will be conducted randomly.
- Returns:
- pandas.DataFrame
Dataframe with simulated environmental noise.
- Raises:
- ValueError
If h2 <= 0 or h2 > 1
See also
sim_genetic
Return a dataclass with genetic value dataframe, which can be used as genetic_df input.
Notes
The genetic_df input has some requirements that will be noted below.
Columns
The following columns must be included in genetic_df:
trait_id: Trait ID.
individual_id: Individual ID inside the tree sequence input.
genetic_value: Simulated genetic values.
Data requirement
Trait IDs in trait_id column must start from 0 and be consecutive.
The dataframe output has the following columns:
trait_id: Trait ID.
individual_id: Individual ID inside the tree sequence input.
genetic_value: Simulated genetic values.
environmental_noise: Simulated environmental noise.
phenotype: Simulated phenotype.
Examples
See Environmental noise for worked examples.
Effect size distributions#
- tstrait.trait_model(distribution, **kwargs)[source]#
Return a trait model corresponding to the specified model.
- Parameters:
- distributionstr
String describing the trait model. The list of supported distributions are: * “normal”: Normal distribution * “t”: Student’s t distribution * “fixed”: Fixed value * “exponential”: Exponential distribution * “gamma”: Gamma distribution * “multi_normal”: Multivariate normal distribution
- **kwargs
These parameters will be used to specify the trait model.
- Returns:
- TraitModel
Trait model that specifies the distribution of effect size simulation.
See also
TraitModelNormal
Return a normal distribution trait model.
TraitModelT
Return a Student’s t-distribution trait model.
TraitModelFixed
Return a fixed value trait model.
TraitModelExponential
Return an exponential distribution trait model.
TraitModelGamma
Return a gamma distribution trait model.
TraitModelMultivariateNormal
Return a multivariate normal distribution trait model.
Notes
Please reference effect_size for details on the effect size simulation. Multivariate normal distribution trait model is used in multi-trait simulation, which is described in Multi-trait simulation.
Examples
>>> import tstrait
Constructing a normal distribution trait model with mean \(0\) and variance \(1\).
>>> import tstrait >>> model = tstrait.trait_model(distribution="normal", mean=0, var=1) >>> model.name 'normal'
Constructing a student’s t-distribution trait model with mean \(0\), variance \(1\) and degrees of freedom \(1\).
>>> model = tstrait.trait_model(distribution="t", mean=0, var=1, df=1) >>> model.name 't'
Constructing a fixed value trait model with value \(1\).
>>> model = tstrait.trait_model(distribution="fixed", value=1) >>> model.name 'fixed'
Constructing an exponential distribution trait model with scale \(1\).
>>> model = tstrait.trait_model(distribution="exponential", scale=1) >>> model.name 'exponential'
Constructing an exponential distribution trait model with scale \(1\), and enable simulation of negative values.
>>> model = tstrait.trait_model(distribution="exponential", scale=1, random_sign=True)
Constructing a gamma distribution trait model with shape \(1\) and scale \(2\).
>>> model = tstrait.trait_model(distribution="gamma", shape=1, scale=2) >>> model.name 'gamma'
Constructing a gamma distribution trait model with shape \(1\), scale \(2\), and allow simulation of negative values.
>>> model = tstrait.trait_model(distribution="gamma", shape=1, scale=2, random_sign=True) >>> model.name 'gamma'
Constructing a multivariate normal distribution trait model with mean vector \([0, 0]\) and covariance matrix being an identity matrix.
>>> import numpy as np >>> model = tstrait.trait_model(distribution="multi_normal", mean=np.zeros(2), cov=np.eye(2)) >>> model.name 'multi_normal' >>> model.num_trait 2
- class tstrait.TraitModel(name)[source]#
Superclass of the trait model.
See also
trait_model
Construct a trait model.
TraitModelNormal
Return a normal distribution trait model.
TraitModelT
Return a Student’s t-distribution trait model.
TraitModelFixed
Return a fixed value trait model.
TraitModelExponential
Return an exponential distribution trait model.
TraitModelGamma
Return a gamma distribution trait model.
TraitModelMultivariateNormal
Return a multivariate normal distribution trait model.
Notes
This is the base class for all trait models in tstrait. All trait models should set all parameters in their
__init__
as arguments.- Attributes:
- namestr
Name of the trait model.
- num_traitint
Number of traits to be simulated.
- class tstrait.TraitModelNormal(mean, var)[source]#
Normal distribution trait model.
- Parameters:
- meanfloat
Mean of the simulated effect size.
- varfloat
Variance of the simulated effect size. Must be non-negative.
- Returns:
- TraitModel
Normal distribution trait model.
See also
trait_model
Construct a trait model.
numpy.random.Generator.normal
Details on the input parameters and distribution.
Notes
This is a trait model built on top of
numpy.random.Generator.normal()
, so please see its documentation for the details of the normal distribution simulation.Examples
Please see the docstring example of
trait_model()
for constructing a normal distribution trait model.
- class tstrait.TraitModelT(mean, var, df)[source]#
Student’s t distribution trait model.
- Parameters:
- meanfloat
Mean of the simulated effect size.
- varfloat
Variance of the simulated effect size. Must be > 0.
- dffloat
Degrees of freedom. Must be > 0.
- Returns:
- TraitModel
Student’s t distribution trait model.
See also
trait_model
Construct a trait model.
numpy.random.Generator.standard_t
Details on the input parameters and distribution.
Notes
This is a trait model built on top of
numpy.random.Generator.standard_t()
, so please see its documentation for the details of the normal distribution simulation.Examples
Please see the docstring example of
trait_model()
for constructing a student’s t distribution trait model.
- class tstrait.TraitModelFixed(value, random_sign=False)[source]#
Fixed value trait model.
- Parameters:
- valuefloat
Value of the simulated effect size.
- random_signbool, default False
If True, \(1\) or \(-1\) will be randomly multiplied to the simulated effect sizes, such that we can simulate constant value effect sizes with randomly chosen signs.
- Returns:
- TraitModel
Fixed value trait model.
See also
trait_model
Construct a trait model.
Notes
This is a trait model that gives the fixed value that is specified in value if random_sign is False. If it is true, this simulates effect sizes with randomly chosen signs.
Examples
Please see the docstring example of
trait_model()
for constructing a fixed value trait model.
- class tstrait.TraitModelExponential(scale, random_sign=False)[source]#
Exponential distribution trait model.
- Parameters:
- scalefloat
Scale of the exponential distribution. Must be non-negative.
- random_signbool, default False
If True, \(1\) or \(-1\) will be randomly multiplied to the simulated effect sizes, such that we can simulate effect sizes with randomly chosen signs. If False, only positive values are being simulated as part of the property of the exponential distribution.
- Returns:
- TraitModel
Exponential distribution trait model.
See also
trait_model
Construct a trait model.
numpy.random.Generator.exponential
Details on the input parameters and distribution.
Notes
This is a trait model built on top of
numpy.random.Generator.exponential()
, so please see its documentation for the details of the exponential distribution simulation.Examples
Please see the docstring example of
trait_model()
for constructing an exponential distribution trait model.
- class tstrait.TraitModelGamma(shape, scale, random_sign=False)[source]#
Gamma distribution trait model.
- Parameters:
- shapefloat
Shape of the gamma distribution. Must be non-negative.
- scalefloat
Scale of the gamma distribution. Must be non-negative.
- random_signbool, default False
If True, \(1\) or \(-1\) will be randomly multiplied to the simulated effect sizes, such that we can simulate effect sizes with randomly chosen signs. If False, only positive values are being simulated as part of the property of the gamma distribution.
- Returns:
- TraitModel
Gamma distribution trait model.
See also
trait_model
Construct a trait model.
numpy.random.Generator.gamma
Details on the input parameters and distribution.
Notes
This is a trait model built on top of
numpy.random.Generator.gamma()
, so please see its documentation for the details of the gamma distribution simulation.Examples
Please see the docstring example of
trait_model()
for constructing an gamma distribution trait model.
- class tstrait.TraitModelMultivariateNormal(mean, cov)[source]#
Multivariate normal distribution trait model.
- Parameters:
- mean1-D array_like, of length N
Mean vector.
- cov2-D array_like, of shape (N, N)
Covariance matrix. Must be symmetric and positive-semidefinite.
- Returns:
- TraitModel
Multivariate normal distribution trait model.
See also
trait_model
Construct a trait model.
numpy.random.Generator.multivariate_normal
Details on the input parameters and distribution.
Notes
Multivariate normal distribution simulation is used in multi-trait simulation, which is described in Multi-trait simulation.
This is a trait model built on top of
numpy.random.Generator.multivariate_normal()
, so please see its documentation for the details of the multivariate normal distribution simulation.The number of dimensions of mean vector and covariance matrix should match, and the length of the mean vector specifies the number of traits that will be simulated by using this model.
Examples
Please see the docstring example of
trait_model()
for constructing a multivariate normal distribution trait model.
Postprocessing functions#
- tstrait.normalise_phenotypes(phenotype_df, mean=0, var=1, ddof=1)[source]#
Normalise phenotype dataframe.
- Parameters:
- phenotype_dfpandas.DataFrame
Phenotype dataframe.
- meanfloat, default 0
Mean of the resulting phenotype.
- varfloat, default 1
Variance of the resulting phenotype.
- ddofint, default 1
Delta degrees of freedom. The divisor used in computing the variance is N - ddof, where N represents the number of elements.
- Returns:
- pandas.DataFrame
Dataframe with normalised phenotype.
- Raises:
- ValueError
If var <= 0.
Notes
The following columns must be included in phenotype_df:
trait_id: Trait ID.
individual_id: Individual ID.
phenotype: Simulated phenotypes.
The dataframe output has the following columns:
trait_id: Trait ID inside the phenotype_df input.
individual_id: Individual ID inside the phenotype_df input.
phenotype: Normalised phenotype.
Examples
See Normalise Phenotype section for worked examples.
- tstrait.normalise_genetic_value(genetic_df, mean=0, var=1, ddof=1)[source]#
Normalise genetic value dataframe.
- Parameters:
- genetic_dfpandas.DataFrame
Genetic value dataframe.
- meanfloat, default 0
Mean of the resulting genetic value.
- varfloat, default 1
Variance of the resulting genetic value.
- ddofint, default 1
Delta degrees of freedom. The divisor used in computing the variance is N - ddof, where N represents the number of elements.
- Returns:
- pandas.DataFrame
Dataframe with normalised genetic value.
- Raises:
- ValueError
If var <= 0.
Notes
The following columns must be included in genetic_df:
trait_id: Trait ID.
individual_id: Individual ID inside the tree sequence input.
genetic_value: Simulated genetic values.
The dataframe output has the following columns:
trait_id: Trait ID.
individual_id: Individual ID inside the tree sequence input.
genetic_value: Normalised genetic values.
Examples
See Normalise Genetic Value section for worked examples.
Result data classes#
- class tstrait.PhenotypeResult(trait: DataFrame, phenotype: DataFrame)[source]#
Dataclass that contains effect size dataframe and phenotype dataframe.
See also
sim_phenotype
Use this dataclass as a simulation output.
Examples
See Trait Dataframe for details on extracting the trait dataframe, and Phenotype Output for details on extracting the phenotype dataframe.
- Attributes:
- traitpandas.DataFrame
Trait dataframe that includes simulated effect sizes.
- phenotypepandas.DataFrame
Phenotype dataframe that includes simulated phenotype.