Environmental noise#
This page describes how environmental noise is simulated in tstrait. Please refer to Phenotype Model for mathematical details on the phenotype model.
Learning Objectives
After this effect size page, you will be able to:
Understand how to simulate environmental noise in tstrati
Understand how to use the user’s defined distribution to simulate environmental noise
Environmental noise simulation#
Environmental noise can be simulated by using sim_env()
. The required inputs are
- genetic_df
Genetic value dataframe. Please see Dataframe requirements for requirements.
- h2
Narrow-sense heritability.
Dataframe requirements#
The simplest way to simulate environmental noise is by using the genetic value dataframe
output of genetic_value()
. If you would like to define your own genetic value
dataframe, there are some requirements that you must follow.
Columns#
The following columns must be included in genetic_df
:
trait_id: Trait ID. This will be used in multi-trait simulation.
individual_id: Individual ID.
genetic_value: Simulated genetic value.
Data requirement#
Trait IDs in trait_id column must start from 0 and be consecutive. If you are simulating a single trait, you should be using an array that is filled with zeros.
Example#
We will be showing an example of environmental noise simulating by using a simulated tree sequence data with 10,000 individuals. The narrow-sense heritability is set to be 0.3.
See also
msprime for simulating whole genome in tree sequence data.
for simulating trait dataframe in tstrait.
Genetic value for simulating the genetic value dataframe in tstrait.
import msprime
import tstrait
ts = msprime.sim_ancestry(
samples=10_000,
recombination_rate=1e-8,
sequence_length=1_000_000,
population_size=10_000,
random_seed=5,
)
ts = msprime.sim_mutations(ts, rate=1e-8, random_seed=5)
model = tstrait.trait_model(distribution="normal", mean=0, var=1)
trait_df = tstrait.sim_trait(ts, num_causal=1000, model=model, random_seed=5)
genetic_df = tstrait.genetic_value(ts, trait_df)
phenotype_df = tstrait.sim_env(genetic_df, h2=0.3, random_seed=5)
phenotype_df.head()
trait_id | individual_id | genetic_value | environmental_noise | phenotype | |
---|---|---|---|---|---|
0 | 0 | 0 | 15.628757 | -12.099555 | 3.529202 |
1 | 0 | 1 | 9.790460 | -19.981950 | -10.191491 |
2 | 0 | 2 | 26.421206 | -3.747284 | 22.673922 |
3 | 0 | 3 | 11.717086 | 6.343685 | 18.060771 |
4 | 0 | 4 | 2.442051 | 17.140689 | 19.582740 |
The resulting dataframe has the following columns:
trait_id: Trait ID.
individual_id: Individual ID inside the tree sequence input.
genetic_value: Simulated genetic values.
environmental_noise: Simulated environmental noise.
phenotype: Simulated phenotype.
The distribution of simulated environmental noise is shown below.
import matplotlib.pyplot as plt
plt.hist(phenotype_df["environmental_noise"], bins=40)
plt.title("Environmental Noise")
plt.show()
The simulated environmental noise is following a normal distribution as expected.
User-defined environmental noise#
It would be possible for the user to define their own environmental noise, and there are several options available for the user.
Simulating from the output of genetic_value()
#
The output of genetic_value()
only includes relevant information regarding
genetic values, and it doesn’t simulate environmental noise. For example, if the
user wants to simulate environmental noise from a normal distribution with mean 0
and variance 1, it would be possible to run the following code:
import numpy as np
genetic_df = tstrait.genetic_value(ts, trait_df)
rng = np.random.default_rng(seed=50)
env_noise = rng.normal(loc=0, scale=1, size=len(genetic_df))
genetic_df["environmental_noise"] = env_noise
genetic_df["phenotype"] = (
genetic_df["environmental_noise"] + genetic_df["genetic_value"]
)
genetic_df.head()
trait_id | individual_id | genetic_value | environmental_noise | phenotype | |
---|---|---|---|---|---|
0 | 0 | 0 | 15.628757 | 0.486381 | 16.115138 |
1 | 0 | 1 | 9.790460 | -0.477392 | 9.313068 |
2 | 0 | 2 | 26.421206 | 0.669340 | 27.090546 |
3 | 0 | 3 | 11.717086 | -1.387295 | 10.329791 |
4 | 0 | 4 | 2.442051 | 1.665888 | 4.107940 |
We will be drawing random samples from a normal distribuion by using
numpy.random.Generator.normal()
.
Setting h2
to be 1#
When h2
is set to be 1 in sim_phenotype()
or sim_env()
, the
environmental noise will be a vector of zeros. After obtaining the output, the user
can define their own environmental noise.
sim_result = tstrait.sim_phenotype(
ts=ts, num_causal=100, model=model, h2=1, random_seed=1
)
sim_result.phenotype.head()
trait_id | individual_id | genetic_value | environmental_noise | phenotype | |
---|---|---|---|---|---|
0 | 0 | 0 | -4.772837 | 0.0 | -4.772837 |
1 | 0 | 1 | -4.604359 | 0.0 | -4.604359 |
2 | 0 | 2 | 0.534293 | 0.0 | 0.534293 |
3 | 0 | 3 | 2.972380 | 0.0 | 2.972380 |
4 | 0 | 4 | -1.384315 | 0.0 | -1.384315 |
We see that all values in the environmental_noise
column are zero.