Environmental noise

Environmental noise#

This page describes how environmental noise is simulated in tstrait. Please refer to Phenotype Model for mathematical details on the phenotype model.

Learning Objectives

After this effect size page, you will be able to:

Understand how to simulate environmental noise in tstrati
Understand how to use the user’s defined distribution to simulate environmental noise

Environmental noise simulation#

Environmental noise can be simulated by using sim_env(). The required inputs are

genetic_df: Genetic value dataframe. Please see Dataframe requirements for requirements.
h2: Narrow-sense heritability.

Dataframe requirements#

The simplest way to simulate environmental noise is by using the genetic value dataframe output of genetic_value(). If you would like to define your own genetic value dataframe, there are some requirements that you must follow.

Columns#

The following columns must be included in genetic_df:

trait_id: Trait ID. This will be used in multi-trait simulation.

individual_id: Individual ID.

genetic_value: Simulated genetic value.

Data requirement#

Trait IDs in trait_id column must start from 0 and be consecutive. If you are simulating a single trait, you should be using an array that is filled with zeros.

Example#

We will be showing an example of environmental noise simulating by using a simulated tree sequence data with 10,000 individuals. The narrow-sense heritability is set to be 0.3.

See also

msprime for simulating whole genome in tree sequence data.
sim_trait_doc for simulating trait dataframe in tstrait.
Genetic value for simulating the genetic value dataframe in tstrait.

import msprime
import tstrait

ts = msprime.sim_ancestry(
    samples=10_000,
    recombination_rate=1e-8,
    sequence_length=1_000_000,
    population_size=10_000,
    random_seed=5,
)
ts = msprime.sim_mutations(ts, rate=1e-8, random_seed=5)

model = tstrait.trait_model(distribution="normal", mean=0, var=1)
trait_df = tstrait.sim_trait(ts, num_causal=1000, model=model, random_seed=5)
genetic_df = tstrait.genetic_value(ts, trait_df)

phenotype_df = tstrait.sim_env(genetic_df, h2=0.3, random_seed=5)
phenotype_df.head()

	individual_id	genetic_value	environmental_noise	phenotype
0	0	15.628757	-12.099555	3.529202
1	1	9.790460	-19.981950	-10.191491
2	2	26.421206	-3.747284	22.673922
3	3	11.717086	6.343685	18.060771
4	4	2.442051	17.140689	19.582740

The resulting dataframe has the following columns:

trait_id: Trait ID.

individual_id: Individual ID inside the tree sequence input.

genetic_value: Simulated genetic values.

environmental_noise: Simulated environmental noise.

phenotype: Simulated phenotype.

The distribution of simulated environmental noise is shown below.

import matplotlib.pyplot as plt

plt.hist(phenotype_df["environmental_noise"], bins=40)
plt.title("Environmental Noise")
plt.show()

_images/1f288bf537ab8e1f090d0ac47e922ff1a2b1bca3ee981fb830a184b5845fd03a.png

The simulated environmental noise is following a normal distribution as expected.

User-defined environmental noise#

It would be possible for the user to define their own environmental noise, and there are several options available for the user.

Simulating from the output of `genetic_value()`#

The output of genetic_value() only includes relevant information regarding genetic values, and it doesn’t simulate environmental noise. For example, if the user wants to simulate environmental noise from a normal distribution with mean 0 and variance 1, it would be possible to run the following code:

import numpy as np

genetic_df = tstrait.genetic_value(ts, trait_df)

rng = np.random.default_rng(seed=50)
env_noise = rng.normal(loc=0, scale=1, size=len(genetic_df))
genetic_df["environmental_noise"] = env_noise
genetic_df["phenotype"] = (
    genetic_df["environmental_noise"] + genetic_df["genetic_value"]
)
genetic_df.head()

	individual_id	genetic_value	environmental_noise	phenotype
0	0	15.628757	0.486381	16.115138
1	1	9.790460	-0.477392	9.313068
2	2	26.421206	0.669340	27.090546
3	3	11.717086	-1.387295	10.329791
4	4	2.442051	1.665888	4.107940

We will be drawing random samples from a normal distribuion by using numpy.random.Generator.normal().

Setting `h2` to be 1#

When h2 is set to be 1 in sim_phenotype() or sim_env(), the environmental noise will be a vector of zeros. After obtaining the output, the user can define their own environmental noise.

sim_result = tstrait.sim_phenotype(
    ts=ts, num_causal=100, model=model, h2=1, random_seed=1
)
sim_result.phenotype.head()

	individual_id	genetic_value	phenotype
0	0	-4.772837	-4.772837
1	1	-4.604359	-4.604359
2	2	0.534293	0.534293
3	3	2.972380	2.972380
4	4	-1.384315	-1.384315

We see that all values in the environmental_noise column are zero.