Environmental noise#

This page describes how environmental noise is simulated in tstrait. Please refer to Phenotype Model for mathematical details on the phenotype model.

Learning Objectives

After this effect size page, you will be able to:

  • Understand how to simulate environmental noise in tstrati

  • Understand how to use the user’s defined distribution to simulate environmental noise

Environmental noise simulation#

Environmental noise can be simulated by using sim_env(). The required inputs are

genetic_df

Genetic value dataframe. Please see Dataframe requirements for requirements.

h2

Narrow-sense heritability.

Dataframe requirements#

The simplest way to simulate environmental noise is by using the genetic value dataframe output of genetic_value(). If you would like to define your own genetic value dataframe, there are some requirements that you must follow.

Columns#

The following columns must be included in genetic_df:

  • trait_id: Trait ID. This will be used in multi-trait simulation.

  • individual_id: Individual ID.

  • genetic_value: Simulated genetic value.

Data requirement#

Trait IDs in trait_id column must start from 0 and be consecutive. If you are simulating a single trait, you should be using an array that is filled with zeros.

Example#

We will be showing an example of environmental noise simulating by using a simulated tree sequence data with 10,000 individuals. The narrow-sense heritability is set to be 0.3.

See also

  • msprime for simulating whole genome in tree sequence data.

  • for simulating trait dataframe in tstrait.

  • Genetic value for simulating the genetic value dataframe in tstrait.

import msprime
import tstrait

ts = msprime.sim_ancestry(
    samples=10_000,
    recombination_rate=1e-8,
    sequence_length=1_000_000,
    population_size=10_000,
    random_seed=5,
)
ts = msprime.sim_mutations(ts, rate=1e-8, random_seed=5)

model = tstrait.trait_model(distribution="normal", mean=0, var=1)
trait_df = tstrait.sim_trait(ts, num_causal=1000, model=model, random_seed=5)
genetic_df = tstrait.genetic_value(ts, trait_df)

phenotype_df = tstrait.sim_env(genetic_df, h2=0.3, random_seed=5)
phenotype_df.head()
trait_id individual_id genetic_value environmental_noise phenotype
0 0 0 15.628757 -12.099555 3.529202
1 0 1 9.790460 -19.981950 -10.191491
2 0 2 26.421206 -3.747284 22.673922
3 0 3 11.717086 6.343685 18.060771
4 0 4 2.442051 17.140689 19.582740

The resulting dataframe has the following columns:

  • trait_id: Trait ID.

  • individual_id: Individual ID inside the tree sequence input.

  • genetic_value: Simulated genetic values.

  • environmental_noise: Simulated environmental noise.

  • phenotype: Simulated phenotype.

The distribution of simulated environmental noise is shown below.

import matplotlib.pyplot as plt

plt.hist(phenotype_df["environmental_noise"], bins=40)
plt.title("Environmental Noise")
plt.show()
_images/41a2771ca08f263937d6d557dee4185f3842cc73bbcb4c5179a2f2e687ebc6bd.png

The simulated environmental noise is following a normal distribution as expected.

User-defined environmental noise#

It would be possible for the user to define their own environmental noise, and there are several options available for the user.

Simulating from the output of genetic_value()#

The output of genetic_value() only includes relevant information regarding genetic values, and it doesn’t simulate environmental noise. For example, if the user wants to simulate environmental noise from a normal distribution with mean 0 and variance 1, it would be possible to run the following code:

import numpy as np

genetic_df = tstrait.genetic_value(ts, trait_df)

rng = np.random.default_rng(seed=50)
env_noise = rng.normal(loc=0, scale=1, size=len(genetic_df))
genetic_df["environmental_noise"] = env_noise
genetic_df["phenotype"] = (
    genetic_df["environmental_noise"] + genetic_df["genetic_value"]
)
genetic_df.head()
trait_id individual_id genetic_value environmental_noise phenotype
0 0 0 15.628757 0.486381 16.115138
1 0 1 9.790460 -0.477392 9.313068
2 0 2 26.421206 0.669340 27.090546
3 0 3 11.717086 -1.387295 10.329791
4 0 4 2.442051 1.665888 4.107940

We will be drawing random samples from a normal distribuion by using numpy.random.Generator.normal().

Setting h2 to be 1#

When h2 is set to be 1 in sim_phenotype() or sim_env(), the environmental noise will be a vector of zeros. After obtaining the output, the user can define their own environmental noise.

sim_result = tstrait.sim_phenotype(
    ts=ts, num_causal=100, model=model, h2=1, random_seed=1
)
sim_result.phenotype.head()
trait_id individual_id genetic_value environmental_noise phenotype
0 0 0 -4.772837 0.0 -4.772837
1 0 1 -4.604359 0.0 -4.604359
2 0 2 0.534293 0.0 0.534293
3 0 3 2.972380 0.0 2.972380
4 0 4 -1.384315 0.0 -1.384315

We see that all values in the environmental_noise column are zero.