Multi-trait simulation

Multi-trait simulation#

This page describes how to simulate multiple correlated traits in tstrait.

Learning Objectives

After this effect size page, you will be able to:

Understand how multi-trait simulation is conducted in tstrait.
Understand the trait models and inputs of multi-trait simulation.

Pleiotropy#

tstrait supports simulation of multiple correlated traits, assuming that they are influenced by pleiotropic genes. Pleiotropy is used to describe the phenomenon that a single gene contributes to multiple traits (See here for details). The only trait model that is supported to simulate multiple traits in tstrait is multivariate normal distribution (TraitModelMultivariateNormal). We will be showing how to simulate traits by using a multivariate normal distribution trait model.

Multivariate normal distribution#

The multivariate normal distribution trait model can be specified by inputting the mean vector and covariance matrix in trait_model(). Note that the covariance matrix must be symmetric and positive-semidefinite, and the dimensions of the mean vector and covariance matrix must match.

In the following example, we will be generating a multivariate normal distribution trait model with mean vector being a vector of zeros and covariance matrix being an identity matrix.

import tstrait
import numpy as np

model = tstrait.trait_model(
    distribution="multi_normal", mean=np.zeros(2), cov=np.eye(2)
)
model.num_trait

Note that 2 traits will be simulated from this model, as the dimension of the mean vector is 2.

Multi-trait simulation#

We will now be simulating multiple traits by using a tree sequence data with 3 individuals and 2 causal sites.

import msprime

ts = msprime.sim_ancestry(
    samples=3,
    recombination_rate=1e-8,
    sequence_length=1_000_000,
    population_size=10_000,
    random_seed=5,
)
ts = msprime.sim_mutations(ts, rate=1e-8, random_seed=5)

sim_result = tstrait.sim_phenotype(
    ts=ts, num_causal=2, model=model, h2=[0.3, 0.3], random_seed=1
)

sim_result.phenotype

	trait_id	individual_id	genetic_value	environmental_noise	phenotype
0	0	0	0.905356	0.241850	1.147206
1	0	1	0.330437	0.574993	0.905430
2	0	2	0.000000	0.231250	0.231250
3	1	0	0.446375	-1.809578	-1.363203
4	1	1	-1.303157	1.257187	-0.045970
5	1	2	0.000000	0.619841	0.619841

sim_result.trait

	position	site_id	effect_size	causal_allele	allele_freq	trait_id
0	493419	474	0.330437	G	0.166667	0
1	493419	474	-1.303157	G	0.166667	1
2	523215	513	0.905356	G	0.166667	0
3	523215	513	0.446375	G	0.166667	1

Note

The dimension of narrow-sense heritability h2 must match the number of traits being simulated, and the multivariate normal distribution trait model can also be used as a model input of sim_trait() as well.

In the above example, phenotypic and effect size information of 2 traits are being simulated. The trait_id column represents the trait ID of the simulated traits.

As a next example, we will be illustrating correlated quantitative traits by simulating correlated traits with 1000 causal sites and 1000 individuals.

ts = msprime.sim_ancestry(
    samples=1000,
    recombination_rate=1e-8,
    sequence_length=1_000_000,
    population_size=10_000,
    random_seed=5,
)
ts = msprime.sim_mutations(ts, rate=1e-8, random_seed=5)

cov = np.array([[1, 0.9], [0.9, 1]])

model = tstrait.trait_model(distribution="multi_normal", mean=np.zeros(2), cov=cov)

sim_result = tstrait.sim_phenotype(
    ts=ts, num_causal=100, model=model, h2=[0.8, 0.8], random_seed=1
)

We will be showing the correlation by creating a scatterplot in matplotlib.

import matplotlib.pyplot as plt

phenotype_df = sim_result.phenotype
trait0 = phenotype_df.loc[phenotype_df["trait_id"] == 0].phenotype
trait1 = phenotype_df.loc[phenotype_df["trait_id"] == 1].phenotype
plt.scatter(trait0, trait1)
plt.xlabel("Trait 0")
plt.ylabel("Trait 1")
plt.title("Polygenic Traits")
plt.show()

_images/081fad4f1f76aa67aa3eeda0d9daf07a646a9082868f0d1fb49a9071341c460e.png

Multi-trait simulation

Contents

Multi-trait simulation#

Pleiotropy#

Multivariate normal distribution#

Multi-trait simulation#