Chapter 2 Simulation Approach for Individual-Level Topic Weights

2.1 Model Description

2.1.1 Hierarchical Model Structure: Topic Weights (\(\eta_{ik}(t)\))

We model the individual-level topic weights using Gaussian Processes (GPs) with genetic effects. The prior for the topic weights \(\eta_{ik}(t)\) is given by:

\[ \eta_{ik}(t) \sim GP(\gamma_k \mathbf{g}_i, K_{\eta,k}(t, t')) \]

Here: - \(\eta_{ik}(t)\) represents the topic weight for individual \(i\) and topic \(k\) at time \(t\). - \(\gamma_k \mathbf{g}_i\) represents the genetic effects for individual \(i\) on topic \(k\). - \(K_{\eta,k}(t, t')\) is the covariance function for the GP, which is shared across all individuals for a given topic \(k\).

The observed noisy weights \(\hat{w}_{ik}\) are modeled as:

\[ \hat{w}_{ik}(t) \mid \eta_{ik}(t) \sim N(\eta_{ik}(t), \sigma^2_{\text{noise}} \mathbf{I}) \]

The posterior distribution of the topic weights given the observed data is:

\[ \eta_{ik}(t) \mid \text{data} \sim GP(\mu_{ik,\text{posterior}}(t) + \gamma_k \mathbf{g}_i, K_{ik,\text{posterior}}(t, t')) \]

Here, \(\mu_{ik,\text{posterior}}(t)\) is the posterior mean and \(K_{ik,\text{posterior}}(t, t')\) is the posterior covariance function, which incorporate the observed data.

2.1.2 Posterior Inference Derivation

Given the observed weights \(y_{ik}(t)\), the goal is to infer the posterior distribution of the true topic weights \(\eta_{ik}(t)\). The prior distribution for \(\eta_{ik}(t)\) is:

\[ \eta_{ik}(t) \sim GP(\gamma_k \mathbf{g}_i, K_{\eta,k}(t, t')) \]

The observed weights are modeled as:

\[ y_{ik}(t) \mid \eta_{ik}(t) \sim N(\eta_{ik}(t), \sigma^2_{\text{noise}} \mathbf{I}) \]

The posterior distribution of \(\eta_{ik}(t)\) given the observed data \(y\) is:

\[ \eta_{ik}(t) \mid y \sim GP(\mu_{ik,\text{posterior}}(t), K_{ik,\text{posterior}}(t, t')) \]

where:

\[ \mu_{ik,\text{posterior}}(t) = K(t, t)[K(t, t) + \sigma^2_{\text{noise}} \mathbf{I}]^{-1} (y - \gamma_k \mathbf{g}_i) \]

\[ K_{ik,\text{posterior}}(t, t') = K(t, t') - K(t, t)[K(t, t) + \sigma^2_{\text{noise}} \mathbf{I}]^{-1} K(t, t') \]

The posterior mean \(\mu_{ik,\text{posterior}}(t)\) incorporates the genetic effects \(\gamma_k \mathbf{g}_i\) and the observed data. The posterior covariance \(K_{ik,\text{posterior}}\) describes the uncertainty in the topic weights after accounting for the observed data.

2.2 Practical Steps for Fitting and Simulation

2.2.1 Simulate Individual-Level Weights

Fit a GP to the observed data to estimate the posterior mean function \(\mu_{\text{posterior,k}}(t)\) and the posterior covariance function \(K_{\text{k,posterior}}(t, t')\).

From the posterior GP, you can draw samples to generate new realizations of the process. This incorporates both the learned mean and the covariance structure. To simplify things, we will add the posterior mean for each topic to the genetic effects \(\gamma_k \mathbf{g}_i\) and draw from the distribution using this new posterior mean and topic level posterior variance:

\[ \eta_{ik}(t) \mid \text{data} \sim GP(\mu_{k,\text{posterior}}(t) + \gamma_k \mathbf{g}_i, K_{k,\text{posterior}}(t, t')) \]