Chapter 2 Introduction

We understand a lot about topic models, and the use in McVean et al of an age-dependent process adds an additional layer. To recap:

  1. In a topic model, we draw a distribution across topics for an individual:

\(\theta_{i} \sim Dirichlet_k(\alpha)\).

Each individual is allowed D diagnoses which come from a set of V potential codes (drawn without replacement in this setting)

  1. For each of the n … \(N_{i}\) diagnoses for an individual i
  • Choose a latent indicator z_{i,n} trom the individual’s topic proportions: \(z_{in}|\theta_{i}\sim Multi(\theta_{i}\))

  • draw a diagnosis from the corresponding topic \(w_{in}|z_{i,n}, \beta_{1..K} \sim \sim Multi(\beta_{z_{in}})\)

Our goals are threefold:

  1. Model changing disease proportions over time using an autoregressive principled approach (as opposed to a spline) that fits within the confines of a dynamic topic model.

  2. Allow for an individuals topic distribtuion to change conditional on new infromation (new diagnoses).

  3. Incorporate the ‘stickiness’ of genetics in creating a force towards the prior distribution for objective (2) and to increase the speed of within topic evolution in (2).

We will use dynamic topic models and hierarhical dirichlet processes to accomplish these objectives.