11 Day 11 (February 25)

11.1 Announcements

  • Last class was awesome!

  • Confidence intervals for rare events (link)

  • A guide for activity 3 is available

  • Selected questions/clarifications from journals

    • Good paper on understanding the Gibbs sampler (link)
    • Exploratory data analysis vs confirmatory data analysis
    • Overfitting vs. underfitting a model
    • Extended example of measurement error with age vs height
    • Warning about linear models
    • Very quick review of linear models here

11.2 Our second statistical model

  • Dig into the rabies test a bit more….
    • Rabies test results
    • Building a statistical model using a hierarchical Bayesian approach
    • Specify (write out) the data model
    • Specify the process model
    • Specify the parameter model (or prior) including hyper-parameters
    • Select an approach to obtain the posterior distribution
      • Gibbs sampler (see pg. 35 in BBM2L)
      • Derive full conditionals
      • Discussion of trade-offs with Gibbs sampler with analytical full conditionals vs. Metropolis-Hastings
    • Live example

11.3 Summary and future direction

  • We have just finished our second statistical model
    • Time to pause and review plot the path forward
  • So far we have accomplished the following.
    • A reviewed of mathematical statistics and distribution theory (Activity 1)
    • We have learned/rediscovered common methods for numerical integration (Activity 2)
    • We have learned common methods to sample from complex multivariate probability distributions (Activity 3/4).
    • We have learned how to creatively develop a Bayesian model that is appropriate for the question and data that are available (Bat model 1 and model 2).
    • We have learned how to use numerical algorithms to “fit” a Bayesian model to data (Bat model 1 and model 2).
    • We have learned how to make statistical inference from the posterior distribution of unknown random variables (i.e., model parameters).
  • What we don’t know
    • How to do prediction, backcasting, and forecasting
    • How to “test” or “check” our model
    • The wide variety of models that can be use

11.4 The Bayesian Linear Model

  • The classic paper is Lindley and Smith (1972)

  • Implemented with MCMC from Gelfand and Smith (1990)

  • Recent recent elaborations include the Bayesian Lasso (see Park and Casella 2008)

  • The model

    • Distribution of the data: “y given \boldsymbol{\beta} and \sigma_{\varepsilon}^{2}[\mathbf{y}|\boldsymbol{\beta},\sigma_{\varepsilon}^{2}]\equiv\text{N}(\mathbf{X}\boldsymbol{\beta},\sigma_{\varepsilon}^{2}\mathbf{I}) This is also sometimes called the data model.
    • Distribution of the parameters: \boldsymbol{\beta} and \sigma_{\varepsilon}^{2} [\boldsymbol{\beta}]\equiv\text{N}(\mathbf{0},\sigma_{\beta}^{2}\mathbf{I}) [\sigma_{\varepsilon}^{2}]\equiv\text{inverse gamma}(q,r)
  • Computations and statistical inference

    • Using Bayes rule (Bayes 1763) we can obtain the joint posterior distribution [\boldsymbol{\beta},\sigma_{\varepsilon}^{2}|\mathbf{y}]=\frac{[\mathbf{y}|\boldsymbol{\beta},\sigma_{\varepsilon}^{2}][\boldsymbol{\beta}][\sigma_{\varepsilon}^{2}]}{\int\int [\mathbf{y}|\boldsymbol{\beta},\sigma_{\varepsilon}^{2}][\boldsymbol{\beta}][\sigma_{\varepsilon}^{2}]d\boldsymbol{\beta}d\sigma_{\varepsilon}^{2}}
      • Statistical inference about a parameters is obtained from the marginal posterior distributions [\boldsymbol{\beta}|\mathbf{y}]=\int[\boldsymbol{\beta},\sigma_{\varepsilon}^{2}|\mathbf{y}]d\sigma_{\varepsilon}^{2} [\sigma_{\varepsilon}^{2}|\mathbf{y}]=\int[\boldsymbol{\beta},\sigma_{\varepsilon}^{2}|\mathbf{y}]d\boldsymbol{\beta}
    • In practice it is difficult to find closed-form solutions for the marginal posterior distributions, but easy to find closed-form solutions for the conditional posterior distribtuions.
      • Full-conditional distribtuions: \boldsymbol{\beta} and \sigma_{\varepsilon}^{2} [\boldsymbol{\beta}|\sigma_{\varepsilon}^{2},\mathbf{y}]\equiv\text{N}\bigg{(}\big{(}\mathbf{X}^{\prime}\mathbf{X}+\frac{1}{\sigma_{\beta}^{2}}\mathbf{I}\big{)}^{-1}\mathbf{X}^{\prime}\mathbf{y},\sigma_{\varepsilon}^{2}\big{(}\mathbf{X}^{\prime}\mathbf{X}+\frac{1}{\sigma_{\beta}^{2}}\mathbf{I}\big{)}^{-1}\bigg{)} [\sigma_{\varepsilon}^{2}|\mathbf{\boldsymbol{\beta}},\mathbf{y}]\equiv\text{inverse gamma}\left(q+\frac{n}{2},(r+\frac{1}{2}(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})^{'}(\mathbf{y}-\mathbf{X}\boldsymbol{\beta}))^-1\right)
      • Gibbs sampler for the Bayesian linear model
        • Write out on the whiteboard