11 Day 11 (February 25)

11.1 Announcements

Last class was awesome!
Confidence intervals for rare events (link)
A guide for activity 3 is available
Selected questions/clarifications from journals
- Good paper on understanding the Gibbs sampler (link)
- Exploratory data analysis vs confirmatory data analysis
- Overfitting vs. underfitting a model
- Extended example of measurement error with age vs height
- Warning about linear models
- Very quick review of linear models here

11.2 Our second statistical model

Dig into the rabies test a bit more….
- Rabies test results
- Building a statistical model using a hierarchical Bayesian approach
- Specify (write out) the data model
- Specify the process model
- Specify the parameter model (or prior) including hyper-parameters
- Select an approach to obtain the posterior distribution
  - Gibbs sampler (see pg. 35 in BBM2L)
  - Derive full conditionals
  - Discussion of trade-offs with Gibbs sampler with analytical full conditionals vs. Metropolis-Hastings
- Live example

11.3 Summary and future direction

We have just finished our second statistical model
- Time to pause and review plot the path forward
So far we have accomplished the following.
- A reviewed of mathematical statistics and distribution theory (Activity 1)
- We have learned/rediscovered common methods for numerical integration (Activity 2)
- We have learned common methods to sample from complex multivariate probability distributions (Activity 3/4).
- We have learned how to creatively develop a Bayesian model that is appropriate for the question and data that are available (Bat model 1 and model 2).
- We have learned how to use numerical algorithms to “fit” a Bayesian model to data (Bat model 1 and model 2).
- We have learned how to make statistical inference from the posterior distribution of unknown random variables (i.e., model parameters).
What we don’t know
- How to do prediction, backcasting, and forecasting
- How to “test” or “check” our model
- The wide variety of models that can be use

11.4 The Bayesian Linear Model

The classic paper is Lindley and Smith (1972)
Implemented with MCMC from Gelfand and Smith (1990)
Recent recent elaborations include the Bayesian Lasso (see Park and Casella 2008)
The model
- Distribution of the data: “ $\mathbf{y}$ given $\boldsymbol{\beta}$ and $\sigma_{\varepsilon}^{2}$ ” $[\mathbf{y}|\boldsymbol{\beta},\sigma_{\varepsilon}^{2}]\equiv\text{N}(\mathbf{X}\boldsymbol{\beta},\sigma_{\varepsilon}^{2}\mathbf{I})$ This is also sometimes called the data model.
- Distribution of the parameters: $\boldsymbol{\beta}$ and $\sigma_{\varepsilon}^{2}$ $[\boldsymbol{\beta}]\equiv\text{N}(\mathbf{0},\sigma_{\beta}^{2}\mathbf{I})$ $[\sigma_{\varepsilon}^{2}]\equiv\text{inverse gamma}(q,r)$
Computations and statistical inference
- Using Bayes rule (Bayes 1763) we can obtain the joint posterior distribution $[\boldsymbol{\beta},\sigma_{\varepsilon}^{2}|\mathbf{y}]=\frac{[\mathbf{y}|\boldsymbol{\beta},\sigma_{\varepsilon}^{2}][\boldsymbol{\beta}][\sigma_{\varepsilon}^{2}]}{\int\int [\mathbf{y}|\boldsymbol{\beta},\sigma_{\varepsilon}^{2}][\boldsymbol{\beta}][\sigma_{\varepsilon}^{2}]d\boldsymbol{\beta}d\sigma_{\varepsilon}^{2}}$
  - Statistical inference about a parameters is obtained from the marginal posterior distributions $[\boldsymbol{\beta}|\mathbf{y}]=\int[\boldsymbol{\beta},\sigma_{\varepsilon}^{2}|\mathbf{y}]d\sigma_{\varepsilon}^{2}$ $[\sigma_{\varepsilon}^{2}|\mathbf{y}]=\int[\boldsymbol{\beta},\sigma_{\varepsilon}^{2}|\mathbf{y}]d\boldsymbol{\beta}$
- In practice it is difficult to find closed-form solutions for the marginal posterior distributions, but easy to find closed-form solutions for the conditional posterior distribtuions.
  - Full-conditional distribtuions: $\boldsymbol{\beta}$ and $\sigma_{\varepsilon}^{2}$ $[\boldsymbol{\beta}|\sigma_{\varepsilon}^{2},\mathbf{y}]\equiv\text{N}\bigg{(}\big{(}\mathbf{X}^{\prime}\mathbf{X}+\frac{1}{\sigma_{\beta}^{2}}\mathbf{I}\big{)}^{-1}\mathbf{X}^{\prime}\mathbf{y},\sigma_{\varepsilon}^{2}\big{(}\mathbf{X}^{\prime}\mathbf{X}+\frac{1}{\sigma_{\beta}^{2}}\mathbf{I}\big{)}^{-1}\bigg{)}$ $[\sigma_{\varepsilon}^{2}|\mathbf{\boldsymbol{\beta}},\mathbf{y}]\equiv\text{inverse gamma}\left(q+\frac{n}{2},(r+\frac{1}{2}(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})^{'}(\mathbf{y}-\mathbf{X}\boldsymbol{\beta}))^-1\right)$
  - Gibbs sampler for the Bayesian linear model
    - Write out on the whiteboard