24 Day 24 (July 10)

24.1 Announcements

  • Read Ch. 6 (Model diagnostics) in Linear models with R
  • Work day on Tuesday and Thursday
  • Assignment 4 will be posted this afternoon

24.2 Model checking

  • Given a statistical model, estimation, prediction, and statistical inference is somewhat “automatic”
  • If the statistical model is misspecified (i.e., wrong) in any way, the resulting statistical inference (including predictions and prediction uncertainty) rests on a house of cards.
  • George Box quote: “All models are wrong but some are useful.”
    • Box (1976) “Since all models are wrong the scientist cannot obtain a correct one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity.”
  • We have assumed the linear model \(\mathbf{y}\sim\text{N}(\mathbf{X\boldsymbol{\beta}},\sigma^{2}\mathbf{I})\), which allowed us to:
    • Estimate \(\boldsymbol{\beta}\) and \(\sigma^2\)
    • Make statistical inference about \(\hat{\boldsymbol{\beta}}\)
    • Make predictions and obtain prediction intervals for future values of \(\mathbf{y}\)
  • All statistical inference we obtained requires that the linear model \(\mathbf{y}\sim\text{N}(\mathbf{X\boldsymbol{\beta}},\sigma^{2}\mathbf{I})\) gave rise to the data.
    • Support
    • Linear
    • Constant variance
    • Independence
    • Outliers
  • Model diagnostics (Ch 6 in Faraway (2014)) is a set of tools and procedures to see if the assumptions of our model are approximately correct.
    • Statistical tests (e.g., Shapiro-Wilk test for normality)
      • Specific
      • What if you reject the null?
    • Graphical
      • Broad
      • Subjective
      • Widely used
    • Predictive model checks
      • More common for Bayesian models (e.g., posterior predictive checks)
  • We will explore numerous ways to check
    • Distributional assumptions
      • Normality
      • Constant variance
      • Correlation among errors
    • Detection of outliers
    • Deterministic model structure
      • Is \(\mathbf{X}\boldsymbol{\beta}\) a reasonable assumption?

References

Box, George EP. 1976. “Science and Statistics.” Journal of the American Statistical Association 71 (356): 791–99.
Faraway, J. J. 2014. Linear Models with r. CRC Press.