Chapter 6 M6: Return to Regression

In this module, we come back to the wonders of regression – and learn some new tricks! First, we perform (classical) inference in the simple linear regression context, including checking conditions, performing appropriate hypothesis tests, and finding confidence/prediction intervals for a slope, an average \(y\) value, and an individual \(y\) value. Then we expand into the world of multiple regression, where we have one response variable but many predictor variables. We explore how to include categorical variables as predictors, how to check the conditions for inference (including the new one!), and how to think about inference in the MLR context. Some key skills are:

  • Use all the key regression skills from module 2, including writing and interpreting the regression equation, getting predictions, and finding/interpreting residuals
  • Write and apply the inference framework to the regression context:
    • List the assumptions and conditions for regression inference, and check them with appropriate plots or other reasoning
    • Describe appropriate hypotheses, test statistic, and null distribution. As usual, you won’t have to do a bunch of arithmetic, but you should be able to set up a formula and/or use numbers already provided to you.
    • Interpret test results (whether from your own work, a word description, or R output) and draw conclusions in context
    • Set up and interpret a confidence interval for a regression slope
  • Describe and identify the different types of special points (key terms: outlier, leverage, influence)
  • Write a multiple regression equation, either using appropriate math symbols or plugging in numbers if they’re provided (for example, in R output)
  • Use indicator variables to encode categorical predictors: put them in an equation, and interpret what they mean
  • List the assumptions and conditions for MLR inference, and check them using appropriate plots or reasoning
    • Yep, this includes multicollinearity! Be able to recognize it if it’s showing up on a scatterplot. While you don’t have to prove anything about its effects, it’s good to have a sense of why it might be a problem – the idea that your model “doesn’t know which predictor matters.”
  • Interpret inference tests for individual predictors in multiple regression
    • You don’t have to set up the formulas for these (you can do it for simple regression, where there’s only one predictor, but this situation is more complicated). You should, though, be able to find the relevant results in R output, and interpret them in the context of the problem.

As usual, the focus here isn’t on arithmetic. It’s always sufficient to set up a formula and plug in any values you know from the problem description, then leave it at that.