Chapter 4 M4: Thinking About Predictors

This module looks at some creative ideas for working with the predictors in your dataset. For example, you might have non-linear relationships between your predictors and the response; one approach is to apply transformations (functions) to the various predictors before fitting the model. We look at a few common types of functions people use for this, including polynomials, step functions, and splines.

But what do you do when you just have too many predictors? We also look at the kinds of problems that can show up with high-dimensional data and some ways of reducing the dimension (or the effective dimension) of the dataset. These include creating “combination” variables and dropping things from the model.

This module’s reading is all in the textbook! Relevant sections include:

  • Intro to chapter 7, “Moving Beyond Linearity”
  • 7.7 “Generalized Additive Models”
  • 7.1 “Polynomial Regression”
  • 7.2 “Step Functions”
  • 7.3 “Basis Functions”
  • 7.4 “Regression Splines”
  • 7.6 “Local Regression”
  • Optional: 7.5 “Smoothing Splines.”
  • Chapter 6 intro, “Linear Model Selection and Regularization”
  • 6.3 “Dimension Reduction Methods”
    • Skip or skim subsection 6.3.2, “Partial Least Squares”
  • Subsection 3.3.3, “Potential Problems” (in Section 3.3, “Other Considerations in the Regression Model”).
    • Focus on sub-subsection 6, “Collinearity,” which discusses multicollinearity and the VIF. You can skim the other parts if you like, as a refresher on the list of regression assumptions and conditions :)
  • Section 6.1, “Subset Selection”
  • Section 6.4, “Considerations in High Dimensions”