2.2 Why use regression?

Regression can be used for any or all of the following purposes:

  • Testing a theory: A theory that implies a certain functional relationship between \(Y\) and \(X\) can be tested by comparing the hypothesized model to simpler or more complex models and seeing which model fits best.

  • Prediction: After fitting the model to observed data, specified values of \(X\) can be used to predict yet to be observed values of \(Y\).

  • Machine learning: The fields of artificial intelligence and machine learning (AI/ML), sometimes placed under the umbrella of “data science”, “data mining”, “analytics”, or “statistical learning” are, essentially, attempts to predict an outcome based on a set of predictors (“features”). Regression is one method among many of making such a prediction. For a gentle introduction to these methods in R, see James et al. (2021). For a more in depth treatment, see Hastie, Tibshirani, and Friedman (2016).

  • Testing an association: Is there a significant association between \(Y\) and \(X\)? In the case of simple linear regression, this question is answered by testing the null hypothesis \(H_0:\beta_1=0\). Under the null hypothesis, the outcome does not depend on the predictor. If there is enough evidence to reject the null hypothesis, we conclude that there is a significant association.

  • Estimating a rate of change: How does \(Y\) change as \(X\) changes? In the case of simple linear regression, this question is answered by estimating the magnitude of \(\beta_1\), the regression slope.

  • Controlling for confounding: Is there an association between \(Y\) and \(X_1\) after adjusting for \(X_2, \ldots, X_K\)? When the data arise from an observational study, an observed association between a single predictor and the outcome may be spurious due to confounding. A third variable may actually be associated with each, and those associations induce an association between the predictor of interest and the outcome. Alternatively, the estimate of a real association may be biased due to confounding. Regression adjustment for confounding is a powerful tool for attempting to isolate the effect of one predictor on the outcome from the effects of other potentially confounding variables.

  • Effect modification: Does the association between \(Y\) and \(X\) depend on the value of another variable, \(Z\)? If so, then \(Z\) is an effect modifier. Ignoring an effect modifier may lead to estimating an effect that may not apply to any individual cases as it will be averaged over the modifier-level-specific effects. Regression methods can be used to estimate the extent of effect modification by including an “interaction” term.

References

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2016. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York, NY: Springer.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning: With Applications in R. 2nd ed. New York, NY: Springer.