2.2 Why use regression?
Regression can be used for any or all of the following purposes:
Testing a theory: A theory that implies a certain functional relationship between the outcome \(Y\) and a predictor \(X\) can be tested by comparing the hypothesized model to simpler or more complex models and seeing which model fits best.
Estimation and prediction: After fitting the model to observed data, one can estimate the average outcome value at a specified value of a predictor, as well as the distribution of outcome values for individuals with that predictor value.
Machine learning: The fields of artificial intelligence and machine learning (AI/ML), sometimes placed under the umbrella of “data science”, “data mining”, “analytics”, or “statistical learning” are, essentially, attempts to predict an outcome based on a set of predictors (“features”). Regression is one method among many of making such a prediction. For a gentle introduction to these methods in R, see James et al. (2021). For a more in-depth treatment, see Hastie, Tibshirani, and Friedman (2016).
Testing an association: Is there a significant association between \(Y\) and \(X\)? In the case of simple linear regression, this question is answered by testing the null hypothesis \(H_0:\beta_1=0\). Under the null hypothesis, the outcome does not depend on the predictor. If there is enough evidence to reject the null hypothesis, we conclude that there is a significant association.
Estimating a rate of change: How does \(Y\) change as \(X\) changes? In the case of simple linear regression, this question is answered by estimating the magnitude of \(\beta_1\), the regression slope.
Controlling for confounding: Is there an association between \(Y\) and \(X_1\) after adjusting for \(X_2, \ldots, X_K\)? When the data arise from an observational study, an observed association between a single predictor and the outcome may be spurious due to confounding. A third variable may actually be associated with each, and those associations induce an association between the predictor of interest and the outcome. Alternatively, the estimate of a real association may be biased due to confounding. Regression adjustment for confounding is a powerful tool for attempting to isolate the effect of one predictor on the outcome from the effects of other potentially confounding variables.
Effect modification: Does the association between \(Y\) and \(X\) depend on the value of another variable, \(Z\)? If so, then \(Z\) is an effect modifier. Ignoring an effect modifier may lead to estimating an effect that may not apply to any individual cases, as it will be averaged over the modifier-level-specific effects. Regression methods can be used to estimate the extent of effect modification by including an “interaction” term.