2.2 Why use regression?
Regression can be used for any or all of the following purposes:
Testing a theory: A theory that implies a certain functional relationship between the outcome Y and a predictor X can be tested by comparing the hypothesized model to simpler or more complex models and seeing which model fits best.
Estimation and prediction: After fitting the model to observed data, one can estimate the average outcome value at a specified value of a predictor, as well as the distribution of outcome values for individuals with that predictor value.
Machine learning: The fields of artificial intelligence and machine learning (AI/ML), sometimes placed under the umbrella of “data science”, “data mining”, “analytics”, or “statistical learning” are, essentially, attempts to predict an outcome based on a set of predictors (“features”). Regression is one method among many of making such a prediction. For a gentle introduction to these methods in R, see James et al. (2021). For a more in-depth treatment, see Hastie, Tibshirani, and Friedman (2016).
Testing an association: Is there a significant association between Y and X? In the case of simple linear regression, this question is answered by testing the null hypothesis H0:β1=0. Under the null hypothesis, the outcome does not depend on the predictor. If there is enough evidence to reject the null hypothesis, we conclude that there is a significant association.
Estimating a rate of change: How does Y change as X changes? In the case of simple linear regression, this question is answered by estimating the magnitude of β1, the regression slope.
Controlling for confounding: Is there an association between Y and X1 after adjusting for X2,…,XK? When the data arise from an observational study, an observed association between a single predictor and the outcome may be spurious due to confounding. A third variable may actually be associated with each, and those associations induce an association between the predictor of interest and the outcome. Alternatively, the estimate of a real association may be biased due to confounding. Regression adjustment for confounding is a powerful tool for attempting to isolate the effect of one predictor on the outcome from the effects of other potentially confounding variables.
Effect modification: Does the association between Y and X depend on the value of another variable, Z? If so, then Z is an effect modifier. Ignoring an effect modifier may lead to estimating an effect that may not apply to any individual cases, as it will be averaged over the modifier-level-specific effects. Regression methods can be used to estimate the extent of effect modification by including an “interaction” term.