Linear regression for prediction
Learning outcomes/objective: Learn…
- …repeat how the linear regression model works.
- …in which situations we can use it for predictions.
- …how we use it as a ML model within R.
Sources: #TidyTuesday and tidymodels
1 Regression vs. classification
- See introductory session.
2 Linear model
2.1 Linear model (Equation) (1)
- Linear Model = LM = Linear regression model
- Aim (normally)
- Model (also understand) relationship between outcome variable (output) und 1+ explanatory variables (features)
- But very popular machine learning model as well!
- Q: How do we also call \(\color{blue}{\beta_{0}}\), \(\color{orange}{\beta _{1}}\) and \(\color{orange}{\beta _{2}}\)?
2.2 Linear model (Equation) (2)
Q: Why is the linear model called “linear” model?
Important: Variable values (e.g., \(y_{i}\) or \(x_{1,i}\)) vary, parameter values (e.g., \(\boldsymbol{\color{blue}{\beta_{0}}}\)) are constant across rows
Important: \(\color{green}{\widehat{y}_{i}}\) varies across units
Name | \(Lifesatisfaction\) \(y_{i}\) |
\(\boldsymbol{\color{blue}{\beta_{0}}}\) | \(\boldsymbol{\color{orange}{\beta_{1}}}\) | \(Unemployed\) \(x_{1,i}\) |
\(\boldsymbol{\color{orange}{\beta_{2}}}\) | \(Education\) \(x_{2,i}\) |
\(\boldsymbol{\color{red}{\varepsilon_{i}}}\) | \(\color{green}{\widehat{y}_{i}}\) |
---|---|---|---|---|---|---|---|---|
Samuel | 8 | ? | ? | 0 | ? | 7 | ? | ? |
Ruth | 4 | ? | ? | 0 | ? | 3 | ? | ? |
William | 5 | ? | ? | 1 | ? | 2 | ? | ? |
.. | .. | .. | .. | .. | .. | .. | .. | .. |
2.3 Linear model (Visualization)
- Figure 1 visualizes the distribution of our data and a linear model that we fit to the data
- Lifesatisfactioni = b0 + b1Unemployedi + b1Educationi + \(\epsilon\)i (Wikipedia)
- The plane in Figure 1 is not exact model of the data
- Admissible model must be consistent with all the data points
- Plane cannot be model, unless it exactly fits all the data points
- Hence, error term, \(\epsilon\)i, must be included in the model equation, so that it is consistent with all data points
- Predictive accuracy: How well does our model predict observations (in the test dataset)?
- Calculate average error across all errors \(\epsilon\)i (in the test dataset)
2.4 Linear model (Estimation)
- Estimation = Fitting the model to the data (by adapting/finding the parameters)
- e.g. easy in case of the mean (analytical) but more difficult e.g. for linear (or other) model(s)
- Modellparameter: \(\color{orange}{\beta_{0}}\), \(\color{orange}{\beta_{1}}\) and \(\color{orange}{\beta_{2}}\)
- Ordinary Least Squares (OLS)
- Least squares methods (Astronomy)
- Choose \(\color{orange}{\beta_{0}}\), \(\color{orange}{\beta_{1}}\) and \(\color{orange}{\beta_{2}}\) (= plane) so that the sum of the squared errors \(\color{red}{\varepsilon}_{i}\) is minimized (See graph!)
- Q: Why do we square the errors?
2.5 Linear model (Prediction)
\(y_{Samuel} = \color{blue}{6.23} + \color{orange}{-0.58} \times x_{1Samuel} + \color{orange}{0.20} \times x_{2Samuel} + \color{red}{\varepsilon}_{Samuel}\)
\(8 = \color{blue}{6.23} + \color{orange}{-0.58} \times 0 + \color{orange}{0.20} \times 7 + \color{red}{0.373} = \color{green}{7.63} + \color{red}{0.373}\)
2.6 Linear model: Accuracy (MSE, RMSE, R-squared)
- Mean squared error (James et al. 2013, Ch. 2.2)
- \(MSE=\frac{1}{n}\sum_{i=1}^{n}(y_{i}- \hat{f}(x_{i}))^{2}\) (James et al. 2013, Ch. 2.2.1)
- \(y_{i}\) is \(i\)s true outcome value
- \(\hat{f}(x_{i})\) is the prediction that \(\hat{f}\) gives for the \(i\)th observation \(\hat{y}_{i}\)
- MSE is small if predicted responses are to the true responses, and large if they differ substantially
- \(MSE=\frac{1}{n}\sum_{i=1}^{n}(y_{i}- \hat{f}(x_{i}))^{2}\) (James et al. 2013, Ch. 2.2.1)
- Training MSE: MSE computed using the training data
- Test MSE: How is the accuracy of the predictions that we obtain when we apply our method to previously unseen test data?
- \(\text{Ave}(y_{0} - \hat{f}(x_{0}))^{2}\): the average squared prediction error for test observations \((y_{0},x_{0})\)
- Usually, when building a model we used a third dataset to assess accuracy, i.e., analysis (training) data, assessment (validation) data and test data
- Fundamental property of ML (cf. James et al. 2013, 31, Figure 2.9)
- As model flexibility increases, training MSE will decrease, but the test MSE may not (danger of overfitting)
- In practice we use the Root Mean Squared Error (RMSE)
- MSE is expressed in squared units and not directly comparable to the target/outcome variable
- RMSE takes the square root of the MSE and brings the units back to the original scale of the outcome variable
- RMSE is more interpretable and comparable across models/datasets
- R-squared measures the proportion of variance in the outcome variable that is explained by the model
- Ranges from 0 to 1, where 0 means the model explains none of the variance and 1 means the model explains all of the variance
- Calculated as the ratio of the explained variance to the total variance of the outcome variable (measure of the model’s goodness of fit)
References
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.