Linear regression for prediction

Learning outcomes/objective: Learn…

Sources: #TidyTuesday and tidymodels

1 Regression vs. classification

2 Linear model

2.1 Linear model (Equation) (1)

  • Linear Model = LM = Linear regression model
  • Aim (normally)
    • Model (also understand) relationship between outcome variable (output) und 1+ explanatory variables (features)
    • But very popular machine learning model as well!
yi=β0+β1×x1i+β2×x2i?+εi?


  • Q: How do we also call β0, β1 and β2?

2.2 Linear model (Equation) (2)

yi=β0+β1×x1i+β2×x2if^ giving predicted values y^i+εiError=y^i+εi


  • Q: Why is the linear model called “linear” model?

  • Important: Variable values (e.g., yi or x1,i) vary, parameter values (e.g., β0) are constant across rows

  • Important: y^i varies across units

Name Lifesatisfaction
yi
β0 β1 Unemployed
x1,i
β2 Education
x2,i
εi y^i
Samuel 8 ? ? 0 ? 7 ? ?
Ruth 4 ? ? 0 ? 3 ? ?
William 5 ? ? 1 ? 2 ? ?
.. .. .. .. .. .. .. .. ..

2.3 Linear model (Visualization)

  • Figure 1 visualizes the distribution of our data and a linear model that we fit to the data

Joint distribution + Linear Model: Lifesatisfaction, Unemployment and Education

Figure 1: Joint distribution + Linear Model

  • Lifesatisfactioni = b0 + b1Unemployedi + b1Educationi + ϵi (Wikipedia)
  • The plane in Figure 1 is not exact model of the data
    • Admissible model must be consistent with all the data points
    • Plane cannot be model, unless it exactly fits all the data points
    • Hence, error term, ϵi, must be included in the model equation, so that it is consistent with all data points
  • Predictive accuracy: How well does our model predict observations (in the test dataset)?
    • Calculate average error across all errors ϵi (in the test dataset)

2.4 Linear model (Estimation)

  • Estimation = Fitting the model to the data (by adapting/finding the parameters)
    • e.g. easy in case of the mean (analytical) but more difficult e.g. for linear (or other) model(s)
  • Modellparameter: β0, β1 and β2
  • Ordinary Least Squares (OLS)
    • Least squares methods (Astronomy)
    • Choose β0, β1 and β2 (= plane) so that the sum of the squared errors εi is minimized (See graph!)
    • Q: Why do we square the errors?

2.5 Linear model (Prediction)

ySamuel=6.23+0.58×x1Samuel+0.20×x2Samuel+εSamuel

8=6.23+0.58×0+0.20×7+0.373=7.63+0.373

Name Lifesatisfaction β0 β1 Unemployed β2 Education ε y^
Samuel 8 6.23 -0.58 0 0.20 7 0.373 7.63
Ruth 4 6.23 -0.58 0 0.20 3 -2.83 6.83
William 5 6.23 -0.58 1 0.20 2 -1.05 6.05
.. .. .. .. .. .. .. .. ..
  • Important note on “prediction”
    • In Figure 1 we simply fitted the model to all observations in the dataset
    • When using the linear model as an ML we split the data first, fit the model to observations in the training data subset and use this model to predict observations in the test data subset

2.6 Linear model: Accuracy (MSE, RMSE, R-squared)

  • Mean squared error (, Ch. 2.2)
    • MSE=1ni=1n(yif^(xi))2 (, Ch. 2.2.1)
      • yi is is true outcome value
      • f^(xi) is the prediction that f^ gives for the ith observation y^i
      • MSE is small if predicted responses are to the true responses, and large if they differ substantially
  • Training MSE: MSE computed using the training data
  • Test MSE: How is the accuracy of the predictions that we obtain when we apply our method to previously unseen test data?
    • Ave(y0f^(x0))2: the average squared prediction error for test observations (y0,x0)
    • Usually, when building a model we used a third dataset to assess accuracy, i.e., analysis (training) data, assessment (validation) data and test data
  • Fundamental property of ML (cf. , Figure 2.9)
    • As model flexibility increases, training MSE will decrease, but the test MSE may not (danger of overfitting)
  • In practice we use the Root Mean Squared Error (RMSE)
    • MSE is expressed in squared units and not directly comparable to the target/outcome variable
    • RMSE takes the square root of the MSE and brings the units back to the original scale of the outcome variable
    • RMSE is more interpretable and comparable across models/datasets
  • R-squared measures the proportion of variance in the outcome variable that is explained by the model
    • Ranges from 0 to 1, where 0 means the model explains none of the variance and 1 means the model explains all of the variance
    • Calculated as the ratio of the explained variance to the total variance of the outcome variable (measure of the model’s goodness of fit)

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.