2.5 Predictions

As well as using the model to better understand the relationship between the response and explanatory variables, we can also use the model to make predictions. For example, suppose we knew of a country that was not in our original data set that had a GDP per capita of 30000, but we did not know its average happiness score. We could then predict this country's average happiness score as follows.

Let \(x_0\) denote the 'new' value of \(x\). Then, our predicted response for this choice of \(x\) will be

\[\widehat{y}_0 = \widehat{\beta}_0 + \widehat{\beta}_1x_0.\]

Recalling that our new country had an income of \(x_0 = 30000\), and that our estimated model is

\[\widehat{\text{Happiness}} = 44.78 + 0.0006\times\text{Income},\]

we can estimate this country's average happiness score as

\[\widehat{\text{Happiness}} = 44.78 + 0.0006\times 30000 = 62.78.\] It is also possible to estimate prediction intervals, although these are beyond the scope of this subject.

2.5.1 A cautionary tale: Extrapolation

When using a model to make predictions, it is a good idea to carefully check whether or not \(x_0\) is within the range of the data from which we estimated our model. If it is not, then we may be extrapolating, which can often lead to inaccurate results. For example, consider the following.

Using our estimated model of \(\widehat{\text{Happiness}} = 44.78 + 0.0006\times\text{Income},\) for a new country with a value of \(x_0 = 90000\), what would be the predicted average happiness score?

\(44.78 + 0.0006\times 90000 = \ldots\)


However, consider the below chart:

As we can see, our prediction for this country, as represented by the red dot, is very unlikely to represent reality! This is because we have used our model to extrapolate for a value that was well outside of the range from which we estimated our model, i.e. the range between the two vertical dotted lines, from $10,000 to $50,000. While our model was a good fit for the countries with GDP per capita values within that range, it is not a good fit at all for countries with GDP per capita values outside that range!