2.5 Predictions

As well as using the model to better understand the relationship between the response and explanatory variables, we can also use the model to make predictions. For example, suppose we knew of a country that was not in our original data set that had a GDP per capita of 30000, but we did not know its average happiness score. We could then predict this country's average happiness score as follows.

Let x0 denote the 'new' value of x. Then, our predicted response for this choice of x will be

ˆy0=ˆβ0+ˆβ1x0.

Recalling that our new country had an income of x0=30000, and that our estimated model is

^Happiness=44.78+0.0006×Income,

we can estimate this country's average happiness score as

^Happiness=44.78+0.0006×30000=62.78. It is also possible to estimate prediction intervals, although these are beyond the scope of this subject.

2.5.1 A cautionary tale: Extrapolation

When using a model to make predictions, it is a good idea to carefully check whether or not x0 is within the range of the data from which we estimated our model. If it is not, then we may be extrapolating, which can often lead to inaccurate results. For example, consider the following.

Using our estimated model of ^Happiness=44.78+0.0006×Income, for a new country with a value of x0=90000, what would be the predicted average happiness score?

44.78+0.0006×90000=

98.78

0 of 1 correct

However, consider the below chart:

As we can see, our prediction for this country, as represented by the red dot, is very unlikely to represent reality! This is because we have used our model to extrapolate for a value that was well outside of the range from which we estimated our model, i.e. the range between the two vertical dotted lines, from $10,000 to $50,000. While our model was a good fit for the countries with GDP per capita values within that range, it is not a good fit at all for countries with GDP per capita values outside that range!