## 2.3 Evaluating and interpreting the model

We are now ready to carry out the simple linear regression analysis. The results of the analysis are as follows:

```
Call:
lm(formula = happiness_2019 ~ income_2019, data = df)
Residuals:
Min 1Q Median 3Q Max
-19.4572 -3.5785 -0.1413 3.8410 17.5070
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.478e+01 1.559e+00 28.72 < 2e-16 ***
income_2019 5.642e-04 5.489e-05 10.28 4.94e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.768 on 76 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.5816, Adjusted R-squared: 0.5761
F-statistic: 105.6 on 1 and 76 DF, p-value: 4.945e-16
```

From the above output, we can note the following:

- The results related to \(\widehat{\beta}_0\) and \(\widehat{\beta}_1\) are under the heading
`Coefficients:`

- The first row
`(Intercept)`

corresponds to the intercept coefficient \(\widehat{\beta}_0\), while the second row`income_2019`

corresponds to the slope coefficient \(\widehat{\beta}_1\) - The estimate for \(\beta_0\) is
`4.478e+01`

. The`e+01`

tells us to move the decimal point one place to the right, so we have that \(\widehat{\beta}_0 = 44.78\) - The estimate for \(\beta_1\) is
`5.642e-04`

. The`e-04`

tells us to move the decimal point four places to the left, so we have that, rounded to four decimal places, \(\widehat{\beta}_1 = 0.0006\) - Knowing the values for \(\widehat{\beta}_0\) and \(\widehat{\beta}_1\), we can write down the estimated model as:
- \(\widehat{\text{Happiness}} = 44.78 + 0.0006\times\text{Income}\)

- We can
**interpret the value of \(\widehat{\beta}_1 = 0.0006\)**as follows:*"We estimate that, on average, for every $1 increase in GDP per capita, the average happiness score will be 0.0006 higher"*. - Reading from the column labeled
`Pr(>|t|)`

, the \(p\)-value for the intercept coefficient is`< 2e-16`

, which is very close to zero. This is a test of the form \(H_0 : \beta_0 = 0\) versus \(H_1 : \beta_0 \neq 0\). - The \(p\)-value for the slope coefficient is
`4.94e-16`

which is also very close to zero. This is a test of the form \(H_0 : \beta_1 = 0\) versus \(H_1 : \beta_1 \neq 0\).**Since we have \(p < 0.05\), we reject \(H_0\) and conclude that \(\beta_1\) is not zero. This means there is evidence of a significant linear association between income and happiness.**(More information on this below) - The
`Multiple R-squared`

value, which can be found in the second last row, is \(R^2 = 0.5816\). This indicates that 58.16% of the variation in the response can be explained by the model, which is a good fit. (More information on this below)

### 2.3.1 Testing for \(H_0 : \beta_1 = 0\) versus \(H_1 : \beta_1 \neq 0\)

Recall the simple linear regression model

\[y = \beta_0 + \beta_1x + \epsilon.\]

If the true value of \(\beta_1\) were 0, then the regression model would become

\[y = \beta_0 + \epsilon,\]

meaning \(y\) does not depend on \(x\) in any way. In other words, there would be no association between \(x\) and \(y\). For this reason, the hypothesis test for \(\beta_1\) is very important.

### 2.3.2 \(R^2\), the Coefficient of Determination

\(R^2\) values are always between 0 and 1. In fact, the \(R^2\) value is simply the ** correlation squared**. To see this, recall that in Section 1.1, we found that the correlation coefficient was \(r = 0.76263\). If we square this number, we get \(R^2 = 0.76263^2 = 0.5816\). Conversely, if we take the square root of \(R^2\), we can find the correlation.

The \(R^2\) value can be used to ** evaluate the fit of the model**. \(R^2\) values close to 0 indicate a poor fit, whereas \(R^2\) values close to 1 indicate an excellent fit. Although the interpretation of the \(R^2\) value can sometimes differ by subject matter, for the purposes of this subject, the below table can be used as a guide when interpreting \(R^2\) values:

\(R^2\) value |
Quality of the SLR model |
---|---|

\(0.8 \leq R^2 \leq 1\) | Excellent |

\(0.5 \leq R^2 < 0.8\) | Good |

\(0.25 \leq R^2 < 0.5\) | Moderate |

\(0 \leq R^2 < 0.25\) | Weak |