## 2.3 Evaluating and interpreting the model

We are now ready to carry out the simple linear regression analysis. The results of the analysis are as follows:


Call:
lm(formula = happiness_2019 ~ income_2019, data = df)

Residuals:
Min       1Q   Median       3Q      Max
-19.4572  -3.5785  -0.1413   3.8410  17.5070

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.478e+01  1.559e+00   28.72  < 2e-16 ***
income_2019 5.642e-04  5.489e-05   10.28 4.94e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.768 on 76 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared:  0.5816,    Adjusted R-squared:  0.5761
F-statistic: 105.6 on 1 and 76 DF,  p-value: 4.945e-16

From the above output, we can note the following:

• The results related to $$\widehat{\beta}_0$$ and $$\widehat{\beta}_1$$ are under the heading Coefficients:
• The first row (Intercept) corresponds to the intercept coefficient $$\widehat{\beta}_0$$, while the second row income_2019 corresponds to the slope coefficient $$\widehat{\beta}_1$$
• The estimate for $$\beta_0$$ is 4.478e+01. The e+01 tells us to move the decimal point one place to the right, so we have that $$\widehat{\beta}_0 = 44.78$$
• The estimate for $$\beta_1$$ is 5.642e-04. The e-04 tells us to move the decimal point four places to the left, so we have that, rounded to four decimal places, $$\widehat{\beta}_1 = 0.0006$$
• Knowing the values for $$\widehat{\beta}_0$$ and $$\widehat{\beta}_1$$, we can write down the estimated model as:
• $$\widehat{\text{Happiness}} = 44.78 + 0.0006\times\text{Income}$$
• We can interpret the value of $$\widehat{\beta}_1 = 0.0006$$ as follows: "We estimate that, on average, for every \$1 increase in GDP per capita, the average happiness score will be 0.0006 higher".
• Reading from the column labeled Pr(>|t|), the $$p$$-value for the intercept coefficient is < 2e-16, which is very close to zero. This is a test of the form $$H_0 : \beta_0 = 0$$ versus $$H_1 : \beta_0 \neq 0$$.
• The $$p$$-value for the slope coefficient is 4.94e-16 which is also very close to zero. This is a test of the form $$H_0 : \beta_1 = 0$$ versus $$H_1 : \beta_1 \neq 0$$. Since we have $$p < 0.05$$, we reject $$H_0$$ and conclude that $$\beta_1$$ is not zero. This means there is evidence of a significant linear association between income and happiness. (More information on this below)
• The Multiple R-squared value, which can be found in the second last row, is $$R^2 = 0.5816$$. This indicates that 58.16% of the variation in the response can be explained by the model, which is a good fit. (More information on this below)

### 2.3.1 Testing for $$H_0 : \beta_1 = 0$$ versus $$H_1 : \beta_1 \neq 0$$

Recall the simple linear regression model

$y = \beta_0 + \beta_1x + \epsilon.$

If the true value of $$\beta_1$$ were 0, then the regression model would become

$y = \beta_0 + \epsilon,$

meaning $$y$$ does not depend on $$x$$ in any way. In other words, there would be no association between $$x$$ and $$y$$. For this reason, the hypothesis test for $$\beta_1$$ is very important.

### 2.3.2$$R^2$$, the Coefficient of Determination

$$R^2$$ values are always between 0 and 1. In fact, the $$R^2$$ value is simply the correlation squared. To see this, recall that in Section 1.1, we found that the correlation coefficient was $$r = 0.76263$$. If we square this number, we get $$R^2 = 0.76263^2 = 0.5816$$. Conversely, if we take the square root of $$R^2$$, we can find the correlation.

The $$R^2$$ value can be used to evaluate the fit of the model. $$R^2$$ values close to 0 indicate a poor fit, whereas $$R^2$$ values close to 1 indicate an excellent fit. Although the interpretation of the $$R^2$$ value can sometimes differ by subject matter, for the purposes of this subject, the below table can be used as a guide when interpreting $$R^2$$ values:

$$R^2$$ value Quality of the SLR model
$$0.8 \leq R^2 \leq 1$$ Excellent
$$0.5 \leq R^2 < 0.8$$ Good
$$0.25 \leq R^2 < 0.5$$ Moderate
$$0 \leq R^2 < 0.25$$ Weak