2.3 Evaluating and interpreting the model

We are now ready to carry out the simple linear regression analysis.

This video explains the results that follow.


The results of the analysis are as follows:


Call:
lm(formula = happiness_2019 ~ income_2019, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-19.4572  -3.5785  -0.1413   3.8410  17.5070 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 4.478e+01  1.559e+00   28.72  < 2e-16 ***
income_2019 5.642e-04  5.489e-05   10.28 4.94e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.768 on 76 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.5816,    Adjusted R-squared:  0.5761 
F-statistic: 105.6 on 1 and 76 DF,  p-value: 4.945e-16

From the above output, we can note the following:

  • The results related to \(\widehat{\beta}_0\) and \(\widehat{\beta}_1\) are under the heading Coefficients:
  • The first row (Intercept) corresponds to the intercept coefficient \(\widehat{\beta}_0\), while the second row income_2019 corresponds to the slope coefficient \(\widehat{\beta}_1\)
  • The estimate for \(\beta_0\) is 4.478e+01. The e+01 tells us to move the decimal point one place to the right, so we have that \(\widehat{\beta}_0 = 44.78\)
  • The estimate for \(\beta_1\) is 5.642e-04. The e-04 tells us to move the decimal point four places to the left, so we have that, rounded to four decimal places, \(\widehat{\beta}_1 = 0.0006\)
  • Knowing the values for \(\widehat{\beta}_0\) and \(\widehat{\beta}_1\), we can write down the estimated model as:
    • \(\widehat{\text{Happiness}} = 44.78 + 0.0006\times\text{Income}\)
  • We can interpret the value of \(\widehat{\beta}_1 = 0.0006\) as follows: "We estimate that, on average, for every $1 increase in GDP per capita, the average happiness score will be 0.0006 higher".
  • Reading from the column labeled Pr(>|t|), the \(p\)-value for the intercept coefficient is < 2e-16, which is very close to zero. This is a test of the form \(H_0 : \beta_0 = 0\) versus \(H_1 : \beta_0 \neq 0\).
  • The \(p\)-value for the slope coefficient is 4.94e-16 which is also very close to zero. This is a test of the form \(H_0 : \beta_1 = 0\) versus \(H_1 : \beta_1 \neq 0\). Since we have \(p < 0.05\), we reject \(H_0\) and conclude that \(\beta_1\) is not zero. This means there is evidence of a significant linear association between income and happiness. (More information on this below)
  • The Multiple R-squared value, which can be found in the second last row, is \(R^2 = 0.5816\). This indicates that 58.16% of the variation in the response can be explained by the model, which is a good fit. (More information on this below)

2.3.1 Testing for \(H_0 : \beta_1 = 0\) versus \(H_1 : \beta_1 \neq 0\)

Recall the simple linear regression model

\[y = \beta_0 + \beta_1x + \epsilon.\]

If the true value of \(\beta_1\) were 0, then the regression model would become

\[y = \beta_0 + \epsilon,\]

meaning \(y\) does not depend on \(x\) in any way. In other words, there would be no association between \(x\) and \(y\). For this reason, the hypothesis test for \(\beta_1\) is very important.

2.3.2 \(R^2\), the Coefficient of Determination

\(R^2\) values are always between 0 and 1. In fact, the \(R^2\) value is simply the correlation squared. To see this, recall that in Section 1.1, we found that the correlation coefficient was \(r = 0.76263\). If we square this number, we get \(R^2 = 0.76263^2 = 0.5816\). Conversely, if we take the square root of \(R^2\), we can find the correlation.

The \(R^2\) value can be used to evaluate the fit of the model. \(R^2\) values close to 0 indicate a poor fit, whereas \(R^2\) values close to 1 indicate an excellent fit. Although the interpretation of the \(R^2\) value can sometimes differ by subject matter, for the purposes of this subject, the below table can be used as a guide when interpreting \(R^2\) values:

\(R^2\) value Quality of the SLR model
\(0.8 \leq R^2 \leq 1\) Excellent
\(0.5 \leq R^2 < 0.8\) Good
\(0.25 \leq R^2 < 0.5\) Moderate
\(0 \leq R^2 < 0.25\) Weak