2.3 Evaluating and interpreting the model

We are now ready to carry out the simple linear regression analysis.

This video explains the results that follow.


The results of the analysis are as follows:


Call:
lm(formula = happiness_2019 ~ income_2019, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-19.4572  -3.5785  -0.1413   3.8410  17.5070 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 4.478e+01  1.559e+00   28.72  < 2e-16 ***
income_2019 5.642e-04  5.489e-05   10.28 4.94e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.768 on 76 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.5816,    Adjusted R-squared:  0.5761 
F-statistic: 105.6 on 1 and 76 DF,  p-value: 4.945e-16

From the above output, we can note the following:

  • The results related to ˆβ0 and ˆβ1 are under the heading Coefficients:
  • The first row (Intercept) corresponds to the intercept coefficient ˆβ0, while the second row income_2019 corresponds to the slope coefficient ˆβ1
  • The estimate for β0 is 4.478e+01. The e+01 tells us to move the decimal point one place to the right, so we have that ˆβ0=44.78
  • The estimate for β1 is 5.642e-04. The e-04 tells us to move the decimal point four places to the left, so we have that, rounded to four decimal places, ˆβ1=0.0006
  • Knowing the values for ˆβ0 and ˆβ1, we can write down the estimated model as:
    • ^Happiness=44.78+0.0006×Income
  • We can interpret the value of ˆβ1=0.0006 as follows: "We estimate that, on average, for every $1 increase in GDP per capita, the average happiness score will be 0.0006 higher".
  • Reading from the column labeled Pr(>|t|), the p-value for the intercept coefficient is < 2e-16, which is very close to zero. This is a test of the form H0:β0=0 versus H1:β00.
  • The p-value for the slope coefficient is 4.94e-16 which is also very close to zero. This is a test of the form H0:β1=0 versus H1:β10. Since we have p<0.05, we reject H0 and conclude that β1 is not zero. This means there is evidence of a significant linear association between income and happiness. (More information on this below)
  • The Multiple R-squared value, which can be found in the second last row, is R2=0.5816. This indicates that 58.16% of the variation in the response can be explained by the model, which is a good fit. (More information on this below)

2.3.1 Testing for H0:β1=0 versus H1:β10

Recall the simple linear regression model

y=β0+β1x+ϵ.

If the true value of β1 were 0, then the regression model would become

y=β0+ϵ,

meaning y does not depend on x in any way. In other words, there would be no association between x and y. For this reason, the hypothesis test for β1 is very important.

2.3.2 R2, the Coefficient of Determination

R2 values are always between 0 and 1. In fact, the R2 value is simply the correlation squared. To see this, recall that in Section 1.1, we found that the correlation coefficient was r=0.76263. If we square this number, we get R2=0.762632=0.5816. Conversely, if we take the square root of R2, we can find the correlation.

The R2 value can be used to evaluate the fit of the model. R2 values close to 0 indicate a poor fit, whereas R2 values close to 1 indicate an excellent fit. Although the interpretation of the R2 value can sometimes differ by subject matter, for the purposes of this subject, the below table can be used as a guide when interpreting R2 values:

R2 value Quality of the SLR model
0.8R21 Excellent
0.5R2<0.8 Good
0.25R2<0.5 Moderate
0R2<0.25 Weak