## 4.3 Regression

We will be using the following data to show the regression. Note that we need to use dataframe.

x <- c(1,2,3,5,6,7,10,12,13)
y <- c(1,4,5,6,7,8,9,10,15)
z <- c(2,3,7,8,9,12,8,7,6)
df <-data.frame(x=x,y=y,z=z)

### 4.3.1 Simple linear regression

The following code shows simple linear regression. The syntax is lm(y~x, dataframe).

SLR <-lm(y~x,df)
summary(SLR)
##
## Call:
## lm(formula = y ~ x, data = df)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -1.9660 -1.2234  0.2618  0.7470  2.1627
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   1.5104     0.8756   1.725 0.128193
## x             0.8713     0.1134   7.686 0.000118 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.389 on 7 degrees of freedom
## Multiple R-squared:  0.8941, Adjusted R-squared:  0.8789
## F-statistic: 59.08 on 1 and 7 DF,  p-value: 0.0001175

### 4.3.2 Muliple linear regression

The following code shows multiple regression. The syntax is lm(y~x+z, dataframe).

MLR <-lm(y~x + z,df)
summary(MLR)
##
## Call:
## lm(formula = y ~ x + z, data = df)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -1.8920 -1.2104  0.1329  0.8179  2.3030
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  1.25034    1.34526   0.929 0.388522
## x            0.85666    0.13321   6.431 0.000668 ***
## z            0.05167    0.19124   0.270 0.796060
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.492 on 6 degrees of freedom
## Multiple R-squared:  0.8953, Adjusted R-squared:  0.8605
## F-statistic: 25.67 on 2 and 6 DF,  p-value: 0.001146

### 4.3.3 Interaction terms

The following code shows multiple regression with interaction term. The syntax is lm(y~x+z+x:z, dataframe).

MLRDummy <- lm(y ~ x + z + x:z, df)
summary(MLRDummy)
##
## Call:
## lm(formula = y ~ x + z + x:z, data = df)
##
## Residuals:
##       1       2       3       4       5       6       7       8       9
## -0.3080  0.9573 -0.3143 -0.7246 -0.1773  1.1230 -0.5891 -1.6242  1.6572
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.33212    1.96748  -0.677   0.5284
## x            1.59845    0.46581   3.432   0.0186 *
## z            0.64902    0.40027   1.621   0.1658
## x:z         -0.12819    0.07789  -1.646   0.1607
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.316 on 5 degrees of freedom
## Multiple R-squared:  0.9321, Adjusted R-squared:  0.8914
## F-statistic: 22.89 on 3 and 5 DF,  p-value: 0.002386

### 4.3.4 Robust standard error

To have robust variance-covariance matrix, we have to use package sandwich to get the appropriate error term. Then we do the estimate through the package lmtest

First we install and load the packages.

install.packages(c("sandwich","lmtest"))
library(sandwich)
library(lmtest)

Then we need to create the variance and covariance matrix: using the vcovHC().

To match the stata robust command result, we use HC1(). Then we perform coeftest based on specified variance-covariance matrix.

MLR <- lm(y ~ x + z, df)
coeftest(MLR, vcov = vcovHC(MLR, "HC1")) 
##
## t test of coefficients:
##
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.250343   1.117886  1.1185 0.306129
## x           0.856663   0.194468  4.4052 0.004543 **
## z           0.051674   0.167447  0.3086 0.768062
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1