# Chapter 7 Multivariate OLS: Where the Action Is

## 7.1 Computing Corner

Packages needed for this chapter.

In this chapter you will learn the basics of estimating multivariate OLS models.

### 7.1.1 Multiple Regression

To estitmate a multiple regression (a regression with more than one independent variable) use the same function `lm`

but change the formula argument to include the additional variables. In a simple regression, the formula argument was of the form `y ~ x`

. In a multiple regression, the formula argument takes the form `y ~ x1 + x2`

. To include additional variables, extend the argument in a similar manner `y ~ x1 + x2 + x3 + ...`

. The remaining arguments are the same as in the simple regression. You can assign the results to an object just as with a simple regression. The output will be the list of 12, but the objects in the list will change to reflect the additional variable(s).

To make use of the results, you can use any of the functions described in Chapter 3 of this manual. You can also make use of any of the subsetting commands as well.

Estimate a regression with robust standard errors with `lm_robust`

with the modified function argument.

### 7.1.2 Multicollinearity

You can directly estimate the VIF’s with the `vif()`

function from the car package. To esitmate the VIF’s call `ols %>% vif()`

where ols is the object you created with the `lm`

call.

### 7.1.3 Standardized Coefficients

Estimate standardized regression coefficients with `lm.beta()`

from the lm.beta package. `ols %>% lm.beta()`

.

### 7.1.4 *F* tests

*F* tests in econometrics are generally about the joint significance of multiple variables. Suppose, we estimate the regression on \(i=1,2,\ldots n\) observations. \[y_i=\beta_0+\beta_1x_{1,i}+\beta_2x_{2,i}+\cdots+\beta_mx_{i,m}+\beta_{m+1}x_{m+1,i}+\cdots+\beta_kx_{i,k} + \epsilon_i\]

To test the joint significance of the \(\beta_1,\ldots,\beta_m\) in the model we would use an *F* test to perform the following hypothesis test: \[H_0: \beta_1=\beta_2=\cdots=\beta_m=0\] \[H_1:\text{@ least one }\beta_j\ne0\]

An *F* test essentially compares the difference in the residual sum of squares under the null and alternative hypotheses. If this difference in large enough relative to the unrestricted standard error, we have evidence to reject the null hypothesis in favor of the alternative hypothesis. The mechanics of the test are as follows:

Estimate the model that does not hold under the null hypothesis, that is, the model above and call it the unsrestricted model and retrieve the residual sum of squares. Retrieve the residual sum of squares, \(rss_u\). The residuals from unrestricted model will have \(n-k-1\) degrees of freedom. The unrestricted model, U, is: \[\text{U: }y_i=\beta_0+\beta_1x_{1,i}+\beta_2x_{2,i}+\cdots+\beta_mx_{i,m}+\beta_{m+1}x_{m+1,i}+\cdots+\beta_kx_{i,k} + \epsilon_i\]

Estimate the model that holds under the null hypothesis Restrict the model so that the null hypothesis holds. That restricted model, R, is \[\text{R: }y_i=\beta_0+\beta_{m+1}x_{m+1,i}+\beta_{m+2}x_{m+2,i}+\cdots+\beta_kx_{k,i} + \eta_i\]. Retrieve the residual sum of squares \(rss_r\) The residual from restricted model will have \(n-m-1\) degrees of freedom.

Calculate the difference in the residual sum of squares \(rss_r - rss_u\) and divide by its degrees of of freedom \(q = (n-m-1)-(n-k-1) = k-m\). So, q is the number of restrictions. A simple way to calculate the number of restrictions is to count the number of equal signs, \(=\), in the null hypothesis.

Calculate \(rss_u/(n-k-1)\)

Divide the result from 3 by the result from 4. This will give you an

*F*statistic with k-m and n-k-1 degrees of freedom.

\[F_c=\frac{\frac{rss_r-rss_u}{q}}{\frac{rss_u}{n-k-1}}\]

The *F*-test (Wald test) can be used for any number of restrictions on the unrestricted model. For example, suppose we would like to know if a production function with a Cobb-Douglas form has constant returns to scale. The Cobb-Douglas function for output as a function of labor and capital takes the form \[q=al^\alpha k^\beta\epsilon\]. If constant returns to scale hold, \(\alpha+\beta=1\). So we test the following hypothesis: \[H_0:\alpha+\beta=1\] \[H_1:\alpha+\beta\ne1\]

To test this hypothesis form the unrestricted and restricted forms of the model, estimate the models, retrieve the sum of squared residuals, and calculate the *F* statistic. In the form presented above, the Cobb-Douglas model is not linear in the parameters so it can’t be estimated with OLS. We can make it linear in the parameters by taking the logarithm of both sides. \[\ln(q)=\ln(al^\alpha k^\beta\epsilon)\] \[\text{U: }\ln(q)=\gamma+\alpha \ln(l)+\beta\ln(k)+\eta\].

Form the restricted model by emposing the null hypothesis on the paramaters. From the null hypothesis, \(\beta=1-\alpha\). Substituting for \(\beta\) in the restricted model yields the restricted model. \[\text{R: }\ln(q)-\ln(k)=\gamma+\alpha[\ln(l)-\ln(k)]+\eta\]

The *F*-stat is: \[F_c=\frac{rss_r-rss_u}{\frac{rss_u}{n-k-1}}\]

The degrees of freedom are \(q=1\) (the number of equal signs in the null hypothesis) and \(n-k-1\).

#### 7.1.4.1 *F*-test for overall significance.

Estimate the model \(y_i=\beta_0+\beta_1x_{1,i}+\beta_2x_{2,i}+\cdots+\beta_kx_{k,i}+\epsilon_i\). Test the hypothesis \[H_0: \beta_1=\beta_2=\cdots=\beta_k=0\] \[H_1:\text{@ least one }\beta_j\ne0\]

If we reject the null hypothesis, we can say that we have explained some varation in \(y\) with variation in at least one of the \(x's\). In other words, we have a model that is significant. If we fail to reject the null hypothesis, our model has no explanatory power. There is no need to calculate the *F*-statistic to perform this test because it is reported as a matter of course in the base R call `summary`

or in `glance`

from the broom package. The degrees of freedom are \(q=k\) (the number of coefficients estimated - 1) and \(n-k-1\).

`summary`

will report the F-statistic, its degrees of freedom (numerator and denominator), and the p-value. `glance`

reports the *F* as “statistic”, the p-value as “p.value”, \(k+1\) as “df”, and \(n-k-1\) as “df.residual”. Note that this test is also a test for the significance of \(R^2\).

#### 7.1.4.2 *F*-test of linear restrictions

The test we performed above are tests of linear restrictions of the parameters. These hypotheses can be tested directly using `linearHypothesis`

from the car package. Performing a test of linear restrictions using `linearHypothesis`

requires two arguments: model and hypothesis.matrix.

Let the unrestricted model be \[y=\beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3+\epsilon\] Estimate the model as `ols_u <- df %$% lm(y ~ x1 + x2 + x3)`

, where df is the data frame containing the data.

Let’s test the hypothesis \(\beta_2=\beta_3=0\) versus at that one of the \(\beta's\ne0\) using `linearHypothesis(model = ols_u, hypothesis.matrix = c("x2 = 0", "x3 = 0")`

. The result will be an *F*-test on the restrictions. The *F*-statistic, its degrees of freedom, and p-value will be returned.

Let’s test the linear restriction for the Cobb-Douglas model above. Estimate the model as `ols_u <- df %$% lm(log(q) ~ log(l) + log(k))`

. To test the hypothesis \(\alpha=\beta\) pipe ols_u into `linearHypothesis`

with the argument \(c(log(l) = log(k))\): `ols_u %>% linearHypothesis(c("log(l) = log(k)"))`

. Again, the *F*-statistic, its degrees of freedom, and p-value will be returned.

### 7.1.5 Examples

The Motor Trend Car Road Test (mtcars) data set is part of the datasets in base R. The data were extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). See `?mtcars`

for more information on the data. `data(mtcars)`

will load the data into your global environment as mtcars. We will perform each of the *F*-tests described above: overall significance, joint signficance of a subset of variables, and equality of two coefficients.

#### 7.1.5.1 Multiple Regression

Suppose we want to estimate the mpg as a function of the number of cylinders, the displacement, and the gross horsepower, then our (unrestricted) model is \[mpg=\beta_0+\beta_1cyl+\beta_2disp+\beta_3hp+\epsilon\].

Let’s estimate the unrestricted model using the expose pipe `%$%`

both with and without robust errors.

```
# estimate model without reobust standard errors
ols <- mtcars %$% lm(mpg ~ cyl + disp + hp)
ols %>% tidy()
```

```
# A tibble: 4 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 34.2 2.59 13.2 1.54e-13
2 cyl -1.23 0.797 -1.54 1.35e- 1
3 disp -0.0188 0.0104 -1.81 8.09e- 2
4 hp -0.0147 0.0147 -1.00 3.25e- 1
```

```
# estimate model with robust standard erros
ols_robust <- mtcars %$% lm_robust(mpg ~ cyl + disp + hp)
ols_robust %>% tidy()
```

```
term estimate std.error statistic p.value conf.low
1 (Intercept) 34.1849 2.4700 13.84 0.000000000000048 29.1253
2 cyl -1.2274 0.5967 -2.06 0.049121075438813 -2.4498
3 disp -0.0188 0.0083 -2.27 0.031138440490781 -0.0358
4 hp -0.0147 0.0109 -1.34 0.190818678697032 -0.0371
conf.high df outcome
1 39.24451 28 mpg
2 -0.00506 28 mpg
3 -0.00183 28 mpg
4 0.00775 28 mpg
```

#### 7.1.5.2 Multicollinearity

Using the model above \[mpg=\beta_0+\beta_1cyl+\beta_2disp+\beta_3hp+\epsilon\].

We can calcualte the VIF’s as follows:

```
cyl disp hp
6.73 5.52 3.35
```

```
cyl disp hp
3.67 2.90 2.71
```

#### 7.1.5.3 Standardize Regression Coefficients

Using the model \[mpg=\beta_0+\beta_1cyl+\beta_2disp+\beta_3hp+\epsilon\], estimate standardized regression coefficients as follows:

```
Call:
lm(formula = mpg ~ cyl + disp + hp)
Standardized Coefficients::
(Intercept) cyl disp hp
0.000 -0.364 -0.387 -0.167
```

#### 7.1.5.4 *F*-test for Overall significance

Suppose we want to estimate the mpg as a function of the number of cylinders, the displacement, and the gross horsepower, then our (unrestricted) model is \[mpg=\beta_0+\beta_1cyl+\beta_2disp+\beta_3hp+\epsilon\].

Let’s estimate the unrestricted model using the expose pipe `%$%`

The test for overall significance is: \[H_0:\beta_1=\beta_2=\beta_3=0\] \[H_1: \text{@ least one }\beta_j\ne0\]

Recall that *the* *F*-test is reported as a matter of course in `summary`

from base R and `glance`

from the broom package.

```
Call:
lm(formula = mpg ~ cyl + disp + hp)
Residuals:
Min 1Q Median 3Q Max
-4.089 -2.085 -0.774 1.397 6.918
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.1849 2.5908 13.19 0.00000000000015 ***
cyl -1.2274 0.7973 -1.54 0.135
disp -0.0188 0.0104 -1.81 0.081 .
hp -0.0147 0.0147 -1.00 0.325
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.06 on 28 degrees of freedom
Multiple R-squared: 0.768, Adjusted R-squared: 0.743
F-statistic: 30.9 on 3 and 28 DF, p-value: 0.00000000505
```

```
# A tibble: 1 x 11
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
<dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.768 0.743 3.06 30.9 5.05e-9 4 -79.0 168. 175.
# ... with 2 more variables: deviance <dbl>, df.residual <int>
```

So we see that \(F=30.877\), \(q=3\), and \(df2=28\). The critical *F* with \(\alpha=.05\) is \(2.947\). Since the calculated *F*-stat is greater than the critical *F*-stat, we reject \(H_0\) in favor of \(H_1\). That is, the explanatory power of the model is statistical significant.

#### 7.1.5.5 *F*-test of Joint Significance

Suppose we’d like to add the weight (wt), number of gears (gear), and number of carburetors (carb) together increase the explanatory power of the model at the \(\alpha=.05\), level of significance. Our unrestricted model becomes: \[mpg=\beta_0+\beta_1cyl+\beta_2disp+\beta_3hp+\beta_4wt+\beta_5gear+\beta_6carb+\eta\].

The null and alternative hypotheses are: \[H_0:\beta_4=\beta_5=\beta_6=0\] \[H_1:\text{@ least one }\beta_j\ne0\]

#### 7.1.5.6 Perform the test “manually”

```
# estimate the unrestricted model
ols_u <- mtcars %$% lm(mpg ~ cyl + disp + disp + hp + wt + gear + carb)
# generate the residual sum of squares
rss_u <- ols_u$residuals^2 %>%
sum()
# retrive the degrees of freedom for the unrestricted model
df_u <- ols_u$df.residual
# estimate the restricted model
ols_r <- mtcars %$% lm(mpg ~ cyl + disp + disp + hp)
# generate the residual sum of squares
rss_r <- ols_r$residuals^2 %>%
sum()
# retrive the degrees of freedom for the restricted model
df_r <- ols_r$df.residual
# calculate the number of restrictions
q <- df_r - df_u
# calculate F
(F_stat <- ((rss_r-rss_u)/q)/(rss_u/df_u)) # () around the call prints the result to the screen
```

`[1] 4.8`

`[1] 2.99`

`[1] 0.00897`

Since 4.796 is greater than 2.991 we can reject \(H_0\) in favor of \(H_1\) and conclude that wt, am, and carb add signficant explanatory power to the model. We can also see that the p-vallue for our calculated *F*-statistic is 0.009. Since this is less than \(\alpha=.05\) we reject \(H_0\).

#### 7.1.5.7 Perform the test with `linearHypothesis`

```
Linear hypothesis test
Hypothesis:
wt = 0
gear = 0
carb = 0
Model 1: restricted model
Model 2: mpg ~ cyl + disp + disp + hp + wt + gear + carb
Res.Df RSS Df Sum of Sq F Pr(>F)
1 28 261
2 25 166 3 95.5 4.8 0.009 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Of course, we have the same result.

#### 7.1.5.8 Test of Linear Restrictions

Let the model be \[\ln(mpg)=\beta_0+\beta_1\ln(cyl)+\beta_2\ln(wt)+\epsilon\]. Suppose we’d like to test \[H_0:\beta_1+\beta_2=-1\] against \[H_0:\beta_1+\beta_2\ne-1\]

##### 7.1.5.8.1 Perform the Test “Manually”

Form the restricted model under \(H_0\). If \(H_0\) holds, \(\beta_2=-1-\beta_1\). Substituting into the unrestricted model yields the restricted model: \[\text{R: }\ln(mpg)+\ln(wt)=\beta_0+\beta_1(\ln(cyl)-\ln(wt))+\eta\]

```
# estimate the unrestricted model
ols_u <- mtcars %$% lm(log(mpg) ~ log(cyl) + log(wt))
# generate the residual sum of squares
rss_u <- ols_u$residuals^2 %>%
sum()
# retrive the degrees of freedom for the unrestricted model
df_u <- ols_u$df.residual
# estimate the restricted model
ols_r <- mtcars %$% lm(I(log(mpg)+log(wt)) ~ I(log(cyl) - log(wt)))
# generate the residual sum of squares
rss_r <- ols_r$residuals^2 %>%
sum()
# retrive the degrees of freedom for the restricted model
df_r <- ols_r$df.residual
# calculate the number of restrictions
q <- df_r - df_u
# calculate F
(F_stat <- ((rss_r-rss_u)/q)/(rss_u/df_u)) # () around the call prints the result to the screen
```

`[1] 1.29`

`[1] 4.18`

`[1] 0.266`

Since 1.289 is less than 4.183 we can failt to reject \(H_0\) and conclude that we have no evidence to suggest that \(\beta_1+\beta_2\ne1\). We can also see that the p-vallue for our calculated *F*-statistic is 0.266. Since this is greater than \(\alpha=.05\) we fail to reject \(H_0\).

#### 7.1.5.9 Perform the test with `linearHypothesis`

```
Linear hypothesis test
Hypothesis:
log(cyl) + log(wt) = - 1
Model 1: restricted model
Model 2: log(mpg) ~ log(cyl) + log(wt)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 30 0.419
2 29 0.401 1 0.0178 1.29 0.27
```