## 4.5 Inference

Draw conclusions about the significance of the coefficient estimates with the t-test and/or F-test.

### 4.5.1t-Test

By assumption, the residuals are normally distributed, so the Z-test statistic could evaluate the parameter estimators,

$Z = \frac{\hat{\beta} - \beta_0}{\sqrt{\sigma^2 (X'X)^{-1}}}$

where $$\beta_0$$ is the null-hypothesized value, usually 0. $$\sigma$$ is unknown, but $$\frac{\hat{\sigma}^2 (n - k)}{\sigma^2} \sim \chi^2$$. The ratio of the normal distribution divided by the adjusted chi-square $$\sqrt{\chi^2 / (n - k)}$$ is t-distributed,

$t = \frac{\hat{\beta} - \beta_0}{\sqrt{\hat{\sigma}^2 (X'X)^{-1}}} = \frac{\hat{\beta} - \beta_0}{SE(\hat{\beta})}$

The $$(1 - \alpha)$$ confidence intervals are $$CI = \hat{\beta} \pm t_{\alpha / 2, df} SE(\hat{\beta})$$ with p-value equaling the probability of measuring a $$t$$ of that extreme, $$p = P(t > |t|)$$. For a one-tail test, divide the reported p-value by two. The $$SE(\hat{\beta})$$ decreases with 1) a better fitting regression line (smaller $$\hat{\sigma}^2$$), 2) greater variation in the predictor (larger $$X'X$$), and 3) larger sample size (larger n).

#### 4.5.1.1 Example

Define a 95% confidence interval around the slope parameters.

The summary() output shows the t values and probabilities in the t value and Pr(>|t|) columns. You can verify this manually using matrix algebra for $$t = \frac{(\hat{\beta} - \beta_1)}{SE(\hat{\beta})}$$ with $$\beta_1 = 0$$. The $$(1 - \alpha)$$ confidence interval is $$CI = \hat{\beta} \pm t_{\alpha / 2, df} SE(\hat{\beta})$$. The table below gathers the parameter estimators and t-test results.

t <- beta_hat / se_beta_hat
p_value <- pt(q = abs(t),
df = n - k - 1,
lower.tail = FALSE) * 2
t_crit <- qt(p = .05 / 2, df = n - k - 1, lower.tail = FALSE)
lcl = beta_hat - t_crit * se_beta_hat
ucl = beta_hat + t_crit * se_beta_hat
data.frame(beta = round(beta_hat, 4),
se = round(se_beta_hat, 4),
t = round(t, 4),
p = round(p_value, 4),
lcl = round(lcl,4),
ucl = round(ucl, 4))
##                beta     se     t     p    lcl     ucl
## (Intercept) 19.5410 14.146  1.38 0.181 -9.797 48.8789
## cyl.L        0.3426  2.765  0.12 0.902 -5.391  6.0765
## cyl.Q        1.3884  1.112  1.25 0.225 -0.918  3.6948
## disp         0.0067  0.013  0.49 0.625 -0.021  0.0347
## hp          -0.0291  0.017 -1.70 0.104 -0.065  0.0065
## drat         0.5881  1.503  0.39 0.699 -2.529  3.7053
## wt          -3.1552  1.420 -2.22 0.037 -6.101 -0.2099
## qsec         0.5232  0.690  0.76 0.456 -0.908  1.9545
## vsS          1.2378  2.106  0.59 0.563 -3.130  5.6055
## ammanual     3.0009  1.853  1.62 0.120 -0.843  6.8446

### 4.5.2F-Test

The F-test for the model is a test of the null hypothesis that none of the independent variables linearly predict the dependent variable, that is, the model parameters are jointly zero: $$H_0 : \beta_1 = \ldots = \beta_k = 0$$. The regression mean sum of squares $$MSR = \frac{(\hat{y} - \bar{y})'(\hat{y} - \bar{y})}{k-1}$$ and the error mean sum of squares $$MSE = \frac{\hat{\epsilon}'\hat{\epsilon}}{n-k}$$ are each chi-square variables. Their ratio has an F distribution with $$k - 1$$ numerator degrees of freedom and $$n - k$$ denominator degrees of freedom. The F statistic can also be expressed in terms of the coefficient of correlation $$R^2 = \frac{MSR}{MST}$$.

$F(k - 1, n - k) = \frac{MSR}{MSE} = \frac{R^2}{1 - R^2} \frac{n-k}{k-1}$

MSE is $$\sigma^2$$. If $$H_0$$ is true, that is, there is no relationship between the predictors and the response, then $$MSR$$ is also equal to $$\sigma^2$$, so $$F = 1$$. As $$R^2 \rightarrow 1$$, $$F \rightarrow \infty$$, and as $$R^2 \rightarrow 0$$, $$F \rightarrow 0$$. F increases with $$n$$ and decreases with $$k$$.

#### 4.5.2.1 Example

What is the probability that all parameters are jointly equal to zero?

The F-statistic is presented at the bottom of the summary() function. You can verify this manually.

ssr <- sum((m$fitted.values - mean(d$mpg))^2)
sse <- sum(m$residuals^2) sst <- sum((m$mpg - mean(d\$mpg))^2)
msr <- ssr / k
mse <- sse / (n - k - 1)
f = msr / mse
p_value <- pf(q = f, df1 = k, df2 = n - k - 1, lower.tail = FALSE)
cat("F-statistic: ", round(f, 4), " on 3 and 65 DF,  p-value: ", p_value)
## F-statistic:  17  on 3 and 65 DF,  p-value:  0.000000048

There is sufficient evidence $$(F = 17.35, P < .0001)$$ to reject $$H_0$$ that the parameter estimators are jointly equal to zero.

The aov function calculates the sequential sum of squares. The regression sum of squares SSR for mpg ~ cyl is 824.8. Adding disp to the model increases SSR by 57.6. Adding hp to the model increases SSR by 18.5. It would seem that hp does not improve the model.

summary(aov(m))
##             Df Sum Sq Mean Sq F value        Pr(>F)
## cyl          2    825     412   65.26 0.00000000056 ***
## disp         1     58      58    9.12        0.0063 **
## hp           1     19      19    2.93        0.1011
## drat         1     12      12    1.89        0.1836
## wt           1     56      56    8.83        0.0070 **
## qsec         1      2       2    0.24        0.6282
## vs           1      0       0    0.05        0.8289
## am           1     17      17    2.62        0.1197
## Residuals   22    139       6
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Order matters. Had we started with disp, then added hp we would find both estimators were significant.

summary(aov(lm(mpg ~ disp + hp + drat + wt + qsec + vs + am + cyl, data = d)))
##             Df Sum Sq Mean Sq F value        Pr(>F)
## disp         1    809     809  128.00 0.00000000012 ***
## hp           1     34      34    5.33         0.031 *
## drat         1     30      30    4.77         0.040 *
## wt           1     71      71   11.16         0.003 **
## qsec         1     13      13    2.01         0.170
## vs           1      0       0    0.04         0.852
## am           1     20      20    3.24         0.086 .
## cyl          2     10       5    0.82         0.451
## Residuals   22    139       6
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1