4.5 Inference
Draw conclusions about the significance of the coefficient estimates with the t-test and/or F-test.
4.5.1 t-Test
By assumption, the residuals are normally distributed, so the Z-test statistic could evaluate the parameter estimators,
\[Z = \frac{\hat{\beta} - \beta_0}{\sqrt{\sigma^2 (X'X)^{-1}}}\]
where \(\beta_0\) is the null-hypothesized value, usually 0. \(\sigma\) is unknown, but \(\frac{\hat{\sigma}^2 (n - k)}{\sigma^2} \sim \chi^2\). The ratio of the normal distribution divided by the adjusted chi-square \(\sqrt{\chi^2 / (n - k)}\) is t-distributed,
\[t = \frac{\hat{\beta} - \beta_0}{\sqrt{\hat{\sigma}^2 (X'X)^{-1}}} = \frac{\hat{\beta} - \beta_0}{SE(\hat{\beta})}\]
The \((1 - \alpha)\) confidence intervals are \(CI = \hat{\beta} \pm t_{\alpha / 2, df} SE(\hat{\beta})\) with p-value equaling the probability of measuring a \(t\) of that extreme, \(p = P(t > |t|)\). For a one-tail test, divide the reported p-value by two. The \(SE(\hat{\beta})\) decreases with 1) a better fitting regression line (smaller \(\hat{\sigma}^2\)), 2) greater variation in the predictor (larger \(X'X\)), and 3) larger sample size (larger n).
4.5.1.1 Example
Define a 95% confidence interval around the slope parameters.
The summary()
output shows the t values and probabilities in the t value
and Pr(>|t|)
columns. You can verify this manually using matrix algebra for \(t = \frac{(\hat{\beta} - \beta_1)}{SE(\hat{\beta})}\) with \(\beta_1 = 0\). The \((1 - \alpha)\) confidence interval is \(CI = \hat{\beta} \pm t_{\alpha / 2, df} SE(\hat{\beta})\). The table below gathers the parameter estimators and t-test results.
t <- beta_hat / se_beta_hat
p_value <- pt(q = abs(t),
df = n - k - 1,
lower.tail = FALSE) * 2
t_crit <- qt(p = .05 / 2, df = n - k - 1, lower.tail = FALSE)
lcl = beta_hat - t_crit * se_beta_hat
ucl = beta_hat + t_crit * se_beta_hat
data.frame(beta = round(beta_hat, 4),
se = round(se_beta_hat, 4),
t = round(t, 4),
p = round(p_value, 4),
lcl = round(lcl,4),
ucl = round(ucl, 4))
## beta se t p lcl ucl
## (Intercept) 19.5410 14.146 1.38 0.181 -9.797 48.8789
## cyl.L 0.3426 2.765 0.12 0.902 -5.391 6.0765
## cyl.Q 1.3884 1.112 1.25 0.225 -0.918 3.6948
## disp 0.0067 0.013 0.49 0.625 -0.021 0.0347
## hp -0.0291 0.017 -1.70 0.104 -0.065 0.0065
## drat 0.5881 1.503 0.39 0.699 -2.529 3.7053
## wt -3.1552 1.420 -2.22 0.037 -6.101 -0.2099
## qsec 0.5232 0.690 0.76 0.456 -0.908 1.9545
## vsS 1.2378 2.106 0.59 0.563 -3.130 5.6055
## ammanual 3.0009 1.853 1.62 0.120 -0.843 6.8446
4.5.2 F-Test
The F-test for the model is a test of the null hypothesis that none of the independent variables linearly predict the dependent variable, that is, the model parameters are jointly zero: \(H_0 : \beta_1 = \ldots = \beta_k = 0\). The regression mean sum of squares \(MSR = \frac{(\hat{y} - \bar{y})'(\hat{y} - \bar{y})}{k-1}\) and the error mean sum of squares \(MSE = \frac{\hat{\epsilon}'\hat{\epsilon}}{n-k}\) are each chi-square variables. Their ratio has an F distribution with \(k - 1\) numerator degrees of freedom and \(n - k\) denominator degrees of freedom. The F statistic can also be expressed in terms of the coefficient of correlation \(R^2 = \frac{MSR}{MST}\).
\[F(k - 1, n - k) = \frac{MSR}{MSE} = \frac{R^2}{1 - R^2} \frac{n-k}{k-1}\]
MSE is \(\sigma^2\). If \(H_0\) is true, that is, there is no relationship between the predictors and the response, then \(MSR\) is also equal to \(\sigma^2\), so \(F = 1\). As \(R^2 \rightarrow 1\), \(F \rightarrow \infty\), and as \(R^2 \rightarrow 0\), \(F \rightarrow 0\). F increases with \(n\) and decreases with \(k\).
4.5.2.1 Example
What is the probability that all parameters are jointly equal to zero?
The F-statistic is presented at the bottom of the summary()
function. You can verify this manually.
ssr <- sum((m$fitted.values - mean(d$mpg))^2)
sse <- sum(m$residuals^2)
sst <- sum((m$mpg - mean(d$mpg))^2)
msr <- ssr / k
mse <- sse / (n - k - 1)
f = msr / mse
p_value <- pf(q = f, df1 = k, df2 = n - k - 1, lower.tail = FALSE)
cat("F-statistic: ", round(f, 4), " on 3 and 65 DF, p-value: ", p_value)
## F-statistic: 17 on 3 and 65 DF, p-value: 0.000000048
There is sufficient evidence \((F = 17.35, P < .0001)\) to reject \(H_0\) that the parameter estimators are jointly equal to zero.
The aov
function calculates the sequential sum of squares. The regression sum of squares SSR for mpg ~ cyl
is 824.8. Adding disp
to the model increases SSR by 57.6. Adding hp
to the model increases SSR by 18.5. It would seem that hp
does not improve the model.
## Df Sum Sq Mean Sq F value Pr(>F)
## cyl 2 825 412 65.26 0.00000000056 ***
## disp 1 58 58 9.12 0.0063 **
## hp 1 19 19 2.93 0.1011
## drat 1 12 12 1.89 0.1836
## wt 1 56 56 8.83 0.0070 **
## qsec 1 2 2 0.24 0.6282
## vs 1 0 0 0.05 0.8289
## am 1 17 17 2.62 0.1197
## Residuals 22 139 6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Order matters. Had we started with disp
, then added hp
we would find both estimators were significant.
## Df Sum Sq Mean Sq F value Pr(>F)
## disp 1 809 809 128.00 0.00000000012 ***
## hp 1 34 34 5.33 0.031 *
## drat 1 30 30 4.77 0.040 *
## wt 1 71 71 11.16 0.003 **
## qsec 1 13 13 2.01 0.170
## vs 1 0 0 0.04 0.852
## am 1 20 20 3.24 0.086 .
## cyl 2 10 5 0.82 0.451
## Residuals 22 139 6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1