14.2 Wald test
\[ \begin{aligned} W &= (\hat{\theta}-\theta_0)'[cov(\hat{\theta})]^{-1}(\hat{\theta}-\theta_0) \\ W &\sim \chi_q^2 \end{aligned} \]
where \(cov(\hat{\theta})\) is given by the inverse Fisher Information matrix evaluated at \(\hat{\theta}\) and q is the rank of \(cov(\hat{\theta})\), which is the number of non-redundant parameters in \(\theta\)
Alternatively,
\[ t_W=\frac{(\hat{\theta}-\theta_0)^2}{I(\theta_0)^{-1}} \sim \chi^2_{(v)} \]
where v is the degree of freedom.
Equivalently,
\[ s_W= \frac{\hat{\theta}-\theta_0}{\sqrt{I(\hat{\theta})^{-1}}} \sim Z \]
How far away in the distribution your sample estimate is from the hypothesized population parameter.
For a null value, what is the probability you would have obtained a realization “more extreme” or “worse” than the estimate you actually obtained?
Significance Level (\(\alpha\)) and Confidence Level (\(1-\alpha\))
- The significance level is the benchmark in which the probability is so low that we would have to reject the null
- The confidence level is the probability that sets the bounds on how far away the realization of the estimator would have to be to reject the null.
Test Statistics
- Standardized (transform) the estimator and null value to a test statistic that always has the same distribution
- Test Statistic for the OLS estimator for a single hypothesis
\[ T = \frac{\sqrt{n}(\hat{\beta}_j-\beta_{j0})}{\sqrt{n}SE(\hat{\beta_j})} \sim^a N(0,1) \]
Equivalently,
\[ T = \frac{(\hat{\beta}_j-\beta_{j0})}{SE(\hat{\beta_j})} \sim^a N(0,1) \]
the test statistic is another random variable that is a function of the data and null hypothesis.
- T denotes the random variable test statistic
- t denotes the single realization of the test statistic
Evaluating Test Statistic: determine whether or not we reject or fail to reject the null hypothesis at a given significance / confidence level
Three equivalent ways
Critical Value
P-value
Confidence Interval
Critical Value
For a given significance level, will determine the critical value \((c)\)
- One-sided: \(H_0: \beta_j \ge \beta_{j0}\)
\[ P(T<c|H_0)=\alpha \]
Reject the null if \(t<c\)
- One-sided: \(H_0: \beta_j \le \beta_{j0}\)
\[ P(T>c|H_0)=\alpha \]
Reject the null if \(t>c\)
- Two-sided: \(H_0: \beta_j \neq \beta_{j0}\)
\[ P(|T|>c|H_0)=\alpha \]
Reject the null if \(|t|>c\)
- p-value
Calculate the probability that the test statistic was worse than the realization you have
- One-sided: \(H_0: \beta_j \ge \beta_{j0}\)
\[ \text{p-value} = P(T<t|H_0) \]
- One-sided: \(H_0: \beta_j \le \beta_{j0}\)
\[ \text{p-value} = P(T>t|H_0) \]
- Two-sided: \(H_0: \beta_j \neq \beta_{j0}\)
\[ \text{p-value} = P(|T|<t|H_0) \]
reject the null if p-value \(< \alpha\)
- Confidence Interval
Using the critical value associated with a null hypothesis and significance level, create an interval
\[ CI(\hat{\beta}_j)_{\alpha} = [\hat{\beta}_j-(c \times SE(\hat{\beta}_j)),\hat{\beta}_j+(c \times SE(\hat{\beta}_j))] \]
If the null set lies outside the interval then we reject the null.
- We are not testing whether the true population value is close to the estimate, we are testing that given a field true population value of the parameter, how like it is that we observed this estimate.
- Can be interpreted as we believe with \((1-\alpha)\times 100 \%\) probability that the confidence interval captures the true parameter value.
With stronger assumption (A1-A6), we could consider Finite Sample Properties
\[ T = \frac{\hat{\beta}_j-\beta_{j0}}{SE(\hat{\beta}_j)} \sim T(n-k) \]
- This above distributional derivation is strongly dependent on A4 and A5
- T has a student t-distribution because the numerator is normal and the denominator is \(\chi^2\).
- Critical value and p-values will be calculated from the student t-distribution rather than the standard normal distribution.
- \(n \to \infty\), \(T(n-k)\) is asymptotically standard normal.
Rule of thumb
if \(n-k>120\): the critical values and p-values from the t-distribution are (almost) the same as the critical values and p-values from the standard normal distribution.
if \(n-k<120\)
- if (A1-A6) hold then the t-test is an exact finite distribution test
- if (A1-A3a, A5) hold, because the t-distribution is asymptotically normal, computing the critical values from a t-distribution is still a valid asymptotic test (i.e., not quite the right critical values and p0values, the difference goes away as \(n \to \infty\))
14.2.1 Multiple Hypothesis
test multiple parameters as the same time
- \(H_0: \beta_1 = 0\ \& \ \beta_2 = 0\)
- \(H_0: \beta_1 = 1\ \& \ \beta_2 = 0\)
perform a series of simply hypothesis does not answer the question (joint distribution vs. two marginal distributions).
The test statistic is based on a restriction written in matrix form.
\[ y=\beta_0+x_1\beta_1 + x_2\beta_2 + x_3\beta_3 + \epsilon \]
Null hypothesis is \(H_0: \beta_1 = 0\) & \(\beta_2=0\) can be rewritten as \(H_0: \mathbf{R}\beta -\mathbf{q}=0\) where
- \(\mathbf{R}\) is a \(m \times k\) matrix where m is the number of restrictions and \(k\) is the number of parameters. \(\mathbf{q}\) is a \(k \times 1\) vector
- \(\mathbf{R}\) “picks up” the relevant parameters while \(\mathbf{q}\) is a the null value of the parameter
\[ \mathbf{R}= \left( \begin{array}{cccc} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{array} \right), \mathbf{q} = \left( \begin{array}{c} 0 \\ 0 \\ \end{array} \right) \]
Test Statistic for OLS estimator for a multiple hypothesis
\[ F = \frac{(\mathbf{R\hat{\beta}-q})\hat{\Sigma}^{-1}(\mathbf{R\hat{\beta}-q})}{m} \sim^a F(m,n-k) \]
\(\hat{\Sigma}^{-1}\) is the estimator for the asymptotic variance-covariance matrix
When \(m = 1\), there is only a single restriction, then the \(F\)-statistic is the \(t\)-statistic squared.
\(F\) distribution is strictly positive, check F-Distribution for more details.
14.2.2 Linear Combination
Testing multiple parameters as the same time
\[ \begin{aligned} H_0&: \beta_1 -\beta_2 = 0 \\ H_0&: \beta_1 - \beta_2 > 0 \\ H_0&: \beta_1 - 2\times\beta_2 =0 \end{aligned} \]
Each is a single restriction on a function of the parameters.
Null hypothesis:
\[ H_0: \beta_1 -\beta_2 = 0 \]
can be rewritten as
\[ H_0: \mathbf{R}\beta -\mathbf{q}=0 \]
where \(\mathbf{R}\)=(0 1 -1 0 0) and \(\mathbf{q}=0\)
14.2.3 Estimate Difference in Coefficients
There is no package to estimate for the difference between two coefficients and its CI, but a simple function created by Katherine Zee can be used to calculate this difference. Some modifications might be needed if you don’t use standard lm
model in R.
difftest_lm <- function(x1, x2, model) {
diffest <-
summary(model)$coef[x1, "Estimate"] - summary(model)$coef[x2, "Estimate"]
vardiff <- (summary(model)$coef[x1, "Std. Error"] ^ 2 +
summary(model)$coef[x2, "Std. Error"] ^ 2) - (2 * (vcov(model)[x1, x2]))
# variance of x1 + variance of x2 - 2*covariance of x1 and x2
diffse <- sqrt(vardiff)
tdiff <- (diffest) / (diffse)
ptdiff <- 2 * (1 - pt(abs(tdiff), model$df, lower.tail = T))
upr <-
# will usually be very close to 1.96
diffest + qt(.975, df = model$df) * diffse
lwr <- diffest + qt(.025, df = model$df) * diffse
df <- model$df
return(list(
est = round(diffest, digits = 2),
t = round(tdiff, digits = 2),
p = round(ptdiff, digits = 4),
lwr = round(lwr, digits = 2),
upr = round(upr, digits = 2),
df = df
))
}
14.2.4 Application
library("car")
# Multiple hypothesis
mod.davis <- lm(weight ~ repwt, data=Davis)
linearHypothesis(mod.davis, c("(Intercept) = 0", "repwt = 1"),white.adjust = TRUE)
#> Linear hypothesis test
#>
#> Hypothesis:
#> (Intercept) = 0
#> repwt = 1
#>
#> Model 1: restricted model
#> Model 2: weight ~ repwt
#>
#> Note: Coefficient covariance matrix supplied.
#>
#> Res.Df Df F Pr(>F)
#> 1 183
#> 2 181 2 3.3896 0.03588 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Linear Combination
mod.duncan <- lm(prestige ~ income + education, data=Duncan)
linearHypothesis(mod.duncan, "1*income - 1*education = 0")
#> Linear hypothesis test
#>
#> Hypothesis:
#> income - education = 0
#>
#> Model 1: restricted model
#> Model 2: prestige ~ income + education
#>
#> Res.Df RSS Df Sum of Sq F Pr(>F)
#> 1 43 7518.9
#> 2 42 7506.7 1 12.195 0.0682 0.7952
14.2.5 Nonlinear
Suppose that we have q nonlinear functions of the parameters
\[
\mathbf{h}(\theta) = \{ h_1 (\theta), ..., h_q (\theta)\}'
\]
The,n, the Jacobian matrix (\(\mathbf{H}(\theta)\)), of rank q is
\[ \mathbf{H}_{q \times p}(\theta) = \left( \begin{array} {ccc} \frac{\partial h_1(\theta)}{\partial \theta_1} & ... & \frac{\partial h_1(\theta)}{\partial \theta_p} \\ . & . & . \\ \frac{\partial h_q(\theta)}{\partial \theta_1} & ... & \frac{\partial h_q(\theta)}{\partial \theta_p} \end{array} \right) \]
where the null hypothesis \(H_0: \mathbf{h} (\theta) = 0\) can be tested against the 2-sided alternative with the Wald statistic
\[ W = \frac{\mathbf{h(\hat{\theta})'\{H(\hat{\theta})[F(\hat{\theta})'F(\hat{\theta})]^{-1}H(\hat{\theta})'\}^{-1}h(\hat{\theta})}}{s^2q} \sim F_{q,n-p} \]