## 14.2 Wald test

\begin{aligned} W &= (\hat{\theta}-\theta_0)'[cov(\hat{\theta})]^{-1}(\hat{\theta}-\theta_0) \\ W &\sim \chi_q^2 \end{aligned}

where $$cov(\hat{\theta})$$ is given by the inverse Fisher Information matrix evaluated at $$\hat{\theta}$$ and q is the rank of $$cov(\hat{\theta})$$, which is the number of non-redundant parameters in $$\theta$$

Alternatively,

$t_W=\frac{(\hat{\theta}-\theta_0)^2}{I(\theta_0)^{-1}} \sim \chi^2_{(v)}$

where v is the degree of freedom.

Equivalently,

$s_W= \frac{\hat{\theta}-\theta_0}{\sqrt{I(\hat{\theta})^{-1}}} \sim Z$

How far away in the distribution your sample estimate is from the hypothesized population parameter.

For a null value, what is the probability you would have obtained a realization “more extreme” or “worse” than the estimate you actually obtained?

Significance Level ($$\alpha$$) and Confidence Level ($$1-\alpha$$)

• The significance level is the benchmark in which the probability is so low that we would have to reject the null
• The confidence level is the probability that sets the bounds on how far away the realization of the estimator would have to be to reject the null.

Test Statistics

• Standardized (transform) the estimator and null value to a test statistic that always has the same distribution
• Test Statistic for the OLS estimator for a single hypothesis

$T = \frac{\sqrt{n}(\hat{\beta}_j-\beta_{j0})}{\sqrt{n}SE(\hat{\beta_j})} \sim^a N(0,1)$

Equivalently,

$T = \frac{(\hat{\beta}_j-\beta_{j0})}{SE(\hat{\beta_j})} \sim^a N(0,1)$

the test statistic is another random variable that is a function of the data and null hypothesis.

• T denotes the random variable test statistic
• t denotes the single realization of the test statistic

Evaluating Test Statistic: determine whether or not we reject or fail to reject the null hypothesis at a given significance / confidence level

Three equivalent ways

1. Critical Value

2. P-value

3. Confidence Interval

4. Critical Value

For a given significance level, will determine the critical value $$(c)$$

• One-sided: $$H_0: \beta_j \ge \beta_{j0}$$

$P(T<c|H_0)=\alpha$

Reject the null if $$t<c$$

• One-sided: $$H_0: \beta_j \le \beta_{j0}$$

$P(T>c|H_0)=\alpha$

Reject the null if $$t>c$$

• Two-sided: $$H_0: \beta_j \neq \beta_{j0}$$

$P(|T|>c|H_0)=\alpha$

Reject the null if $$|t|>c$$

1. p-value

Calculate the probability that the test statistic was worse than the realization you have

• One-sided: $$H_0: \beta_j \ge \beta_{j0}$$

$\text{p-value} = P(T<t|H_0)$

• One-sided: $$H_0: \beta_j \le \beta_{j0}$$

$\text{p-value} = P(T>t|H_0)$

• Two-sided: $$H_0: \beta_j \neq \beta_{j0}$$

$\text{p-value} = P(|T|<t|H_0)$

reject the null if p-value $$< \alpha$$

1. Confidence Interval

Using the critical value associated with a null hypothesis and significance level, create an interval

$CI(\hat{\beta}_j)_{\alpha} = [\hat{\beta}_j-(c \times SE(\hat{\beta}_j)),\hat{\beta}_j+(c \times SE(\hat{\beta}_j))]$

If the null set lies outside the interval then we reject the null.

• We are not testing whether the true population value is close to the estimate, we are testing that given a field true population value of the parameter, how like it is that we observed this estimate.
• Can be interpreted as we believe with $$(1-\alpha)\times 100 \%$$ probability that the confidence interval captures the true parameter value.

With stronger assumption (A1-A6), we could consider Finite Sample Properties

$T = \frac{\hat{\beta}_j-\beta_{j0}}{SE(\hat{\beta}_j)} \sim T(n-k)$

• This above distributional derivation is strongly dependent on A4 and A5
• T has a student t-distribution because the numerator is normal and the denominator is $$\chi^2$$.
• Critical value and p-values will be calculated from the student t-distribution rather than the standard normal distribution.
• $$n \to \infty$$, $$T(n-k)$$ is asymptotically standard normal.

Rule of thumb

• if $$n-k>120$$: the critical values and p-values from the t-distribution are (almost) the same as the critical values and p-values from the standard normal distribution.

• if $$n-k<120$$

• if (A1-A6) hold then the t-test is an exact finite distribution test
• if (A1-A3a, A5) hold, because the t-distribution is asymptotically normal, computing the critical values from a t-distribution is still a valid asymptotic test (i.e., not quite the right critical values and p0values, the difference goes away as $$n \to \infty$$)

### 14.2.1 Multiple Hypothesis

• test multiple parameters as the same time

• $$H_0: \beta_1 = 0\ \& \ \beta_2 = 0$$
• $$H_0: \beta_1 = 1\ \& \ \beta_2 = 0$$
• perform a series of simply hypothesis does not answer the question (joint distribution vs. two marginal distributions).

• The test statistic is based on a restriction written in matrix form.

$y=\beta_0+x_1\beta_1 + x_2\beta_2 + x_3\beta_3 + \epsilon$

Null hypothesis is $$H_0: \beta_1 = 0$$ & $$\beta_2=0$$ can be rewritten as $$H_0: \mathbf{R}\beta -\mathbf{q}=0$$ where

• $$\mathbf{R}$$ is a $$m \times k$$ matrix where m is the number of restrictions and $$k$$ is the number of parameters. $$\mathbf{q}$$ is a $$k \times 1$$ vector
• $$\mathbf{R}$$ “picks up” the relevant parameters while $$\mathbf{q}$$ is a the null value of the parameter

$\mathbf{R}= \left( \begin{array}{cccc} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{array} \right), \mathbf{q} = \left( \begin{array}{c} 0 \\ 0 \\ \end{array} \right)$

Test Statistic for OLS estimator for a multiple hypothesis

$F = \frac{(\mathbf{R\hat{\beta}-q})\hat{\Sigma}^{-1}(\mathbf{R\hat{\beta}-q})}{m} \sim^a F(m,n-k)$

• $$\hat{\Sigma}^{-1}$$ is the estimator for the asymptotic variance-covariance matrix

• if A4 holds, both the homoskedastic and heteroskedastic versions produce valid estimator
• If A4 does not hold, only the heteroskedastic version produces valid estimators.
• When $$m = 1$$, there is only a single restriction, then the $$F$$-statistic is the $$t$$-statistic squared.

• $$F$$ distribution is strictly positive, check F-Distribution for more details.

### 14.2.2 Linear Combination

Testing multiple parameters as the same time

\begin{aligned} H_0&: \beta_1 -\beta_2 = 0 \\ H_0&: \beta_1 - \beta_2 > 0 \\ H_0&: \beta_1 - 2\times\beta_2 =0 \end{aligned}

Each is a single restriction on a function of the parameters.

Null hypothesis:

$H_0: \beta_1 -\beta_2 = 0$

can be rewritten as

$H_0: \mathbf{R}\beta -\mathbf{q}=0$

where $$\mathbf{R}$$=(0 1 -1 0 0) and $$\mathbf{q}=0$$

### 14.2.3 Estimate Difference in Coefficients

There is no package to estimate for the difference between two coefficients and its CI, but a simple function created by Katherine Zee can be used to calculate this difference. Some modifications might be needed if you don’t use standard lm model in R.

difftest_lm <- function(x1, x2, model) {
diffest <-
summary(model)$coef[x1, "Estimate"] - summary(model)$coef[x2, "Estimate"]

vardiff <- (summary(model)$coef[x1, "Std. Error"] ^ 2 + summary(model)$coef[x2, "Std. Error"] ^ 2) - (2 * (vcov(model)[x1, x2]))
# variance of x1 + variance of x2 - 2*covariance of x1 and x2
diffse <- sqrt(vardiff)
tdiff <- (diffest) / (diffse)
ptdiff <- 2 * (1 - pt(abs(tdiff), model$df, lower.tail = T)) upr <- # will usually be very close to 1.96 diffest + qt(.975, df = model$df) * diffse
lwr <- diffest + qt(.025, df = model$df) * diffse df <- model$df
return(list(
est = round(diffest, digits = 2),
t = round(tdiff, digits = 2),
p = round(ptdiff, digits = 4),
lwr = round(lwr, digits = 2),
upr = round(upr, digits = 2),
df = df
))
}

### 14.2.4 Application

library("car")

# Multiple hypothesis
mod.davis <- lm(weight ~ repwt, data=Davis)
linearHypothesis(mod.davis, c("(Intercept) = 0", "repwt = 1"),white.adjust = TRUE)
#> Linear hypothesis test
#>
#> Hypothesis:
#> (Intercept) = 0
#> repwt = 1
#>
#> Model 1: restricted model
#> Model 2: weight ~ repwt
#>
#> Note: Coefficient covariance matrix supplied.
#>
#>   Res.Df Df      F  Pr(>F)
#> 1    183
#> 2    181  2 3.3896 0.03588 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Linear Combination
mod.duncan <- lm(prestige ~ income + education, data=Duncan)
linearHypothesis(mod.duncan, "1*income - 1*education = 0")
#> Linear hypothesis test
#>
#> Hypothesis:
#> income - education = 0
#>
#> Model 1: restricted model
#> Model 2: prestige ~ income + education
#>
#>   Res.Df    RSS Df Sum of Sq      F Pr(>F)
#> 1     43 7518.9
#> 2     42 7506.7  1    12.195 0.0682 0.7952

### 14.2.5 Nonlinear

Suppose that we have q nonlinear functions of the parameters
$\mathbf{h}(\theta) = \{ h_1 (\theta), ..., h_q (\theta)\}'$

The,n, the Jacobian matrix ($$\mathbf{H}(\theta)$$), of rank q is

$\mathbf{H}_{q \times p}(\theta) = \left( \begin{array} {ccc} \frac{\partial h_1(\theta)}{\partial \theta_1} & ... & \frac{\partial h_1(\theta)}{\partial \theta_p} \\ . & . & . \\ \frac{\partial h_q(\theta)}{\partial \theta_1} & ... & \frac{\partial h_q(\theta)}{\partial \theta_p} \end{array} \right)$

where the null hypothesis $$H_0: \mathbf{h} (\theta) = 0$$ can be tested against the 2-sided alternative with the Wald statistic

$W = \frac{\mathbf{h(\hat{\theta})'\{H(\hat{\theta})[F(\hat{\theta})'F(\hat{\theta})]^{-1}H(\hat{\theta})'\}^{-1}h(\hat{\theta})}}{s^2q} \sim F_{q,n-p}$