15.2 Wald test

$\begin{aligned} W &= (\hat{\theta}-\theta_0)'[cov(\hat{\theta})]^{-1}(\hat{\theta}-\theta_0) \\ W &\sim \chi_q^2 \end{aligned}$

where $cov(\hat{\theta})$ is given by the inverse Fisher Information matrix evaluated at $\hat{\theta}$ and q is the rank of $cov(\hat{\theta})$ , which is the number of non-redundant parameters in $\theta$

Alternatively,

$t_W=\frac{(\hat{\theta}-\theta_0)^2}{I(\theta_0)^{-1}} \sim \chi^2_{(v)}$

where v is the degree of freedom.

Equivalently,

$s_W= \frac{\hat{\theta}-\theta_0}{\sqrt{I(\hat{\theta})^{-1}}} \sim Z$

How far away in the distribution your sample estimate is from the hypothesized population parameter.

For a null value, what is the probability you would have obtained a realization “more extreme” or “worse” than the estimate you actually obtained?

Significance Level ( $\alpha$ ) and Confidence Level ( $1-\alpha$ )

The significance level is the benchmark in which the probability is so low that we would have to reject the null
The confidence level is the probability that sets the bounds on how far away the realization of the estimator would have to be to reject the null.

Test Statistics

Standardized (transform) the estimator and null value to a test statistic that always has the same distribution
Test Statistic for the OLS estimator for a single hypothesis

$T = \frac{\sqrt{n}(\hat{\beta}_j-\beta_{j0})}{\sqrt{n}SE(\hat{\beta_j})} \sim^a N(0,1)$

Equivalently,

$T = \frac{(\hat{\beta}_j-\beta_{j0})}{SE(\hat{\beta_j})} \sim^a N(0,1)$

the test statistic is another random variable that is a function of the data and null hypothesis.

T denotes the random variable test statistic
t denotes the single realization of the test statistic

Evaluating Test Statistic: determine whether or not we reject or fail to reject the null hypothesis at a given significance / confidence level

Three equivalent ways

Critical Value
P-value
Confidence Interval
Critical Value

For a given significance level, will determine the critical value $(c)$

One-sided: $H_0: \beta_j \ge \beta_{j0}$

$P(T<c|H_0)=\alpha$

Reject the null if $t<c$

One-sided: $H_0: \beta_j \le \beta_{j0}$

$P(T>c|H_0)=\alpha$

Reject the null if $t>c$

Two-sided: $H_0: \beta_j \neq \beta_{j0}$

$P(|T|>c|H_0)=\alpha$

Reject the null if $|t|>c$

p-value

Calculate the probability that the test statistic was worse than the realization you have

One-sided: $H_0: \beta_j \ge \beta_{j0}$

$\text{p-value} = P(T<t|H_0)$

One-sided: $H_0: \beta_j \le \beta_{j0}$

$\text{p-value} = P(T>t|H_0)$

Two-sided: $H_0: \beta_j \neq \beta_{j0}$

$\text{p-value} = P(|T|<t|H_0)$

reject the null if p-value $< \alpha$

Confidence Interval

Using the critical value associated with a null hypothesis and significance level, create an interval

$CI(\hat{\beta}_j)_{\alpha} = [\hat{\beta}_j-(c \times SE(\hat{\beta}_j)),\hat{\beta}_j+(c \times SE(\hat{\beta}_j))]$

If the null set lies outside the interval then we reject the null.

We are not testing whether the true population value is close to the estimate, we are testing that given a field true population value of the parameter, how like it is that we observed this estimate.
Can be interpreted as we believe with $(1-\alpha)\times 100 \%$ probability that the confidence interval captures the true parameter value.

With stronger assumption (A1-A6), we could consider Finite Sample Properties

$T = \frac{\hat{\beta}_j-\beta_{j0}}{SE(\hat{\beta}_j)} \sim T(n-k)$

This above distributional derivation is strongly dependent on A4 and A5
T has a student t-distribution because the numerator is normal and the denominator is $\chi^2$ .
Critical value and p-values will be calculated from the student t-distribution rather than the standard normal distribution.
$n \to \infty$ , $T(n-k)$ is asymptotically standard normal.

Rule of thumb

if $n-k>120$ : the critical values and p-values from the t-distribution are (almost) the same as the critical values and p-values from the standard normal distribution.
if $n-k<120$
- if (A1-A6) hold then the t-test is an exact finite distribution test
- if (A1-A3a, A5) hold, because the t-distribution is asymptotically normal, computing the critical values from a t-distribution is still a valid asymptotic test (i.e., not quite the right critical values and p0values, the difference goes away as $n \to \infty$ )

15.2.1 Multiple Hypothesis

test multiple parameters as the same time
- $H_0: \beta_1 = 0\ \& \ \beta_2 = 0$
- $H_0: \beta_1 = 1\ \& \ \beta_2 = 0$
perform a series of simply hypothesis does not answer the question (joint distribution vs. two marginal distributions).
The test statistic is based on a restriction written in matrix form.

$y=\beta_0+x_1\beta_1 + x_2\beta_2 + x_3\beta_3 + \epsilon$

Null hypothesis is $H_0: \beta_1 = 0$ & $\beta_2=0$ can be rewritten as $H_0: \mathbf{R}\beta -\mathbf{q}=0$ where

$\mathbf{R}$ is a $m \times k$ matrix where m is the number of restrictions and $k$ is the number of parameters. $\mathbf{q}$ is a $k \times 1$ vector
$\mathbf{R}$ “picks up” the relevant parameters while $\mathbf{q}$ is a the null value of the parameter

$\mathbf{R}= \left( \begin{array}{cccc} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{array} \right), \mathbf{q} = \left( \begin{array}{c} 0 \\ 0 \\ \end{array} \right)$

Test Statistic for OLS estimator for a multiple hypothesis

$F = \frac{(\mathbf{R\hat{\beta}-q})\hat{\Sigma}^{-1}(\mathbf{R\hat{\beta}-q})}{m} \sim^a F(m,n-k)$

$\hat{\Sigma}^{-1}$ is the estimator for the asymptotic variance-covariance matrix
- if A4 holds, both the homoskedastic and heteroskedastic versions produce valid estimator
- If A4 does not hold, only the heteroskedastic version produces valid estimators.
When $m = 1$ , there is only a single restriction, then the $F$ -statistic is the $t$ -statistic squared.
$F$ distribution is strictly positive, check [F-Distribution] for more details.

15.2.2 Linear Combination

Testing multiple parameters as the same time

$\begin{aligned} H_0&: \beta_1 -\beta_2 = 0 \\ H_0&: \beta_1 - \beta_2 > 0 \\ H_0&: \beta_1 - 2\times\beta_2 =0 \end{aligned}$

Each is a single restriction on a function of the parameters.

Null hypothesis:

$H_0: \beta_1 -\beta_2 = 0$

can be rewritten as

$H_0: \mathbf{R}\beta -\mathbf{q}=0$

where $\mathbf{R}$ =(0 1 -1 0 0) and $\mathbf{q}=0$

15.2.3 Estimate Difference in Coefficients

There is no package to estimate for the difference between two coefficients and its CI, but a simple function created by Katherine Zee can be used to calculate this difference. Some modifications might be needed if you don’t use standard lm model in R.

difftest_lm <- function(x1, x2, model) {
    diffest <-
        summary(model)$coef[x1, "Estimate"] - summary(model)$coef[x2, "Estimate"]
    
    vardiff <- (summary(model)$coef[x1, "Std. Error"] ^ 2 +
                    summary(model)$coef[x2, "Std. Error"] ^ 2) - (2 * (vcov(model)[x1, x2]))
    # variance of x1 + variance of x2 - 2*covariance of x1 and x2
    diffse <- sqrt(vardiff)
    tdiff <- (diffest) / (diffse)
    ptdiff <- 2 * (1 - pt(abs(tdiff), model$df, lower.tail = T))
    upr <-
        # will usually be very close to 1.96
        diffest + qt(.975, df = model$df) * diffse 
    lwr <- diffest + qt(.025, df = model$df) * diffse
    df <- model$df
    return(list(
        est = round(diffest, digits = 2),
        t = round(tdiff, digits = 2),
        p = round(ptdiff, digits = 4),
        lwr = round(lwr, digits = 2),
        upr = round(upr, digits = 2),
        df = df
    ))
}

15.2.4 Application

library("car")

# Multiple hypothesis
mod.davis <- lm(weight ~ repwt, data=Davis)
linearHypothesis(mod.davis, c("(Intercept) = 0", "repwt = 1"),white.adjust = TRUE)
#> Linear hypothesis test
#> 
#> Hypothesis:
#> (Intercept) = 0
#> repwt = 1
#> 
#> Model 1: restricted model
#> Model 2: weight ~ repwt
#> 
#> Note: Coefficient covariance matrix supplied.
#> 
#>   Res.Df Df      F  Pr(>F)  
#> 1    183                    
#> 2    181  2 3.3896 0.03588 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Linear Combination
mod.duncan <- lm(prestige ~ income + education, data=Duncan)
linearHypothesis(mod.duncan, "1*income - 1*education = 0")
#> Linear hypothesis test
#> 
#> Hypothesis:
#> income - education = 0
#> 
#> Model 1: restricted model
#> Model 2: prestige ~ income + education
#> 
#>   Res.Df    RSS Df Sum of Sq      F Pr(>F)
#> 1     43 7518.9                           
#> 2     42 7506.7  1    12.195 0.0682 0.7952

15.2.5 Nonlinear

Suppose that we have q nonlinear functions of the parameters
$\mathbf{h}(\theta) = \{ h_1 (\theta), ..., h_q (\theta)\}'$

The,n, the Jacobian matrix ( $\mathbf{H}(\theta)$ ), of rank q is

$\mathbf{H}_{q \times p}(\theta) = \left( \begin{array} {ccc} \frac{\partial h_1(\theta)}{\partial \theta_1} & ... & \frac{\partial h_1(\theta)}{\partial \theta_p} \\ . & . & . \\ \frac{\partial h_q(\theta)}{\partial \theta_1} & ... & \frac{\partial h_q(\theta)}{\partial \theta_p} \end{array} \right)$

where the null hypothesis $H_0: \mathbf{h} (\theta) = 0$ can be tested against the 2-sided alternative with the Wald statistic

$W = \frac{\mathbf{h(\hat{\theta})'\{H(\hat{\theta})[F(\hat{\theta})'F(\hat{\theta})]^{-1}H(\hat{\theta})'\}^{-1}h(\hat{\theta})}}{s^2q} \sim F_{q,n-p}$