14.2 Wald test

\[ \begin{aligned} W &= (\hat{\theta}-\theta_0)'[cov(\hat{\theta})]^{-1}(\hat{\theta}-\theta_0) \\ W &\sim \chi_q^2 \end{aligned} \]

where \(cov(\hat{\theta})\) is given by the inverse Fisher Information matrix evaluated at \(\hat{\theta}\) and q is the rank of \(cov(\hat{\theta})\), which is the number of non-redundant parameters in \(\theta\)

Alternatively,

\[ t_W=\frac{(\hat{\theta}-\theta_0)^2}{I(\theta_0)^{-1}} \sim \chi^2_{(v)} \]

where v is the degree of freedom.

Equivalently,

\[ s_W= \frac{\hat{\theta}-\theta_0}{\sqrt{I(\hat{\theta})^{-1}}} \sim Z \]

How far away in the distribution your sample estimate is from the hypothesized population parameter.

For a null value, what is the probability you would have obtained a realization “more extreme” or “worse” than the estimate you actually obtained?

Significance Level (\(\alpha\)) and Confidence Level (\(1-\alpha\))

  • The significance level is the benchmark in which the probability is so low that we would have to reject the null
  • The confidence level is the probability that sets the bounds on how far away the realization of the estimator would have to be to reject the null.

Test Statistics

  • Standardized (transform) the estimator and null value to a test statistic that always has the same distribution
  • Test Statistic for the OLS estimator for a single hypothesis

\[ T = \frac{\sqrt{n}(\hat{\beta}_j-\beta_{j0})}{\sqrt{n}SE(\hat{\beta_j})} \sim^a N(0,1) \]

Equivalently,

\[ T = \frac{(\hat{\beta}_j-\beta_{j0})}{SE(\hat{\beta_j})} \sim^a N(0,1) \]

the test statistic is another random variable that is a function of the data and null hypothesis.

  • T denotes the random variable test statistic
  • t denotes the single realization of the test statistic

Evaluating Test Statistic: determine whether or not we reject or fail to reject the null hypothesis at a given significance / confidence level

Three equivalent ways

  1. Critical Value

  2. P-value

  3. Confidence Interval

  4. Critical Value

For a given significance level, will determine the critical value \((c)\)

  • One-sided: \(H_0: \beta_j \ge \beta_{j0}\)

\[ P(T<c|H_0)=\alpha \]

Reject the null if \(t<c\)

  • One-sided: \(H_0: \beta_j \le \beta_{j0}\)

\[ P(T>c|H_0)=\alpha \]

Reject the null if \(t>c\)

  • Two-sided: \(H_0: \beta_j \neq \beta_{j0}\)

\[ P(|T|>c|H_0)=\alpha \]

Reject the null if \(|t|>c\)

  1. p-value

Calculate the probability that the test statistic was worse than the realization you have

  • One-sided: \(H_0: \beta_j \ge \beta_{j0}\)

\[ \text{p-value} = P(T<t|H_0) \]

  • One-sided: \(H_0: \beta_j \le \beta_{j0}\)

\[ \text{p-value} = P(T>t|H_0) \]

  • Two-sided: \(H_0: \beta_j \neq \beta_{j0}\)

\[ \text{p-value} = P(|T|<t|H_0) \]

reject the null if p-value \(< \alpha\)

  1. Confidence Interval

Using the critical value associated with a null hypothesis and significance level, create an interval

\[ CI(\hat{\beta}_j)_{\alpha} = [\hat{\beta}_j-(c \times SE(\hat{\beta}_j)),\hat{\beta}_j+(c \times SE(\hat{\beta}_j))] \]

If the null set lies outside the interval then we reject the null.

  • We are not testing whether the true population value is close to the estimate, we are testing that given a field true population value of the parameter, how like it is that we observed this estimate.
  • Can be interpreted as we believe with \((1-\alpha)\times 100 \%\) probability that the confidence interval captures the true parameter value.

With stronger assumption (A1-A6), we could consider Finite Sample Properties

\[ T = \frac{\hat{\beta}_j-\beta_{j0}}{SE(\hat{\beta}_j)} \sim T(n-k) \]

  • This above distributional derivation is strongly dependent on A4 and A5
  • T has a student t-distribution because the numerator is normal and the denominator is \(\chi^2\).
  • Critical value and p-values will be calculated from the student t-distribution rather than the standard normal distribution.
  • \(n \to \infty\), \(T(n-k)\) is asymptotically standard normal.

Rule of thumb

  • if \(n-k>120\): the critical values and p-values from the t-distribution are (almost) the same as the critical values and p-values from the standard normal distribution.

  • if \(n-k<120\)

    • if (A1-A6) hold then the t-test is an exact finite distribution test
    • if (A1-A3a, A5) hold, because the t-distribution is asymptotically normal, computing the critical values from a t-distribution is still a valid asymptotic test (i.e., not quite the right critical values and p0values, the difference goes away as \(n \to \infty\))

14.2.1 Multiple Hypothesis

  • test multiple parameters as the same time

    • \(H_0: \beta_1 = 0\ \& \ \beta_2 = 0\)
    • \(H_0: \beta_1 = 1\ \& \ \beta_2 = 0\)
  • perform a series of simply hypothesis does not answer the question (joint distribution vs. two marginal distributions).

  • The test statistic is based on a restriction written in matrix form.

\[ y=\beta_0+x_1\beta_1 + x_2\beta_2 + x_3\beta_3 + \epsilon \]

Null hypothesis is \(H_0: \beta_1 = 0\) & \(\beta_2=0\) can be rewritten as \(H_0: \mathbf{R}\beta -\mathbf{q}=0\) where

  • \(\mathbf{R}\) is a \(m \times k\) matrix where m is the number of restrictions and \(k\) is the number of parameters. \(\mathbf{q}\) is a \(k \times 1\) vector
  • \(\mathbf{R}\) “picks up” the relevant parameters while \(\mathbf{q}\) is a the null value of the parameter

\[ \mathbf{R}= \left( \begin{array}{cccc} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{array} \right), \mathbf{q} = \left( \begin{array}{c} 0 \\ 0 \\ \end{array} \right) \]

Test Statistic for OLS estimator for a multiple hypothesis

\[ F = \frac{(\mathbf{R\hat{\beta}-q})\hat{\Sigma}^{-1}(\mathbf{R\hat{\beta}-q})}{m} \sim^a F(m,n-k) \]

  • \(\hat{\Sigma}^{-1}\) is the estimator for the asymptotic variance-covariance matrix

    • if A4 holds, both the homoskedastic and heteroskedastic versions produce valid estimator
    • If A4 does not hold, only the heteroskedastic version produces valid estimators.
  • When \(m = 1\), there is only a single restriction, then the \(F\)-statistic is the \(t\)-statistic squared.

  • \(F\) distribution is strictly positive, check F-Distribution for more details.

14.2.2 Linear Combination

Testing multiple parameters as the same time

\[ \begin{aligned} H_0&: \beta_1 -\beta_2 = 0 \\ H_0&: \beta_1 - \beta_2 > 0 \\ H_0&: \beta_1 - 2\times\beta_2 =0 \end{aligned} \]

Each is a single restriction on a function of the parameters.

Null hypothesis:

\[ H_0: \beta_1 -\beta_2 = 0 \]

can be rewritten as

\[ H_0: \mathbf{R}\beta -\mathbf{q}=0 \]

where \(\mathbf{R}\)=(0 1 -1 0 0) and \(\mathbf{q}=0\)

14.2.3 Estimate Difference in Coefficients

There is no package to estimate for the difference between two coefficients and its CI, but a simple function created by Katherine Zee can be used to calculate this difference. Some modifications might be needed if you don’t use standard lm model in R.

difftest_lm <- function(x1, x2, model) {
    diffest <-
        summary(model)$coef[x1, "Estimate"] - summary(model)$coef[x2, "Estimate"]
    
    vardiff <- (summary(model)$coef[x1, "Std. Error"] ^ 2 +
                    summary(model)$coef[x2, "Std. Error"] ^ 2) - (2 * (vcov(model)[x1, x2]))
    # variance of x1 + variance of x2 - 2*covariance of x1 and x2
    diffse <- sqrt(vardiff)
    tdiff <- (diffest) / (diffse)
    ptdiff <- 2 * (1 - pt(abs(tdiff), model$df, lower.tail = T))
    upr <-
        # will usually be very close to 1.96
        diffest + qt(.975, df = model$df) * diffse 
    lwr <- diffest + qt(.025, df = model$df) * diffse
    df <- model$df
    return(list(
        est = round(diffest, digits = 2),
        t = round(tdiff, digits = 2),
        p = round(ptdiff, digits = 4),
        lwr = round(lwr, digits = 2),
        upr = round(upr, digits = 2),
        df = df
    ))
}

14.2.4 Application

library("car")

# Multiple hypothesis
mod.davis <- lm(weight ~ repwt, data=Davis)
linearHypothesis(mod.davis, c("(Intercept) = 0", "repwt = 1"),white.adjust = TRUE)
#> Linear hypothesis test
#> 
#> Hypothesis:
#> (Intercept) = 0
#> repwt = 1
#> 
#> Model 1: restricted model
#> Model 2: weight ~ repwt
#> 
#> Note: Coefficient covariance matrix supplied.
#> 
#>   Res.Df Df      F  Pr(>F)  
#> 1    183                    
#> 2    181  2 3.3896 0.03588 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Linear Combination
mod.duncan <- lm(prestige ~ income + education, data=Duncan)
linearHypothesis(mod.duncan, "1*income - 1*education = 0")
#> Linear hypothesis test
#> 
#> Hypothesis:
#> income - education = 0
#> 
#> Model 1: restricted model
#> Model 2: prestige ~ income + education
#> 
#>   Res.Df    RSS Df Sum of Sq      F Pr(>F)
#> 1     43 7518.9                           
#> 2     42 7506.7  1    12.195 0.0682 0.7952

14.2.5 Nonlinear

Suppose that we have q nonlinear functions of the parameters
\[ \mathbf{h}(\theta) = \{ h_1 (\theta), ..., h_q (\theta)\}' \]

The,n, the Jacobian matrix (\(\mathbf{H}(\theta)\)), of rank q is

\[ \mathbf{H}_{q \times p}(\theta) = \left( \begin{array} {ccc} \frac{\partial h_1(\theta)}{\partial \theta_1} & ... & \frac{\partial h_1(\theta)}{\partial \theta_p} \\ . & . & . \\ \frac{\partial h_q(\theta)}{\partial \theta_1} & ... & \frac{\partial h_q(\theta)}{\partial \theta_p} \end{array} \right) \]

where the null hypothesis \(H_0: \mathbf{h} (\theta) = 0\) can be tested against the 2-sided alternative with the Wald statistic

\[ W = \frac{\mathbf{h(\hat{\theta})'\{H(\hat{\theta})[F(\hat{\theta})'F(\hat{\theta})]^{-1}H(\hat{\theta})'\}^{-1}h(\hat{\theta})}}{s^2q} \sim F_{q,n-p} \]