15.2 Wald test

W=(ˆθθ0)[cov(ˆθ)]1(ˆθθ0)Wχ2q

where cov(ˆθ) is given by the inverse Fisher Information matrix evaluated at ˆθ and q is the rank of cov(ˆθ), which is the number of non-redundant parameters in θ

Alternatively,

tW=(ˆθθ0)2I(θ0)1χ2(v)

where v is the degree of freedom.

Equivalently,

sW=ˆθθ0I(ˆθ)1Z

How far away in the distribution your sample estimate is from the hypothesized population parameter.

For a null value, what is the probability you would have obtained a realization “more extreme” or “worse” than the estimate you actually obtained?

Significance Level (α) and Confidence Level (1α)

  • The significance level is the benchmark in which the probability is so low that we would have to reject the null
  • The confidence level is the probability that sets the bounds on how far away the realization of the estimator would have to be to reject the null.

Test Statistics

  • Standardized (transform) the estimator and null value to a test statistic that always has the same distribution
  • Test Statistic for the OLS estimator for a single hypothesis

T=n(ˆβjβj0)nSE(^βj)aN(0,1)

Equivalently,

T=(ˆβjβj0)SE(^βj)aN(0,1)

the test statistic is another random variable that is a function of the data and null hypothesis.

  • T denotes the random variable test statistic
  • t denotes the single realization of the test statistic

Evaluating Test Statistic: determine whether or not we reject or fail to reject the null hypothesis at a given significance / confidence level

Three equivalent ways

  1. Critical Value

  2. P-value

  3. Confidence Interval

  4. Critical Value

For a given significance level, will determine the critical value (c)

  • One-sided: H0:βjβj0

P(T<c|H0)=α

Reject the null if t<c

  • One-sided: H0:βjβj0

P(T>c|H0)=α

Reject the null if t>c

  • Two-sided: H0:βjβj0

P(|T|>c|H0)=α

Reject the null if |t|>c

  1. p-value

Calculate the probability that the test statistic was worse than the realization you have

  • One-sided: H0:βjβj0

p-value=P(T<t|H0)

  • One-sided: H0:βjβj0

p-value=P(T>t|H0)

  • Two-sided: H0:βjβj0

p-value=P(|T|<t|H0)

reject the null if p-value <α

  1. Confidence Interval

Using the critical value associated with a null hypothesis and significance level, create an interval

CI(ˆβj)α=[ˆβj(c×SE(ˆβj)),ˆβj+(c×SE(ˆβj))]

If the null set lies outside the interval then we reject the null.

  • We are not testing whether the true population value is close to the estimate, we are testing that given a field true population value of the parameter, how like it is that we observed this estimate.
  • Can be interpreted as we believe with (1α)×100% probability that the confidence interval captures the true parameter value.

With stronger assumption (A1-A6), we could consider Finite Sample Properties

T=ˆβjβj0SE(ˆβj)T(nk)

  • This above distributional derivation is strongly dependent on A4 and A5
  • T has a student t-distribution because the numerator is normal and the denominator is χ2.
  • Critical value and p-values will be calculated from the student t-distribution rather than the standard normal distribution.
  • n, T(nk) is asymptotically standard normal.

Rule of thumb

  • if nk>120: the critical values and p-values from the t-distribution are (almost) the same as the critical values and p-values from the standard normal distribution.

  • if nk<120

    • if (A1-A6) hold then the t-test is an exact finite distribution test
    • if (A1-A3a, A5) hold, because the t-distribution is asymptotically normal, computing the critical values from a t-distribution is still a valid asymptotic test (i.e., not quite the right critical values and p0values, the difference goes away as n)

15.2.1 Multiple Hypothesis

  • test multiple parameters as the same time

    • H0:β1=0 & β2=0
    • H0:β1=1 & β2=0
  • perform a series of simply hypothesis does not answer the question (joint distribution vs. two marginal distributions).

  • The test statistic is based on a restriction written in matrix form.

y=β0+x1β1+x2β2+x3β3+ϵ

Null hypothesis is H0:β1=0 & β2=0 can be rewritten as H0:Rβq=0 where

  • R is a m×k matrix where m is the number of restrictions and k is the number of parameters. q is a k×1 vector
  • R “picks up” the relevant parameters while q is a the null value of the parameter

R=(01000010),q=(00)

Test Statistic for OLS estimator for a multiple hypothesis

F=(Rˆβq)ˆΣ1(Rˆβq)maF(m,nk)

  • ˆΣ1 is the estimator for the asymptotic variance-covariance matrix

    • if A4 holds, both the homoskedastic and heteroskedastic versions produce valid estimator
    • If A4 does not hold, only the heteroskedastic version produces valid estimators.
  • When m=1, there is only a single restriction, then the F-statistic is the t-statistic squared.

  • F distribution is strictly positive, check [F-Distribution] for more details.

15.2.2 Linear Combination

Testing multiple parameters as the same time

H0:β1β2=0H0:β1β2>0H0:β12×β2=0

Each is a single restriction on a function of the parameters.

Null hypothesis:

H0:β1β2=0

can be rewritten as

H0:Rβq=0

where R=(0 1 -1 0 0) and q=0

15.2.3 Estimate Difference in Coefficients

There is no package to estimate for the difference between two coefficients and its CI, but a simple function created by Katherine Zee can be used to calculate this difference. Some modifications might be needed if you don’t use standard lm model in R.

difftest_lm <- function(x1, x2, model) {
    diffest <-
        summary(model)$coef[x1, "Estimate"] - summary(model)$coef[x2, "Estimate"]
    
    vardiff <- (summary(model)$coef[x1, "Std. Error"] ^ 2 +
                    summary(model)$coef[x2, "Std. Error"] ^ 2) - (2 * (vcov(model)[x1, x2]))
    # variance of x1 + variance of x2 - 2*covariance of x1 and x2
    diffse <- sqrt(vardiff)
    tdiff <- (diffest) / (diffse)
    ptdiff <- 2 * (1 - pt(abs(tdiff), model$df, lower.tail = T))
    upr <-
        # will usually be very close to 1.96
        diffest + qt(.975, df = model$df) * diffse 
    lwr <- diffest + qt(.025, df = model$df) * diffse
    df <- model$df
    return(list(
        est = round(diffest, digits = 2),
        t = round(tdiff, digits = 2),
        p = round(ptdiff, digits = 4),
        lwr = round(lwr, digits = 2),
        upr = round(upr, digits = 2),
        df = df
    ))
}

15.2.4 Application

library("car")

# Multiple hypothesis
mod.davis <- lm(weight ~ repwt, data=Davis)
linearHypothesis(mod.davis, c("(Intercept) = 0", "repwt = 1"),white.adjust = TRUE)
#> Linear hypothesis test
#> 
#> Hypothesis:
#> (Intercept) = 0
#> repwt = 1
#> 
#> Model 1: restricted model
#> Model 2: weight ~ repwt
#> 
#> Note: Coefficient covariance matrix supplied.
#> 
#>   Res.Df Df      F  Pr(>F)  
#> 1    183                    
#> 2    181  2 3.3896 0.03588 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Linear Combination
mod.duncan <- lm(prestige ~ income + education, data=Duncan)
linearHypothesis(mod.duncan, "1*income - 1*education = 0")
#> Linear hypothesis test
#> 
#> Hypothesis:
#> income - education = 0
#> 
#> Model 1: restricted model
#> Model 2: prestige ~ income + education
#> 
#>   Res.Df    RSS Df Sum of Sq      F Pr(>F)
#> 1     43 7518.9                           
#> 2     42 7506.7  1    12.195 0.0682 0.7952

15.2.5 Nonlinear

Suppose that we have q nonlinear functions of the parameters
h(θ)={h1(θ),...,hq(θ)}

The,n, the Jacobian matrix (H(θ)), of rank q is

Hq×p(θ)=(h1(θ)θ1...h1(θ)θp...hq(θ)θ1...hq(θ)θp)

where the null hypothesis H0:h(θ)=0 can be tested against the 2-sided alternative with the Wald statistic

W=h(ˆθ){H(ˆθ)[F(ˆθ)F(ˆθ)]1H(ˆθ)}1h(ˆθ)s2qFq,np