15.2 Wald test
W=(ˆθ−θ0)′[cov(ˆθ)]−1(ˆθ−θ0)W∼χ2q
where cov(ˆθ) is given by the inverse Fisher Information matrix evaluated at ˆθ and q is the rank of cov(ˆθ), which is the number of non-redundant parameters in θ
Alternatively,
tW=(ˆθ−θ0)2I(θ0)−1∼χ2(v)
where v is the degree of freedom.
Equivalently,
sW=ˆθ−θ0√I(ˆθ)−1∼Z
How far away in the distribution your sample estimate is from the hypothesized population parameter.
For a null value, what is the probability you would have obtained a realization “more extreme” or “worse” than the estimate you actually obtained?
Significance Level (α) and Confidence Level (1−α)
- The significance level is the benchmark in which the probability is so low that we would have to reject the null
- The confidence level is the probability that sets the bounds on how far away the realization of the estimator would have to be to reject the null.
Test Statistics
- Standardized (transform) the estimator and null value to a test statistic that always has the same distribution
- Test Statistic for the OLS estimator for a single hypothesis
T=√n(ˆβj−βj0)√nSE(^βj)∼aN(0,1)
Equivalently,
T=(ˆβj−βj0)SE(^βj)∼aN(0,1)
the test statistic is another random variable that is a function of the data and null hypothesis.
- T denotes the random variable test statistic
- t denotes the single realization of the test statistic
Evaluating Test Statistic: determine whether or not we reject or fail to reject the null hypothesis at a given significance / confidence level
Three equivalent ways
Critical Value
P-value
Confidence Interval
Critical Value
For a given significance level, will determine the critical value (c)
- One-sided: H0:βj≥βj0
P(T<c|H0)=α
Reject the null if t<c
- One-sided: H0:βj≤βj0
P(T>c|H0)=α
Reject the null if t>c
- Two-sided: H0:βj≠βj0
P(|T|>c|H0)=α
Reject the null if |t|>c
- p-value
Calculate the probability that the test statistic was worse than the realization you have
- One-sided: H0:βj≥βj0
p-value=P(T<t|H0)
- One-sided: H0:βj≤βj0
p-value=P(T>t|H0)
- Two-sided: H0:βj≠βj0
p-value=P(|T|<t|H0)
reject the null if p-value <α
- Confidence Interval
Using the critical value associated with a null hypothesis and significance level, create an interval
CI(ˆβj)α=[ˆβj−(c×SE(ˆβj)),ˆβj+(c×SE(ˆβj))]
If the null set lies outside the interval then we reject the null.
- We are not testing whether the true population value is close to the estimate, we are testing that given a field true population value of the parameter, how like it is that we observed this estimate.
- Can be interpreted as we believe with (1−α)×100% probability that the confidence interval captures the true parameter value.
With stronger assumption (A1-A6), we could consider Finite Sample Properties
T=ˆβj−βj0SE(ˆβj)∼T(n−k)
- This above distributional derivation is strongly dependent on A4 and A5
- T has a student t-distribution because the numerator is normal and the denominator is χ2.
- Critical value and p-values will be calculated from the student t-distribution rather than the standard normal distribution.
- n→∞, T(n−k) is asymptotically standard normal.
Rule of thumb
if n−k>120: the critical values and p-values from the t-distribution are (almost) the same as the critical values and p-values from the standard normal distribution.
if n−k<120
- if (A1-A6) hold then the t-test is an exact finite distribution test
- if (A1-A3a, A5) hold, because the t-distribution is asymptotically normal, computing the critical values from a t-distribution is still a valid asymptotic test (i.e., not quite the right critical values and p0values, the difference goes away as n→∞)
15.2.1 Multiple Hypothesis
test multiple parameters as the same time
- H0:β1=0 & β2=0
- H0:β1=1 & β2=0
perform a series of simply hypothesis does not answer the question (joint distribution vs. two marginal distributions).
The test statistic is based on a restriction written in matrix form.
y=β0+x1β1+x2β2+x3β3+ϵ
Null hypothesis is H0:β1=0 & β2=0 can be rewritten as H0:Rβ−q=0 where
- R is a m×k matrix where m is the number of restrictions and k is the number of parameters. q is a k×1 vector
- R “picks up” the relevant parameters while q is a the null value of the parameter
R=(01000010),q=(00)
Test Statistic for OLS estimator for a multiple hypothesis
F=(Rˆβ−q)ˆΣ−1(Rˆβ−q)m∼aF(m,n−k)
ˆΣ−1 is the estimator for the asymptotic variance-covariance matrix
When m=1, there is only a single restriction, then the F-statistic is the t-statistic squared.
F distribution is strictly positive, check [F-Distribution] for more details.
15.2.2 Linear Combination
Testing multiple parameters as the same time
H0:β1−β2=0H0:β1−β2>0H0:β1−2×β2=0
Each is a single restriction on a function of the parameters.
Null hypothesis:
H0:β1−β2=0
can be rewritten as
H0:Rβ−q=0
where R=(0 1 -1 0 0) and q=0
15.2.3 Estimate Difference in Coefficients
There is no package to estimate for the difference between two coefficients and its CI, but a simple function created by Katherine Zee can be used to calculate this difference. Some modifications might be needed if you don’t use standard lm
model in R.
difftest_lm <- function(x1, x2, model) {
diffest <-
summary(model)$coef[x1, "Estimate"] - summary(model)$coef[x2, "Estimate"]
vardiff <- (summary(model)$coef[x1, "Std. Error"] ^ 2 +
summary(model)$coef[x2, "Std. Error"] ^ 2) - (2 * (vcov(model)[x1, x2]))
# variance of x1 + variance of x2 - 2*covariance of x1 and x2
diffse <- sqrt(vardiff)
tdiff <- (diffest) / (diffse)
ptdiff <- 2 * (1 - pt(abs(tdiff), model$df, lower.tail = T))
upr <-
# will usually be very close to 1.96
diffest + qt(.975, df = model$df) * diffse
lwr <- diffest + qt(.025, df = model$df) * diffse
df <- model$df
return(list(
est = round(diffest, digits = 2),
t = round(tdiff, digits = 2),
p = round(ptdiff, digits = 4),
lwr = round(lwr, digits = 2),
upr = round(upr, digits = 2),
df = df
))
}
15.2.4 Application
library("car")
# Multiple hypothesis
mod.davis <- lm(weight ~ repwt, data=Davis)
linearHypothesis(mod.davis, c("(Intercept) = 0", "repwt = 1"),white.adjust = TRUE)
#> Linear hypothesis test
#>
#> Hypothesis:
#> (Intercept) = 0
#> repwt = 1
#>
#> Model 1: restricted model
#> Model 2: weight ~ repwt
#>
#> Note: Coefficient covariance matrix supplied.
#>
#> Res.Df Df F Pr(>F)
#> 1 183
#> 2 181 2 3.3896 0.03588 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Linear Combination
mod.duncan <- lm(prestige ~ income + education, data=Duncan)
linearHypothesis(mod.duncan, "1*income - 1*education = 0")
#> Linear hypothesis test
#>
#> Hypothesis:
#> income - education = 0
#>
#> Model 1: restricted model
#> Model 2: prestige ~ income + education
#>
#> Res.Df RSS Df Sum of Sq F Pr(>F)
#> 1 43 7518.9
#> 2 42 7506.7 1 12.195 0.0682 0.7952
15.2.5 Nonlinear
Suppose that we have q nonlinear functions of the parameters
h(θ)={h1(θ),...,hq(θ)}′
The,n, the Jacobian matrix (H(θ)), of rank q is
Hq×p(θ)=(∂h1(θ)∂θ1...∂h1(θ)∂θp...∂hq(θ)∂θ1...∂hq(θ)∂θp)
where the null hypothesis H0:h(θ)=0 can be tested against the 2-sided alternative with the Wald statistic
W=h(ˆθ)′{H(ˆθ)[F(ˆθ)′F(ˆθ)]−1H(ˆθ)′}−1h(ˆθ)s2q∼Fq,n−p