Chapter 2 Quantifying Uncertainty Via the Bootstrap

2.1 Our Goalposts

Recall that the goal of Frequentist inference is to obtain estimators, intervals, and hypothesis tests that have strong properties with respect to the sampling distribution (as opposed to the posterior distribution). Given data D a Frequentist approach might be to construct an interval estimate for a parameter ψ such that Gθ{L(D)ψU(D)}=1α, for a desired confidence level 1α. Such intervals are often of the form ˆψ±zα/2sˆψ, where where ˆψ is a point estimate, sˆψ is an estimate of the standard deviation of ˆψ, and zα/2 corresponds to an appropriate quantile of the standard normal distribution. While rarely possible, we would like coverage to hold exactly and without depending on θ.

2.2 The Bootstrap Principle

It is not always possible, given a sample DG, to determine the sampling distribution of a statistic ˆψ=ˆψ(D). This is because we do not know the distribution G; of course, if we knew G, we would not need to do any inference. The bootstrap gets around this problem by using the data to estimate G from the data to obtain some ˆG. Given ˆG, we can compute the sampling distribution of ˆψ=ˆψ(D) where DˆG.

The Bootstrap Principle Suppose that DG, ψ=ψ(G) is some parameter of the distribution G of interest, and ˆψ(D) is some statistic aimed at estimating ψ. Then we can evaluate the sampling distribution of ˆψ(D) by

  1. estimating G with some ˆG; and

  2. using the sampling distribution of ˆψ(D) as an estimate of the sampling distribution of ˆψ(D).

Implementing the bootstrap principle has two minor complications. First, how do we estimate G? Second, how do we compute the distribution sampling distribution of ˆψ(D)?

How we estimate G typically depends on the structure of the problem. Suppose, for example, that D=(X1,,XN) which are sampled iid from F (so that G=FN). Then a standard choice is to use the empirical distribution function ˆF=FN=N1Ni=1δXi where δx is the point mass at x (so that ˆG=ˆFN); this is referred to as the nonparametric bootstrap because it does not depend on any parametric assumptions about F.

In all but the simplest settings, Monte Carlo is used to approximate the sampling distribution of ˆψ. That is, we sample D1,,DB independently from ˆG and take 1BBb=1δˆψb as our approximation of the sampling distribution of ˆψ, where ˆψb=ˆψ(Db).

Exercise 2.1 Suppose that X1,,XNiidF and let ψ(F) denote the population mean of F, i.e., ψ(F)=EF(Xi)=xF(dx). We consider bootstrapping the sample mean ˉXN=N1Ni=1Xi using the approximation ˆF=FN. That is, we consider the sampling distribution of ˉX=N1Ni=1Xi where X1,,XN are sampled independently from FN.

  1. What is ψ(FN)?

  2. The actual bias of ˉXN is EF{ˉXNψ(F)}=0. What is the bootstrap estimate of the bias EF(ˉXNˉX)?

  3. The variance of ˉXN is σ2F/N where σ2F is VarF(Xi). What is the bootstrap estimate of the variance of ˉX, VarFN(ˉX)?

  4. A parameter ψ is said to be linear if it can be written as ψ(F)=t(x)F(dx) for some choice of t(x). In this case it is natural to estimate ψ using ˉT=N1it(Xi). Write down the bootstrap estimate of the bias and variance of ˉT in this setting.

Given the sampling distribution of ˆψ, we can do things like construct confidence intervals for ψ. For example, it is often the case that ˆψ is asymptotically normal and centered at ψ. We can then use the bootstrap estimate of Var(ˆψ) to make the confidence interval ˆψ±zα/2VarˆG(ˆψ). In this way, the bootstrap gives us a way to estimate Var(ˆψ) more-or-less automatically.

For the next problem, we recall the delta method approach to computing standard errors. Suppose that ˆμ has mean μ and variance τ2 and that we want to approximate the mean and variance of g(ˆμ). The delta method states that, if τ is sufficiently small, then E{g(ˆμ)}g(μ) and Var{g(ˆμ)}g(μ)2τ2. This is based on the somewhat crude approximation g(ˆμ)g(μ)+(ˆμμ)g(μ)+remainder with the remainder being of order O(τ2). The delta method approximation is obtained by ignoring the remainder.

Exercise 2.2 Let X1,,XniidNormal(μ,1) and let ψ=eμ and ˆψ=eˉXn be the MLE of ψ. Create a dataset using μ=5 consisting of n=20 observations.

  1. Use the delta method to get the standard error and 95% confidence interval for ψ.

  2. Use the nonparametric bootstrap to get the standard error and 95% confidence interval for ψ.

  3. The parametric bootstrap makes use of the assumption that F (in this case) is a normal distribution. Specifically, we take ˆF equal to its maximum likelihood estimate, the Normal(ˉXn,1) distribution. Using the parametric bootstrap, compute the standard error and a 95% confidence interval for ψ.

  4. Plot a histogram of the bootstrap replications for the parametric and nonparametric bootstraps, along with the approximation of the sampling distribution of ˆψ obtained from the delta method (i.e., Normal(ˆψ,ˆs2)). Compare these to the true sampling distribution of ˆψ. Which approximation is closest to the true distribution?

  5. Depending on the random data generated for this exercise, you most likely will find that the sampling distribution of ˆψ estimated by both the bootstrap and the delta method are not so good; the biggest problem is that the sampling distribution will be locatin-shifted by ˆψpsi. Repeat part (d), but instead comparing the sampling distribution of ˆψψ to the bootstrap estimates obtained by sampling ˆψˆψ.

The lesson of part (e) is that the bootstrap approximation is likely to be best when we apply it to pivotal quantities. A quantity S(X,ψ) (which is allowed to depend on ψ) is said to be pivotal if it has a distribution which is independent of ψ. For example, in Exercise 2.2 the statistic n(ˉXμ) is a pivotal quantity, and in general Z=n(ˉXμ)s is asymptotically pivotal (where s is the sample standard deviation).

Exercise 2.3 While we saw an improved approximation for ˆψψ, argue that this is nevertheless not a pivotal quantity. Propose a pivotal quantity S(ˆψ,ψ) which is more suitable for bootstrapping.

The intervals computed in the previous exercise rely on asymptotic normality, which we may like to avoid. An alternative approach is to apply the bootstrap to ζ=ˆψψ(F) rather than to ˆψ directly, so that ψ(F)=ˆψζ. If we knew the α/2 and (1α/2) quantiles of ζ (say, ζα/2 and ζ1α/2), then we could form a confidence interval Gθ(ˆψζ1α/2ψˆψζα/2)=1α.

The empirical bootstrap estimates these quantiles from the quantiles of ˆψψ(ˆF), which are computed by simulation. More generally, we could use this approach for any pivotal quantity; for example, since ξ=ˆψ/ψ is pivotal in Exercise 2.2, we could use the interval (ˆψ/ξ1α/2,ˆψ/ξα/2) as our interval.

Exercise 2.4 Use the nonparametric bootstrap to make a 95% confidence interval using the pivotal quantity ξ described above.