8 Estimating Functions of GWN Model Parameters
Updated: May 9, 2021
Copyright © Eric Zivot 2015, 2016, 2020, 2021
In the previous chapter, we considered estimation of the parameters of the GWN model for asset returns and the construction of estimated standard errors and confidence intervals for the estimated parameters. In this chapter, we consider estimation of functions of the parameters of the GWN model, and the construction of estimated standard errors and confidence intervals for these estimates. This is an important topic as we are often more interested in a function of the GWN model parameters (e.g. VaRα) than the GWN model parameters themselves. In order to do this, we introduce some new statistical tools including the analytic delta method and two computer intensive resampling methods called the jackknife and the bootstrap.
The R packages used in this chapter are boot, bootstrap, car, IntroCompFinR, and PerformanceAnalytics. Have these packages installed and loaded in R before replicating the chapter examples.
suppressPackageStartupMessages(library(boot))
suppressPackageStartupMessages(library(bootstrap))
suppressPackageStartupMessages(library(car))
suppressPackageStartupMessages(library(IntroCompFinR))
suppressPackageStartupMessages(library(PerformanceAnalytics))
options(digits=3)
8.1 Functions of GWN Model Parameters
Consider the GWN model for returns (simple or continuously compounded):
Rit=μi+εit,{εit}Tt=1∼GWN(0,σ2),cov(εit,εjs)={σij0t=st≠s.
Let θ denote a k×1 vector of GWN model parameters from which we want to compute a function of interest (we drop the asset subscript i to simplify notation). Let f:Rk→R be a continuous and differentiable function of θ, and define η=f(θ) as the parameter of interest. To make things concrete, let θ=(μ,σ)′. Consider the following functions of θ:
f1(θ)=μ+σ×qZα=qRαf2(θ)=−W0×(μ+σ×qZα)=W0×f1(θ)=VaRNαf3(θ)=−W0×(exp(μ+σ×qZα)−1)=W0×(exp(f1(θ))−1)=VaRLNαf4(θ)=μ−rfσ=SR
The first function (8.1) is the α-quantile, qRα, of the N(μ,σ2) distribution for simple returns. The second function (8.2) is VaRα of an initial investment of W0 based on the normal distribution for simple returns. The third function (8.3) is VaRα based on the log-normal distribution for simple returns (normal distribution for continuously compounded returns). The fourth function (8.4) is the Sharpe ratio, SR, of an asset, where rf denotes the constant risk-free rate of interest. The first two functions are linear functions of the elements of θ, and second two functions are nonlinear linear functions of the elements of θ.
Given the plug-in estimate of θ, ˆθ=(ˆμ,ˆσ)′, we want to estimate each of the above functions of θ, and construct estimated standard errors and confidence intervals.
For the examples in this chapter, we use data on monthly continuously compounded and simple returns for Microsoft over the period January 1998, through May 2012, from the IntroCompFinR package. The data is the same as that used in Chapter 5 and is constructed as follows:
data(msftDailyPrices)
to.monthly(msftDailyPrices, OHLC=FALSE)
msftPrices = "1998-01::2012-05"
smpl = msftPrices[smpl]
msftPrices = na.omit(Return.calculate(msftPrices, method="discrete"))
msftRetS = log(1 + msftRetS) msftRetC =
The GWN model estimates for μmsft and σmsft, with estimated standard errors, are:
nrow(msftRetC)
n.obs = mean(msftRetC)
muhatC = sd(msftRetC)
sigmahatC = mean(msftRetS)
muhatS = sd(msftRetS)
sigmahatS = c(muhatC, sigmahatC, muhatS, sigmahatS)
estimates = sigmahatC/sqrt(n.obs)
se.muhatC = sigmahatS/sqrt(n.obs)
se.muhatS = sigmahatC/sqrt(2*n.obs)
se.sigmahatC = sigmahatS/sqrt(2*n.obs)
se.sigmahatS = c(se.muhatC, se.sigmahatC, se.muhatS, se.sigmahatS)
stdErrors = rbind(estimates, stdErrors)
ans =colnames(ans) = c("mu.cc", "sigma.cc", "mu.simple", "sigma.simple")
ans
## mu.cc sigma.cc mu.simple sigma.simple
## estimates 0.00413 0.1002 0.00915 0.10150
## stdErrors 0.00764 0.0054 0.00774 0.00547
◼
8.2 Estimation of Functions of GWN Model Parameters
In Chapter 7, we used the plug-in principle to motivate estimators of the GWN model parameters μi, σ2i, σi,j, σij, and ρij. We can also use the plug-in principle to estimate functions of the GWN model parameters.
Definition 8.1 (Plug-in principle for functions of GWN model parameters) Let θ denote a k×1 vector of GWN model parameters, and let η=f(θ) where f:Rk→R is a continuous and differentiable function of θ. Let ˆθ denote the plug-in estimator of θ. Then the plug-in estimator of η is ˆη=f(ˆθ).
Let W0=$100,000, and rf=0.03/12=0.0025. Using the plug-in estimates of the GWN model parameters, the plug-in estimates of the functions (8.1) - (8.4) are:
100000
W0 = 0.03/12
r.f = muhatS + sigmahatS*qnorm(0.05)
f1.hat = -W0*f1.hat
f2.hat = -W0*(exp(muhatC + sigmahatC*qnorm(0.05)) - 1)
f3.hat = (muhatS-r.f)/sigmahatS
f4.hat = cbind(f1.hat, f2.hat, f3.hat, f4.hat)
fhat.vals =colnames(fhat.vals) = c("f1", "f2", "f3", "f4")
rownames(fhat.vals) = "Estimate"
fhat.vals
## f1 f2 f3 f4
## Estimate -0.158 15780 14846 0.0655
◼
Plug-in estimators of functions are random variables and are subject to estimation error. Just as we studied the statistical properties of the plug-in estimators of the GWN model parameters we can study the statistical properties of the plug-in estimators of functions of GWN model parameters.
8.2.1 Bias
As discussed in Chapter 7, a desirable finite sample property of an estimator ˆθ of θ is unbiasedness: on average over many hypothetical samples ˆθ is equal to θ. When estimating f(θ), unbiasedness of ˆθ may or may not carry over to f(ˆθ).
Suppose f(θ) is a linear function of the elements of θ so that
f(θ)=a+b1θ1+b2θ2+⋯+bkθk.
where a,b1,b2,…bk are constants. For example, the functions (8.1) and (8.2) are linear functions of the elements of θ=(μ,σ)′. Let ˆθ denote an estimator of θ and let f(ˆθ) denote the plug-in estimator of f(θ). If E[ˆθ]=θ for all elements of θ (^θi is unbiased for θi, i=1,…,k), then
E[f(^θ)]=a+b1E[^θ1]+b2E[^θ2]+⋯+bkE[^θk]=a+b1θ1+b2θ2+⋯+bkθk=f(θ)
and so f(ˆθ) is unbiased for f(θ).
Now suppose f(θ) is a nonlinear function of the elements of θ. For example, the functions (8.3) and (8.4) are nonlinear functions of the elements of θ=(μ,σ)′. Then, in general, f(ˆθ) is not unbiased for f(θ) even if ˆθ is unbiased for θ. The direction of the bias depends on the properties of f(θ). If f(⋅) is convex at θ or concave at θ then we can say something about the direction of the bias in f(ˆθ).
Definition 8.2 (Convex function) A scalar function f(θ) is convex (in two dimensions) if all points on a straight line connecting any two points on the graph of f(θ) is above or on that graph. More formally, we say f(θ) is convex if for all θ1 and θ2 and for all 0≤α≤1
f(αθ1+(1−α)θ2))≤αf(θ1)+(1−α)f(θ2)For example, the function f(θ)=θ2 is a convex function. Another common convex function is exp(θ).
For example, the function f(θ)=−θ2 is concave. Other common concave functions are log(θ) and √θ.
Jensen’s inequality tells us that if f is convex (concave) and ˆθ is unbiased then f(ˆθ) is positively (negatively) biased. For example, we know from Chapter 7 that ˆσ2 is unbiased for σ2. Now, ˆσ=f(ˆσ2)=√ˆσ2 is a concave function of ˆσ. From Jensen’s inequality, we know that ˆσ is negatively biased (i.e. too small).
◼
8.2.2 Consistency
An important property of the plug-in estimators of the GWN model parameters is that they are consistent: as the sample size gets larger and larger the estimators get closer and closer to their true values and eventually equal their true values for an infinitely large sample.
Consistency also holds for plug-in estimators of functions of GWN model parameters. The justification comes from Slutsky’s Theorem.
Proposition 8.2 (Slutsky’s Theorem) Let ˆθ be consistent for θ, and let f:Rk→R be continuous at θ. Then f(ˆθ) is consistent for f(θ).
The key condition on f(θ) is continuity at θ.39
All of the functions (8.1) - (8.4) are continuous at θ=(μ,σ)′. By Slutsky’s Theorem, all of the plug-in estimators of the functions are also consistent.
◼
8.3 Asymptotic normality
Let θ denote a scalar parameter of the GWN model for returns and let ˆθ denote the plug-in estimator of θ. In Chapter 7 we stated that ˆθ is asymptotically normally distributed
ˆθ∼N(θ,^se(ˆθ)2),
for large enough T, where ^se(ˆθ) denotes the estimated standard error of ˆθ. This result is justified by the CLT. Let f(θ) denote a real-valued continuous and differentiable scalar function of θ. Using a technique called the delta method, we can show that the plug-in estimator f(ˆθ) is also asymptotically normally distributed:
f(ˆθ)∼N(f(θ),^se(f(ˆθ))2),
for large enough T, where ^se(f(ˆθ)) is the estimated standard error for f(ˆθ). The delta method shows us how to compute ^se(f(ˆθ))2:
^se(f(ˆθ))2=f′(ˆθ)2×^se(ˆθ)2,
where f′(ˆθ) denotes the derivative of f(θ) evaluated at ˆθ. Then,
^se(f(ˆθ))=√f′(ˆθ)2×^se(ˆθ)2.
In Chapter 7 we stated that, for large enough T,
ˆσ2∼N(σ2,se(ˆσ2)2), ˆσ∼N(σ,se(ˆσ)2),
where se(σ2)2=2σ4/T and se(σ)2=σ2/2T. The formula for se(ˆσ)2 is the result of the delta method for the function f(σ2)=√σ2=σ applied to the asymptotic distribution of ˆσ2. To see this, first note that by the chain rule
f′(σ2)=12(σ2)−1/2,
so that f′(σ2)2=14(σ2)−1. Then,
se(f(ˆθ))2=f′(σ2)2×se(ˆσ2)2=14(σ2)−1×2σ4/T=12σ2/T=σ2/2T=se(ˆσ)2.
The estimated squared standard error replaces σ2 with its estimate ˆσ2 and is given by ^se(ˆσ2)2=ˆσ2/2T.
◼
Now suppose θ is a k×1 vector of GWN model parameters, and f:Rk→R is a continuous and differentiable function of θ. Define the k×1 gradient function g(θ) as:
g(θ)=∂f(θ)∂θ=(∂f(θ)∂θ1⋮∂f(θ)∂θk).
Assume that ˆθ is asymptotically normally distributed:
ˆθ∼N(θ,^var(ˆθ)),
where ^var(ˆθ) is the k×k estimated variance-covariance matrix of ˆθ. Then the delta method shows us that f(ˆθ) is asymptotically normally distributed:
f(ˆθ)∼N(f(θ),^se(f(ˆθ))2),
where
^se(f(ˆθ))2=g(ˆθ)′^var(ˆθ)g(ˆθ).
Let θ=(μ,σ)′ and consider f1(θ)=qRα=μ+σqZα (normal simple return quantile). The gradient function g1(θ) for f1(θ) is:
g1(θ)=∂f1(θ)∂θ=(∂f1(θ)∂μ∂f1(θ)∂σ)=(1qZα).
Regarding var(ˆθ), recall from Chapter 7, in the GWN model: (ˆμˆσ2)∼N((μσ2),(se(ˆμ)200se(ˆσ2)2))
for large enough T. We show below, using the delta method, that
(ˆμˆσ)∼N((μσ),(se(ˆμ)200se(ˆσ)2))
Hence,
var(ˆθ)=(se(ˆμ)200se(ˆσ)2)=(σ2T00σ22T)
Then,
g1(θ)′var(ˆθ)g1(θ)=(1qZα)(se(ˆμ)200se(ˆσ)2)(1qZα)=se(ˆμ)2+(qZα)2se(ˆσ)2=σ2T+(qZα)2σ22T=σ2T[1+12(qZα)2]=se(ˆqZα)2.
The standard error for ˆqZα is: se(ˆqRα)=√var(ˆqRα)=σ√T[1+12(qZα)2]1/2.
The formula (8.5) shows that se(ˆqRα) increases with σ and qZα, and decreases with T. In addition, for fixed T, se(ˆqRα)→∞ as α→0 or as α→1 because qZα→−∞ as α→0 and qZα→∞ as α→1. Hence, we have very large estimation error in ˆqRα when α is very close to zero or one (for fixed values of σ and T). This makes intuitive sense as we have very few observations in the extreme tails of the empirical distribution of the data for a fixed sample size and so we expect a larger estimation error.
Replacing σ with its estimate ˆσ gives the estimated standard error for ˆqRα:
^se(ˆqRα)=ˆσ√T[1+12(qZα)2]1/2.
Using the above results, the sampling distribution of ˆqRα can be approximated by the normal distribution: ˆqRα∼N(qRα, ^se(ˆqRα)2), for large enough T.
The estimates of qRα, se(ˆqRα), and 95% confidence intervals, for α=0.05,0.01 and 0.001, from the simple monthly returns for Microsoft are:
.05 = muhatS + sigmahatS*qnorm(0.05)
qhat.01 = muhatS + sigmahatS*qnorm(0.01)
qhat.001 = muhatS + sigmahatS*qnorm(0.001)
qhat.05 = (sigmahatS/sqrt(n.obs))*sqrt(1 + 0.5*qnorm(0.05)^2)
seQhat.01 = (sigmahatS/sqrt(n.obs))*sqrt(1 + 0.5*qnorm(0.01)^2)
seQhat.001 = (sigmahatS/sqrt(n.obs))*sqrt(1 + 0.5*qnorm(0.001)^2)
seQhat.05 = qhat.05 - 2*seQhat.05
lowerQhat.05 = qhat.05 + 2*seQhat.05
upperQhat.01 = qhat.01 - 2*seQhat.01
lowerQhat.01 = qhat.01 + 2*seQhat.01
upperQhat.001 = qhat.001 - 2*seQhat.001
lowerQhat.001 = qhat.001 + 2*seQhat.001
upperQhat cbind(c(qhat.05, qhat.01, qhat.001),
ans =c(seQhat.05, seQhat.01, seQhat.001),
c(lowerQhat.05, lowerQhat.01, lowerQhat.001),
c(upperQhat.05, upperQhat.01, upperQhat.001))
colnames(ans) = c("Estimates", "Std Errors", "2.5%", "97.5%")
rownames(ans) = c("q.05", "q.01", "q.001")
ans
## Estimates Std Errors 2.5% 97.5%
## q.05 -0.158 0.0119 -0.182 -0.134
## q.01 -0.227 0.0149 -0.257 -0.197
## q.001 -0.304 0.0186 -0.342 -0.267
For Microsoft the values of ^se(ˆqRα) are about 1.2%, 1.5%, and 1.9% for α=0.05, α=0.01 and α=0.001, respectively. The 95% confidence interval widths increase with decreases in α indicating more uncertainty about the true value of qRα for smaller values of α. For example, the 95% confidence for interval qR0.01 indicates that the true loss with 1% probability (one month out of 100 months) is between 19.7% and 25.7%.
◼
Let θ=(μ,σ)′ and consider the functions f2(θ)=VaRNα=−W0(μ+σqZα) (normal VaR), f3(θ)=VaRLNα=−W0(exp((μ+σqZα))−1) (log-normal VaR), and f4(θ)=SR=μ−rfσ (Sharpe Ratio). It is straightforward to show (see end-of-chapter exercises) that the gradient functions are:
g2(θ)=∂f2(θ)∂θ=(−W0−W0qZα),g3(θ)=∂f3(θ)∂θ=(−W0exp(μ+σqZα)−W0exp(μ+σqZα)qZα),g4(θ)=∂f4(θ)∂θ=(−1σ−(σ)−2(μ−rf)).
The delta method variances are:
g2(θ)′var(ˆθ)g2(θ)=W20se(ˆqRα)2=se(^VaRNα)2g3(θ)′var(ˆθ)g3(θ)=W20exp(qRα)2se(ˆqRα)2=se(^VaRLNα)2g4(θ)′var(ˆθ)g4(θ)=1T(1+12SR2)=se(^SR)2.
The corresponding standard errors are:
se(^VaRNα)=W0se(ˆqRα)se(^VaRLNα)=W0exp(qRα)se(ˆqRα)se(^SR)=√1T(1+12SR2),
where
se(ˆqZα)=√σ2T[1+12(qZα)2].
We make the following remarks:
se(^VaRNα) and se(^VaRLNα) increase with W0, σ, and qZα, and decrease with T.
For fixed values of W0, σ, and T, se(^VaRNα) and se(^VaRLNα) become very large as α approaches zero. That is, for very small loss probabilities we will have very bad estimates of VaRNα and VaRLNα.
se(SR) increases with SR2 and decreases with T.
The practically useful estimated standard errors replace the unknown values of qRα and SR with their estimated values ˆqRα and ^SR and are given by:
^se(^VaRNα)=W0^se(ˆqRα)^se(^VaRLNα)=W0exp(ˆqRα)^se(ˆqRα)^se(^SR)=√1T(1+12^SR2)
◼
Consider a $100,000 investment for one month in Microsoft. The estimates of VaRNα, its standard error, and 95% confidence intervals, for α=0.05 and α=0.01 are:
100000
W0=.05 = muhatS + sigmahatS*qnorm(0.05)
qhat.01 = muhatS + sigmahatS*qnorm(0.01)
qhat.05 = -qhat.05*W0
VaR.N.05 = W0*seQhat.05
seVaR.N.01 = -qhat.01*W0
VaR.N.01 = W0*seQhat.01
seVaR.N.05 = VaR.N.05 - 2*seVaR.N.05
lowerVaR.N.05 = VaR.N.05 + 2*seVaR.N.05
upperVaR.N.01 = VaR.N.01 - 2*seVaR.N.01
lowerVaR.N.01 = VaR.N.01 + 2*seVaR.N.01
upperVaR.N cbind(c(VaR.N.05, VaR.N.01),
ans =c(seVaR.N.05, seVaR.N.01),
c(lowerVaR.N.05, lowerVaR.N.01),
c(upperVaR.N.05,upperVaR.N.01))
colnames(ans) = c("Estimate", "Std Error", "2.5%", "97.5%")
rownames(ans) = c("VaR.N.05", "VaR.N.01")
ans
## Estimate Std Error 2.5% 97.5%
## VaR.N.05 15780 1187 13405 18154
## VaR.N.01 22696 1490 19717 25676
The estimated values of VaRN.05 and VaRN.01 are $15,780 and $22,696, respectively, with estimated standard errors of $1,172 and $1,471. The estimated standard errors are fairly small compared with the estimated VaR values, so we can say that VaR is estimated fairly precisely here. Notice that the estimated standard error for VaR.01 is about 26% larger than the estimated standard error for VaRN.05 indicating we have less precision for estimating VaR for very small values of α.
The estimates of VaRLNα and its standard error for α=0.05 and α=0.01 are:
.05 = muhatC + sigmahatC*qnorm(0.05)
qhat.01 = muhatC + sigmahatC*qnorm(0.01)
qhat.05 = (sigmahatC/sqrt(n.obs))*sqrt(1 + 0.5*qnorm(0.05)^2)
seQhat.01 = (sigmahatC/sqrt(n.obs))*sqrt(1 + 0.5*qnorm(0.01)^2)
seQhat.05 = -W0*(exp(qhat.05)-1)
VaR.LN.05 = W0*exp(qhat.05)*seQhat.05
seVaR.LN.01 = -W0*(exp(qhat.01)-1)
VaR.LN.01 = W0*exp(qhat.01)*seQhat.01
seVaR.LN.05 = VaR.LN.05 - 2*seVaR.LN.05
lowerVaR.LN.05 = VaR.LN.05 + 2*seVaR.LN.05
upperVaR.LN.01 = VaR.LN.01 - 2*seVaR.LN.01
lowerVaR.LN.01 = VaR.LN.01 + 2*seVaR.LN.01
upperVaR.LN cbind(c(VaR.LN.05, VaR.LN.01),
ans =c(seVaR.LN.05, seVaR.LN.01),
c(lowerVaR.LN.05, lowerVaR.LN.01),
c(upperVaR.LN.05,upperVaR.LN.01))
colnames(ans) = c("Estimate", "Std Error", "2.5%", "97.5%")
rownames(ans) = c("VaR.LN.05", "VaR.LN.01")
ans
## Estimate Std Error 2.5% 97.5%
## VaR.LN.05 14846 998 12850 16842
## VaR.LN.01 20467 1170 18128 22807
The estimates of VaRLNα (and its standard error) are slightly smaller than the estimates of VaRNα (and its standard error) due to the positive skewness in the log-normal distribution.
◼
The estimated monthly SR for Microsoft (using rf=0.0025), its standard error and 95% confidence interval are computed using:
(muhatS - r.f)
SRhat = sqrt((1/n.obs)*(1 + 0.5*(SRhat^2)))
seSRhat = SRhat - 2*seSRhat
lowerSR = SRhat + 2*seSRhat
upperSR = c(SRhat, seSRhat, lowerSR, upperSR)
ans =names(ans) = c("Estimate", "Std Error", "2.5%", "97.5%")
ans
## Estimate Std Error 2.5% 97.5%
## 0.00665 0.07625 -0.14585 0.15915
The estimated monthly SR for Microsoft is close to zero, with a large estimated standard error and a 95% confidence interval containing both negative and positive values. Clearly, the SR is not estimated very well.
◼
8.3.1 The numerical delta method
The delta method is an advanced statistical technique and requires a bit of calculus to implement, especially for nonlinear vector-valued functions. It can be easy to make mistakes when working out the math! Fortunately, in R there is an easy way to implement the delta method that does not require you to work out the math. The car function deltaMethod
implements the delta method numerically, utilizing the stats function D()
which implements symbolic derivatives. The arguments expected by deltaMethod
are:
args(car:::deltaMethod.default)
## function (object, g., vcov., func = g., constants, level = 0.95,
## rhs = NULL, ..., envir = parent.frame())
## NULL
Here, object
is the vector of named estimated parameters, ˆθ, g.
is a quoted string that is the function of the parameter estimates to be evaluated, f(θ), and vcov
is the named estimated covariance matrix of the coefficient estimates, ^var(ˆθ). The function returns an object of class “deltaMethod” for which there is a print
method. The “deltaMethod” object is essentially a data.frame
with columns giving the parameter estimate(s), the delta method estimated standard error(s), and lower and upper confidence limits. The following example illustrates how to use deltaMethod
for the example functions.
We first create the named inputs ˆθ and ^var(ˆθ) from the GWN model for simple returns:
c(muhatS, sigmahatS)
thetahatS =names(thetahatS) = c("mu", "sigma")
matrix(c(se.muhatS^2,0,0,se.sigmahatS^2), 2, 2, byrow=TRUE)
var.thetahatS =rownames(var.thetahatS) = colnames(var.thetahatS) = names(thetahatS)
thetahatS
## mu sigma
## 0.00915 0.10150
var.thetahatS
## mu sigma
## mu 5.99e-05 0.00e+00
## sigma 0.00e+00 2.99e-05
We do the same for the inputs from the GWN model for continuously compounded returns:
c(muhatC, sigmahatC)
thetahatC =names(thetahatC) = c("mu", "sigma")
matrix(c(se.muhatC^2,0,0,se.sigmahatC^2), 2, 2, byrow=TRUE)
var.thetahatC =rownames(var.thetahatC) = colnames(var.thetahatC) = names(thetahatC)
It is important to name the elements in thetahat
and var.thetahat
as these names will be used in defining f(θ). To implement the numerical delta method for the normal quantile, we need to supply a quoted string specifying the function f1(θ)=μ+σqZα. Here we use the string "mu+sigma*q.05"
, where q.05 = qnorm(0.05)
. For, α=0.05 the R code is:
.05 = qnorm(0.05)
q deltaMethod(thetahatS, "mu+sigma*q.05", var.thetahatS)
dm1 =class(dm1)
## [1] "deltaMethod" "data.frame"
The returned object dm1
is of class “deltaMethod”, inhereting from “data.frame”, and has named columns:
names(dm1)
## [1] "Estimate" "SE" "2.5 %" "97.5 %"
The print
method shows the results of applying the delta method:
dm1
## Estimate SE 2.5 % 97.5 %
## mu + sigma * q.05 -0.1578 0.0119 -0.1811 -0.13
The results match the analytic calculations performed above.40
The delta method for the normal VaR function f2(θ) is:
deltaMethod(thetahatS, "-W0*(mu+sigma*q.05)", var.thetahatS)
dm2 = dm2
## Estimate SE 2.5 % 97.5 %
## -W0 * (mu + sigma * q.05) 15780 1187 13453 18106
The delta method for the log-normal VaR function f3(θ) is:
deltaMethod(thetahatC, "-W0*(exp(mu+sigma*q.05)-1)", var.thetahatC)
dm3 = dm3
## Estimate SE 2.5 % 97.5 %
## -W0 * (exp(mu + sigma * q.05) - 1) 14846 998 12890 16802
The delta method for the Sharpe ratio function f4(θ) is
deltaMethod(thetahatS, "(mu-r.f)/sigma", var.thetahatS)
dm4 = dm4
## Estimate SE 2.5 % 97.5 %
## (mu - r.f)/sigma 0.0655 0.0763 -0.0841 0.22
◼
8.3.2 the delta method for vector valued functions (advanced)
Let θ denote a k×1 vector of GWN model parameters. For example, θ=(μ,σ2)′. Assume that ˆθ is asymptotically normally distributed: ˆθ∼N(θ,^var(θ)), for large enough T where ^var(θ) is the k×k variance-covariance matrix of ˆθ. Sometimes we need to consider the asymptotic distribution of a vector-valued function of θ. Let f:Rk→Rj where j≤k be a continuous and differentiable function, and define the j×1 vector η=f(θ). It is useful to express η as:
η=(η1η2⋮ηj)=(f1(θ)f2(θ)⋮fj(θ)).
For the first example let θ=(σ21,σ22)′, and define the 2×1 vector of volatilities: η=f(θ)=(σ1,σ2)′. Then, η1=f1(θ)=√σ22 and η2=f2(θ)=√σ22.
For the second example, let θ = (μ1,μ2,σ1,σ2)′ and define the 2×1 vector of Sharpe ratios:
η=f(θ)=(μ1−rfσ1μ1−rfσ1)=(SR1SR2),
where rf denotes the constant risk-free rate. Then η1=f1(θ)=μ1−rfσ1=SR1, η2=f2(θ)=μ2−rfσ2=SR2.
◼
Define the j×k Jacobian matrix: G(θ)′=(∂f1(θ)∂θ1∂f1(θ)∂θ2⋯∂f1(θ)∂θk∂f2(θ)∂θ1∂f2(θ)∂θ2⋯∂f2(θ)∂θk⋮⋮…⋮∂fj(θ)∂θ1∂fj(θ)∂θ2⋯∂fj(θ)∂θk)
Then, by the delta method, the asymptotic normal distribution of ˆη=f(ˆθ) is: ˆη∼N(η,^var(ˆη)), where ^var(ˆη)=G(ˆθ)′^var(ˆθ)G(ˆθ).
Consider the first example function where θ=(σ21,σ22)′ and η=f(θ)=(σ1,σ2)′. Now, using the chain-rule:
∂f1(θ)∂σ21=12σ−11, ∂f1(θ)∂σ22=0∂f2(θ)∂σ21=0, ∂f2(θ)∂σ22=12σ−11
The Jacobian matrix is
G(θ)′=(12σ−110012σ−12)
From Chapter 7, Proposition 7.9,
^var(ˆθ)=1T(2ˆσ412ˆσ2122ˆσ2122ˆσ42)
Then
^var(ˆη)=G(ˆθ)′^var(ˆθ)G(ˆθ)=(12ˆσ−110012ˆσ−12)1T(2ˆσ412ˆσ2122ˆσ2122ˆσ42)(12ˆσ−110012ˆσ−12)=1T(12ˆσ2112ˆσ212ˆσ−11ˆσ−1212ˆσ212ˆσ−11ˆσ−1212ˆσ22)=1T(12ˆσ2112ˆσ12ˆρ1212ˆσ12ˆρ1212ˆσ22).
◼
Consider the second example function where \theta =(\mu_1,\mu_2, \sigma_1, \sigma_2)' and \eta = f(\theta)=(f_1(\theta), f_2(\theta))' where \begin{equation} f_1(\theta) = \frac{\mu_1 - r_f}{\sigma_1} = \mathrm{SR}_1, ~ f_2(\theta) = \frac{\mu_2 - r_f}{\sigma_2} = \mathrm{SR}_2. \end{equation}
Here, we want to use the delta method to get the joint distribution of \eta = (\mathrm{SR}_1, \mathrm{SR}_2)'. Now, using the chain-rule:
\begin{align*} \frac{\partial f_1(\theta)}{\partial \mu_1} & = \sigma_1^{-1}, ~ \frac{\partial f_1(\theta)}{\partial \mu_2} = 0, ~ & \frac{\partial f_1(\theta)}{\partial \sigma_1} = -\sigma_1^{-2}(\mu_1 - r_f), ~ & \frac{\partial f_1(\theta)}{\partial \sigma_2} = 0, \\ \frac{\partial f_2(\theta)}{\partial \mu_1} & = 0, ~ \frac{\partial f_2(\theta)}{\partial \mu_2} = \sigma_2^{-1}, ~ & \frac{\partial f_2(\theta)}{\partial \sigma_1} = 0, ~ & \frac{\partial f_2(\theta)}{\partial \sigma_2} = -\sigma_1^{-2}(\mu_1 - r_f). \end{align*}
Then, using \mathrm{SR}_{i} = (\mu_i - r_f)/\sigma_i ~ (i=1,2), we have:
\begin{equation} \mathbf{G}(\theta)' = \begin{pmatrix} \sigma_1^{-1} & 0 & -\sigma_1^{-1} \mathrm{SR}_1 & 0 \\ 0 & \sigma_2^{-1} & 0 & -\sigma_2^{-1}\mathrm{SR}_2 & 0 \end{pmatrix} \end{equation}
Using Proposition 7.10 and the results from the previous example we have:
\begin{equation} \mathrm{var}(\hat{\theta}) = \frac{1}{T} \begin{pmatrix} \sigma_1^2 & \sigma_{12} & 0 & 0 \\ \sigma_{12} & \sigma_2^2 & 0 & 0 \\ 0 & 0 & \frac{1}{2}\sigma_1^2 & \frac{1}{2} \sigma_{12}^2 \sigma_1^{-1} \sigma_2^{-1} \\ 0 & 0 & \frac{1}{2} \sigma_{12}^2 \sigma_1^{-1} \sigma_2^{-1} & \frac{1}{2}\sigma_2^2 \end{pmatrix}. \end{equation}
Then, after some straightforward matrix algebra calculations (see end-of-chapter exercises) we get:
\begin{align} \widehat{\mathrm{var}}(\hat{\eta}) & = \mathbf{G}(\hat{\theta})'\widehat{\mathrm{var}}(\hat{\theta})\mathbf{G}(\hat{\theta}) \\ &= \begin{pmatrix} \widehat{\mathrm{var}}(\widehat{\mathrm{SR}}_1) & \widehat{\mathrm{cov}}(\widehat{\mathrm{SR}}_1, \widehat{\mathrm{SR}}_2) \\ \widehat{\mathrm{cov}}(\widehat{\mathrm{SR}}_1, \widehat{\mathrm{SR}}_2) & \widehat{\mathrm{var}}(\widehat{\mathrm{SR}}_2) \end{pmatrix} \\ & = \begin{pmatrix} \frac{1}{T}(1+\frac{1}{2}\widehat{\mathrm{SR}}_1^2) & \frac{\hat{\rho}_{12}}{T}(1 + \frac{1}{2}\hat{\rho}_{12}\widehat{\mathrm{SR}}_1 \widehat{\mathrm{SR}}_2) \\ \frac{\hat{\rho}_{12}}{T}(1 + \frac{1}{2}\hat{\rho}_{12} \widehat{\mathrm{SR}}_1\widehat{\mathrm{SR}}_2) & \frac{1}{T}(1+\frac{1}{2}\widehat{\mathrm{SR}}_2^2) \end{pmatrix}. \end{align}
The diagonal elements match the delta method variances for the estimated Sharpe ratio derived earlier. The off-diagonal covariance term shows that the two estimated Sharpe ratios are correlated. We explore this correlation in more depth in the end-of-chapter exercises.
\blacksquare
8.3.3 The delta method explained
The delta method deduces the asymptotic distribution of f(\hat{\theta}) using a first order Taylor series approximation of f(\hat{\theta}) evaluated at the true value \theta. For simplicity, assume that \theta is a scalar parameter. Then, the first order Taylor series expansion gives
\begin{align*} f(\hat{\theta}) & = f(\theta) + f'(\theta)(\hat{\theta} - \theta) + \text{remainder} \\ & = f(\theta) + f'(\theta)\hat{\theta} + f'(\theta)\theta + \text{remainder} \\ & \approx f(\theta) + f'(\theta)\hat{\theta} + f'(\theta)\theta \end{align*}
The first order Taylor series expansion shows that f(\hat{\theta}) is an approximate linear function of the random variable \hat{\theta}. The values \theta, f(\theta), and f'(\theta) are constants. Assuming that, for large enough T,
\begin{equation*} \hat{\theta}\sim N(\theta,\widehat{\mathrm{se}}(\hat{\theta})^{2}), \end{equation*}
it follows that
\begin{equation*} f(\hat{\theta}) \sim N(f(\theta), f'(\theta)^2 \widehat{\mathrm{se}}(\hat{\theta})^{2}) \end{equation*}
Since \hat{\theta} is consistent for \theta we can replace f'(\theta)^2 with f'(\hat{\theta})^2 giving the practically useful result
\begin{equation*} f(\hat{\theta}) \sim N(f(\theta), f'(\hat{\theta})^2 \widehat{\mathrm{se}}(\hat{\theta})^{2}) \end{equation*}
8.4 Resampling
As described in the previous sections, the delta method can be used to produce analytic expressions for standard errors of functions of estimated parameters that are asymptotically normally distributed. The formulas are specific to each function and are approximations that rely on large samples and the validity of the Central Limit Theorem to be accurate.
Alternatives to the analytic asymptotic distributions of estimators and the delta method are computer intensive resampling methods such as the jackknife and the bootstrap. These resampling methods can be used to produce numerical estimated standard errors for estimators and functions of estimators without the use of mathematical formulas. Most importantly, these jackknife and bootstrap standard errors are: (1) easy to compute, and (2) often numerically very close to the analytic standard errors based on CLT approximations. In some cases the bootstrap standard errors can even be more reliable than asymptotic approximations.
There are many advantages of the jackknife and bootstrap over analytic formulas. The most important are the following:
- Fewer assumptions. The jackknife and boostrap procedures typically requires fewer assumptions than are needed to derive analytic formulas. In particular, they do not require the data to be normally distributed or the sample size to be large enough so that the Central Limit Theorem holds.
- Greater accuracy. In small samples the bootstrap is often more reliable than analytic approximations based on the Central Limit Theorem. In large samples when the Central Limit Theorem holds the bootstrap can be even be more accurate.
- Generality. In many situations, the jackknife and bootstrap procedures are applied in the same way regardless of the estimator under consideration. That is, you don’t need a different jackknife or bootstrap procedure for each estimator.
8.5 The Jackknife
Let \theta denote a scalar parameter of interest. It could be a scalar parameter of the GWN model (e.g. \mu or \sigma) or it could be a scalar function of the parameters of the GWN model (e.g. VaR or SR). Let \hat{\theta} denote the plug-in estimator of \theta. The goal is to compute a numerical estimated standard error for \hat{\theta}, \widehat{\mathrm{se}}(\hat{\theta}), using the jackknife resampling method.
The jackknife resampling method utilizes T resamples of the original data \left\{ R_t \right\}_{t=1}^{T}. The resamples are created by leaving out a single observation. For example, the first jackknife resample leaves out the first observation and is given by \left\{ R_t \right\}_{t=2}^{T}. This sample has T-1 observations. In general, the t^{th} jackknife resample leaves out the t^{th} return R_{t} for t \le T. Now, on each of the t=1,\ldots, T jackknife resamples you compute an estimate of \theta. Denote this jackknife estimate \hat{\theta}_{-t} to indicate that \theta is estimated using the resample that omits observation t. The estimator \hat{\theta}_{-t} is called the leave-one-out estimator of \theta. Computing \hat{\theta}_{-t} for t=1,\ldots, T gives T leave-one-out estimators \left\{ \hat{\theta}_{-t} \right\}_{t=1}^{T}.
The jackknife estimator of \widehat{\mathrm{se}}(\hat{\theta}) is the scaled standard deviation of \left\{ \hat{\theta}_{-t} \right\}_{t=1}^{T}:
\begin{equation} \widehat{\mathrm{se}}_{jack}(\hat{\theta}) = \sqrt{ \frac{T-1}{T}\sum_{t=1}^T (\hat{\theta}_{-t} - \bar{\theta})^2 }, \tag{8.10} \end{equation}
where \bar{\theta} = T^{-1}\sum_{t=1}^T \hat{\theta}_{-t} is the sample mean of \left\{ \hat{\theta}_{-t} \right\}_{t=1}^{T}. The estimator (8.10) is easy to compute and does not require any analytic calculations from the asymptotic distribution of \hat{\theta}.
The intuition for the jackknife standard error formula can be illustrated by considering the jackknife standard error estimate for the sample mean \hat{\theta} = \hat{\mu}. We know that the analytic estimated squared standard error for \hat{\mu} is
\begin{equation*} \widehat{\mathrm{se}}(\hat{\mu})^2 = \frac{\hat{\sigma}^2}{T} = \frac{1}{T} \frac{1}{T-1} \sum_{t=1}^T (R_t - \hat{\mu})^2. \end{equation*}
Now consider the jackknife squared standard error for \hat{\mu}:
\begin{equation*} \widehat{\mathrm{se}}_{jack}(\hat{\mu})^2= \frac{T-1}{T}\sum_{t=1}^T (\hat{\mu}_{-t} - \bar{\mu})^2, \end{equation*}
where \bar{\mu} = T^{-1}\sum_{t=1}^T \hat{\mu}_{-t}. Here, we will show that \widehat{\mathrm{se}}(\hat{\mu})^2 = \widehat{\mathrm{se}}_{jack}(\hat{\mu})^2.
First, note that
\begin{align*} \hat{\mu}_{-t} & = \frac{1}{T-1}\sum_{j \ne t} R_j = \frac{1}{T-1} \left( \sum_{j=1}^T R_j - R_t \right) \\ &= \frac{1}{T-1} \left( T \hat{\mu} - R_t \right) \\ &= \frac{T}{T-1}\hat{\mu} - \frac{1}{T-1}R_t. \end{align*}
Next, plugging in the above to the definition of \bar{\mu} we get
\begin{align*} \bar{\mu} &= \frac{1}{T} \sum_{j=1}^T \hat{\mu}_{-t} = \frac{1}{T}\sum_{t=1}^T \left( \frac{T}{T-1}\hat{\mu} - \frac{1}{T-1}R_t \right) \\ &= \frac{1}{T-1} \sum_{t=1}^T \hat{\mu} - \frac{1}{T}\frac{1}{T-1} \sum_{t=1}^T R_t \\ &= \frac{T}{T-1} \hat{\mu} - \frac{1}{T-1} \hat{\mu} \\ &= \hat{\mu} \left( \frac{T}{T-1} - \frac{1}{T-1} \right) \\ &= \hat{\mu}. \end{align*}
Hence, the sample mean of \hat{\mu}_{-t} is \hat{\mu}. Then we can write
\begin{align*} \hat{\mu}_{-t} - \bar{\mu} & = \hat{\mu}_{-t} - \hat{\mu} \\ & = \frac{T}{T-1}\hat{\mu} - \frac{1}{T-1}R_t - \hat{\mu} \\ &= \frac{T}{T-1}\hat{\mu} - \frac{1}{T-1}R_t - \frac{T-1}{T-1}\hat{\mu} \\ &= \frac{1}{T-1}\hat{\mu} - \frac{1}{T-1}R_t \\ &= \frac{1}{T-1}(\hat{\mu} - R_t). \end{align*}
Using the above we can write
\begin{align*} \widehat{\mathrm{se}}_{jack}(\hat{\mu})^2 &= \frac{T-1}{T}\sum_{t=1}^T (\hat{\mu}_{-t} - \bar{\mu})^2 \\ &= \frac{T-1}{T}\left(\frac{1}{T-1}\right)^2 \sum_{t=1}^T (\hat{\mu} - R_t)^2 \\ &= \frac{1}{T}\frac{1}{T-1} \sum_{t=1}^T (\hat{\mu} - R_t)^2 \\ &= \frac{1}{T} \hat{\sigma}^2 \\ &= \widehat{\mathrm{se}}(\hat{\mu})^2. \end{align*}
We can compute the leave-one-out estimates \{\hat{\mu}_{-t}\}_{t=1}^T using a simple for
loop as follows:
length(msftRetS)
n.obs = rep(0, n.obs)
muhat.loo =for(i in 1:n.obs) {
mean(msftRetS[-i])
muhat.loo[i] = }
In the above loop, msftRetS[-i]
extracts all observations except observation i. The mean of the leave-one-out estimates is
mean(muhat.loo)
## [1] 0.00915
which is the sample mean of msftRetS
. It is informative to plot a histogram of the leave-one-out estimates:
The white dotted line is at the point \hat{\mu} = 0.00915. The jackknife estimated standard error is the scaled standard deviation of the T \hat{\mu}_{-t} values:41
sqrt(((n.obs - 1)^2)/n.obs)*sd(muhat.loo)
se.jack.muhat = se.jack.muhat
## [1] 0.00774
This value is exactly the same as the analytic standard error:
se.muhatS
## [1] 0.00774
You can also compute jackknife estimated standard errors using the bootstrap function jackknife
:
library(bootstrap)
jackknife(coredata(msftRetS), mean)
muhat.jack =names(muhat.jack)
## [1] "jack.se" "jack.bias" "jack.values" "call"
By default, jackknife()
takes as input a vector of data and the name of an R function and returns a list with components related to the jackknife procedure. The component jack.values
contains the leave-one-out estimators, and the component jack.se
gives you the jackknife estimated standard error:
$jack.se muhat.jack
## [1] 0.00774
\blacksquare
The fact that the jackknife standard error estimate for \hat{\mu} equals the analytic standard error is main justification for using the jackknife to compute standard errors. For estimated parameters other than \hat{\mu} and general estimated functions the jackknife standard errors are often numerically very close to the delta method standard errors. The following example illustrates this point for the example functions (8.1) - (8.4).
Here, we use the bootstrap function jackknife()
to compute jackknife estimated standard errors for the plug-in estimates of the example functions (8.1) - (8.4) and we compare the jackknife standard errors with the delta method standard errors. To use jackknife()
with user-defined functions involving multiple parameters you must create the function such that the first input is an integer vector of index observations and the second input is the data. For example, a function for computing (8.1) to be passed to jackknife()
is:
function(x, xdata) {
theta.f1 = mean(xdata[x, ])
mu.hat = sd(xdata[x, ])
sigma.hat = mu.hat + sigma.hat*qnorm(0.05)
fun.hat =
fun.hat }
To compute the jackknife estimated standard errors for the estimate of (8.1) use:
jackknife(1:n.obs, theta=theta.f1, coredata(msftRetS))
f1.jack = cbind(f1.jack$jack.se, dm1$SE)
ans =colnames(ans) = c("Jackknife", "Delta Method")
rownames(ans) = "f1"
ans
## Jackknife Delta Method
## f1 0.0133 0.0119
Here, we see that the estimated jackknife standard error for f_1(\hat{\theta}) is very close to the delta method standard error computed earlier.
To compute the jackknife estimated standard error for f_2(\hat{\theta}) use:
function(x, xdata, W0) {
theta.f2 = mean(xdata[x, ])
mu.hat = sd(xdata[x, ])
sigma.hat = -W0*(mu.hat + sigma.hat*qnorm(0.05))
fun.hat =
fun.hat
} jackknife(1:n.obs, theta=theta.f2, coredata(msftRetS), W0)
f2.jack = cbind(f2.jack$jack.se, dm2$SE)
ans =colnames(ans) = c("Jackknife", "Delta Method")
rownames(ans) = "f2"
ans
## Jackknife Delta Method
## f2 1329 1187
Again, the jackknife standard error is close to the delta method standard error.
To compute the jackknife estimated standard error for f_3(\hat{\theta}) use:
function(x, xdata, W0) {
theta.f3 = mean(xdata[x, ])
mu.hat = sd(xdata[x, ])
sigma.hat = -W0*(exp(mu.hat + sigma.hat*qnorm(0.05)) - 1)
fun.hat =
fun.hat
} jackknife(1:n.obs, theta=theta.f3, coredata(msftRetC), W0)
f3.jack = cbind(f3.jack$jack.se, dm3$SE)
ans =colnames(ans) = c("Jackknife", "Delta Method")
rownames(ans) = "f3"
ans
## Jackknife Delta Method
## f3 1315 998
For log-normal VaR, the jackknife standard error is a bit larger than the delta method standard error.
Finally, to compute the jacknife estimated standard for the Sharpe ratio use:
function(x, xdata, r.f) {
theta.f4 = mean(xdata[x, ])
mu.hat = sd(xdata[x, ])
sigma.hat = (mu.hat-r.f)/sigma.hat
fun.hat =
fun.hat
} jackknife(1:n.obs, theta=theta.f4, coredata(msftRetS), r.f)
f4.jack = cbind(f4.jack$jack.se, dm4$SE)
ans =colnames(ans) = c("Jackknife", "Delta Method")
rownames(ans) = "f4"
ans
## Jackknife Delta Method
## f4 0.076 0.0763
Here, the jackknife and delta method standard errors are very close.
\blacksquare
8.5.1 The jackknife for vector-valued estimates
The jackknife can also be used to compute an estimated variance matrix of a vector-valued estimate of parameters. Let \theta denote a k \times 1 vector of GWN model parameters (e.g. \theta=(\mu,\sigma)'), and let \hat{\theta} denote the plug-in estimate of \theta. Let \hat{\theta}_{-t} denote the leave-one-out estimate of \theta, and \bar{\theta}=T^{-1}\sum_{t=1}^T \hat{\theta}_{-t}. Then the jackknife estimate of the k \times k variance matrix \widehat{\mathrm{var}}(\hat{\theta}) is the scaled sample covariance of \{\hat{\theta}_{-t}\}_{t=1}^T:
\begin{equation} \widehat{\mathrm{var}}_{jack}(\hat{\theta}) = \frac{T-1}{T} \sum_{t=1}^T (\hat{\theta}_{-t} - \bar{\theta})(\hat{\theta}_{-t} - \bar{\theta})'. \tag{8.11} \end{equation}
The jackknife estimated standard errors for the elements of \hat{\theta} are given by the square root of the diagonal elements of \widehat{\mathrm{var}}_{jack}(\hat{\theta}).
The same principle works for computing the estimated variance matrix of a vector of functions of GWN model parameters. Let f:R^k \rightarrow R^l with l \le k and define \eta = f(\theta) denote a l \times 1 vector of functions of GWN model parameters. Let \hat{\eta}=f(\hat{\theta}) denote the plug-in estimator of \eta. Let \hat{\eta}_{-t} denote the leave-one-out estimate of \eta, and \bar{\eta}=T^{-1}\sum_{t=1}^T \hat{\eta}_{-t}. Then the jackknife estimate of the l \times l variance matrix \widehat{\mathrm{var}}(\hat{\eta}) is the scaled sample covariance of \{\hat{\eta}_{-t}\}_{t=1}^T:
\begin{equation} \widehat{\mathrm{var}}_{jack}(\hat{\eta}) = \frac{T-1}{T} \sum_{t=1}^T (\hat{\eta}_{-t} - \bar{\eta})(\hat{\eta}_{-t} - \bar{\eta})'. \tag{8.12} \end{equation}
The bootstrap function jackknife()
function only works for scalar-valued parameter estimates, so we have to compute the jackknife variance estimate by hand using a for
loop. The R code is:
length(msftRetS)
n.obs = rep(0, n.obs)
muhat.loo = rep(0, n.obs)
sigmahat.loo =for(i in 1:n.obs) {
mean(msftRetS[-i])
muhat.loo[i] = sd(msftRetS[-i])
sigmahat.loo[i] =
} cbind(muhat.loo, sigmahat.loo)
thetahat.loo =colnames(thetahat.loo) = c("mu", "sigma")
(((n.obs-1)^2)/n.obs)*var(thetahat.loo)
varhat.jack = varhat.jack
## mu sigma
## mu 5.99e-05 1.49e-05
## sigma 1.49e-05 6.12e-05
The jackknife variance estimate is somewhat close to the analytic estimate:
var.thetahatS
## mu sigma
## mu 5.99e-05 0.00e+00
## sigma 0.00e+00 2.99e-05
Since the first element of \hat{\theta} is \hat{\mu}, the (1,1) elements of the two variance matrix estimates match exactly. However, the varhat.jack
has non-zero off diagonal elements and the (2,2) element corresponding to \hat{\sigma} is considerably bigger than (2,2) element of the analytic estimate. Converting var.thetahatS
to an estimated correlation matrix gives:
cov2cor(varhat.jack)
## mu sigma
## mu 1.000 0.246
## sigma 0.246 1.000
Here, we see that the jackknife shows a small positive correlation between \hat{\mu} and \hat{\sigma} whereas the analytic matrix shows a zero correlation.
\blacksquare
8.5.2 Pros and Cons of the Jackknife
The jackknife gets its name because it is a useful statistical device.42 The main advantages (pros) of the jacknife are:
It is a good “quick and dirty” method for computing numerical estimated standard errors of estimates that does not rely on asymptotic approximations.
The jackknife estimated standard errors are often close to the delta method standard errors.
This jackknife, however, has some limitations (cons). The main limitations are:
It can be computationally burdensome for very large samples (e.g., if used with ultra high frequency data).
It does not work well for non-smooth functions such as empirical quantiles.
It is not well suited for computing confidence intervals.
It is not as accurate as the bootstrap.
8.6 The Nonparametric Bootstrap
In this section, we describe the easiest and most common form of the bootstrap: the nonparametric bootstrap. As we shall see, the nonparametric bootstrap procedure is very similar to a Monte Carlo simulation experiment. The main difference is how the random data is generated. In a Monte Carlo experiment, the random data is created from a computer random number generator for a specific probability distribution (e.g., normal distribution). In the nonparametric bootstrap, the random data is created by resampling with replacement from the original data.
The procedure for the nonparametric bootstrap is as follows:
- Resample. Create B bootstrap samples by sampling with replacement from the original data \{r_{1},\ldots,r_{T}\}.43 Each bootstrap sample has T observations (same as the original sample) \begin{eqnarray*} \left\{r_{11}^{*},r_{12}^{*},\ldots,r_{1T}^{*}\right\} & = & \mathrm{1st\,bootstrap\,sample}\\ & \vdots\\ \left\{r_{B1}^{*},r_{B2}^{*},\ldots,r_{BT}^{*}\right\} & = & \mathrm{Bth\,boostrap\,sample} \end{eqnarray*}
- Estimate \theta. From each bootstrap sample estimate \theta and denote the resulting estimate \hat{\theta}^{*}. There will be B values of \hat{\theta}^{*}: \left\{\hat{\theta}_{1}^{*},\ldots,\hat{\theta}_{B}^{*}\right\}.
- Compute statistics. Using \left\{\hat{\theta}_{1}^{*},\ldots,\hat{\theta}_{B}^{*}\right\} compute estimate of bias, standard error, and approximate 95% confidence interval.
This procedure is very similar to the procedure used to perform a Monte Carlo experiment described in Chapter 7. The main difference is how the hypothetical samples are created. With the nonparametric bootstrap, the hypothetical samples are created by resampling with replacement from the original data. In this regard the bootstrap treats the sample as if it were the population. This ensures that the bootstrap samples inherit the same distribution as the original data – whatever that distribution may be. If the original data is normally distributed, then the nonparametric bootstrap samples will also be normally distributed. If the data is Student’s t distributed, then the nonparametric bootstrap samples will be Student’s t distributed. With Monte Carlo simulation, the hypothetical samples are simulated under an assumed model and distribution. This requires one to specify values for the model parameters and the distribution from which to simulate.
8.6.1 Bootstrap bias estimate
The nonparametric bootstrap can be used to estimate the bias of an estimator \hat{\theta} using the bootstrap estimates {\hat{\theta}_{1}^{*},\ldots,\hat{\theta}_{B}^{*}}. The key idea is to treat the empirical distribution (i.e., histogram) of the bootstrap estimates {\hat{\theta}_{1}^{*},\ldots,\hat{\theta}_{B}^{*}} as an approximate to the unknown distribution of \hat{\theta}.
Recall, the bias of an estimator is defined as
\begin{equation*} E[\hat{\theta}]-\theta. \end{equation*}
The bootstrap estimate of bias is given by
\begin{align} \widehat{\mathrm{bias}}_{boot}(\hat{\theta},\theta)= \bar{\theta}^{\ast} - \hat{\theta} = \frac{1}{B}\sum_{j=1}^{\ B}\hat{\theta}_{j}^{\ast}-\hat{\theta.}\tag{8.13}\\ \text{(bootstrap mean - estimate)}\nonumber \end{align}
The bootstrap bias estimate (8.13) is the difference between the mean of the bootstrap estimates of \theta and the sample estimate of \theta. This is similar to the Monte Carlo estimate of bias discussed in Chapter 7. However, the Monte Carlo estimate of bias is the difference between the mean of the Monte Carlo estimates of \theta and the true value of \theta. The bootstrap estimate of bias does not require knowing the true value of \theta. Effectively, the bootstrap treats the sample estimate \hat{\theta} as the population value \theta and the bootstrap mean \bar{\theta}^{\ast} = \frac{1}{B}\sum_{j=1}^{\ B}\hat{\theta}_{j}^{\ast} as an approximation to E[\hat{\theta}]. Here, \widehat{\mathrm{bias}}_{boot}(\hat{\theta},\theta)=0 if the center of the histogram of {\hat{\theta}_{1}^{*},\ldots,\hat{\theta}_{B}^{*}} is at \hat{\theta}.
Given the bootstrap estimate of bias (8.13), we can compute a bias-adjusted estimate:
\begin{equation} \hat{\theta}_{adj} = \hat{\theta} - (\bar{\theta}^{\ast} - \hat{\theta}) = 2\hat{\theta} - \bar{\theta}^{\ast} \tag{8.14} \end{equation}
if the bootstrap estimate of bias is large, it may be tempting to use the bias-adjusted estimate (8.14) in place of the original estimate. This is generally not done in practice because the bias adjustment introduces extra variability into the estimator. So what use is the bootstrap bias estimate? It provides information to you that your estimate contains bias (or not) and this information can influence your decision making based on the estimate.
8.6.2 Bootstrap standard error estimate
for a scalar estimate \hat{\theta}, the bootstrap estimate of \widehat{\mathrm{se}}(\hat{\theta}) is given by the sample standard deviation of the bootstrap estimates {\hat{\theta}_{1}^{*},\ldots,\hat{\theta}_{B}^{*}}:
\begin{equation} \widehat{\mathrm{se}}_{boot}(\hat{\theta})=\sqrt{\frac{1}{B-1}\sum_{j=1}^{B}\left(\hat{\theta}_{j}^{\ast} - \bar{\theta}^{\ast} \right)^{2}}.\tag{8.15} \end{equation}
where \bar{\theta}^{\ast} = \frac{1}{B}\sum_{j=1}^{\ B}\hat{\theta}_{j}^{\ast}. Here, \widehat{\mathrm{se}}_{boot}(\hat{\theta}) is the size of a typical deviation of a bootstrap estimate, \hat{\theta}^{*}, from the mean of the bootstrap estimates (typical deviation from the middle of the histogram of the bootstrap estimates). This is very closely related to the Monte Carlo estimate of \widehat{\mathrm{se}}(\hat{\theta}) , which is the sample standard deviation of the estimates of \theta from the Monte Carlo samples.
For a k \times 1 vector estimate \hat{\theta}, the bootstrap estimate of \widehat{\mathrm{var}}(\hat{\theta}) is
\begin{equation} \widehat{\mathrm{var}}(\hat{\theta}) = \frac{1}{B-1}\sum_{j=1}^B (\hat{\theta}_{j}^{\ast} - \bar{\theta}^{\ast})(\hat{\theta}_{j}^{\ast} - \bar{\theta}^{\ast})'. \end{equation}
8.6.3 Bootstrap confidence intervals
In addition to providing standard error estimates, the bootstrap is commonly used to compute confidence intervals for scalar parameters. There are several ways of computing confidence intervals with the bootstrap and which confidence interval to use in practice depends on the characteristics of the distribution of the bootstrap estimates {\hat{\theta}_{1}^{*},\ldots,\hat{\theta}_{B}^{*}}.
If the bootstrap distribution appears to be normal then you can compute a 95% confidence interval for \theta using two simple methods. The first method mimics the rule-of-thumb that is justified by the CLT to compute a 95% confidence interval but uses the bootstrap standard error estimate (8.15):
\begin{equation} \hat{\theta}\pm2\times\widehat{\mathrm{se}}_{boot}(\hat{\theta}) = [\hat{\theta} - 2\times \widehat{\mathrm{se}}_{boot}(\hat{\theta}), ~ \hat{\theta} + 2\times \widehat{\mathrm{se}}_{boot}(\hat{\theta})].\tag{8.16} \end{equation}
The 95% confidence interval (8.16) is called the normal approximation 95% confidence interval.
The second method directly uses the distribution of the bootstrap estimates {\hat{\theta}_{1}^{*},\ldots,\hat{\theta}_{B}^{*}} to form a 95% confidence interval:
\begin{equation} [\hat{q}_{.025}^{*},\,\hat{q}_{.975}^{*}],\tag{8.17} \end{equation}
where \hat{q}_{.025}^{*} and \hat{q}_{.975}^{*} are the 2.5% and 97.5% empirical quantiles of {\hat{\theta}_{1}^{*},\ldots,\hat{\theta}_{B}^{*}}, respectively. By construction, 95% of the bootstrap estimates lie in the quantile-based interval (8.17). This 95% confidence interval is called the bootstrap percentile 95% confidence interval.
The main advantage of (8.17) over (8.16) is transformation invariance. This means that if (8.17) is a 95% confidence interval for \theta and f(\theta) is a monotonic function of \theta then [f(\hat{q}_{.025}^{*}),\,f(\hat{q}_{.975}^{*})] is a 95% confidence interval for f(\theta). The interval (8.17) can also be asymmetric whereas (8.16) is symmetric.
The bootstrap normal approximation and percentile 95% confidence intervals (8.16) and (8.17) will perform well (i.e., will have approximately correct coverage probability) if the bootstrap distribution looks normally distributed. However, if the bootstrap distribution is asymmetric then these 95% confidence intervals may have coverage probability different from 95%, especially if the asymmetry is substantial. The basic shape of the bootstrap distribution can be visualized by the histogram of {\hat{\theta}_{1}^{*},\ldots,\hat{\theta}_{B}^{*}}, and the normality of the distribution can be visually evaluated using the normal QQ-plot of {\hat{\theta}_{1}^{*},\ldots,\hat{\theta}_{B}^{*}}. If the histogram is asymmetric and/or not bell-shaped or if the normal QQ-plot is not linear then normality is suspect and the confidence intervals (8.17) and (8.16) may be inaccurate.
If the bootstrap distribution is asymmetric then it is recommended to use Efron’s bias and skewness adjusted bootstrap percentile confidence (BCA) interval. The details of how to compute the BCA interval are a bit complicated, and so are omitted, and the interested reader is referred to Chapter 10 of Hansen (2011) for a good explanation. Suffice it to say that the 95% BCA method adjusts the 2.5% and 97.5% quantiles of the bootstrap distribution for bias and skewness. The BCA confidence interval can be computed using the boot function boot.ci()
with optional argument type="bca"
.44
8.6.4 Performing the Nonparametric Bootstrap in R
The nonparametric bootstrap procedure is easy to perform in R. You can implement the procedure by “brute force” in very much the same way as you perform a Monte Carlo experiment. In this approach you program all parts of the bootstrapping procedure. Alternatively, you can use the R package boot which contains functions for automating certain parts of the bootstrapping procedure.
8.6.4.1 Brute force implementation
In the brute force implementation you progam all parts of the bootstrap procedure. This typically involves three steps:
- Sample with replacement from the original data using the R function
sample()
to create \{r_{1}^{*},r_{2}^{*},\ldots,r_{T}^{*}\}. Do this B times. - Compute B values of the statistic of interest \hat{\theta} from each bootstrap sample giving \{\hat{\theta}_{1}^{*},\ldots,\hat{\theta}_{B}^{*}\}. B=1,000 for routine applications of the bootstrap. To minimize simulation noise it may be required to have B=10,000 or higher.
- Compute the bootstrap bias and SE values from \{\hat{\theta}_{1}^{*},\ldots,\hat{\theta}_{B}^{*}\} using the R functions
mean()
andsd()
, respectively. - Compute the histogram and normal QQ-plot of \{\hat{\theta}_{1}^{*},\ldots,\hat{\theta}_{B}^{*}\} using the R functions
hist()
andqqnorm()
, respectively, to see if the bootstrap distribution looks normal.
Typically steps 1 and 2 are performed inside a for
loop or within
the R function apply()
.
In step 1 sampling with replacement for the original data is performed
with the R function sample()
. To illustrate, consider sampling
with replacement from the 5\times1 return vector:
c(0.1,0.05,-0.02,0.03,-0.04)
r = r
## [1] 0.10 0.05 -0.02 0.03 -0.04
Recall, in R we can extract elements from a vector by subsetting
based on their location, or index, in the vector. Since the vector r
has five elements, its index is the integer vector 1:5
. Then
to extract the first, second and fifth elements of r
use
the index vector c(1, 2, 5)
:
c(1, 2, 5)] r[
## [1] 0.10 0.05 -0.04
Using this idea, you can extract a random sample (of any given size)
with replacement from r
by creating a random sample with
replacement of the integers \{1,2,\ldots,5\} and using this set
of integers to extract the sample from r
. The R fucntion
sample()
can be used to do this process. When you pass a
positive integer value n
to sample()
, with the optional
argument replace=TRUE
, it returns a random sample from the
set of integers from 1 to n
. For example, to create a
random sample with replacement of size 5 from the integers\{1,2,\ldots,5\}
use:45
set.seed(123)
sample(5, replace=TRUE)
idx = idx
## [1] 3 3 2 2 3
We can then get a random sample with replacement from the vector r
by subsetting using the index vector idx
:
r[idx]
## [1] -0.02 -0.02 0.05 0.05 -0.02
This two step process is automated when you pass a vector of observations
to sample()
:
set.seed(123)
sample(r, replace=TRUE)
## [1] -0.02 -0.02 0.05 0.05 -0.02
Consider using the nonparametric bootstrap to compute estimates of the bias and standard error for \hat{\mu} in the GWN model for Microsoft. The R code for the brute force “for loop” to implement the nonparametric bootstrap is:
1000
B = rep(0, B)
muhat.boot = nrow(msftRetS)
n.obs =set.seed(123)
for (i in 1:B) {
sample(msftRetS, n.obs, replace=TRUE)
boot.data = mean(boot.data)
muhat.boot[i] = }
The bootstrap bias estimate is:
mean(muhat.boot) - muhatS
## [1] 0.000483
which is very close to zero and confirms that \hat{\mu} is unbiased. The bootstrap standard error estimate is
sd(muhat.boot)
## [1] 0.00796
and is equal (to three decimals) to the analytic standard error estimate computed earlier. This confirms that the nonparametric bootstrap accurately computes \widehat{\mathrm{se}}(\hat{\mu}). Figure 8.2 shows the histogram and normal QQ-plot of the bootstrap estimates created with:
par(mfrow=c(1,2))
hist(muhat.boot, col="cornflowerblue")
abline(v=muhatS, col="white", lwd=2)
qqnorm(muhat.boot, col="cornflowerblue")
qqline(muhat.boot)
The distribution of the bootstrap estimates looks normal. As a result, the bootstrap 95% confidence interval for \mu has the form:
sd(muhat.boot)
se.boot = muhatS + se.boot
lower = muhatS - se.boot
upper = cbind(muhatS, se.boot, lower, upper)
ans =colnames(ans)=c("Estimate", "Std Error", "2.5%", "97.5%")
rownames(ans) = "mu"
ans
## Estimate Std Error 2.5% 97.5%
## mu 0.00915 0.00796 0.0171 0.0012
\blacksquare
It is important to keep in mind that the bootstrap resampling algorithm draws random samples from the observed data. In the example above, the random number generator in R was initially set with set.seed(123)
. If the seed is set to a different integer, say 345, then the B=1,000 bootstrap samples will be different and we will get different numerical results for the bootstrap bias and estimated standard error. How different these results will be depends on number of bootstrap samples B and the statistic \hat{\theta} being bootstrapped. Generally, the smaller is B the more variable the bootstraps results will be. It is good practice to repeat the bootstrap analysis with several different random number seeds to see how variable the results are. If the results vary considerably for different random number seeds then it is advised to increase the number of bootstrap simulations B. It is common to use B=1,000 for routine calculations, and B=10,000 for final results.
8.6.4.2 R package boot
The R package boot implements a variety of bootstrapping
techniques including the basic non-parametric bootstrap described
above. The boot package was written to accompany the textbook Bootstrap Methods and Their Application by (Davison and Hinkley 1997).
The two main functions in boot are boot()
and
boot.ci()
, respectively. The boot()
function implements
the bootstrap for a statistic computed from a user-supplied function.
The boot.ci()
function computes bootstrap confidence intervals
given the output from the boot()
function.
The arguments to boot()
are:
library(boot)
args(boot)
## function (data, statistic, R, sim = "ordinary", stype = c("i",
## "f", "w"), strata = rep(1, n), L = NULL, m = 0, weights = NULL,
## ran.gen = function(d, p) d, mle = NULL, simple = FALSE, ...,
## parallel = c("no", "multicore", "snow"), ncpus = getOption("boot.ncpus",
## 1L), cl = NULL)
## NULL
where data
is a data object (typically a vector or matrix
but can be a time series object too), statistic
is a use-specified
function to compute the statistic of interest, and R
is the
number of bootstrap replications. The remaining arguments are not
important for the basic non-parametric bootstrap. The function assigned
to the argument statistic
has to be written in a specific
form, which is illustrated in the next example. The boot()
function returns an object of class boot
for which
there are print and plot methods.
The arguments to boot.ci()
are
args(boot.ci)
## function (boot.out, conf = 0.95, type = "all", index = 1L:min(2L,
## length(boot.out$t0)), var.t0 = NULL, var.t = NULL, t0 = NULL,
## t = NULL, L = NULL, h = function(t) t, hdot = function(t) rep(1,
## length(t)), hinv = function(t) t, ...)
## NULL
where boot.out
is an object of class boot
,
conf
specifies the confidence level, and type
is
a subset from c("norm", "basic", "stud", "perc", "bca")
indicating the type of confidence interval to compute.
The choices norm
, perc
, and “bca” compute the
normal confidence interval (8.16), the percentile
confidence interval (8.17), and the BCA confidence interval, respectively. The remaining arguments are not important for the computation of these
bootstrap confidence intervals.
To use the boot()
function to implement the bootstrap for
\hat{\mu}, a function must be specified to compute \hat{\mu}
for each bootstrap sample. The function must have two arguments: x
and idx
. Here, x
represents the original data and
idx
represents the random integer index (created internally
by boot()
) to subset x
for each bootstrap sample.
For example, a function to be passed to boot()
for \hat{\mu} is
function(x, idx) {
mean.boot =# arguments:
# x data to be resampled
# idx vector of scrambled indices created by boot() function
# value:
# ans mean value computed using resampled data
mean(x[idx])
ans =
ans }
To implement the nonparametric bootstrap for \hat{\mu} with 999 samples use
set.seed(123)
boot(msftRetS, statistic = mean.boot, R=999)
muhat.boot =class(muhat.boot)
## [1] "boot"
names(muhat.boot)
## [1] "t0" "t" "R" "data" "seed"
## [6] "statistic" "sim" "call" "stype" "strata"
## [11] "weights"
The returned object muhat.boot
is of class boot
.
The component t0
is the sample estimate \hat{\mu}, and
the component t is a 999\times1 matrix containing the bootstrap
estimates \{\hat{\mu}_{1}^{*},\ldots,\hat{\mu}_{999}^{*}\}. The
print method shows the sample estimate, the bootstrap bias and the
bootstrap standard error:
muhat.boot
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = msftRetS, statistic = mean.boot, R = 999)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 0.00915 0.000482 0.00758
These statistics can be computed directly from the components of muhat.boot
c(muhat.boot$t0,mean(muhat.boot$t) - muhat.boot$t0, sd(muhat.boot$t))
## [1] 0.009153 0.000482 0.007584
A histogram and nornal QQ-plot of the bootstrap values, shown in Figure 8.3, can be created using the plot method
plot(muhat.boot)
Normal, percentile, and bias-adjusted (bca) percentile 95% confidence intervals can be computed using
boot.ci(muhat.boot, conf=0.95, type=c("norm", "perc", "bca"))
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 999 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = muhat.boot, conf = 0.95, type = c("norm",
## "perc", "bca"))
##
## Intervals :
## Level Normal Percentile BCa
## 95% (-0.0062, 0.0235 ) (-0.0057, 0.0245 ) (-0.0062, 0.0239 )
## Calculations and Intervals on Original Scale
Because the bootstrap distribution looks normal, the normal, percentile, and bca confidence intervals are very similar.
The GWN model estimate \hat{\sigma} can be “bootstrapped” in a similar fashion. First, we write a function to compute \hat{\sigma} for each bootstrap sample
function(x, idx) {
sd.boot = sd(x[idx])
ans =
ans }
Then we call boot()
with statistic=sd.boot
set.seed(123)
boot(msftRetS, statistic = sd.boot, R=999)
sigmahat.boot = sigmahat.boot
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = msftRetS, statistic = sd.boot, R = 999)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 0.101 -0.000783 0.00764
The bootstrap distribution, shown in Figure 8.4, looks a bit non-normal. The 95% confidences intervals are computed using
boot.ci(sigmahat.boot, conf=0.95, type=c("norm", "perc", "bca"))
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 999 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = sigmahat.boot, conf = 0.95, type = c("norm",
## "perc", "bca"))
##
## Intervals :
## Level Normal Percentile BCa
## 95% ( 0.0873, 0.1173 ) ( 0.0864, 0.1162 ) ( 0.0892, 0.1200 )
## Calculations and Intervals on Original Scale
## Some BCa intervals may be unstable
We see that all of these intervals are quite similar.
\blacksquare
The real power of the bootstrap procedure comes when we apply it to
plug-in estimates of functions like (8.1)-(8.4) for which there
are no easy analytic formulas for estimating bias and standard errors. In this situation, the bootstrap procedure easily computes numerical estimates of the bias,
standard error, and 95% confidence interval. All we need is an R function for computing
the plug-in function estimate in a form suitable for boot()
. The R functions for the example functions are:
function(x, idx, alpha=0.05) {
f1.boot = mean(x[idx]) + sd(x[idx])*qnorm(alpha)
q =
q
}
function(x, idx, alpha=0.05, w0=100000) {
f2.boot = mean(x[idx]) + sd(x[idx])*qnorm(alpha)
q = -w0*q
VaR =
VaR
}
function(x, idx, alpha=0.05, w0=100000) {
f3.boot = mean(x[idx]) + sd(x[idx])*qnorm(alpha)
q = -(exp(q) - 1)*w0
VaR =
VaR
}
function(x, idx, r.f=0.0025) {
f4.boot = (mean(x[idx]) - r.f)/sd(x[idx])
SR =
SR }
The functions f_1, f_2, and f_3 have the additional argument alpha
which specifies the tail probability for the quantile, and the functions f_2, and f_3 have the additional argument w0
for the initial wealth invested. The function f_4 has the additional argument r.f
for the risk-free rate.
To compute the nonparametric bootstrap for f_1 with \alpha=0.05 use:
set.seed(123)
boot(msftRetS, statistic = f1.boot, R=999)
f1.bs = f1.bs
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = msftRetS, statistic = f1.boot, R = 999)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* -0.158 0.00177 0.0124
Here, the bootstrap estimate of bias is small and the bootstrap estimated standard error is similar to the delta method and jackknife standard errors.
plot(f1.bs)
The bootstrap distribution looks normal, and the three methods for computing 95% confidence intervals are essentially the same:
boot.ci(f1.bs, type=c("norm", "perc", "bca"))
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 999 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = f1.bs, type = c("norm", "perc", "bca"))
##
## Intervals :
## Level Normal Percentile BCa
## 95% (-0.184, -0.135 ) (-0.181, -0.132 ) (-0.189, -0.138 )
## Calculations and Intervals on Original Scale
## Some BCa intervals may be unstable
To compute the nonparametric bootstrap for f_2 with \alpha=0.05 and W_0=\$100,000 use:
set.seed(123)
boot(msftRetS, statistic = f2.boot, R=999)
f2.bs = f2.bs
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = msftRetS, statistic = f2.boot, R = 999)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 15780 -177 1244
The bootstrap estimate of bias is small and the bootstrap estimated standard error is similar to the delta method and jackknife standard errors.
plot(f2.bs)
The bootstrap distribution looks normal, and the three bootstrap 95% confidence intervals are all very similar:
boot.ci(f2.bs, type=c("norm", "perc", "bca"))
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 999 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = f2.bs, type = c("norm", "perc", "bca"))
##
## Intervals :
## Level Normal Percentile BCa
## 95% (13518, 18395 ) (13161, 18089 ) (13801, 18940 )
## Calculations and Intervals on Original Scale
## Some BCa intervals may be unstable
To compute the nonparametric bootstrap for f_3 with \alpha=0.05 and W_0=\$100,000 use:
set.seed(123)
boot(msftRetC, statistic = f3.boot, R=999)
f3.bs = f3.bs
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = msftRetC, statistic = f3.boot, R = 999)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 14846 -182 1223
The bias is small and the bootstrap estimated standard error is larger than the delta method standard error and is closer to the jackknife standard error.
plot(f3.bs)
The bootstrap distribution looks a little asymmetric and the BCA confidence interval is a bit different from the normal and percentile intervals:
boot.ci(f3.bs, type=c("norm", "perc", "bca"))
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 999 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = f3.bs, type = c("norm", "perc", "bca"))
##
## Intervals :
## Level Normal Percentile BCa
## 95% (12631, 17426 ) (12295, 17155 ) (12994, 18061 )
## Calculations and Intervals on Original Scale
## Some BCa intervals may be unstable
Finally, to compute the bootstrap for f_4 with r_f = 0.0025 use:
set.seed(123)
boot(msftRetS, statistic = f4.boot, R=999)
f4.bs = f4.bs
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = msftRetS, statistic = f4.boot, R = 999)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 0.0655 0.00391 0.0739
The bootstrap estimated bias is small and the standard error is similar to the delta method and the jackknife.
plot(f4.bs)
Here, the bootstrap distribution is slightly asymmetric.
boot.ci(f4.bs, conf=0.95, type=c("norm", "perc", "bca"))
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 999 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = f4.bs, conf = 0.95, type = c("norm", "perc",
## "bca"))
##
## Intervals :
## Level Normal Percentile BCa
## 95% (-0.0832, 0.2065 ) (-0.0821, 0.2132 ) (-0.0897, 0.2030 )
## Calculations and Intervals on Original Scale
Notice that the Percentile and BCA confidence intervals are a bit different from the normal approximation interval.
\blacksquare
8.6.5 Pros and cons of the nonparametric bootstrap
The nonparametric bootstrap is extremely useful and powerful statistical technique. The main advantages (pros) are:
General procedure to estimate bias and standard errors, and to compute confidence intervals, that does not rely on asymptotic distributions.
Can be used for scalar and vector estimates.
Under general conditions, bootstrap distributions of estimators converge to the asymptotic distribution of the estimators.
In certain situations, bootstrap results can be more accurate than analytic results computed from asymptotic distributions.
There are some things to watch out for when using the bootstrap (cons):
It be computational intensive but this is really not much of an issue with today’s fast computers.
With small B, bootstrap results can vary substantially across simulations with different random number seeds.
There are situations where the bootstrap does not work. A leading case is when the bootstrap is applied to a function that can be become unbounded (e.g. a ratio of means when the denominator mean is close to zero). In these cases, a trimmed version of the bootstrap is recommended. See Hansen (2011) for a discussion of this issue.
You have to be careful applying the bootstrap to time series data. If there is serial correlation in the data then the nonparametric bootstrap may give poor results. There are versions of the bootstrap (e.g. the block bootstrap) that work with serial correlated data. For example, the tseries function
tsbootstrap()
is designed to to work with serially correlated time series data.
8.7 Using Monte Carlo Simulation to Evaluate Delta Method, Jackknife and Bootstrap Standard Errors
we have studied three ways of computing standard errors for functions of GWN model parameters
Give similar resuts for the example functions
What are the preferred methods in practice?
Which methods are most accurate?
Use MC Simulation to evaluate the three methods in stylized model in which the GWN model is true. MC can give the best numerical standard error with which we can compare to the delta method, jackknife and bootstrap standard errors.
We can evaluate the statistical properties of f_j(\hat{\theta}), for j=1,\ldots,4, by Monte Carlo simulation in the same way that we evaluated the statistical properties of \hat{\mu} in Chapter 7.We use the simulation model (7.41) and N=1000 simulated samples of size T=100 to compute the estimates \{\left(f_{j}(\hat{\theta})\right)^{1},\ldots,\left(f_j(\hat{\theta})\right)^{1000}\} The true values for f_j(\theta) are: \begin{align*} f_1(\theta) & = 0.05+(0.10)\times(-1.645)=-0.114,\\ f_2(\theta) & = - \$100,000 \times f_1(\theta) = \$11,449, \\ f_3(\theta) & = \$100,000 \times \left(\exp(f_1(\theta)) -1 \right) = \$10,818, \\ f_4(\theta) & = (0.05 - 0.0025)/0.10 = 0.475. \end{align*}
The R code for the simulation loop is:
0.05
mu = 0.10
sigma = 100000
W0 = 0.0025
r.f = mu + sigma*qnorm(0.05)
f1 = -W0*f1
f2 = -W0*(exp(f1) - 1)
f3 = (mu - r.f)/sigma
f4 = 172
n.obs = 1000
n.sim =set.seed(111)
matrix(0,n.sim,4)
sim.f = matrix(0, n.sim, 4)
dm.f = matrix(0, n.sim, 4)
jack.f = matrix(0, n.sim, 4)
boot.f = matrix(0, n.sim, 4)
boot.bias =colnames(sim.f) = colnames(dm.f) = colnames(jack.f) = c("f1", "f2","f3","f4")
colnames(boot.f) = colnames(boot.bias) = c("f1", "f2","f3","f4")
for (sim in 1:n.sim) {
rnorm(n.obs,mean=mu,sd=sigma)
sim.ret = mean(sim.ret)
muhat = sd(sim.ret)
sigmahat ="f1"] = muhat + sigmahat*qnorm(0.05)
sim.f[sim, "f2"] = -W0*sim.f[sim, "f1"]
sim.f[sim, "f3"] = -W0*(exp(sim.f[sim, "f1"]) - 1)
sim.f[sim, "f4"] = (muhat - r.f)/sigmahat
sim.f[sim, # delta method se
"f1"] = (sigmahat/sqrt(n.obs))*sqrt(1 + 0.5*qnorm(0.05)^2)
dm.f[sim, "f2"] = W0*dm.f[sim, "f1"]
dm.f[sim, "f3"] = W0*exp(sim.f[sim, "f1"])*dm.f[sim, "f1"]
dm.f[sim, "f4"] = sqrt((1/n.obs)*(1 + 0.5*(sim.f[sim, "f4"]^2)))
dm.f[sim, # jackknife se
jackknife(1:n.obs, theta=theta.f1, as.matrix(sim.ret))
f1.jack ="f1"] = f1.jack$jack.se
jack.f[sim, jackknife(1:n.obs, theta=theta.f2, as.matrix(sim.ret), W0)
f2.jack ="f2"] = f2.jack$jack.se
jack.f[sim, jackknife(1:n.obs, theta=theta.f3, as.matrix(sim.ret), W0)
f3.jack ="f3"] = f3.jack$jack.se
jack.f[sim, jackknife(1:n.obs, theta=theta.f4, as.matrix(sim.ret), r.f)
f4.jack ="f4"] = f4.jack$jack.se
jack.f[sim, # bootstrap se
boot(sim.ret, statistic = f1.boot, R=999)
f1.bs ="f1"] = sd(f1.bs$t)
boot.f[sim, boot(sim.ret, statistic = f2.boot, R=999)
f2.bs ="f2"] = sd(f2.bs$t)
boot.f[sim, boot(sim.ret, statistic = f3.boot, R=999)
f3.bs ="f3"] = sd(f3.bs$t)
boot.f[sim, boot(sim.ret, statistic = f4.boot, R=999)
f4.bs ="f4"] = sd(f4.bs$t)
boot.f[sim, # bootstrap bias
"f1"] = summary(f1.bs)$bootBias
boot.bias[sim, "f2"] = summary(f2.bs)$bootBias
boot.bias[sim, "f3"] = summary(f3.bs)$bootBias
boot.bias[sim, "f4"] = summary(f4.bs)$bootBias
boot.bias[sim, }
The histograms of f_j(\hat{\theta}) values are displayed in Figure 8.5. The dotted white lines indicate the true values, f_j(\theta).
The histograms for the f_j(\hat{\theta}) values are bell-shaped for j=1,2,3 and the histogram for j=4 is slightly right skewed. All of the histograms are centered very close to the respective true values. Recall, the standard deviation of the Monte Carlo estimates (spread of the histograms about true value) approximates the true standard error of the plug-in estimates.
The Monte Carlo estimates of bias are:
colMeans(sim.f) - c(f1, f2, f3, f4)
mc.bias =round(mc.bias, digits=3)
## f1 f2 f3 f4
## 0.001 -72.910 -70.920 0.007
which confirm that f_j are essentially unbiased estimators. It is often useful to also look at the bias relative to the true value to eliminate units:
/c(f1, f2, f3, f4) mc.bias
## f1 f2 f3 f4
## -0.00637 -0.00637 -0.00656 0.01454
In relative terms, the largest bias is 1.45%, which occurs for the estimate of the Sharpe ratio.
The average bootstrap estimate of bias is:
colMeans(boot.bias)
boot.bias.vals =round(boot.bias.vals, digits=3)
## f1 f2 f3 f4
## 0.001 -71.824 -70.213 0.003
The bootstrap bias values are very close to the Monte Carlo estimates of bias which indicates that the bootstrap estimate of bias is very accruate in this context.
The sample standard deviations of the Monte Carlo estimates approximate the true standard errors of the estimated functions:
apply(sim.f, 2, sd)
mc.se =round(mc.se, digits=3)
## f1 f2 f3 f4
## 0.011 1147.869 1024.379 0.078
The sample means of the delta method, jackknife, and bootstrap standard errors give the average performance of these estimated standard errors. It is informative to compare these average standard errors to the Monte Carlo standard errors:
colMeans(dm.f)
dm.se = colMeans(jack.f)
jack.se = colMeans(boot.f)
boot.se = rbind(dm.se/mc.se, jack.se/mc.se, boot.se/mc.se)
ans =rownames(ans) = c("Delta Method", "Jackknife", "Bootstrap")
ans
## f1 f2 f3 f4
## Delta Method 1.02 1.02 1.02 1.03
## Jackknife 1.01 1.01 1.01 1.04
## Bootstrap 1.00 1.00 1.00 1.04
In the table above, a value of 1 means the estimated standard error is the same as the Monte Carlo standard error (on average), and a value greater than 1 means that the estimated standard error is larger than the Monte Carlo standard error (on average). Here, we see that all of the estimated standard errors are very close to the Monte Carlo standard errors, and overall the bootstrap estimated standard errors are the closest. The worst performance is for f_4(\hat{\theta}) where the estimated standard errors are about 3% to 4% larger than the Monte Carlo standard errors.
It is also useful to look at the standard deviations of the estimated standard errors over the Monte Carlo simulations. This tells us how variable the estimated standard errors are across different simulations:
apply(dm.f,2,sd)
dm.sd = apply(jack.f,2,sd)
jack.sd = apply(boot.f,2,sd)
boot.sd = rbind(dm.sd, jack.sd, boot.sd)
ans =rownames(ans) = c("Delta Method", "Jackknife", "Bootstrap")
ans
## f1 f2 f3 f4
## Delta Method 0.000654 65.4 49.8 0.00137
## Jackknife 0.001158 115.8 98.7 0.00365
## Bootstrap 0.001163 115.2 98.5 0.00406
Here, we see that, across the simulations, the delta method is the least variable and the jackknife and the bootstrap have similar variability that is about twice as large as the delta method variability. The variability of the bootstrap can be reduced by increasing the number of bootstrap simulations (here it is 1000).
Overall, the Monte Carlo experiment confirms that the delta method, jackknife and bootstrap all produce accurate standard error estimates for the example plug-in functions.
8.8 Further Reading: Delta Method and Resampling
Chernick and LaBudde (2011) give a comprehensive overview of the bootstrap with applications in R.
8.9 Problems: Delta Method and Resampling
- Look at determinants of se of Sharpe ratio. See discussion from Lo’s paper
- Look at the formula for the covariance between two Sharpe ratios
Illustrate the transformation preserving property of percentile interval using the example of computing the confidence interval for \sigma based on the percentile interval for \sigma^2. Show that asymptotic confidence interval is not transformation preserving.
Repeat the bootstrap simulations from Example 8.16 using 5 different random number seeds: 124, 125, 126, 127 and 128. Make a table to compare the bootstrap estimated standard errors for the example functions across the 5 bootstrap simulations. Briefly comment on your results.
Repeat the Monte Carlo simulation in 8.7 using 1,000 simulated samples and 10,000 bootstrap simulations for the bootstrap calculations. That is, set R=10000 in the call to boot()
inside the Monte Carlo simulation loop.
How does the increase in simulations impact the bootstrap bias estimates?
How does the increase in simulations impact the bootstrap standard error estimates?
References
Davison, A., and D. Hinkley. 1997. Bootstrap Methods and Their Application. Cambridge: Cambridge University Press.
Recall, the intuitive definition of continuity is that you can draw the function on paper without lifting up your pen.↩︎
The 95% confidence interval computed in
deltaMethod
uses 1.96 instead of the rule-of-thumb value of 2 used in the analytic calculations.↩︎the scaling of the standard deviation of \hat{\mu}_{-t} is approximately \sqrt{T}.↩︎
A jackknife is also the name of a very useful folding knife.↩︎
We can also bootstrap data from multiple assets by defining r_t as a vector of asset returns.↩︎
The bootstrap percentile-t confidence interval is another bias and skewness adjusted confidence interval.↩︎
We use
set.seed(123)
to initialize the random number generator in R and allow for reproducible results.↩︎