8.3 Asymptotic normality
Let θ denote a scalar parameter of the GWN model for returns and let ˆθ denote the plug-in estimator of θ. In Chapter 7 we stated that ˆθ is asymptotically normally distributed
ˆθ∼N(θ,^se(ˆθ)2),
for large enough T, where ^se(ˆθ) denotes the estimated standard error of ˆθ. This result is justified by the CLT. Let f(θ) denote a real-valued continuous and differentiable scalar function of θ. Using a technique called the delta method, we can show that the plug-in estimator f(ˆθ) is also asymptotically normally distributed:
f(ˆθ)∼N(f(θ),^se(f(ˆθ))2),
for large enough T, where ^se(f(ˆθ)) is the estimated standard error for f(ˆθ). The delta method shows us how to compute ^se(f(ˆθ))2:
^se(f(ˆθ))2=f′(ˆθ)2×^se(ˆθ)2,
where f′(ˆθ) denotes the derivative of f(θ) evaluated at ˆθ. Then,
^se(f(ˆθ))=√f′(ˆθ)2×^se(ˆθ)2.
In Chapter 7 we stated that, for large enough T,
ˆσ2∼N(σ2,se(ˆσ2)2), ˆσ∼N(σ,se(ˆσ)2),
where se(σ2)2=2σ4/T and se(σ)2=σ2/2T. The formula for se(ˆσ)2 is the result of the delta method for the function f(σ2)=√σ2=σ applied to the asymptotic distribution of ˆσ2. To see this, first note that by the chain rule
f′(σ2)=12(σ2)−1/2,
so that f′(σ2)2=14(σ2)−1. Then,
se(f(ˆθ))2=f′(σ2)2×se(ˆσ2)2=14(σ2)−1×2σ4/T=12σ2/T=σ2/2T=se(ˆσ)2.
The estimated squared standard error replaces σ2 with its estimate ˆσ2 and is given by ^se(ˆσ2)2=ˆσ2/2T.
◼
Now suppose θ is a k×1 vector of GWN model parameters, and f:Rk→R is a continuous and differentiable function of θ. Define the k×1 gradient function g(θ) as:
g(θ)=∂f(θ)∂θ=(∂f(θ)∂θ1⋮∂f(θ)∂θk).
Assume that ˆθ is asymptotically normally distributed:
ˆθ∼N(θ,^var(ˆθ)),
where ^var(ˆθ) is the k×k estimated variance-covariance matrix of ˆθ. Then the delta method shows us that f(ˆθ) is asymptotically normally distributed:
f(ˆθ)∼N(f(θ),^se(f(ˆθ))2),
where
^se(f(ˆθ))2=g(ˆθ)′^var(ˆθ)g(ˆθ).
Let θ=(μ,σ)′ and consider f1(θ)=qRα=μ+σqZα (normal simple return quantile). The gradient function g1(θ) for f1(θ) is:
g1(θ)=∂f1(θ)∂θ=(∂f1(θ)∂μ∂f1(θ)∂σ)=(1qZα).
Regarding var(ˆθ), recall from Chapter 7, in the GWN model: (ˆμˆσ2)∼N((μσ2),(se(ˆμ)200se(ˆσ2)2))
for large enough T. We show below, using the delta method, that
(ˆμˆσ)∼N((μσ),(se(ˆμ)200se(ˆσ)2))
Hence,
var(ˆθ)=(se(ˆμ)200se(ˆσ)2)=(σ2T00σ22T)
Then,
g1(θ)′var(ˆθ)g1(θ)=(1qZα)(se(ˆμ)200se(ˆσ)2)(1qZα)=se(ˆμ)2+(qZα)2se(ˆσ)2=σ2T+(qZα)2σ22T=σ2T[1+12(qZα)2]=se(ˆqZα)2.
The standard error for ˆqZα is: se(ˆqRα)=√var(ˆqRα)=σ√T[1+12(qZα)2]1/2.
The formula (8.5) shows that se(ˆqRα) increases with σ and qZα, and decreases with T. In addition, for fixed T, se(ˆqRα)→∞ as α→0 or as α→1 because qZα→−∞ as α→0 and qZα→∞ as α→1. Hence, we have very large estimation error in ˆqRα when α is very close to zero or one (for fixed values of σ and T). This makes intuitive sense as we have very few observations in the extreme tails of the empirical distribution of the data for a fixed sample size and so we expect a larger estimation error.
Replacing σ with its estimate ˆσ gives the estimated standard error for ˆqRα:
^se(ˆqRα)=ˆσ√T[1+12(qZα)2]1/2.
Using the above results, the sampling distribution of ˆqRα can be approximated by the normal distribution: ˆqRα∼N(qRα, ^se(ˆqRα)2), for large enough T.
The estimates of qRα, se(ˆqRα), and 95% confidence intervals, for α=0.05,0.01 and 0.001, from the simple monthly returns for Microsoft are:
.05 = muhatS + sigmahatS*qnorm(0.05)
qhat.01 = muhatS + sigmahatS*qnorm(0.01)
qhat.001 = muhatS + sigmahatS*qnorm(0.001)
qhat.05 = (sigmahatS/sqrt(n.obs))*sqrt(1 + 0.5*qnorm(0.05)^2)
seQhat.01 = (sigmahatS/sqrt(n.obs))*sqrt(1 + 0.5*qnorm(0.01)^2)
seQhat.001 = (sigmahatS/sqrt(n.obs))*sqrt(1 + 0.5*qnorm(0.001)^2)
seQhat.05 = qhat.05 - 2*seQhat.05
lowerQhat.05 = qhat.05 + 2*seQhat.05
upperQhat.01 = qhat.01 - 2*seQhat.01
lowerQhat.01 = qhat.01 + 2*seQhat.01
upperQhat.001 = qhat.001 - 2*seQhat.001
lowerQhat.001 = qhat.001 + 2*seQhat.001
upperQhat cbind(c(qhat.05, qhat.01, qhat.001),
ans =c(seQhat.05, seQhat.01, seQhat.001),
c(lowerQhat.05, lowerQhat.01, lowerQhat.001),
c(upperQhat.05, upperQhat.01, upperQhat.001))
colnames(ans) = c("Estimates", "Std Errors", "2.5%", "97.5%")
rownames(ans) = c("q.05", "q.01", "q.001")
ans
## Estimates Std Errors 2.5% 97.5%
## q.05 -0.158 0.0119 -0.182 -0.134
## q.01 -0.227 0.0149 -0.257 -0.197
## q.001 -0.304 0.0186 -0.342 -0.267
For Microsoft the values of ^se(ˆqRα) are about 1.2%, 1.5%, and 1.9% for α=0.05, α=0.01 and α=0.001, respectively. The 95% confidence interval widths increase with decreases in α indicating more uncertainty about the true value of qRα for smaller values of α. For example, the 95% confidence for interval qR0.01 indicates that the true loss with 1% probability (one month out of 100 months) is between 19.7% and 25.7%.
◼
Let θ=(μ,σ)′ and consider the functions f2(θ)=VaRNα=−W0(μ+σqZα) (normal VaR), f3(θ)=VaRLNα=−W0(exp((μ+σqZα))−1) (log-normal VaR), and f4(θ)=SR=μ−rfσ (Sharpe Ratio). It is straightforward to show (see end-of-chapter exercises) that the gradient functions are:
g2(θ)=∂f2(θ)∂θ=(−W0−W0qZα),g3(θ)=∂f3(θ)∂θ=(−W0exp(μ+σqZα)−W0exp(μ+σqZα)qZα),g4(θ)=∂f4(θ)∂θ=(−1σ−(σ)−2(μ−rf)).
The delta method variances are:
g2(θ)′var(ˆθ)g2(θ)=W20se(ˆqRα)2=se(^VaRNα)2g3(θ)′var(ˆθ)g3(θ)=W20exp(qRα)2se(ˆqRα)2=se(^VaRLNα)2g4(θ)′var(ˆθ)g4(θ)=1T(1+12SR2)=se(^SR)2.
The corresponding standard errors are:
se(^VaRNα)=W0se(ˆqRα)se(^VaRLNα)=W0exp(qRα)se(ˆqRα)se(^SR)=√1T(1+12SR2),
where
se(ˆqZα)=√σ2T[1+12(qZα)2].
We make the following remarks:
se(^VaRNα) and se(^VaRLNα) increase with W0, σ, and qZα, and decrease with T.
For fixed values of W0, σ, and T, se(^VaRNα) and se(^VaRLNα) become very large as α approaches zero. That is, for very small loss probabilities we will have very bad estimates of VaRNα and VaRLNα.
se(SR) increases with SR2 and decreases with T.
The practically useful estimated standard errors replace the unknown values of qRα and SR with their estimated values ˆqRα and ^SR and are given by:
^se(^VaRNα)=W0^se(ˆqRα)^se(^VaRLNα)=W0exp(ˆqRα)^se(ˆqRα)^se(^SR)=√1T(1+12^SR2)
◼
Consider a $100,000 investment for one month in Microsoft. The estimates of VaRNα, its standard error, and 95% confidence intervals, for α=0.05 and α=0.01 are:
100000
W0=.05 = muhatS + sigmahatS*qnorm(0.05)
qhat.01 = muhatS + sigmahatS*qnorm(0.01)
qhat.05 = -qhat.05*W0
VaR.N.05 = W0*seQhat.05
seVaR.N.01 = -qhat.01*W0
VaR.N.01 = W0*seQhat.01
seVaR.N.05 = VaR.N.05 - 2*seVaR.N.05
lowerVaR.N.05 = VaR.N.05 + 2*seVaR.N.05
upperVaR.N.01 = VaR.N.01 - 2*seVaR.N.01
lowerVaR.N.01 = VaR.N.01 + 2*seVaR.N.01
upperVaR.N cbind(c(VaR.N.05, VaR.N.01),
ans =c(seVaR.N.05, seVaR.N.01),
c(lowerVaR.N.05, lowerVaR.N.01),
c(upperVaR.N.05,upperVaR.N.01))
colnames(ans) = c("Estimate", "Std Error", "2.5%", "97.5%")
rownames(ans) = c("VaR.N.05", "VaR.N.01")
ans
## Estimate Std Error 2.5% 97.5%
## VaR.N.05 15780 1187 13405 18154
## VaR.N.01 22696 1490 19717 25676
The estimated values of VaRN.05 and VaRN.01 are $15,780 and $22,696, respectively, with estimated standard errors of $1,172 and $1,471. The estimated standard errors are fairly small compared with the estimated VaR values, so we can say that VaR is estimated fairly precisely here. Notice that the estimated standard error for VaR.01 is about 26% larger than the estimated standard error for VaRN.05 indicating we have less precision for estimating VaR for very small values of α.
The estimates of VaRLNα and its standard error for α=0.05 and α=0.01 are:
.05 = muhatC + sigmahatC*qnorm(0.05)
qhat.01 = muhatC + sigmahatC*qnorm(0.01)
qhat.05 = (sigmahatC/sqrt(n.obs))*sqrt(1 + 0.5*qnorm(0.05)^2)
seQhat.01 = (sigmahatC/sqrt(n.obs))*sqrt(1 + 0.5*qnorm(0.01)^2)
seQhat.05 = -W0*(exp(qhat.05)-1)
VaR.LN.05 = W0*exp(qhat.05)*seQhat.05
seVaR.LN.01 = -W0*(exp(qhat.01)-1)
VaR.LN.01 = W0*exp(qhat.01)*seQhat.01
seVaR.LN.05 = VaR.LN.05 - 2*seVaR.LN.05
lowerVaR.LN.05 = VaR.LN.05 + 2*seVaR.LN.05
upperVaR.LN.01 = VaR.LN.01 - 2*seVaR.LN.01
lowerVaR.LN.01 = VaR.LN.01 + 2*seVaR.LN.01
upperVaR.LN cbind(c(VaR.LN.05, VaR.LN.01),
ans =c(seVaR.LN.05, seVaR.LN.01),
c(lowerVaR.LN.05, lowerVaR.LN.01),
c(upperVaR.LN.05,upperVaR.LN.01))
colnames(ans) = c("Estimate", "Std Error", "2.5%", "97.5%")
rownames(ans) = c("VaR.LN.05", "VaR.LN.01")
ans
## Estimate Std Error 2.5% 97.5%
## VaR.LN.05 14846 998 12850 16842
## VaR.LN.01 20467 1170 18128 22807
The estimates of VaRLNα (and its standard error) are slightly smaller than the estimates of VaRNα (and its standard error) due to the positive skewness in the log-normal distribution.
◼
The estimated monthly SR for Microsoft (using rf=0.0025), its standard error and 95% confidence interval are computed using:
(muhatS - r.f)
SRhat = sqrt((1/n.obs)*(1 + 0.5*(SRhat^2)))
seSRhat = SRhat - 2*seSRhat
lowerSR = SRhat + 2*seSRhat
upperSR = c(SRhat, seSRhat, lowerSR, upperSR)
ans =names(ans) = c("Estimate", "Std Error", "2.5%", "97.5%")
ans
## Estimate Std Error 2.5% 97.5%
## 0.00665 0.07625 -0.14585 0.15915
The estimated monthly SR for Microsoft is close to zero, with a large estimated standard error and a 95% confidence interval containing both negative and positive values. Clearly, the SR is not estimated very well.
◼
8.3.1 The numerical delta method
The delta method is an advanced statistical technique and requires a bit of calculus to implement, especially for nonlinear vector-valued functions. It can be easy to make mistakes when working out the math! Fortunately, in R there is an easy way to implement the delta method that does not require you to work out the math. The car function deltaMethod
implements the delta method numerically, utilizing the stats function D()
which implements symbolic derivatives. The arguments expected by deltaMethod
are:
args(car:::deltaMethod.default)
## function (object, g., vcov., func = g., constants, level = 0.95,
## rhs = NULL, ..., envir = parent.frame())
## NULL
Here, object
is the vector of named estimated parameters, ˆθ, g.
is a quoted string that is the function of the parameter estimates to be evaluated, f(θ), and vcov
is the named estimated covariance matrix of the coefficient estimates, ^var(ˆθ). The function returns an object of class “deltaMethod” for which there is a print
method. The “deltaMethod” object is essentially a data.frame
with columns giving the parameter estimate(s), the delta method estimated standard error(s), and lower and upper confidence limits. The following example illustrates how to use deltaMethod
for the example functions.
We first create the named inputs ˆθ and ^var(ˆθ) from the GWN model for simple returns:
c(muhatS, sigmahatS)
thetahatS =names(thetahatS) = c("mu", "sigma")
matrix(c(se.muhatS^2,0,0,se.sigmahatS^2), 2, 2, byrow=TRUE)
var.thetahatS =rownames(var.thetahatS) = colnames(var.thetahatS) = names(thetahatS)
thetahatS
## mu sigma
## 0.00915 0.10150
var.thetahatS
## mu sigma
## mu 5.99e-05 0.00e+00
## sigma 0.00e+00 2.99e-05
We do the same for the inputs from the GWN model for continuously compounded returns:
c(muhatC, sigmahatC)
thetahatC =names(thetahatC) = c("mu", "sigma")
matrix(c(se.muhatC^2,0,0,se.sigmahatC^2), 2, 2, byrow=TRUE)
var.thetahatC =rownames(var.thetahatC) = colnames(var.thetahatC) = names(thetahatC)
It is important to name the elements in thetahat
and var.thetahat
as these names will be used in defining f(θ). To implement the numerical delta method for the normal quantile, we need to supply a quoted string specifying the function f1(θ)=μ+σqZα. Here we use the string "mu+sigma*q.05"
, where q.05 = qnorm(0.05)
. For, α=0.05 the R code is:
.05 = qnorm(0.05)
q deltaMethod(thetahatS, "mu+sigma*q.05", var.thetahatS)
dm1 =class(dm1)
## [1] "deltaMethod" "data.frame"
The returned object dm1
is of class “deltaMethod”, inhereting from “data.frame”, and has named columns:
names(dm1)
## [1] "Estimate" "SE" "2.5 %" "97.5 %"
The print
method shows the results of applying the delta method:
dm1
## Estimate SE 2.5 % 97.5 %
## mu + sigma * q.05 -0.1578 0.0119 -0.1811 -0.13
The results match the analytic calculations performed above.40
The delta method for the normal VaR function f2(θ) is:
deltaMethod(thetahatS, "-W0*(mu+sigma*q.05)", var.thetahatS)
dm2 = dm2
## Estimate SE 2.5 % 97.5 %
## -W0 * (mu + sigma * q.05) 15780 1187 13453 18106
The delta method for the log-normal VaR function f3(θ) is:
deltaMethod(thetahatC, "-W0*(exp(mu+sigma*q.05)-1)", var.thetahatC)
dm3 = dm3
## Estimate SE 2.5 % 97.5 %
## -W0 * (exp(mu + sigma * q.05) - 1) 14846 998 12890 16802
The delta method for the Sharpe ratio function f4(θ) is
deltaMethod(thetahatS, "(mu-r.f)/sigma", var.thetahatS)
dm4 = dm4
## Estimate SE 2.5 % 97.5 %
## (mu - r.f)/sigma 0.0655 0.0763 -0.0841 0.22
◼
8.3.2 the delta method for vector valued functions (advanced)
Let θ denote a k×1 vector of GWN model parameters. For example, θ=(μ,σ2)′. Assume that ˆθ is asymptotically normally distributed: ˆθ∼N(θ,^var(θ)), for large enough T where ^var(θ) is the k×k variance-covariance matrix of ˆθ. Sometimes we need to consider the asymptotic distribution of a vector-valued function of θ. Let f:Rk→Rj where j≤k be a continuous and differentiable function, and define the j×1 vector η=f(θ). It is useful to express η as:
η=(η1η2⋮ηj)=(f1(θ)f2(θ)⋮fj(θ)).
For the first example let θ=(σ21,σ22)′, and define the 2×1 vector of volatilities: η=f(θ)=(σ1,σ2)′. Then, η1=f1(θ)=√σ22 and η2=f2(θ)=√σ22.
For the second example, let θ = (μ1,μ2,σ1,σ2)′ and define the 2×1 vector of Sharpe ratios:
η=f(θ)=(μ1−rfσ1μ1−rfσ1)=(SR1SR2),
where rf denotes the constant risk-free rate. Then η1=f1(θ)=μ1−rfσ1=SR1, η2=f2(θ)=μ2−rfσ2=SR2.
◼
Define the j×k Jacobian matrix: G(θ)′=(∂f1(θ)∂θ1∂f1(θ)∂θ2⋯∂f1(θ)∂θk∂f2(θ)∂θ1∂f2(θ)∂θ2⋯∂f2(θ)∂θk⋮⋮…⋮∂fj(θ)∂θ1∂fj(θ)∂θ2⋯∂fj(θ)∂θk)
Then, by the delta method, the asymptotic normal distribution of ˆη=f(ˆθ) is: ˆη∼N(η,^var(ˆη)), where ^var(ˆη)=G(ˆθ)′^var(ˆθ)G(ˆθ).
Consider the first example function where θ=(σ21,σ22)′ and η=f(θ)=(σ1,σ2)′. Now, using the chain-rule:
∂f1(θ)∂σ21=12σ−11, ∂f1(θ)∂σ22=0∂f2(θ)∂σ21=0, ∂f2(θ)∂σ22=12σ−11
The Jacobian matrix is
G(θ)′=(12σ−110012σ−12)
From Chapter 7, Proposition 7.9,
^var(ˆθ)=1T(2ˆσ412ˆσ2122ˆσ2122ˆσ42)
Then
^var(ˆη)=G(ˆθ)′^var(ˆθ)G(ˆθ)=(12ˆσ−110012ˆσ−12)1T(2ˆσ412ˆσ2122ˆσ2122ˆσ42)(12ˆσ−110012ˆσ−12)=1T(12ˆσ2112ˆσ212ˆσ−11ˆσ−1212ˆσ212ˆσ−11ˆσ−1212ˆσ22)=1T(12ˆσ2112ˆσ12ˆρ1212ˆσ12ˆρ1212ˆσ22).
◼
Consider the second example function where θ=(μ1,μ2,σ1,σ2)′ and η=f(θ)=(f1(θ),f2(θ))′ where f1(θ)=μ1−rfσ1=SR1, f2(θ)=μ2−rfσ2=SR2.
Here, we want to use the delta method to get the joint distribution of η=(SR1,SR2)′. Now, using the chain-rule:
∂f1(θ)∂μ1=σ−11, ∂f1(θ)∂μ2=0, ∂f1(θ)∂σ1=−σ−21(μ1−rf), ∂f1(θ)∂σ2=0,∂f2(θ)∂μ1=0, ∂f2(θ)∂μ2=σ−12, ∂f2(θ)∂σ1=0, ∂f2(θ)∂σ2=−σ−21(μ1−rf).
Then, using SRi=(μi−rf)/σi (i=1,2), we have:
G(θ)′=(σ−110−σ−11SR100σ−120−σ−12SR20)
Using Proposition 7.10 and the results from the previous example we have:
var(ˆθ)=1T(σ21σ1200σ12σ22000012σ2112σ212σ−11σ−120012σ212σ−11σ−1212σ22).
Then, after some straightforward matrix algebra calculations (see end-of-chapter exercises) we get:
^var(ˆη)=G(ˆθ)′^var(ˆθ)G(ˆθ)=(^var(^SR1)^cov(^SR1,^SR2)^cov(^SR1,^SR2)^var(^SR2))=(1T(1+12^SR21)ˆρ12T(1+12ˆρ12^SR1^SR2)ˆρ12T(1+12ˆρ12^SR1^SR2)1T(1+12^SR22)).
The diagonal elements match the delta method variances for the estimated Sharpe ratio derived earlier. The off-diagonal covariance term shows that the two estimated Sharpe ratios are correlated. We explore this correlation in more depth in the end-of-chapter exercises.
◼
8.3.3 The delta method explained
The delta method deduces the asymptotic distribution of f(ˆθ) using a first order Taylor series approximation of f(ˆθ) evaluated at the true value θ. For simplicity, assume that θ is a scalar parameter. Then, the first order Taylor series expansion gives
f(ˆθ)=f(θ)+f′(θ)(ˆθ−θ)+remainder=f(θ)+f′(θ)ˆθ+f′(θ)θ+remainder≈f(θ)+f′(θ)ˆθ+f′(θ)θ
The first order Taylor series expansion shows that f(ˆθ) is an approximate linear function of the random variable ˆθ. The values θ, f(θ), and f′(θ) are constants. Assuming that, for large enough T,
ˆθ∼N(θ,^se(ˆθ)2),
it follows that
f(ˆθ)∼N(f(θ),f′(θ)2^se(ˆθ)2)
Since ˆθ is consistent for θ we can replace f′(θ)2 with f′(ˆθ)2 giving the practically useful result
f(ˆθ)∼N(f(θ),f′(ˆθ)2^se(ˆθ)2)
The 95% confidence interval computed in
deltaMethod
uses 1.96 instead of the rule-of-thumb value of 2 used in the analytic calculations.↩︎