7.5 Statistical Properties of the GWN Model Estimates

To determine the statistical properties of plug-in principle estimators ˆμi,ˆσ2i,ˆσi,ˆσij and ˆρij in the GWN model, we treat them as functions of the random variables {Rt}Tt=1 where Rt is assumed to be generated by the GWN model (7.1).

7.5.1 Bias

Proposition 7.4 (Bias of GWN model estimators) Assume that returns are generated by the GWN model (7.1). Then the estimators ˆμi, ˆσ2i and ˆσij are unbiased estimators:36 E[ˆμi]=μiE[ˆσ2i]=σ2i,E[ˆσij]=σij. The estimators ˆσi and ˆρij are biased estimators: E[ˆσi]σi,E[ˆρij]ρij.

It can be shown that the biases in ˆσi and ˆρij, are very small and decreasing in T such that bias(ˆσi,σi)=bias(ˆρij,ρij)=0 as T. The proofs of these results are beyond the scope of this book and may be found, for example, in Goldberger (1991). As we shall see, these results about bias can be easily verified using Monte Carlo methods.

It is instructive to illustrate how to derive the result E[ˆμi]=μi. Using ˆμi=1TTt=1Rit and results about the expectation of a linear combination of random variables, it follows that: E[ˆμi]=E[1TTt=1Rit]=1TTt=1E[Rit] (by the linearity of E[])=1TTt=1μi (since E[Rit]=μi, t=1,,T)=1TTμi=μi. The derivation of the results E[ˆσ2i]=σ2i and E[ˆσij]=σij are similar but are considerably more involved and so are omitted.

7.5.2 Precision

Because the GWN model estimators are either unbiased or the bias is very small, the precision of these estimators is measured by their standard errors. Here, we give the mathematical formulas for their standard errors.

Proposition 7.5 (Standard error for ^μi) The standard error for ˆμi, se(ˆμi), can be calculated exactly and is given by: se(ˆμi)=σiT.

The derivation of this result is straightforward. Using the results for the variance of a linear combination of uncorrelated random variables, we have: var(ˆμi)=var(1TTt=1Rit)=1T2Tt=1var(Rit) (since Rit is independent over time)=1T2Tt=1σ2i (since var(Rit)=σ2i, t=1,,T)=1T2Tσ2i=σ2iT.

Then se(ˆμi)=sd(ˆμi)=σiT. We make the following remarks:

  1. The value of se(ˆμi) is in the same units as ˆμi and measures the precision of ˆμi as an estimate. If se(ˆμi) is small relative to ˆμi then ˆμi is a relatively precise estimate of μi because f(ˆμi) will be tightly concentrated around μi; if se(ˆμi) is large relative to μi then ˆμi is a relatively imprecise estimate of μi because f(ˆμi) will be spread out about μi.
  2. The magnitude of se(ˆμi) depends positively on the volatility of returns, σi=sd(Rit). For a given sample size T, assets with higher return volatility have larger values of se(ˆμi) than assets with lower return volatility. In other words, estimates of expected return for high volatility assets are less precise than estimates of expected returns for low volatility assets.
  3. For a given return volatility σi, se(ˆμi) is smaller for larger sample sizes T. In other words, ˆμi is more precisely estimated for larger samples.
  4. se(ˆμi)0 as T. This together with E[^μi]=μi shows that ^μi is consistent for μi.

The derivations of the standard errors for ˆσ2i,ˆσi, ˆσij and ˆρij are complicated, and the exact results are extremely messy and hard to work with. However, there are simple approximate formulas for the standard errors of ˆσ2i,ˆσi, ^σij, and ˆρij based on the CLT under the GWN model that are valid if the sample size, T, is reasonably large.

Proposition 7.6 (Approximate standard error formulas for ˆσ2i,ˆσi, ^σij, and ˆρij) The approximate standard error formulas for ˆσ2i,ˆσi, ^σij, and ˆρij are given by:

se(ˆσ2i)2σ2iT=σ2iT/2,se(ˆσi)σi2T,se(ˆσij)σ2iσ2j+σ2ijT=σ2iσ2j(1+ρ2ij)Tse(ˆρij)1ρ2ijT, where “” denotes approximately equal. The approximations are such that the approximation error goes to zero as the sample size T gets very large.

We make the following remarks:

  1. As with the formula for the standard error of the sample mean, the formulas for se(ˆσ2i) and se(ˆσi) depend on σ2i. Larger values of σ2i imply less precise estimates of ˆσ2i and ˆσi.
  2. The formula for se(ˆσij) depends on σ2i, σ2j, and ρ2ij. Given σ2i and σ2j, the standard error is smallest when ρij=0.
  3. The formula for se(ˆρij), does not depend on σ2i but rather depends on ρ2ij and is smaller the closer ρ2ij is to unity. Intuitively, this makes sense because as ρ2ij approaches one the linear dependence between Rit and Rjt becomes almost perfect and this will be easily recognizable in the data (scatterplot will almost follow a straight line).
  4. The formulas for the standard errors above are inversely related to the square root of the sample size, T, which means that larger sample sizes imply smaller values of the standard errors.
  5. Interestingly, se(ˆσi) goes to zero the fastest and se(ˆσ2i) goes to zero the slowest. Hence, for a fixed sample size, these formulas suggest that σi is generally estimated more precisely than σ2i and ρij, and ρij is estimated generally more precisely than σ2i.

The above formulas (7.20) - (7.23) are not practically useful, however, because they depend on the unknown quantities σ2i,σi and ρij. Practically useful formulas make use of the plug-in principle and replace σ2i,σi and ρij by the estimates ˆσ2i,ˆσi and ˆρij and give rise to the estimated standard errors: ^se(ˆμi)=ˆσiT^se(ˆσ2i)ˆσ2iT/2,^se(ˆσi)ˆσi2T,^se(ˆσij)ˆσ2iˆσ2j(1+ˆρ2ij)T^se(ˆρij)1ˆρ2ijT.

It is good practice to report estimates together with their estimated standard errors. In this way the precision of the estimates is transparent to the user. Typically, estimates are reported in a table with the estimates in one column and the estimated standard errors in an adjacent column.

Example 2.15 (^se(ˆμi) values for Microsoft, Starbucks and the S&P 500 index)

For Microsoft, Starbucks and S&P 500, the values of ^se(ˆμi) are easily computed in R using:

n.obs = nrow(gwnRetC)
seMuhat = sigmahat/sqrt(n.obs)

The values of ˆμi and ^se(ˆμi) shown together are:

cbind(muhat, seMuhat)
##         muhat seMuhat
## MSFT  0.00413 0.00764
## SBUX  0.01466 0.00851
## SP500 0.00169 0.00370

For Microsoft and Starbucks, the values of ^se(ˆμi) are similar because the values of ˆσi are similar, and ^se(ˆμi) is smallest for the S&P 500 index. This occurs because ˆσsp500 is much smaller than ˆσmsft and ˆσsbux. Hence, ˆμi is estimated more precisely for the S&P 500 index (a highly diversified portfolio) than it is for Microsoft and Starbucks stock (individual assets).

It is tempting to compare the magnitude of ˆμi to ^se(ˆμi) to evaluate if ˆμi is a precise estimate. A common way to do this is to compute the so-called t-ratio:

muhat/seMuhat
##  MSFT  SBUX SP500 
## 0.540 1.722 0.457

The t-ratio shows the number of ^se(ˆμi) values ˆμi is from zero. For example, ˆμmsft=0.004 is 0.54 values of ^se(ˆμmsft)=0.008 above zero. This is not very far from zero. In contrast, ˆμsbux=0.015 is 1.72 values of ^se(ˆμsbux)=0.008 above zero. This is moderately far from zero. Because, ^μi represents an average rate of return on an asset it informative to know how far above zero it is likely to be. The farther ^μi is from zero, in units of ^se(ˆμi), the more we are sure that the asset has a positive average rate of return.

Example 7.1 (Computing ^se(ˆσ2i), ^se(ˆσi), ^se(ˆσij), and ^se(ˆρij) for Microsoft, Starbucks and the S&P 500)

For Microsoft, Starbucks and S&P 500, the values of ^se(ˆσ2i) and ^se(ˆσi) (together with the estimates ˆσ2i and ˆσi are:

seSigma2hat = sigma2hat/sqrt(n.obs/2)
seSigmahat = sigmahat/sqrt(2*n.obs)
cbind(sigma2hat, seSigma2hat, sigmahat, seSigmahat)
##       sigma2hat seSigma2hat sigmahat seSigmahat
## MSFT    0.01004    0.001083   0.1002    0.00540
## SBUX    0.01246    0.001344   0.1116    0.00602
## SP500   0.00235    0.000253   0.0485    0.00261

Notice that σ2 and σ for the S&P 500 index are estimated much more precisely than the values for Microsoft and Starbucks. Also notice that σi is estimated more precisely than μi for all assets: the values of ^se(ˆσi) relative to ˆσi are much smaller than the values of ^se(ˆμi) to ˆμi.

The values of ^se(ˆσij) and ^se(ˆρij) (together with the values of σij and ˆρij) are:

seCovhat = sqrt(sigma2hat[c("MSFT","MSFT","SBUX")]*sigma2hat[c("SBUX","SP500","SP500")]*(1 + rhohat^2))/sqrt(n.obs)
seRhohat = (1-rhohat^2)/sqrt(n.obs)
cbind(covhat, seCovhat, rhohat, seRhohat)
##             covhat seCovhat rhohat seRhohat
## msft,sbux  0.00381 0.000901  0.341   0.0674
## msft,sp500 0.00300 0.000435  0.617   0.0472
## sbux,sp500 0.00248 0.000454  0.457   0.0603

The values of ^se(ˆσij) and ^se(ˆρij) are moderate in size (relative to ˆσij and ˆρij). Notice that ˆρsbux,sp500 has the smallest estimated standard error because ˆρ2sbux,sp500 is closest to one.

7.5.3 Sampling Distributions and Confidence Intervals

7.5.3.1 Sampling Distribution for ˆμi

In the GWN model, Ritiid N(μi,σ2i) and since ˆμi=1TTt=1Rit is an average of these normal random variables, it is also normally distributed. Previously we showed that E[ˆμi]=μi and var(ˆμi)=σ2iT. Therefore, the exact probability distribution of ˆμi, f(ˆμi), for a fixed sample size T is the normal distribution ˆμi N(μi,σ2iT). Therefore, we have an exact formula for f(ˆμi):

f(ˆμi)=(2πσ2iT)1/2exp{12σ2i/T(ˆμiμi)2}.

The probability curve f(ˆμi) is centered at the true value μi, and the spread about μi depends on the magnitude of σ2i, the variability of Rit, and the sample size, T. For a fixed sample size T, the uncertainty in ˆμi is larger (smaller) for larger (smaller) values of σ2i. Notice that the variance of ˆμi is inversely related to the sample size T. Given σ2i, var(ˆμi) is smaller for larger sample sizes than for smaller sample sizes. This makes sense since we expect to have a more precise estimator when we have more data. If the sample size is very large (as T) then var(ˆμi) will be approximately zero and the normal distribution of ˆμi given by (7.29) will be essentially a spike at μi. In other words, if the sample size is very large then we essentially know the true value of μi. Hence, we have established that ˆμi is a consistent estimator of μi as the sample size goes to infinity.

The exact sampling distribution for ˆμ assumes that σ2 is known, which is not practically useful. Below, we provide a practically useful result which holds when we use ˆσ2.

Example 2.19 (Sampling distribution of ˆμ with different sample sizes)

The distribution of ˆμi, with μi=0 and σ2i=1 for various sample sizes is illustrated in figure 7.4. Notice how fast the distribution collapses at μi=0 as T increases.

$N(0,1/T)$ sampling distributions for $\hat{\mu}$ for $T=1,10$ and $50$.

Figure 7.4: N(0,1/T) sampling distributions for ˆμ for T=1,10 and 50.

7.5.3.2 Confidence Intervals for μi

In practice, the precision of ˆμi is measured by ^se(ˆμi) but is best communicated by computing a confidence interval for the unknown value of μi. A confidence interval is an interval estimate of μi such that we can put an explicit probability statement about the likelihood that the interval covers μi.

Because we know the exact finite sample distribution of ˆμi in the GWN return model, we can construct an exact confidence interval for μi.

Proposition 7.7 (t-ratio for sample mean) Let {Rit}Tt=1 be generated from the GWN model (7.1). Define the t-ratio as: ti=ˆμiμi^se(ˆμi)=ˆμiμiˆσ/T, Then titT1 where tT1 denotes a Student’s t random variable with T1 degrees of freedom.

The Student’s t distribution with v>0 degrees of freedom is a symmetric distribution centered at zero, like the standard normal. The tail-thickness (kurtosis) of the distribution is determined by the degrees of freedom parameter v. For values of v close to zero, the tails of the Student’s t distribution are much fatter than the tails of the standard normal distribution. As v gets large, the Student’s t distribution approaches the standard normal distribution.

For α(0,1), we compute a (1α)100% confidence interval for μi using (7.30) and the 1α/2 quantile (critical value) tT1(1α/2) to give: Pr which can be rearranged as, \Pr\left(\hat{\mu}_{i}-t_{T-1}(1-\alpha/2)\cdot\widehat{\mathrm{se}}(\hat{\mu}_{i})\leq\mu_{i}\leq\hat{\mu}_{i}+t_{T-1}(1-\alpha/2)\cdot\widehat{\mathrm{se}}(\hat{\mu}_{i})\right)=1-\alpha. Hence, the interval, \begin{align} & [\hat{\mu}_{i}-t_{T-1}(1-\alpha/2)\cdot\widehat{\mathrm{se}}(\hat{\mu}_{i}),~\hat{\mu}_{i}+t_{T-1}(1-\alpha/2)\cdot\widehat{\mathrm{se}}(\hat{\mu}_{i})]\tag{7.31}\\ =& \hat{\mu}_{i}\pm t_{T-1}(1-\alpha/2)\cdot\widehat{\mathrm{se}}(\hat{\mu}_{i})\nonumber \end{align} covers the true unknown value of \mu_{i} with probability 1-\alpha.

Example 2.21 (Computing 95% confidence intervals for \mu_{i})

Suppose we want to compute a 95% confidence interval for \mu_{i}. In this case \alpha=0.05 and 1-\alpha=0.95. Suppose further that T-1=60 (e.g., five years of monthly return data) so that t_{T-1}(1-\alpha/2)=t_{60}(0.975)=2. This can be verified in R using the function qt().

Then the 95% confidence for \mu_{i} is given by:

\begin{equation} \hat{\mu}_{i}\pm2\cdot\widehat{\mathrm{se}}(\hat{\mu}_{i}).\tag{7.32} \end{equation}

The above formula for a 95% confidence interval is often used as a rule of thumb for computing an approximate 95% confidence interval for moderate sample sizes. It is easy to remember and does not require the computation of the quantile t_{T-1}(1-\alpha/2) from the Student’s t distribution. It is also an approximate 95% confidence interval that is based the asymptotic normality of \hat{\mu}_{i}. Recall, for a normal distribution with mean \mu and variance \sigma^{2} approximately 95% of the probability lies between \mu\pm2\sigma.

\blacksquare

The coverage probability associated with the confidence interval for \mu_{i} is based on the fact that the estimator \hat{\mu}_{i} is a random variable. Since the confidence interval is constructed as \hat{\mu}_{i}\pm t_{T-1}(1-\alpha/2)\cdot\widehat{\mathrm{se}}(\hat{\mu}_{i}) it is also a random variable. An intuitive way to think about the coverage probability associated with the confidence interval is to think about the game of “horseshoes”37. The horseshoe is the confidence interval and the parameter \mu_{i} is the post at which the horse shoe is tossed. Think of playing the game 100 times. If the thrower is 95% accurate (if the coverage probability is 0.95) then 95 of the 100 tosses should ring the post (95 of the constructed confidence intervals should contain the true value \mu_{i}).

Example 2.22 (95% confidence intervals for \mu_{i} for Microsoft, Starbucks and the S&P 500 index)

Consider computing 95% confidence intervals for \mu_{i} using (7.31) based on the estimated results for the Microsoft, Starbucks and S&P 500 data. The degrees of freedom for the Student’s t distribution is T-1=171. The 97.5% quantile, t_{99}(0.975), can be computed using the R function qt():

t.975 = qt(0.975, df=(n.obs-1))
t.975
## [1] 1.97

Notice that this quantile is very close to 2. Then the exact 95% confidence intervals are given by:

lower = muhat - t.975*seMuhat
upper = muhat + t.975*seMuhat
width = upper - lower
ans= cbind(lower, upper, width)
colnames(ans) = c("2.5%", "97.5%", "Width")
ans
##           2.5%   97.5%  Width
## MSFT  -0.01096 0.01921 0.0302
## SBUX  -0.00215 0.03146 0.0336
## SP500 -0.00561 0.00898 0.0146

With probability 0.95, the above intervals will contain the true mean values assuming the GWN model is valid. The 95% confidence intervals for Microsoft and Starbucks are fairly wide (about 3%) and contain both negative and positive values. The confidence interval for the S&P 500 index is tighter but also contains negative and positive values. For Microsoft, the confidence interval is [-1.1\%~1.9\%]. This means that with probability 0.95, the true monthly expected return is somewhere between -1.1% and 1.9%. The economic implications of a -1.1% expected monthly return and a 1.9% expected return are vastly different. In contrast, the 95% confidence interval for the SP500 is about half the width of the intervals for Microsoft or Starbucks. The lower limit is near -0.5% and the upper limit is near 1%. This result clearly shows that the monthly mean return for the S&P 500 index is estimated much more precisely than the monthly mean returns for Microsoft or Starbucks.

7.5.3.3 Sampling Distributions for \hat{\sigma}_{i}^{2}, \hat{\sigma}_{i}, \hat{\sigma}_{ij}, and \hat{\rho}_{ij}

The exact distributions of \hat{\sigma}_{i}^{2}, \hat{\sigma}_{i}, \hat{\sigma}_{ij}, and \hat{\rho}_{ij} based on a fixed sample size T are not normal distributions and are difficult to derive38. However, approximate normal distributions of the form (7.7) based on the CLT are readily available:

\begin{align} & \hat{\sigma}_{i}^{2}\sim N\left(\sigma_{i}^{2},\widehat{\mathrm{se}}(\hat{\sigma}_{i}^{2})^{2}\right)=N\left(\sigma_{i}^{2},\frac{4\hat{\sigma}_{i}^{4}}{T}\right),\tag{7.33}\\ & \hat{\sigma}_{i}\sim N\left(\sigma_{i},\widehat{\mathrm{se}}(\hat{\sigma}_{i})^{2}\right)=N\left(\sigma_{i},\frac{\hat{\sigma}_{i}^{2}}{2T}\right),\tag{7.34}\\ & \hat{\sigma}_{ij} \sim N\left(\sigma_{ij},\widehat{\mathrm{se}}(\hat{\sigma}_{ij})^{2}\right)= N\left(\sigma_{ij}, \frac{\hat{\sigma}_i^2 \hat{\sigma}_j^2(1 + \hat{\rho}_{ij}^2)}{T} \right) \tag{7.35}\\ & \hat{\rho}_{ij}\sim N\left(\rho_{ij},\widehat{\mathrm{se}}(\hat{\rho}_{ij})^{2}\right)=N\left(\rho_{ij},\frac{(1-\hat{\rho}_{ij}^{2})^{2}}{T}\right).\tag{7.36} \end{align}

These approximate normal distributions can be used to compute approximate confidence intervals for \sigma_{i}^{2}, \sigma_{i}, \sigma_{ij} and \rho_{ij}.

7.5.3.4 Approximate Confidence Intervals for \sigma_{i}^{2},\sigma_{i}, \sigma_{ij}, and \rho_{ij}

Approximate 95% confidence intervals for \sigma_{i}^{2},\sigma_{i} and \rho_{ij} are given by: \begin{align} \hat{\sigma}_{i}^{2}\pm2\cdot\widehat{\mathrm{se}}(\hat{\sigma}_{i}^{2}) & =\hat{\sigma}_{i}^{2}\pm2\cdot\frac{\hat{\sigma}_{i}^{2}}{\sqrt{T/2}},\tag{7.37}\\ \hat{\sigma}_{i}\pm2\cdot\widehat{\mathrm{se}}(\hat{\sigma}_{i}) & =\hat{\sigma}_{i}\pm2\cdot\frac{\hat{\sigma}_{i}}{\sqrt{2T}},\tag{7.38}\\ \hat{\sigma}_{ij}\pm2\cdot\widehat{\mathrm{se}}(\hat{\sigma}_{ij}) & = \hat{\sigma}_{ij}\pm2\cdot\frac{\sqrt{\hat{\sigma}_i^2 \hat{\sigma}_j^2(1+\hat{\rho}_{ij}^2)}}{\sqrt{T}}.\tag{7.39}\\ \hat{\rho}_{ij}\pm2\cdot\widehat{\mathrm{se}}(\hat{\rho}_{ij}) & =\hat{\rho}_{ij}\pm2\cdot\frac{(1-\hat{\rho}_{ij}^{2})}{\sqrt{T}}.\tag{7.40} \end{align}

Example 2.23 (Approximate 95% confidence intervals for \sigma_{i}^{2},\sigma_{i}, \sigma_{ij}, and \rho_{ij} for Microsoft, Starbucks and the S&P 500)

Using (7.37) - (7.38), the approximate 95% confidence intervals for \sigma_{i}^{2} and \sigma_{i} (i= Microsoft, Starbucks, S&P 500) are:

# 95% confidence interval for variance
lowerSigma2 = sigma2hat - 2*seSigma2hat
upperSigma2 = sigma2hat + 2*seSigma2hat
widthSigma2 = upperSigma2 - lowerSigma2
ans = cbind(lowerSigma2, upperSigma2, widthSigma2)
colnames(ans) = c("2.5%", "97.5%", "Width")
ans
##          2.5%   97.5%   Width
## MSFT  0.00788 0.01221 0.00433
## SBUX  0.00978 0.01515 0.00538
## SP500 0.00184 0.00286 0.00101
# 95% confidence interval for volatility
lowerSigma = sigmahat - 2*seSigmahat
upperSigma = sigmahat + 2*seSigmahat
widthSigma = upperSigma - lowerSigma
ans = cbind(lowerSigma, upperSigma, widthSigma)
colnames(ans) = c("2.5%", "97.5%", "Width")
ans
##         2.5%  97.5%  Width
## MSFT  0.0894 0.1110 0.0216
## SBUX  0.0996 0.1237 0.0241
## SP500 0.0432 0.0537 0.0105

The 95% confidence intervals for \sigma and \sigma^{2} are larger for Microsoft and Starbucks than for the S&P 500 index. For all assets, the intervals for \sigma are fairly narrow (2% for Microsoft and Starbucks and 1% for S&P 500 index) indicating that \sigma is precisely estimated.

The approximate 95% confidence intervals for \sigma_{ij} are:

lowerCov = covhat - 2*seCovhat
upperCov = covhat + 2*seCovhat
widthCov = upperCov - lowerCov
ans = cbind(lowerCov, upperCov, widthCov)
colnames(ans) = c("2.5%", "97.5%", "Width")
ans
##               2.5%   97.5%   Width
## msft,sbux  0.00201 0.00562 0.00361
## msft,sp500 0.00213 0.00387 0.00174
## sbux,sp500 0.00157 0.00338 0.00181

The approximate 95% confidence intervals for \rho_{ij} are:

lowerRho = rhohat - 2*seRhohat
upperRho = rhohat + 2*seRhohat
widthRho = upperRho - lowerRho
ans = cbind(lowerRho, upperRho, widthRho)
colnames(ans) = c("2.5%", "97.5%", "Width")
ans
##             2.5% 97.5% Width
## msft,sbux  0.206 0.476 0.270
## msft,sp500 0.523 0.712 0.189
## sbux,sp500 0.337 0.578 0.241

The 95% confidence intervals for \sigma_{ij} \rho_{ij} are not too wide and all contain just positive values away from zero. The smallest interval is for \rho_{\textrm{msft,sp500}} because \hat{\rho}_{\textrm{msft,sp500}} is closest to 1.

\blacksquare


  1. The matrix sample statistics (7.16) and (7.17) are unbiased estimators of \mu and \Sigma, respectively.↩︎

  2. Horse shoes is a game commonly played at county fairs. See (http://en.wikipedia.org/wiki/Horseshoes) for a complete description of the game.↩︎

  3. For example, the exact sampling distribution of (T-1)\hat{\sigma}_{i}^{2}/\sigma_{i}^{2} is chi-square with T-1 degrees of freedom.↩︎