## 12.6 Computational Problems with Very Large Portfolios

In principle, mean-variance portfolio analysis can be applied in situations in which there is a very large number of risky assets (e.g., $$N=5,000)$$. However, there are a number of practical problems that can arise. First, the computation of efficient portfolios requires inverting the $$N\times N$$ asset return covariance matrix $$\Sigma$$. When $$N$$ is very large, inverting $$\Sigma$$ can be computationally burdensome. Second, the practical application of the theory requires the estimation of $$\Sigma$$. Recall, there are $$N$$ variance terms and $$N(N-1)/2$$ unique covariance terms in $$\Sigma$$. When $$N=5,000$$, there are $$12,502,500$$ unique elements of $$\Sigma$$ to estimate. And since each estimated element of $$\Sigma$$ has estimation error, there is a tremendous amount of estimation error in the estimate of $$\Sigma$$. There is an additional problem with the estimation of $$\Sigma$$ using the sample covariance matrix of asset returns when $$N$$ is very large. If the number of assets, $$N$$, is greater than the number of sample observations, $$T$$, then the $$N\times N$$ sample covariance matrix: $\begin{eqnarray*} \hat{\Sigma} & = & \frac{1}{T-1}\sum_{t=1}^{T}(\mathbf{R}_{t}-\hat{\mu})(\mathbf{R}_{t}-\hat{\mu})^{\prime},\\ \hat{\mu} & = & \frac{1}{T}\sum_{t=1}^{T}\mathbf{R}_{t}, \end{eqnarray*}$ is only positive semi-definite and less than full rank $$N$$. This means that $$\hat{\Sigma}$$ is not invertible and so mean-variance efficient portfolios cannot be uniquely computed. This problem can happen often. For example, suppose $$N=5,000$$. For the sample covariance matrix to be full rank, you need at least $$T=5,000$$ sample observations. For daily data, this mean you would need $$5,000/250=20$$ years of daily data.83 For weekly data, you would need $$5000/52=96.2$$ years of weekly data. For monthly data, you would need $$5,000/12=417$$ years of monthly data.

Example 2.31 (Nonsingular sample return covariance matrix)

To illustrate the rank failure of $$\hat{\Sigma}$$ that occurs when the number of assets $$N$$ is greater than the number of data observations $$T$$, consider computing $$\hat{\Sigma}$$ for the six Vanguard mutual funds in the IntroCompFinR data object VanguardPrices using only five monthly observations:

library(IntroCompFinR)
data(VanguardPrices)
colnames(VanguardPrices) 
##  "vfinx" "veurx" "veiex" "vbltx" "vbisx" "vpacx"
range(index(VanguardPrices)) 
##  "Jan 1995" "Dec 2014"
VanguardRetS = na.omit(Return.calculate(VanguardPrices,
method="simple"))
covhat = cov(VanguardRetS[1:5, ])

A quick way to determine if $$\hat{\Sigma}$$ is full rank (and invertible) is to compute the Cholesky decomposition $$\hat{\Sigma}=\hat{\mathbf{C}}\hat{\mathbf{C}}^{\prime}$$, where $$\hat{\mathbf{C}}$$ is a lower triangular matrix with non-negative diagonal elements. If all of the diagonal elements of $$\hat{\mathbf{C}}$$ are positive then $$\hat{\Sigma}$$ is positive definite, full rank, and invertible. In R, we compute $$\hat{\mathbf{C}}$$ using the function chol():

# chol(covhat) # uncomment this one to see the result

Here, chol() returns an error that indicates $$\hat{\Sigma}$$ is not positive definite and less than full rank. If we try to invert $$\hat{\Sigma}$$ using solve() we will also get an error indicating $$\hat{\Sigma}$$ is not invertible:84

# solve(covhat) # uncomment this one to see the result

$$\blacksquare$$

Due to these practical problems of using the sample covariance matrix $$\hat{\Sigma}$$ to compute mean-variance efficient portfolios when $$N$$ is large, there is a need for alternative methods for estimating $$\Sigma$$ when $$N$$ is large. One such method based on the Single Index Model for returns is presented in Chapter 16.

1. Recall, there are approximately 250 trading days per year.↩︎

2. The Matrix function rankMatrix() can be used to compute the exact rank of a matrix.↩︎