12.6 Computational Problems with Very Large Portfolios

In principle, mean-variance portfolio analysis can be applied in situations in which there is a very large number of risky assets (e.g., \(N=5,000)\). However, there are a number of practical problems that can arise. First, the computation of efficient portfolios requires inverting the \(N\times N\) asset return covariance matrix \(\Sigma\). When \(N\) is very large, inverting \(\Sigma\) can be computationally burdensome. Second, the practical application of the theory requires the estimation of \(\Sigma\). Recall, there are \(N\) variance terms and \(N(N-1)/2\) unique covariance terms in \(\Sigma\). When \(N=5,000\), there are \(12,502,500\) unique elements of \(\Sigma\) to estimate. And since each estimated element of \(\Sigma\) has estimation error, there is a tremendous amount of estimation error in the estimate of \(\Sigma\). There is an additional problem with the estimation of \(\Sigma\) using the sample covariance matrix of asset returns when \(N\) is very large. If the number of assets, \(N\), is greater than the number of sample observations, \(T\), then the \(N\times N\) sample covariance matrix: \[\begin{eqnarray*} \hat{\Sigma} & = & \frac{1}{T-1}\sum_{t=1}^{T}(\mathbf{R}_{t}-\hat{\mu})(\mathbf{R}_{t}-\hat{\mu})^{\prime},\\ \hat{\mu} & = & \frac{1}{T}\sum_{t=1}^{T}\mathbf{R}_{t}, \end{eqnarray*}\] is only positive semi-definite and less than full rank \(N\). This means that \(\hat{\Sigma}\) is not invertible and so mean-variance efficient portfolios cannot be uniquely computed. This problem can happen often. For example, suppose \(N=5,000\). For the sample covariance matrix to be full rank, you need at least \(T=5,000\) sample observations. For daily data, this mean you would need \(5,000/250=20\) years of daily data.83 For weekly data, you would need \(5000/52=96.2\) years of weekly data. For monthly data, you would need \(5,000/12=417\) years of monthly data.

Example 2.31 (Nonsingular sample return covariance matrix)

To illustrate the rank failure of \(\hat{\Sigma}\) that occurs when the number of assets \(N\) is greater than the number of data observations \(T\), consider computing \(\hat{\Sigma}\) for the six Vanguard mutual funds in the IntroCompFinR data object VanguardPrices using only five monthly observations:

library(IntroCompFinR)
data(VanguardPrices)
colnames(VanguardPrices) 
## [1] "vfinx" "veurx" "veiex" "vbltx" "vbisx" "vpacx"
range(index(VanguardPrices)) 
## [1] "Jan 1995" "Dec 2014"
VanguardRetS = na.omit(Return.calculate(VanguardPrices, 
                                        method="simple")) 
covhat = cov(VanguardRetS[1:5, ])

A quick way to determine if \(\hat{\Sigma}\) is full rank (and invertible) is to compute the Cholesky decomposition \(\hat{\Sigma}=\hat{\mathbf{C}}\hat{\mathbf{C}}^{\prime}\), where \(\hat{\mathbf{C}}\) is a lower triangular matrix with non-negative diagonal elements. If all of the diagonal elements of \(\hat{\mathbf{C}}\) are positive then \(\hat{\Sigma}\) is positive definite, full rank, and invertible. In R, we compute \(\hat{\mathbf{C}}\) using the function chol():

# chol(covhat) # uncomment this one to see the result

Here, chol() returns an error that indicates \(\hat{\Sigma}\) is not positive definite and less than full rank. If we try to invert \(\hat{\Sigma}\) using solve() we will also get an error indicating \(\hat{\Sigma}\) is not invertible:84

# solve(covhat) # uncomment this one to see the result

\(\blacksquare\)

Due to these practical problems of using the sample covariance matrix \(\hat{\Sigma}\) to compute mean-variance efficient portfolios when \(N\) is large, there is a need for alternative methods for estimating \(\Sigma\) when \(N\) is large. One such method based on the Single Index Model for returns is presented in Chapter 16.


  1. Recall, there are approximately 250 trading days per year.↩︎

  2. The Matrix function rankMatrix() can be used to compute the exact rank of a matrix.↩︎