12.6 Computational Problems with Very Large Portfolios

In principle, mean-variance portfolio analysis can be applied in situations in which there is a very large number of risky assets (e.g., N=5,000). However, there are a number of practical problems that can arise. First, the computation of efficient portfolios requires inverting the N×N asset return covariance matrix Σ. When N is very large, inverting Σ can be computationally burdensome. Second, the practical application of the theory requires the estimation of Σ. Recall, there are N variance terms and N(N1)/2 unique covariance terms in Σ. When N=5,000, there are 12,502,500 unique elements of Σ to estimate. And since each estimated element of Σ has estimation error, there is a tremendous amount of estimation error in the estimate of Σ. There is an additional problem with the estimation of Σ using the sample covariance matrix of asset returns when N is very large. If the number of assets, N, is greater than the number of sample observations, T, then the N×N sample covariance matrix: ˆΣ=1T1Tt=1(Rtˆμ)(Rtˆμ),ˆμ=1TTt=1Rt, is only positive semi-definite and less than full rank N. This means that ˆΣ is not invertible and so mean-variance efficient portfolios cannot be uniquely computed. This problem can happen often. For example, suppose N=5,000. For the sample covariance matrix to be full rank, you need at least T=5,000 sample observations. For daily data, this mean you would need 5,000/250=20 years of daily data.83 For weekly data, you would need 5000/52=96.2 years of weekly data. For monthly data, you would need 5,000/12=417 years of monthly data.

Example 2.31 (Nonsingular sample return covariance matrix)

To illustrate the rank failure of ˆΣ that occurs when the number of assets N is greater than the number of data observations T, consider computing ˆΣ for the six Vanguard mutual funds in the IntroCompFinR data object VanguardPrices using only five monthly observations:

library(IntroCompFinR)
data(VanguardPrices)
colnames(VanguardPrices) 
## [1] "vfinx" "veurx" "veiex" "vbltx" "vbisx" "vpacx"
range(index(VanguardPrices)) 
## [1] "Jan 1995" "Dec 2014"
VanguardRetS = na.omit(Return.calculate(VanguardPrices, 
                                        method="simple")) 
covhat = cov(VanguardRetS[1:5, ])

A quick way to determine if ˆΣ is full rank (and invertible) is to compute the Cholesky decomposition ˆΣ=ˆCˆC, where ˆC is a lower triangular matrix with non-negative diagonal elements. If all of the diagonal elements of ˆC are positive then ˆΣ is positive definite, full rank, and invertible. In R, we compute ˆC using the function chol():

# chol(covhat) # uncomment this one to see the result

Here, chol() returns an error that indicates ˆΣ is not positive definite and less than full rank. If we try to invert ˆΣ using solve() we will also get an error indicating ˆΣ is not invertible:84

# solve(covhat) # uncomment this one to see the result

Due to these practical problems of using the sample covariance matrix ˆΣ to compute mean-variance efficient portfolios when N is large, there is a need for alternative methods for estimating Σ when N is large. One such method based on the Single Index Model for returns is presented in Chapter 16.


  1. Recall, there are approximately 250 trading days per year.↩︎

  2. The Matrix function rankMatrix() can be used to compute the exact rank of a matrix.↩︎