12.6 Computational Problems with Very Large Portfolios

In principle, mean-variance portfolio analysis can be applied in situations in which there is a very large number of risky assets (e.g., $N=5,000)$ . However, there are a number of practical problems that can arise. First, the computation of efficient portfolios requires inverting the $N\times N$ asset return covariance matrix $\Sigma$ . When $N$ is very large, inverting $\Sigma$ can be computationally burdensome. Second, the practical application of the theory requires the estimation of $\Sigma$ . Recall, there are $N$ variance terms and $N(N-1)/2$ unique covariance terms in $\Sigma$ . When $N=5,000$ , there are $12,502,500$ unique elements of $\Sigma$ to estimate. And since each estimated element of $\Sigma$ has estimation error, there is a tremendous amount of estimation error in the estimate of $\Sigma$ . There is an additional problem with the estimation of $\Sigma$ using the sample covariance matrix of asset returns when $N$ is very large. If the number of assets, $N$ , is greater than the number of sample observations, $T$ , then the $N\times N$ sample covariance matrix: $\begin{eqnarray*} \hat{\Sigma} & = & \frac{1}{T-1}\sum_{t=1}^{T}(\mathbf{R}_{t}-\hat{\mu})(\mathbf{R}_{t}-\hat{\mu})^{\prime},\\ \hat{\mu} & = & \frac{1}{T}\sum_{t=1}^{T}\mathbf{R}_{t}, \end{eqnarray*}$ is only positive semi-definite and less than full rank $N$ . This means that $\hat{\Sigma}$ is not invertible and so mean-variance efficient portfolios cannot be uniquely computed. This problem can happen often. For example, suppose $N=5,000$ . For the sample covariance matrix to be full rank, you need at least $T=5,000$ sample observations. For daily data, this mean you would need $5,000/250=20$ years of daily data.⁸³ For weekly data, you would need $5000/52=96.2$ years of weekly data. For monthly data, you would need $5,000/12=417$ years of monthly data.

Example 2.31 (Nonsingular sample return covariance matrix)

To illustrate the rank failure of $\hat{\Sigma}$ that occurs when the number of assets $N$ is greater than the number of data observations $T$ , consider computing $\hat{\Sigma}$ for the six Vanguard mutual funds in the IntroCompFinR data object VanguardPrices using only five monthly observations:

library(IntroCompFinR)
data(VanguardPrices)
colnames(VanguardPrices)

## [1] "vfinx" "veurx" "veiex" "vbltx" "vbisx" "vpacx"

range(index(VanguardPrices))

## [1] "Jan 1995" "Dec 2014"

VanguardRetS = na.omit(Return.calculate(VanguardPrices, 
                                        method="simple")) 
covhat = cov(VanguardRetS[1:5, ])

A quick way to determine if $\hat{\Sigma}$ is full rank (and invertible) is to compute the Cholesky decomposition $\hat{\Sigma}=\hat{\mathbf{C}}\hat{\mathbf{C}}^{\prime}$ , where $\hat{\mathbf{C}}$ is a lower triangular matrix with non-negative diagonal elements. If all of the diagonal elements of $\hat{\mathbf{C}}$ are positive then $\hat{\Sigma}$ is positive definite, full rank, and invertible. In R, we compute $\hat{\mathbf{C}}$ using the function chol():

# chol(covhat) # uncomment this one to see the result

Here, chol() returns an error that indicates $\hat{\Sigma}$ is not positive definite and less than full rank. If we try to invert $\hat{\Sigma}$ using solve() we will also get an error indicating $\hat{\Sigma}$ is not invertible:⁸⁴

# solve(covhat) # uncomment this one to see the result

$\blacksquare$

Due to these practical problems of using the sample covariance matrix $\hat{\Sigma}$ to compute mean-variance efficient portfolios when $N$ is large, there is a need for alternative methods for estimating $\Sigma$ when $N$ is large. One such method based on the Single Index Model for returns is presented in Chapter 16.

Recall, there are approximately 250 trading days per year.↩︎
The Matrix function rankMatrix() can be used to compute the exact rank of a matrix.↩︎