12.6 Computational Problems with Very Large Portfolios
In principle, mean-variance portfolio analysis can be applied in situations in which there is a very large number of risky assets (e.g., \(N=5,000)\). However, there are a number of practical problems that can arise. First, the computation of efficient portfolios requires inverting the \(N\times N\) asset return covariance matrix \(\Sigma\). When \(N\) is very large, inverting \(\Sigma\) can be computationally burdensome. Second, the practical application of the theory requires the estimation of \(\Sigma\). Recall, there are \(N\) variance terms and \(N(N-1)/2\) unique covariance terms in \(\Sigma\). When \(N=5,000\), there are \(12,502,500\) unique elements of \(\Sigma\) to estimate. And since each estimated element of \(\Sigma\) has estimation error, there is a tremendous amount of estimation error in the estimate of \(\Sigma\). There is an additional problem with the estimation of \(\Sigma\) using the sample covariance matrix of asset returns when \(N\) is very large. If the number of assets, \(N\), is greater than the number of sample observations, \(T\), then the \(N\times N\) sample covariance matrix: \[\begin{eqnarray*} \hat{\Sigma} & = & \frac{1}{T-1}\sum_{t=1}^{T}(\mathbf{R}_{t}-\hat{\mu})(\mathbf{R}_{t}-\hat{\mu})^{\prime},\\ \hat{\mu} & = & \frac{1}{T}\sum_{t=1}^{T}\mathbf{R}_{t}, \end{eqnarray*}\] is only positive semi-definite and less than full rank \(N\). This means that \(\hat{\Sigma}\) is not invertible and so mean-variance efficient portfolios cannot be uniquely computed. This problem can happen often. For example, suppose \(N=5,000\). For the sample covariance matrix to be full rank, you need at least \(T=5,000\) sample observations. For daily data, this mean you would need \(5,000/250=20\) years of daily data.83 For weekly data, you would need \(5000/52=96.2\) years of weekly data. For monthly data, you would need \(5,000/12=417\) years of monthly data.
To illustrate the rank failure of \(\hat{\Sigma}\) that occurs
when the number of assets \(N\) is greater than the number of data
observations \(T\), consider computing \(\hat{\Sigma}\) for
the six Vanguard mutual funds in the IntroCompFinR data object
VanguardPrices
using only five monthly observations:
## [1] "vfinx" "veurx" "veiex" "vbltx" "vbisx" "vpacx"
## [1] "Jan 1995" "Dec 2014"
VanguardRetS = na.omit(Return.calculate(VanguardPrices,
method="simple"))
covhat = cov(VanguardRetS[1:5, ])
A quick way to determine if \(\hat{\Sigma}\) is full rank
(and invertible) is to compute the Cholesky decomposition \(\hat{\Sigma}=\hat{\mathbf{C}}\hat{\mathbf{C}}^{\prime}\),
where \(\hat{\mathbf{C}}\) is a lower triangular matrix with non-negative
diagonal elements. If all of the diagonal elements of \(\hat{\mathbf{C}}\)
are positive then \(\hat{\Sigma}\) is positive definite, full
rank, and invertible. In R, we compute \(\hat{\mathbf{C}}\) using the
function chol()
:
Here, chol()
returns an error that indicates \(\hat{\Sigma}\)
is not positive definite and less than full rank. If we try to invert
\(\hat{\Sigma}\) using solve()
we will also get an
error indicating \(\hat{\Sigma}\) is not invertible:84
\(\blacksquare\)
Due to these practical problems of using the sample covariance matrix \(\hat{\Sigma}\) to compute mean-variance efficient portfolios when \(N\) is large, there is a need for alternative methods for estimating \(\Sigma\) when \(N\) is large. One such method based on the Single Index Model for returns is presented in Chapter 16.