12.1 Introduction
Markowitz’s mean–variance portfolio (Markowitz, 1952) formulates the portfolio design as a trade-off between the expected return \(\w^\T\bmu\) and the risk measured by the variance \(\w^\T\bSigma\w\) (see Chapter 7 for details): \[ \begin{array}{ll} \underset{\w}{\textm{maximize}} & \w^\T\bmu - \frac{\lambda}{2}\w^\T\bSigma\w\\ \textm{subject to} & \w \in \mathcal{W}, \end{array} \] where \(\lambda\) is a hyper-parameter that controls the investor’s risk-aversion and \(\mathcal{W}\) denotes an arbitrary constraint set, such as \(\mathcal{W} = \{\w \mid \bm{1}^\T\w=1, \w\ge\bm{0} \}\).
Unfortunately, Markowitz’s mean–variance portfolio is severely affected by the ever-present estimation errors in the mean vector \(\bmu\) and covariance matrix \(\bSigma\). The question is whether this portfolio design can be improved with knowledge of the graph of assets by somehow capitalizing on the key connections revealed by the graph connectivity pattern.
Graphs and distance matrices
Graph-based portfolios are constructed from the graph information of the assets, which can be conveniently encoded in the form of a distance matrix containing the distance between each pair of assets. There are numerous methods to acquire this graph information. We start with two simple approaches and then consider two more sophisticated graph estimation methods from Chapter 5.
A popular and simple way to obtain a distance matrix \(\bm{D}\) is directly from the assets’ correlations (Mantegna, 1999) as \[\begin{equation} D_{ij} = \sqrt{\frac{1}{2}(1 - \rho_{ij})}, \tag{12.1} \end{equation}\] where \(\rho_{ij}\) is the correlation between assets \(i\) and \(j\); an alternative is \(D_{ij} = 1 - \rho_{ij}^2\). This is equivalent to the Euclidean distance between the standardized columns of the data matrix \(\bm{X}.\) In more detail, denote the demeaned and normalized \(i\)th column of the data matrix by \(\tilde{\bm{x}}_i = (\bm{x}_i - \mu_i)/\sigma_i,\) where \(\mu_i\) and \(\sigma_i\) are the mean and volatility of \(\bm{x}_i\), respectively. Then, the empirical value of the correlation can be written as \(\rho_{ij} = \frac{1}{T}\tilde{\bm{x}}_i^\T\tilde{\bm{x}}_j\), where \(T\) is the number of observations, and the normalized squared Euclidean distance between \(\tilde{\bm{x}}_i\) and \(\tilde{\bm{x}}_j\) is \(\frac{1}{T}\|\tilde{\bm{x}}_i - \tilde{\bm{x}}_j\|_2^2 = 2(1 - \rho_{ij}),\) which coincides with \(D_{ij}\) in (12.1) up to a scaling factor. Needless to say that other different distance functions could be used to compute the distance matrix; for example, the \(p\)-norm or Minkowski metric \(D_{ij} = \|\tilde{\bm{x}}_i - \tilde{\bm{x}}_j\|_p,\) where \(\|\bm{a}\|_p \triangleq \left(\sum_{t=1}^T |a_i|^p\right)^{1/p}\), which for \(p=1\) becomes the Manhattan distance and for \(p=2\) the Euclidean distance.
A drawback of such a correlation-based distance matrix is that each element only contains information involving two assets, ignoring the rest of the assets. A more holistic definition is to compute a new distance matrix \(\tilde{\bm{D}}\) with elements containing the Euclidean distance between pairs of columns of \(\bm{D}\) (López de Prado, 2016): \[\begin{equation} \tilde{D}_{ij} = \|\bm{d}_i - \bm{d}_j\|_2, \tag{12.2} \end{equation}\] where \(\bm{d}_i\) is the \(i\)th column of \(\bm{D}\). Thus, each element of \(\tilde{\bm{D}}\) is a function of the entire correlation matrix rather than a particular correlation value like in \(\bm{D}\). In other words, \(D_{ij}\) is the distance between two assets while \(\tilde{D}_{ij}\) indicates the closeness in similarity of these assets with the rest of the universe.
Example 12.1 (Distance matrix of a toy example) Consider the following correlation matrix of three variables (López de Prado, 2016): \[ \bm{C}=\begin{bmatrix} 1 & 0.7 & 0.2\\ 0.7 & 1 & -0.2\\ 0.2 & -0.2 & 1 \end{bmatrix}. \] The corresponding correlation-based distance matrix (12.1) is \[ \bm{D}=\begin{bmatrix} 0 & 0.3873 & 0.6325\\ 0.3873 & 0 & 0.7746\\ 0.6325 & 0.7746 & 0 \end{bmatrix} \] and the Euclidean distance matrix of correlation distances (12.2) is \[ \tilde{\bm{D}}=\begin{bmatrix} 0 & 0.5659 & 0.9747\\ 0.5659 & 0 & 1.1225\\ 0.9747 & 1.1225 & 0 \end{bmatrix}. \]
As an alternative to these heuristic definitions of graph distance matrices, Chapter 5 offers an extensive overview of graphs and presents a wide variety of graph estimation methods derived from data. For financial time series corresponding to \(T\) observations of \(N\) assets, with the observations denoted \(\bm{x}^{(t)} \in \R^N\), for \(t=1,\dots,T,\) the recommended methods in Section 5.6 are:54
- Heavy-tailed Markov random field (MRF) with degree control in (5.13) (Cardoso et al., 2021): \[\begin{equation} \begin{array}{ll} \underset{\bm{w}\geq\bm{0}}{\textm{maximize}} & \begin{aligned}[t] \textm{log gdet}(\mathcal{L}(\bm{w})) - \frac{N+\nu}{T}\sum_{t=1}^T \textm{log}\left(\nu + (\bm{x}^{(t)})^\T\mathcal{L}(\bm{w})\bm{x}^{(t)}\right) \end{aligned}\\ \textm{subject to} & \mathfrak{d}(\bm{w}) = \bm{1}, \end{array} \tag{12.3} \end{equation}\] where \(\textm{gdet}(\cdot)\) denotes the generalized determinant of a matrix (defined as the product of nonzero eigenvalues), the weight vector \(\bm{w}\) is a compact representation of the graph information, \(\mathcal{L}(\bm{w})\) is the Laplacian operator that produces the Laplacian matrix \(\bm{L}\) from the weights, \(\mathfrak{d}(\bm{w})\) is the degree operator that gives the degrees of the nodes, and \(\nu\) is a hyper-parameter that controls the degree of heavy-tailness.
- \(k\)-component heavy-tailed MRF with degree control (Cardoso et al., 2021): \[\begin{equation} \begin{array}{cl} \underset{\bm{w}\geq\bm{0}, \bm{F}\in\R^{N\times k}}{\textm{maximize}} & \begin{aligned}[t] \textm{log gdet}(\mathcal{L}(\bm{w})) - \frac{N+\nu}{T}\sum_{t=1}^T \textm{log}\left(\nu + (\bm{x}^{(t)})^\T\mathcal{L}(\bm{w})\bm{x}^{(t)}\right)\\ + \gamma \textm{Tr}\left(\bm{F}^\T\mathcal{L}(\bm{w})\bm{F}\right) \end{aligned}\\ \textm{subject to} & \mathfrak{d}(\bm{w}) = \bm{1}, \quad \bm{F}^\T\bm{F}=\bm{I}, \end{array} \tag{12.4} \end{equation}\] which incorporates the regularization term \(\textm{Tr}(\bm{F}^\T\mathcal{L}(\bm{w})\bm{F})\), controlled by the hyper-parameter \(\gamma\), and the additional variable \(\bm{F}\) is to enforce the low-rank property in the Laplacian matrix (i.e., enforce \(\bm{L} = \mathcal{L}(\bm{w})\) to be rank \(N-k)\), which corresponds to a \(k\)-component graph, i.e., a graph with \(k\) clusters.
After solving these graph learning formulations, the result is contained in the graph Laplacian matrix \(\bm{L}=\mathcal{L}(\bm{w})\), from which a matrix akin to the correlation matrix can be obtained as \[\bm{C} = \textm{abs}(\bm{L}),\] where \(\textm{abs}(\cdot)\) denotes the elementwise absolute value (basically, it keeps the diagonal elements equal to 1 and makes the off-diagonal elements nonnegative). Then, the distance matrix can be obtained as in (12.1): \(D_{ij} = \sqrt{\frac{1}{2}(1 - C_{ij})}\).
References
The R package
fingraph
, based on (Cardoso et al., 2021), contains efficient implementations for the two formulations (12.3) and (12.4); namely, the functionslearn_regular_heavytail_graph()
for the heavy-tailed MRF (12.3) andlearn_kcomp_heavytail_graph()
for the \(k\)-component heavy-tailed MRF (12.4) (Cardoso and Palomar, 2023b). ↩︎