9.3 Portfolio Formulations
We now explore different formulations involving high-order moments. Recall that higher-order moments will make the formulations nonconvex, unlike the mean–variance formulation.
As explored in Section 9.2, there are several options for the expressions of the moments \(\phi_1(\w),\) \(\phi_2(\w),\) \(\phi_3(\w),\) and \(\phi_4(\w)\):
- nonparametric moments: as in (9.1);
- factor model structured moments: as in (9.4) for a single factor or (9.5) for multiple factors;
- parametric moments: following the multivariate skewed \(t\) as in (9.6); and
- L-moments: as in (9.10).
9.3.1 MVSK Portfolios
A natural and straightforward way to incorporate higher-order moments in Markowitz’s mean–variance framework is by optimizing a weighted combination of the first four moments, called the mean–variance–skewness–kurtosis (MVSK) portfolio: \[\begin{equation} \begin{array}{ll} \underset{\w}{\textm{minimize}} & - \lambda_1 \phi_1(\w) + \lambda_2 \phi_2(\w) - \lambda_3 \phi_3(\w) + \lambda_4 \phi_4(\w)\\ \textm{subject to} & \w \in \mathcal{W}. \end{array} \tag{9.11} \end{equation}\]
The hyper-parameters \(\lambda_1\), \(\lambda_2\), \(\lambda_3\), and \(\lambda_4\) are chosen, as usual, according to the investor’s risk aversion. Observe that a reasonable investor seeks higher values of the first and third moments (i.e., mean and skewness), \(\phi_1(\w)\) and \(\phi_3(\w)\), and lower values of the second and fourth moments (i.e., variance and kurtosis), \(\phi_2(\w)\) and \(\phi_4(\w)\) (Briec et al., 2007; Martellini and Ziemann, 2010; Scott and Horvath, 1980). An interesting case of (9.11) arises when \(\lambda_1 = 0\), that is, ignoring the mean like in the global minimum variance portfolio (GMVP); see Section 6.5.1 in Chapter 6.
A convenient choice for the hyper-parameters is according to the constant relative risk aversion (Martellini and Ziemann, 2010): \[ \begin{aligned} \lambda_1 & = 1,\\ \lambda_2 & = \frac{\gamma}{2},\\ \lambda_3 & = \frac{\gamma(\gamma + 1)}{6},\\ \lambda_4 & = \frac{\gamma(\gamma + 1)(\gamma + 2)}{24}, \end{aligned} \] where \(\gamma\geq0\) is the risk aversion parameter.
As is customary, this high-order portfolio design could be alternatively formulated with any of the moments as constraints. For example, the feasibility problem \[ \begin{array}{ll} \underset{\w}{\textm{find}} & \w\\ \textm{subject to} & \phi_1(\w) \geq \alpha_1,\\ & \phi_2(\w) \leq \alpha_2,\\ & \phi_3(\w) \geq \alpha_3,\\ & \phi_4(\w) \leq \alpha_4, \end{array} \] where the hyper-parameters are given by \(\alpha_1,\) \(\alpha_2,\) \(\alpha_3,\) and \(\alpha_4,\) denoting the investor’s preference.
Efficient numerical algorithms specifically designed to solve the MVSK formulation in (9.11) are discussed in Section 9.4, based on Zhou and Palomar (2021) and X. Wang et al. (2023).
Expected-Utility Approximations
Expected utility theory in the context of portfolio design was explored in Section 7.3 of Chapter 7. The idea is to maximize the expected value of a utility function, \(\E\left[U(\w^\T\bm{r})\right]\), where \(U(\cdot)\) denotes some utility function, instead of the mean–variance objective \(\w^\T\bmu - \frac{\lambda}{2}\w^\T\bSigma\w\).
High-order portfolios were considered in W. E. Young and Trent (1969) using the following approximation for the geometric mean of the returns: \[ \E\left[\textm{log}\left(1 + \w^\T\bm{r}\right)\right] \approx \textm{log}\left(1 + \phi_1(\w)\right) - \frac{\phi_2(\w)}{2\phi_1^2(\w)} + \frac{\phi_3(\w)}{3\phi_1^3(\w)} - \frac{\phi_4(\w)}{4\phi_1^4(\w)}, \] where the approximation up to the first two terms coincides to that in (7.16). High-order expansions were also considered for arbitrary expected utilities (Jean, 1971). More recently, Martellini and Ziemann (2010) considered high-order approximations of expected utilities with structured estimators of the moments as in (9.4).
9.3.2 Making Portfolios Efficient
The shortage function is an important quantity in multi-objective optimization related to the efficient frontier and the Pareto-optimal points (see Section A.7 in Appendix A).
The shortage function measures the distance between the moments of a portfolio and the efficient frontier along a given direction. Based on this concept, given a reference portfolio \(\w^0\) and a direction vector \(\bm{g}\), we can optimize a portfolio by pushing the reference portfolio towards the efficient frontier along that direction (Briec et al., 2007; Jurczenko et al., 2006): \[\begin{equation} \begin{array}{ll} \underset{\w,\delta \geq0}{\textm{maximize}} & \delta\\ \textm{subject to} & \phi_1(\w) \geq \phi_1(\w^0) + \delta g_1,\\ & \phi_2(\w) \leq \phi_2(\w^0) - \delta g_2,\\ & \phi_3(\w) \geq \phi_3(\w^0) + \delta g_3,\\ & \phi_4(\w) \leq \phi_4(\w^0) - \delta g_4. \end{array} \tag{9.12} \end{equation}\]
Observe that this formulation is always feasible. In the case that the reference portfolio \(\w^0\) was already on the efficient frontier, then the solution will be \(\w=\w^0\) and \(\delta=0\).
9.3.3 Portfolio Tilting
The formulation in (9.12) to improve a given reference portfolio \(\w^0\) can be further extended by introducing a measure of portfolio optimality.
Suppose that the reference portfolio \(\w^0\) is obtained as the solution to the minimization of some cost function \(\xi(\cdot)\): \[ \w^0 = \textm{arg min}_{\w\in\mathcal{W}}\; \xi(\w). \] Some illustrative examples of the cost function \(\xi(\cdot)\) are
- the Herfindahl index of the portfolio weights to promote diversity (see Section 7.1.5 in Chapter 7): \[ \xi(\w) = \sum_{i=1}^N w_i^2; \]
- equalization of risk contributions (see the risk parity portfolio in Chapter 11): \[ \xi(\w) = \sum_{i=1}^N \left(\frac{w_i(\bSigma\w)_i}{\w^\T\bSigma\w} - \frac{1}{N}\right)^2; \]
- diversification ratio (see Section 6.5 in Chapter 6): \[ \xi(\w) = -\dfrac{\w^\T\bm{\sigma}}{\sqrt{\w^\T\bSigma\w}}; \]
- tracking error of a benchmark portfolio \(\w^\textm{b}\) (see index tracking in Chapter 13): \[ \xi(\w) = \sqrt{(\w - \w^\textm{b})^\T\bSigma(\w - \w^\textm{b})}. \]
The so-called MVSK portfolio tilting is formulated (Boudt, Cornilly, Holle, et al., 2020) as \[\begin{equation} \begin{array}{ll} \underset{\w,\delta \geq0}{\textm{maximize}} & \delta\\ \textm{subject to} & \xi(\w) \leq \xi(\w^0) + \kappa,\\ & \phi_1(\w) \geq \phi_1(\w^0) + g_1(\delta),\\ & \phi_2(\w) \leq \phi_2(\w^0) - g_2(\delta),\\ & \phi_3(\w) \geq \phi_3(\w^0) + g_3(\delta),\\ & \phi_4(\w) \leq \phi_4(\w^0) - g_4(\delta), \end{array} \tag{9.13} \end{equation}\] where \(g_i(\delta)\) are increasing functions of \(\delta\), and \(\kappa>0\) is the “sacrifice parameter” to allow for some loss of optimality with respect to the reference portfolio (according to the cost function \(\xi(\cdot)\)) in exchange for getting closer to the efficient frontier.
One simple way to choose the hyper-parameters is proportional to the reference values, for example: \[ \begin{aligned} \kappa & = 0.01\times\xi(\w^0),\\ g_1(\delta) & = \delta \times \phi_1(\w^0),\\ g_2(\delta) & = \delta \times \phi_2(\w^0),\\ g_3(\delta) & = \delta \times \phi_3(\w^0),\\ g_4(\delta) & = \delta \times \phi_4(\w^0). \end{aligned} \]
Efficient numerical algorithms specifically designed to solve the MVSK tilting portfolio formulation were developed in Zhou and Palomar (2021).
9.3.4 Polynomial Goal Programming MVSK Portfolio
Another possible way to obtain a trade-off among the moments can be formulated as the so-called polynomial goal programming into which the investor’s preferences and objectives are incorporated. The formulation is based on minimizing the distance to some reference moments measured with a polynomial (Lai, 1991): \[ \begin{array}{ll} \underset{\w,\bm{d}\ge\bm{0}}{\textm{minimize}} & \left|\frac{d_1}{\phi_1^0}\right|^{\lambda_1} + \left|\frac{d_2}{\phi_2^0}\right|^{\lambda_2} + \left|\frac{d_3}{\phi_3^0}\right|^{\lambda_3} + \left|\frac{d_4}{\phi_4^0}\right|^{\lambda_4}\\ \textm{subject to} & \phi_1(\w) + d_1 \geq \phi_1^0,\\ & \phi_2(\w) - d_2 \leq \phi_2^0,\\ & \phi_3(\w) + d_3 \geq \phi_3^0,\\ & \phi_4(\w) - d_4 \leq \phi_4^0, \end{array} \] where \(\bm{d}\) denotes the deviation from the so-called “aspired levels” of the moments \(\phi_1^0,\) \(\phi_2^0,\) \(\phi_3^0,\) and \(\phi_4^0,\) which can be obtained, for example, as the extreme values \(\phi_i^0 = \textm{max(min)}_{\w\in\mathcal{W}}\ \phi_i(\w)\). Observe that these aspired levels are not jointly achievable by a single portfolio and that is where the vector variable \(\bm{d}\ge\bm{0}\) comes into play to relax the problem. If the aspired levels could be achieved by a portfolio \(\w^0\), then the optimal solution would simply be \(\w=\w^0\) and \(\bm{d}=\bm{0}\).
One particular case of this polynomial goal programming is when using the Minkowski distance (where the exponents are set to \(\lambda_i=1/p\)): \[ \begin{array}{ll} \underset{\w,\bm{d}\ge\bm{0}}{\textm{minimize}} & \begin{aligned}\left(\sum\limits_{i=1}^{4} \left|\frac{d_i}{\phi_i^0}\right|^{p}\right)^{1/p}\end{aligned}\\ \textm{subject to} & \phi_1(\w) + d_1 \geq \phi_1^0,\\ & \phi_2(\w) - d_2 \leq \phi_2^0,\\ & \phi_3(\w) + d_3 \geq \phi_3^0,\\ & \phi_4(\w) - d_4 \leq \phi_4^0. \end{array} \]
9.3.5 L-Moment Portfolios
We now turn to the L-moments in (9.10) obtained from the sorted portfolio returns \[ \w^\T\bm{r}_{\tau(1)} \leq \w^\T\bm{r}_{\tau(2)} \leq \dots \leq \w^\T\bm{r}_{\tau(T)}, \] where the permutation \(\tau(\cdot)\) is a critical component that makes the problem nonconvex and difficult to handle.
Plugging the expressions of the L-moments in terms of sorted portfolio returns, as in (9.10), into the MSVK portfolio formulation in (9.11) leads to \[ \begin{array}{ll} \underset{\w}{\textm{maximize}} & \sum_{i=1}^T v_i \w^\T\bm{r}_{\tau(i)}\\ \textm{subject to} & \w \in \mathcal{W}, \end{array} \] for properly chosen weights \(v_i\).
The function in the objective \(\sum_{i=1}^T v_i \w^\T\bm{r}_{\tau(i)}\) involving ordered values is called ordered weighted averaging (OWA) and was studied in the 1990s. It turns out that such a problem can be reformulated in terms of auxiliary integer (actually binary) variables, which shows that the problem is in general a nonconvex integer problem (Yager, 1996).
To be specific, the OWA problem \[\begin{equation} \begin{array}{ll} \underset{\w,\{x_t\}}{\textm{maximize}} & \sum_{i=1}^T v_i x_{(i)}\\ \textm{subject to} & x_t = \w^\T\bm{r}_t, \qquad t=1,\dots,T,\\ & \w \in \mathcal{W} \end{array} \tag{9.14} \end{equation}\] is equivalent to the mixed-integer linear program (Yager, 1996) \[ \begin{array}{cl} \underset{\w,\{x_t\},\{y_t\},\{z_{ij}\}}{\textm{maximize}} & \begin{array}[t]{l}\begin{aligned}\sum_{i=1}^T v_i y_i\end{aligned}\end{array}\\ \textm{subject to} & \begin{array}[t]{ll} x_t = \w^\T\bm{r}_t, & t=1,\dots,T,\\ \w \in \mathcal{W},\\ y_1 \leq y_2 \leq \dots \leq y_T,\\ y_i\bm{1} \leq \bm{x} + M\bm{z}_i, & i=1,\dots,T,\\ \bm{1}^\T\bm{z}_i \leq i-1,\\ z_{ij} \in \{0,1\}, \end{array} \end{array} \] where the weights \(v_i\) are assumed to be nonnegative and \(M\) is a sufficiently large constant (much larger than any possible value that any of the \(x_t\) or \(y_t\) can take).
If the weights \(v_i\) are positive and decreasing, it was shown (Ogryczak, 2000) that the OWA objective function is a concave piecewise linear function, \[ \sum_{i=1}^T v_i x_{(i)} = \underset{\tau\in\Pi}{\textm{min}} \left(\sum_{i=1}^T v_{\tau(i)} x_i\right), \] where \(\tau(\cdot)\) is a permutation and \(\Pi\) the set of all possible \(T!\) permutations for a set of length \(T\). Then the OWA problem (9.14) can be rewritten as the linear program \[ \begin{array}{ll} \underset{\w, s}{\textm{maximize}} & s\\ \textm{subject to} & s \leq \sum_{t=1}^T v_{\tau(t)} \w^\T\bm{r}_t,\quad\textm{ for all }\tau\in\Pi,\\ & \w \in \mathcal{W}. \end{array} \] This formulation, unfortunately, has \(T!\) constraints involved in all possible permutations, which makes its usefulness questionable. An efficient dual implementation was considered in Ogryczak and Sliwinski (2003). An alternative is based on relaxing the set of permutations to its convex hull, which does not change the minimum value (Chassein and Goerigk, 2015).
Yet another reformulation of the OWA problem (9.14) is in terms of the cumulative ordered values, defined as \(\bar{x}_i = \sum_{j=1}^i x_{(i)}\), which allows us to write \[ \sum_{i=1}^T v_i x_{(i)} = \sum_{i=1}^T v'_i \bar{x}_i, \] where \(v'_i = v_i - v_{i+1}\), for \(i=1,\dots,T-1\), and \(v'_T = v_T\). It was shown (Ogryczak and Sliwinski, 2003) that \[ \bar{x}_i = \textm{max}_{y_i}\left\{i y_i - \bm{1}^\T (\bm{x} - y_i \bm{1})^+\right\}. \] Thus, if the weights \(v_i\) are positive and decreasing, then \(v'_i>0\) for \(i=1,\dots,T\) and the OWA formulation can be written as the following problem (Ogryczak and Sliwinski, 2003): \[ \begin{array}{ll} \underset{\w,\bm{x},\bm{y}}{\textm{maximize}} & \begin{aligned}\sum_{i=1}^T v'_i \left(i y_i - \bm{1}^\T (\bm{x} - y_i \bm{1})^+\right)\end{aligned}\\ \textm{subject to} & x_t = \w^\T\bm{r}_t, \qquad t=1,\dots,T,\\ & \w \in \mathcal{W}, \end{array} \] which can be easily rewritten as a linear program: \[ \begin{array}{ll} \underset{\w,\bm{x},\bm{y},\bm{s}\geq\bm{0}}{\textm{maximize}} & \begin{array}[t]{l}\begin{aligned}\sum_{i=1}^T v'_i \left(i y_i - \bm{1}^\T s_i\right)\end{aligned}\end{array}\\ \textm{subject to} & \begin{array}[t]{ll} x_t = \w^\T\bm{r}_t, &t=1,\dots,T,\\ s_i \geq \bm{x} - y_i \bm{1}, & i=1,\dots,T,\\ \w \in \mathcal{W}. \end{array} \end{array} \]
Nevertheless, if the weights \(v_i\) are not positive and decreasing, then the problem cannot be simplified as above.