\( \newcommand{\bm}[1]{\boldsymbol{#1}} \newcommand{\textm}[1]{\textsf{#1}} \def\T{{\mkern-2mu\raise-1mu\mathsf{T}}} \newcommand{\R}{\mathbb{R}} % real numbers \newcommand{\E}{{\rm I\kern-.2em E}} \newcommand{\w}{\bm{w}} % bold w \newcommand{\bmu}{\bm{\mu}} % bold mu \newcommand{\bSigma}{\bm{\Sigma}} % bold mu \newcommand{\bigO}{O} %\mathcal{O} \renewcommand{\d}[1]{\operatorname{d}\!{#1}} \)

9.3 Portfolio formulations

We now explore different formulations involving high-order moments. Recall that higher-order moments will make the formulations nonconvex, unlike the mean–variance formulation.

As explored in Section 9.2, there are several options for the expressions of the moments \(\phi_1(\w),\) \(\phi_2(\w),\) \(\phi_3(\w),\) and \(\phi_4(\w)\):

  • non-parametric moments: as in (9.1);
  • factor model structured moments: as in (9.4) for a single factor or (9.5) for multiple factors;
  • parametric moments: following the multivariate skewed \(t\) as in (9.6); and
  • L-moments: as in (9.10).

9.3.1 MVSK portfolios

A natural and straightforward way to incorporate higher-order moments in Markowitz’s mean–variance framework is by optimizing a weighted combination of the first four moments, called mean-variance-skewness-kurtosis (MVSK) portfolio: \[\begin{equation} \begin{array}{ll} \underset{\w}{\textm{minimize}} & - \lambda_1 \phi_1(\w) + \lambda_2 \phi_2(\w) - \lambda_3 \phi_3(\w) + \lambda_4 \phi_4(\w)\\ \textm{subject to} & \w \in \mathcal{W}. \end{array} \tag{9.11} \end{equation}\]

The hyper-parameters \(\lambda_1\), \(\lambda_2\), \(\lambda_3\), and \(\lambda_4\) are chosen, as usual, according to the investor’s risk aversion. Observe that a reasonable investor seeks higher values of the first and third moments (i.e., mean and skewness), \(\phi_1(\w)\) and \(\phi_3(\w)\), and lower values of the second and fourth moments (i.e., variance and kurtosis), \(\phi_2(\w)\) and \(\phi_4(\w)\) (Briec et al., 2007; Martellini and Ziemann, 2010; Scott and Horvath, 1980). An interesting case of (9.11) arises when \(\lambda_1 = 0\), i.e., ignoring the mean like in the global minimum variance portfolio (GMVP) (see Section 6.5.1 in Chapter 6).

A convenient choice for the hyper-parameters is according to the constant relative risk aversion (CRRA) (Martellini and Ziemann, 2010): \[ \begin{aligned} \lambda_1 & = 1\\ \lambda_2 & = \frac{\gamma}{2}\\ \lambda_3 & = \frac{\gamma(\gamma + 1)}{6}\\ \lambda_4 & = \frac{\gamma(\gamma + 1)(\gamma + 2)}{24}, \end{aligned} \] where \(\gamma\geq0\) is the risk aversion parameter.

As customary, this high-order portfolio design could be alternatively formulated with any of the moments as constraints. For example, the feasibility problem \[ \begin{array}{ll} \underset{\w}{\textm{find}} & \w\\ \textm{subject to} & \phi_1(\w) \geq \alpha_1\\ & \phi_2(\w) \leq \alpha_2\\ & \phi_3(\w) \geq \alpha_3\\ & \phi_4(\w) \leq \alpha_4, \end{array} \] where the hyper-parameters are given by \(\alpha_1,\) \(\alpha_2,\) \(\alpha_3,\) and \(\alpha_4,\) denoting the investor’s preference.

Efficient numerical algorithms specifically designed to solve the MVSK formulation in (9.11) are discussed in Section 9.4, based on (Zhou and Palomar, 2021) and (X. Wang et al., 2023).

Expected-utility approximations

Expected utility theory in the context of portfolio design was explored in Section 7.3 of Chapter 7. The idea is to maximize the expected value of a utility function \(\E\left[U(\w^\T\bm{r})\right]\), where \(U(\cdot)\) denotes some utility function, instead of the mean–variance objective \(\w^\T\bmu - \frac{\lambda}{2}\w^\T\bSigma\w\).

Already in 1969, high-order portfolios were considered (W. E. Young and Trent, 1969) using the following approximation for the geometric mean of the returns: \[ \E\left[\textm{log}\left(1 + \w^\T\bm{r}\right)\right] \approx \textm{log}\left(1 + \phi_1(\w)\right) - \frac{\phi_2(\w)}{2\phi_1^2(\w)} + \frac{\phi_3(\w)}{3\phi_1^3(\w)} - \frac{\phi_4(\w)}{4\phi_1^4(\w)}, \] where the approximation up to the first two terms coincides to that in (7.16). High-order expansions were also considered for arbitrary expected utilities (Jean, 1971). More recently, (Martellini and Ziemann, 2010) considered high-order approximations of expected utilities with structured estimators of the moments as in (9.4).

9.3.2 Making portfolios efficient

The shortage function is an important quantity in multi-objective optimization related to the efficient frontier and the Pareto-optimal points (see Section A.7 in Appendix A).

The shortage function measures the distance between the moments of a portfolio and the efficient frontier along a given direction. Based on this concept of shortage function, given a reference portfolio \(\w^0\) and a direction vector \(\bm{g}\), we can optimize a portfolio by pushing the reference portfolio towards the efficient frontier along that direction (Briec et al., 2007; Jurczenko et al., 2006): \[\begin{equation} \begin{array}{ll} \underset{\w,\delta \geq0}{\textm{maximize}} & \delta\\ \textm{subject to} & \phi_1(\w) \geq \phi_1(\w^0) + \delta g_1\\ & \phi_2(\w) \leq \phi_2(\w^0) - \delta g_2\\ & \phi_3(\w) \geq \phi_3(\w^0) + \delta g_3\\ & \phi_4(\w) \leq \phi_4(\w^0) - \delta g_4. \end{array} \tag{9.12} \end{equation}\]

Observe that this formulation is always feasible. In the case that the reference portfolio \(\w^0\) was already on the efficient frontier, then the solution will be \(\w=\w^0\) and \(\delta=0\).

9.3.3 Portfolio tilting

The formulation in (9.12) to improve a given reference portfolio \(\w^0\) can be further extended by introducting a measure of portfolio optimality.

Suppose that the reference portfolio \(\w^0\) is obtained as the solution to the minimization of some cost function \(\xi(\cdot)\): \[ \w^0 = \textm{arg min}_{\w\in\mathcal{W}}\; \xi(\w). \] Some illustrative examples of the cost function \(\xi(\cdot)\) are

  • the Herfindahl index of the portfolio weights to promote diversity (see Section 7.1.5 in Chapter 7): \[ \xi(\w) = \sum_{i=1}^N w_i^2; \]
  • equalization of risk contributions (see risk parity portfolio in Chapter 11): \[ \xi(\w) = \sum_{i=1}^N \left(\frac{w_i(\bSigma\w)_i}{\w^\T\bSigma\w} - \frac{1}{N}\right)^2; \]
  • diversification ratio (see Section 6.5 in Chapter 6): \[ \xi(\w) = -\dfrac{\w^\T\bm{\sigma}}{\sqrt{\w^\T\bSigma\w}}; \]
  • tracking error of a benchmark portfolio \(\w^\textm{benchmark}\) (see index tracking in Chapter 13): \[ \xi(\w) = \sqrt{(\w - \w^\textm{benchmark})^\T\bSigma(\w - \w^\textm{benchmark})}. \]

The so-called MVSK portfolio tilting is formulated in (Boudt, Cornilly, Holle, et al., 2020) as \[\begin{equation} \begin{array}{ll} \underset{\w,\delta \geq0}{\textm{maximize}} & \delta\\ \textm{subject to} & \xi(\w) \leq \xi(\w^0) + \kappa\\ & \phi_1(\w) \geq \phi_1(\w^0) + g_1(\delta)\\ & \phi_2(\w) \leq \phi_2(\w^0) - g_2(\delta)\\ & \phi_3(\w) \geq \phi_3(\w^0) + g_3(\delta)\\ & \phi_4(\w) \leq \phi_4(\w^0) - g_4(\delta), \end{array} \tag{9.13} \end{equation}\] where \(g_i(\delta)\) are increasing functions of \(\delta\), and \(\kappa>0\) is the “sacrifice parameter” to allow for some loss of optimality with respect to the reference portfolio (according to the cost function \(\xi(\cdot)\)) in exchange for getting closer to the efficient frontier.

One simple way to choose the hyper-parameters is proportional to the reference values, for example: \[ \begin{aligned} \kappa & = 0.01\times\xi(\w^0)\\ g_1(\delta) & = \delta \times \phi_1(\w^0)\\ g_2(\delta) & = \delta \times \phi_2(\w^0)\\ g_3(\delta) & = \delta \times \phi_3(\w^0)\\ g_4(\delta) & = \delta \times \phi_4(\w^0). \end{aligned} \]

Efficient numerical algorithms specifically designed to solve the MVSK tilting portfolio formulation were developed in (Zhou and Palomar, 2021).

9.3.4 Polynomial goal programming MVSK portfolio

Another possible way to obtain a trade-off among the moments can be formulated as the so-called polynomial goal programming into which the investor’s preferences and objectives are incorporated. The formulation is based on minimizing the distance to some reference moments measured with a polynomial (Lai, 1991): \[ \begin{array}{ll} \underset{\w,\bm{d}\ge\bm{0}}{\textm{minimize}} & \left|\frac{d_1}{\phi_1^0}\right|^{\lambda_1} + \left|\frac{d_2}{\phi_2^0}\right|^{\lambda_2} + \left|\frac{d_3}{\phi_3^0}\right|^{\lambda_3} + \left|\frac{d_4}{\phi_4^0}\right|^{\lambda_4}\\ \textm{subject to} & \phi_1(\w) + d_1 \geq \phi_1^0\\ & \phi_2(\w) - d_2 \leq \phi_2^0\\ & \phi_3(\w) + d_3 \geq \phi_3^0\\ & \phi_4(\w) - d_4 \leq \phi_4^0, \end{array} \] where \(\bm{d}\) denotes the deviation from the so-called “aspired levels” of the moments \(\phi_1^0,\) \(\phi_2^0,\) \(\phi_3^0,\) and \(\phi_4^0,\) which can be obtained, for example, as the extreme values \(\phi_i^0 = \textm{max(min)}_{\w\in\mathcal{W}}\quad\phi_i(\w)\). Observe that these aspired levels are not jointly achievable by a single portfolio and that is where the vector variable \(\bm{d}\ge\bm{0}\) comes into play to relax the problem. If the aspired levels could be achieved by a portfolio \(\w^0\), then the optimal solution would simply be \(\w=\w^0\) and \(\bm{d}=\bm{0}\).

One particular case of this polynomial goal programming is when using the Minkovski distance (where the exponents are set to \(\lambda_i=1/p\)): \[ \begin{array}{ll} \underset{\w,\bm{d}\ge\bm{0}}{\textm{minimize}} & \begin{aligned}\left(\sum\limits_{i=1}^{4} \left|\frac{d_i}{\phi_i^0}\right|^{p}\right)^{1/p}\end{aligned}\\ \textm{subject to} & \phi_1(\w) + d_1 \geq \phi_1^0\\ & \phi_2(\w) - d_2 \leq \phi_2^0\\ & \phi_3(\w) + d_3 \geq \phi_3^0\\ & \phi_4(\w) - d_4 \leq \phi_4^0. \end{array} \]

9.3.5 L-moment portfolios

We now turn to the L-moments in (9.10) obtained from the sorted portfolio returns \[ \w^\T\bm{r}_{\tau(1)} \leq \w^\T\bm{r}_{\tau(2)} \leq \dots \leq \w^\T\bm{r}_{\tau(T)}, \] where the permutation \(\tau(\cdot)\) is a critical component that makes the problem nonconvex and difficult to handle.

Plugging the expressions of the L-moments in terms of sorted portfolio returns, as in (9.10), in the MSVK portfolio formulation in (9.11) leads to \[ \begin{array}{ll} \underset{\w}{\textm{maximize}} & \sum_{i=1}^T v_i \w^\T\bm{r}_{\tau(i)}\\ \textm{subject to} & \w \in \mathcal{W}, \end{array} \] for properly chosen weights \(v_i\).

Precisely the function in the objective \(\sum_{i=1}^T v_i \w^\T\bm{r}_{\tau(i)}\) involving ordered values is called ordered weighted averaging (OWA) and was already studied in the 1990s. It turns out that such a problem can be reformulated in terms of auxiliary integer (actually binary) variables, which shows that the problem is in general a nonconvex integer problem (Yager, 1996).

To be specific, the OWA problem \[\begin{equation} \begin{array}{ll} \underset{\w,\{x_t\}}{\textm{maximize}} & \sum_{i=1}^T v_i x_{(i)}\\ \textm{subject to} & x_t = \w^\T\bm{r}_t, \qquad t=1,\dots,T\\ & \w \in \mathcal{W} \end{array} \tag{9.14} \end{equation}\] is equivalent to the mixed-integer linear program (Yager, 1996) \[ \begin{array}{cl} \underset{\w,\{x_t\},\{y_t\},\{z_{ij}\}}{\textm{maximize}} & \begin{array}[t]{l}\begin{aligned}\sum_{i=1}^T v_i y_i\end{aligned}\end{array}\\ \textm{subject to} & \begin{array}[t]{ll} x_t = \w^\T\bm{r}_t, & t=1,\dots,T\\ \w \in \mathcal{W}\\ y_1 \leq y_2 \leq \dots \leq y_T\\ y_i\bm{1} \leq \bm{x} + M\bm{z}_i, & i=1,\dots,T\\ \bm{1}^\T\bm{z}_i \leq i-1\\ z_{ij} \in \{0,1\}, \end{array} \end{array} \] where the weights \(v_i\) are assumed to be nonnegative and \(M\) is a sufficiently large constant (much larger than any possible value that any of the \(x_t\) or \(y_t\) can take).

If the weights \(v_i\) are positive and decreasing, it was shown in (Ogryczak, 2000) that the OWA objective function is a concave piecewise linear function: \[ \sum_{i=1}^T v_i x_{(i)} = \underset{\tau\in\Pi}{\textm{min}} \left(\sum_{i=1}^T v_{\tau(i)} x_i\right), \] where \(\tau(\cdot)\) is a permutation and \(\Pi\) the set of all possible \(T!\) permutations for a set of length \(T\). Then the OWA problem (9.14) can be rewritten as the linear program \[ \begin{array}{ll} \underset{\w, s}{\textm{maximize}} & s\\ \textm{subject to} & s \leq \sum_{t=1}^T v_{\tau(t)} \w^\T\bm{r}_t,\quad\textm{ for all }\tau\in\Pi\\ & \w \in \mathcal{W}. \end{array} \] This formulation, unfortunately, has \(T!\) constraints involved in all possible permutations, which makes its usefulness questionable. An efficient dual implementation was considered in (Ogryczak and Sliwinski, 2003). An alternative is based on relaxing the set of permutations to its convex hull, which does not change the minimum value (Chassein and Goerigk, 2015).

Yet another reformulation of the OWA problem (9.14) is in terms of the cumulative ordered values, defined as \(\bar{x}_i = \sum_{j=1}^i x_{(i)}\), which allows to write \[ \sum_{i=1}^T v_i x_{(i)} = \sum_{i=1}^T v'_i \bar{x}_i, \] where \(v'_i = v_i - v_{i+1}\), for \(i=1,\dots,T-1\), and \(v'_T = v_T\). It was shown in (Ogryczak and Sliwinski, 2003) that \[ \bar{x}_i = \textm{max}_{y_i}\left\{i y_i - \bm{1}^\T (\bm{x} - y_i \bm{1})^+\right\}. \] Thus, if the weights \(v_i\) are positive and decreasing, then \(v'_i>0\) for \(i=1,\dots,T\) and the OWA formulation can be written as the following problem (Ogryczak and Sliwinski, 2003): \[ \begin{array}{ll} \underset{\w,\bm{x},\bm{y}}{\textm{maximize}} & \begin{aligned}\sum_{i=1}^T v'_i \left(i y_i - \bm{1}^\T (\bm{x} - y_i \bm{1})^+\right)\end{aligned}\\ \textm{subject to} & x_t = \w^\T\bm{r}_t, \qquad t=1,\dots,T\\ & \w \in \mathcal{W}, \end{array} \] which can be easily rewritten as a linear program as \[ \begin{array}{ll} \underset{\w,\bm{x},\bm{y},\bm{s}\geq\bm{0}}{\textm{maximize}} & \begin{array}[t]{l}\begin{aligned}\sum_{i=1}^T v'_i \left(i y_i - \bm{1}^\T s_i\right)\end{aligned}\end{array}\\ \textm{subject to} & \begin{array}[t]{ll} x_t = \w^\T\bm{r}_t, &t=1,\dots,T\\ s_i \geq \bm{x} - y_i \bm{1}, & i=1,\dots,T\\ \w \in \mathcal{W}. \end{array} \end{array} \]

Nevertheless, if the weights \(v_i\) are not positive and decreasing, then the problem cannot be simplified as above.

References

Boudt, K., Cornilly, D., Holle, F. V., and Willems, J. (2020). Algorithmic portfolio tilting to harvest higher moment gains. Heliyon, 6(3).
Briec, W., Kerstens, K., and Jokung, O. (2007). Mean-variance-skewness portfolio performance gauging: A general shortage function and dual approach. Management Science, 53(1), 135–149.
Chassein, A., and Goerigk, M. (2015). Alternative formulations for the ordered weighted averaging objective. Information Processing Letters, 115, 604–608.
Jean, W. H. (1971). The extension of portfolio analysis to three or more parameters. Journal of Financial and Quantitative Analysis, 6(1), 505–515.
Jurczenko, E., Maillet, B., and Merlin, P. (2006). Hedge fund portfolio selection with higher-order moments: A nonparametric mean–variance–skewness–kurtosis efficient frontier. In E. Jurczenko and B. Maillet, editors, Multi-moment asset allocation and pricing models, pages 51–66. Wiley.
Lai, T.-Y. (1991). Portfolio selection with skewness: A multiple- objective approach. Review of Quantitative Finance and Accounting, 1(3), 293–305.
Martellini, L., and Ziemann, V. (2010). Improved estimates of higher-order comoments and implications for portfolio selection. The Review of Financial Studies, 23(4), 1467–1502.
Ogryczak, W. (2000). Multiple criteria linear programming model for portfolio selection. Annals of Operations Research, 97, 143–162.
Ogryczak, W., and Sliwinski, T. (2003). On solving linear programs with the ordered weighted averaging objective. European Journal of Operational Research, 148, 80–91.
Scott, R. C., and Horvath, P. A. (1980). On the direction of preference for moments of higher order than the variance. The Journal of Finance, 35(4), 915–919.
Wang, X., Zhou, R., Ying, J., and Palomar, D. P. (2023). Efficient and scalable parametric high-order portfolios design via the skew-t distribution. IEEE Transactions on Signal Processing, 71, 3726–3740.
Yager, R. R. (1996). Constrained OWA aggregation. Fuzzy Sets and Systems, 81(1), 89–101.
Young, W. E., and Trent, R. H. (1969). Geometric mean approximations of individual security and portfolio performance. Journal of Financial and Quantitative Analysis, 4(2), 179–199.
Zhou, R., and Palomar, D. P. (2021). Solving high-order portfolios via successive convex approximation algorithms. IEEE Transactions on Signal Processing, 69, 892–904.