7.3 Utility-based portfolios
All the previous porfolio formulations in Sections 7.1 and 7.2 are based on some judicious combination of the mean and variance of the returns \(R^\textm{portf}_t = \w^\T\bm{r}_t\) (here \(\bm{r}_t\) denotes linear returns). However, it is possible to express the interest of the investor in a more general way via utility functions.
7.3.1 Kelly criterion portfolio
In 1956, a scientist working for Bell Labs, John Larry Kelly, Jr., brought together game theory and information theory (Kelly, 1956). He showed that in order to achieve maximum growth of wealth, a gambler should place bets that maximize the expected value of the logarithm of the capital, usually referred to as Kelly criterion. The Kelly criterion was applied to portfolio design in (Markowitz, 1959), see also (Thorp, 1971, 1997).35
Recall from (6.8) that, for a given fixed portfolio \(\w\), the returns are \(R^\textm{portf}_t = \w^\T\bm{r}_t\). From the geometric compounding of the portfolio wealth of NAV in (6.6), we can write the wealth accumulated during the periods \(t=1,\dots,T\) as \[ W_T = W_0\prod_{t=1}^T \frac{W_t}{W_{t-1}} = W_0\prod_{t=1}^T \left(1 + \w^\T\bm{r}_t\right), \] where \(W_0\) is the initial wealth and \(W_t\) the wealth at time period \(t\).
It turns out that the wealth grows exponentially as \(W_t\sim e^{t\times G}\) (Cover and Thomas, 1991), where the exponent \(G\) is the exponential rate of growth or, simply, growth rate: \[ G = \underset{T\rightarrow\infty}{\textm{lim}} \;\textm{log}\left(\frac{W_T}{W_0}\right)^{1/T}. \] The growth rate can be estimated asymptotically by the law of large numbers as \[ G = \underset{T\rightarrow\infty}{\textm{lim}} \; \frac{1}{T} \sum_{t=1}^T \textm{log}\left(1 + \w^\T\bm{r}_t\right) = \E\left[\textm{log}\left(1 + \w^\T\bm{r}\right)\right], \] where \(\bm{r}\) is a random variable with the same distribution as each \(\bm{r}_t\).
Maximizing the growth rate effectively maximizes the long-term wealth and it is a very compelling choice for portfolio design. We call this formulation Kelly criterion portfolio: \[\begin{equation} \begin{array}{ll} \underset{\w}{\textm{maximize}} & \E\left[\textm{log}\left(1 + \w^\T\bm{r}\right)\right]\\ \textm{subject to} & \bm{1}^\T\w=1, \quad \w\ge\bm{0}. \end{array} \tag{7.13} \end{equation}\] Good and bad properties of the Kelly criterion are discussed in (MacLean et al., 2010).
Problem (7.13) is convex since the log is a concave function and it is a maximization problem. In practice, however, we need to find an appropriate way to deal with the expected value in the objective function as we discuss next via sample averages, the exponential cone, and other approximations.
Solving the Kelly criterion portfolio directly via sample average
In practice, the expectation in problem (7.13) can be approximated by the sample mean: \[ \E\left[\textm{log}\left(1 + \w^\T\bm{r}\right)\right] \approx \frac{1}{T} \sum_{t=1}^T \textm{log}\left(1 + \w^\T\bm{r}_t\right). \] However, finding a solver that can deal directly with the log function may be challenging.
Solving the Kelly criterion portfolio via exponential cone programming
Interestingly, this problem can be reformulated in terms of the exponential cone (Dany Cajas, 2021b), leading to exponential cone programming for which solvers can be found.
Consider problem (7.13) with the sample average approximation and with the additional slack variables \(q_t\): \[ \begin{array}{ll} \underset{\w, \{q_t\}}{\textm{maximize}} & \frac{1}{T}\sum_{t=1}^T q_t\\ \textm{subject to} & q_t \leq \textm{log}\left(1 + \w^\T\bm{r}_t\right), \quad t=1,\dots,T\\ & \bm{1}^\T\w=1, \quad \w\ge\bm{0}. \end{array} \] The constraints \(q_t \leq \textm{log}\left(1 + \w^\T\bm{r}_t\right)\) can be equivalently written as \(\textm{exp}(q_t) \leq 1 + \w^\T\bm{r}_t\) and the Kelly portfolio can be finally written as the exponential cone program \[\begin{equation} \begin{array}{ll} \underset{\w, \{q_t\}}{\textm{maximize}} & \frac{1}{T}\sum_{t=1}^T q_t\\ \textm{subject to} & (q_t, 1, 1 + \w^\T\bm{r}_t) \in \mathcal{K}_\textm{exp}, \quad t=1,\dots,T\\ & \bm{1}^\T\w=1, \quad \w\ge\bm{0}, \end{array} \tag{7.14} \end{equation}\] where \(\mathcal{K}_\textm{exp}\) is the exponential cone (Chares, 2007) defined as \[ \mathcal{K}_{\textm{exp}} \triangleq \big\{(a,b,c) \mid c\geq b\,e^{a/b}, b>0\big\} \cup \big\{(a,b,c) \mid a\leq0, b=0, c\geq0\big\}. \]
Solving the Kelly criterion portfolio via mean–variance approximations
The most common way to deal with the expectation in the objective of problem (7.13) is via a first-order Taylor approximation36 around the point \(\bm{r}=\bm{0}\) (Markowitz, 1959): \[\begin{equation} \E\left[\textm{log}\left(1 + \w^\T\bm{r}\right)\right] \approx \w^\T\bmu - \frac{1}{2}\w^\T\bSigma\w, \tag{7.15} \end{equation}\] which is a surprising and beautiful justification for the mean–variance formulation in (7.1) (with \(\lambda=1\)), i.e., \[ \begin{array}{ll} \underset{\w}{\textm{maximize}} & \w^\T\bmu - \frac{1}{2}\w^\T\bSigma\w\\ \textm{subject to} & \bm{1}^\T\w=1, \quad \w\ge\bm{0}. \end{array} \]
It is possible to find better Taylor approximations than (7.15) around the point \(\bm{r}=\bmu\), such as (Markowitz, 1959) \[\begin{equation} \E\left[\textm{log}\left(1 + \w^\T\bm{r}\right)\right] \approx \textm{log}\left(1 + \w^\T\bmu\right) - \frac{1}{2}\frac{\w^\T\bSigma\w}{\left(1+\w^\T\bmu\right)^2} \tag{7.16} \end{equation}\] or, further approximated, \[\begin{equation} \E\left[\textm{log}\left(1 + \w^\T\bm{r}\right)\right] \approx \w^\T\bmu - \frac{1}{2}(\w^\T\bmu)^2 - \frac{1}{2}\frac{\w^\T\bSigma\w}{1+2\w^\T\bmu}. \tag{7.17} \end{equation}\]
One can also try an approximation over an interval (as opposed to the Taylor approximations, which focus on a single point) such as the Levy-Markowitz approximation (Levy and Markowitz, 1979): \[\begin{equation} \begin{aligned}[b] \E\left[\textm{log}\left(1 + \w^\T\bm{r}\right)\right] &\approx \frac{1}{2\kappa^2}\textm{log}\left(\left(1+\w^\T\bmu\right)^2 - \kappa^2 \w^\T\bSigma\w\right)\\ &\qquad + \left(1-\frac{1}{\kappa^2}\right)\textm{log}\left(1+\w^\T\bmu\right), \end{aligned} \tag{7.18} \end{equation}\] where \(\kappa\) measures the width of the approximating interval in standard deviations.
However, these other approximations may not bring any benefit in practice (Pulley, 1983) due to the nonconvexity and the fact that the parameters \(\bmu\) and \(\bSigma\) contain estimation errors of order of magnitude larger than the potential benefit of these refined approximations. See (Markowitz, 2014) for a historical perspective on mean–variance approximations. Chapter 9 explores higher order moments, i.e., skewness and kurtosis, to obtain better approximations for portfolio design.
7.3.2 Expected utility theory
The model of rational decision-making in most of economics and statistics is expected utility theory, which was axiomatized by von Neumann and Morgenstern (Neumann and Morgenstern, 1944) and Savage (Savage, 1954).
In the context of portfolio design, the utility \(U(\cdot)\) is a function of the random portfolio return \(\w^\T\bm{r}\) and the objective is the maximization of the expected utility: \[\begin{equation} \begin{array}{ll} \underset{\w}{\textm{maximize}} & \E\left[U(\w^\T\bm{r})\right]\\ \textm{subject to} & \bm{1}^\T\w=1, \quad \w\ge\bm{0}. \end{array} \tag{7.19} \end{equation}\]
The Kelly criterion portfolio in (7.13) is a particular case of the expected utility maximization in (7.19) by choosing the log utility \(U(x) = \textm{log}\left(1 + x\right)\). Some (concave) utilities of interest include:
- \(U(x) = \textm{log}\left(1 + x\right)\);
- \(U(x) = \sqrt{1 + x}\);
- \(U(x) = -\dfrac{1}{x}\);
- \(U(x) = -p\dfrac{1}{x^p}\) for \(p>0\) (as \(p\rightarrow0\) it converges to the log utility);
- \(U(x) = -\dfrac{1}{\sqrt{1 + x}}\); and
- \(U(x) = 1 - \textm{exp}(-\lambda x)\), with \(\lambda>0\) being the risk aversion parameter.
However, this general expected utility theory, while useful in theory, is an elusive concept that may be of little practical help when faced with investment decisions, as opposed to the statistically sound Kelly criterion. As stated in (Roy, 1952):
“In calling in a utility function to our aid, an appearance of generality is achieved at the cost of a loss of practical significance, and applicability in our results. A man who seeks advice about his actions will not be grateful for the suggestion that he maximise expected utility.”
Problem (7.19) is convex as long as the utility \(U(\cdot)\) is a concave function. Similarly to the Kelly criterion portfolio, the expected utility portfolio formulation can be numerically solved in a direct way or via mean–variance approximations as described next.
Solving the expected utility portfolio directly via sample average
In practice, problem (7.19) can be approximated by directly replacing the expectation in the objective function by the sample mean: \[ \E\left[U(\w^\T\bm{r})\right] \approx \frac{1}{T} \sum_{t=1}^T U\left(\w^\T\bm{r}_t\right). \] However, finding a solver that can deal directly with the utility function \(U(\cdot)\) may be challenging, even if it is a concave function.
Solving the expected utility portfolio via mean–variance approximations
Similarly to the Kelly criterion portfolio, the most common way to deal with the expectation in the objective of problem (7.19) is via a mean–variance approximations similar to that in (7.15) (see (Markowitz, 2014)).
We now consider some Taylor approximations around specific points37 (Markowitz, 1959) and the Levy-Markowitz approximation along an interval (more specifically on three points of the interval) (Levy and Markowitz, 1979).
The second-order Taylor approximation around the point \(\bm{r}=\bm{0}\) is \[\begin{equation} \begin{aligned}[b] \E\left[U(\w^\T\bm{r})\right] & \approx U(0) + U'(0)\;\E\left[\w^\T\bm{r}\right] + \frac{1}{2}U''(0)\;\E\left[(\w^\T\bm{r})^2\right]\\ & = U(0) + U'(0)\;\w^\T\bmu + \frac{1}{2}U''(0)(\w^\T\bSigma\w + (\w^\T\bmu)^2), \end{aligned} \tag{7.20} \end{equation}\] whereas the approximation around the point \(\bm{r}=\bmu\) reads \[\begin{equation} \begin{aligned}[b] \E\left[U(\w^\T\bm{r})\right] & \approx U(\w^\T\bmu) + U'(\w^\T\bmu)\E\left[\w^\T(\bm{r} - \bmu)\right] + \frac{1}{2}U''(\w^\T\bmu)\E\left[(\w^\T(\bm{r} - \bmu)^2\right]\\ & = U(\w^\T\bmu) + \frac{1}{2}U''(\w^\T\bmu)\w^\T\bSigma\w. \end{aligned} \tag{7.21} \end{equation}\]
An approximation that fits the utility \(U(\cdot)\) simultaneously on three points, namely, the mean \(\w^\T\bmu\) (like the previous Taylor approximation) and the two points \(\w^\T\bmu \pm \kappa\sqrt{\w^\T\bSigma\w}\) is [Levy and Markowitz (1979)]38 \[\begin{equation} \begin{aligned}[b] \E\left[U(\w^\T\bm{r})\right] &\approx U(\w^\T\bmu)\\ &+ \frac{U\left(\w^\T\bmu + \kappa\sqrt{\w^\T\bSigma\w}\right) + U\left(\w^\T\bmu - \kappa\sqrt{\w^\T\bSigma\w}\right) - 2U\left(\w^\T\bmu\right)}{2\kappa^2}. \end{aligned} \tag{7.22} \end{equation}\]
Many empirical analysis have concluded that these mean–variance approximations perform well in practice for real data, and their difference is negligible (Levy and Markowitz, 1979; Markowitz, 1959, 2014; Pulley, 1983). Chapter 9 explores higher order moments, i.e., skewness and kurtosis, in these approximations for portfolio design.
References
Edward O. Thorp, an American math professor, author, and blackjack player who wrote Beat the Dealer (Thorp, 1962), which became a classic and was the first book to prove mathematically that the house advantage in blackjack could be overcome by card counting.↩︎
The Taylor expansion of the log function around the point \(x=x_0\) is \[ \textm{log}(1 + x) = \textm{log}(1 + x_0) + \frac{1}{1 + x_0}(x - x_0) + \frac{1}{2}\frac{-1}{(1 + x_0)^2}(x - x_0)^2 + \cdots \]↩︎
The Taylor expansion of a utility function \(U(\cdot)\) around the point \(x=x_0\) is \[ U(x) \approx U(x_0) + U'(x_0)(x - x_0) + \frac{1}{2}U''(x_0)(x - x_0)^2 + \cdots \]↩︎
The Levy approximation of an expected utility around an interval of width \(\kappa\) standard deviations centered at the mean is \[ \E\left[U(X)\right] \approx U(\mu) + \frac{1}{2\kappa^2}\left[U(\mu+\kappa\sigma) + U(\mu-\kappa\sigma) - 2U(\mu) \right], \] where \(X\) denotes a random variable (with mean \(\mu\) and standard deviation \(\sigma\)) and \(U(\cdot)\) is the utility function.↩︎