## 14.2 Robust portfolio optimization

After a brief introduction to robust optimization, we will illustrate its application to portfolio design based on the mean–variance formulation, although the techniques can be applied to other portfolio formulations.

### 14.2.1 Robust optimization

Consider a general mathematical optimization problem with optimization variable \(\bm{x}\) making explicit the dependency of the functions on the parameter \(\bm{\theta}\): \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}[t]{l} f_{0}(\bm{x}; \bm{\theta}) \end{array}\\ \textm{subject to} & \begin{array}[t]{l} f_{i}(\bm{x}; \bm{\theta})\leq0,\\ h_{i}(\bm{x}; \bm{\theta})=0, \end{array} \quad \begin{array}[t]{l} i=1,\ldots,m\\ i=1,\ldots,p, \end{array} \end{array} \]

where \(f_{0}\) is the objective function, \(f_{i},\; i=1,\ldots,m,\) are the inequality constraint functions, and \(h_{i},\; i=1,\ldots,p,\) are the equality constraint functions. The parameter \(\bm{\theta}\) is something given externally and not to be optimized. For example, in the case of the mean–variance portfolio, we have \(\bm{\theta}=(\bmu, \bSigma)\). We denote a solution to this problem by \(\bm{x}^\star(\bm{\theta})\), where the dependency on \(\bm{\theta}\) is again made explicit.

In practice, however, \(\bm{\theta}\) is unknown and has to be estimated as \(\hat{\bm{\theta}}\). The problem, of course, is that the solution obtained by solving the optimization problem using the estimated \(\hat{\bm{\theta}}\), denoted by \(\bm{x}^\star(\hat{\bm{\theta}})\), differs from the desired one: \(\bm{x}^\star(\hat{\bm{\theta}}) \neq \bm{x}^\star(\bm{\theta})\). The question is whether it is still approximately equal, \(\bm{x}^\star(\hat{\bm{\theta}}) \approx \bm{x}^\star(\bm{\theta})\), or totally different. The answer depends on each particular problem formulation; for the mean–variance portfolio formulation it turns out to be quite different (as illustrated in Figure 14.1).

There are several ways to make the problem robust to parameters errors, mainly:

*Stochastic optimization*relies on a probabilistic modeling of the parameter (Birge and Louveaux, 2011; Prekopa, 1995; Ruszczynski and Shapiro, 2003) and may include:- expectations: average behavior;
- chance constraints: probabilistic constraints.

*Worst-case robust optimization*relies on the definition of an uncertainty set for the parameter (Ben-Tal et al., 2009; Ben-Tal and Nemirovski, 2008; D. Bertsimas and Caramanis, 2011; Lobo, 2000).

#### Stochastic optimization

In stochastic robust optimization, the estimated parameter \(\hat{\bm{\theta}}\) is modeled as a random variable that fluctuates around its true value \(\bm{\theta}\). For convenience, instead, the true value is modeled as a random variable around the estimated value (with an additive noise) as \[ \bm{\theta} = \hat{\bm{\theta}} + \bm{\delta}, \] where \(\bm{\delta}\) is the estimation error modeled as a zero-mean random variable with some distribution such as Gaussian. In practice, it is critical to choose properly the covariance matrix (or shape) of the error term.

Then, rather than using a naive and wrong constraint of the form \[ f\big(\bm{x}; \hat{\bm{\theta}}\big) \le \alpha, \] we can use the “average constraint” \[ \E_{\bm{\theta}}\left[f(\bm{x}; \bm{\theta})\right] \le \alpha, \] where the expectation is with respect to the random variable \(\bm{\theta}\). The interpretation is that the constraint will be satisfied on average, which is a relaxation of the true constraint.

It is important to point out that this stochastic average constraint preserves convexity. That is, if \(f(\cdot; \bm{\theta})\) is convex for each \(\bm{\theta}\), then its expected value over \(\bm{\theta}\) is also convex (sum of convex functions preserves convexity, see Appendix A for details).

In practice, the expected value can be implemented in different ways; for example:

*brute-force sampling*: simply sample \(S\) times the random variable \(\bm{\theta}\) and use the constraint \[ \frac{1}{S}\sum_{i=1}^S f(\bm{x}; \bm{\theta}_i) \le \alpha; \]*adaptive sampling*: sample the random variable \(\bm{\theta}\) in a smart and efficient way at each iteration while the problem is being solved;*closed-form expression*: compute the expected value in closed form whenever possible (see Example 14.1);*stochastic programming*: a wide variety of numerical methods have been developed for stochastic optimization under the umbrella of stochastic programming (Birge and Louveaux, 2011; Prekopa, 1995; Ruszczynski and Shapiro, 2003).

**Example 14.1 (Stochastic average constraint in closed form) **Suppose we have the quadratic constraint \(f(\bm{x}; \bm{\theta}) = (\bm{c}^\T\bm{x})^2\), where the parameter is \(\bm{\theta} = \bm{c}\), modeled as \(\bm{c} = \hat{\bm{c}} + \bm{\delta}\), where \(\bm{\delta}\) is zero-mean with covariance matrix \(\bm{Q}\). Then, the expected value is
\[
\begin{aligned}
\E_{\bm{\theta}}\left[f(\bm{x}; \bm{\theta})\right]
&= \E_{\bm{\delta}}\left[\left((\hat{\bm{c}} + \bm{\delta})^\T\bm{x}\right)^2\right]\\
&= \E_{\bm{\delta}}\left[(\hat{\bm{c}}^\T\bm{x})^2 + \bm{x}^\T \bm{\delta} \bm{\delta}^\T \bm{x}\right]\\
&= (\hat{\bm{c}}^\T\bm{x})^2 + \bm{x}^\T \bm{Q} \bm{x},
\end{aligned}
\]
which, interestingly, has the form of the naive version plus a quadratic regularization term.

The problem with expectations and satisfying constraints on average is that there is no control about any specific realization of the estimation error. This relaxed control may not be acceptable in many applications as the constraint will be violated in many instances. The worst-case approach (considered later) tries to deal with this issue, but it may be too conservative. Chance constraints attempt to achieve a compromise between being too relaxed or overly conservative. They are also based on a statistical modeling of the estimation error, but instead of focusing on the average they focus on satisfying the constraint with some high probability (say, 95% of the cases). In this approach, the naive constraint \(f(\bm{x}; \hat{\bm{\theta}}) \le \alpha\) is replaced with \[ \textm{Pr}\left[ f(\bm{x}; \bm{\theta}) \le \alpha \right] \ge \epsilon, \] where \(\epsilon\) is the confidence level, say, \(\epsilon=0.95\) for 95%. Unfortunately, chance or probabilistic constraints are generally very hard to deal with and one typically has to resort to approximations (Ben-Tal et al., 2009; Ben-Tal and Nemirovski, 2008).

#### Worst-case robust optimization

In worst-case robust optimization, the parameter \(\bm{\theta}\) is not characterized statistically. Instead, it is assumed to lie within an uncertainty region near the estimated value: \[ \bm{\theta} \in \mathcal{U}_{\bm{\theta}}, \] where \(\mathcal{U}_{\bm{\theta}}\) is the uncertainty set centered at \(\hat{\bm{\theta}}\).

In practice, it is critical to choose properly the shape and size of the uncertainty set. Typical choices for the shape for a size of \(\epsilon\) are:

*spherical set*: \[\mathcal{U}_{\bm{\theta}} = \left\{\bm{\theta} \mid \|\bm{\theta} - \hat{\bm{\theta}}\|_2 \le \epsilon\right\};\]*box set*: \[\mathcal{U}_{\bm{\theta}} = \left\{\bm{\theta} \mid \|\bm{\theta} - \hat{\bm{\theta}}\|_\infty \le \epsilon\right\};\]*ellipsoidal set*: \[\mathcal{U}_{\bm{\theta}} = \left\{\bm{\theta} \mid \big(\bm{\theta} - \hat{\bm{\theta}}\big)^\T\bm{S}^{-1}\big(\bm{\theta} - \hat{\bm{\theta}}\big) \le \epsilon^2\right\},\] where \(\bm{S}\succ\bm{0}\) defines the shape of the ellipsoid.

Then, rather than using a naive and wrong constraint of the form \[ f\big(\bm{x}; \hat{\bm{\theta}}\big) \le \alpha, \] we can use the worst-case constraint \[ \underset{\bm{\theta} \in \mathcal{U}_{\hat{\bm{\theta}}}}{\textm{sup}} \; f(\bm{x}; \bm{\theta}) \le \alpha. \] The interpretation is that the constraint will be satisfied for any point inside the uncertainty set, which is a conservative approach.

It is important to point out that this worst-case constraint preserves convexity. That is, if \(f(\cdot; \bm{\theta})\) is convex for each \(\bm{\theta}\), then its worst-case over \(\bm{\theta}\) is also convex (pointwise maximum of convex functions preserves convexity, see Appendix A for details).

In practice, the expected value can be implemented in different ways; for example:

*brute-force sampling*: simply sample \(S\) times the uncertainty set \(\mathcal{U}_{\bm{\theta}}\) and use the constraint \[ \underset{i=1,\dots,S}{\textm{max}} \; f(\bm{x}; \bm{\theta}_i) \le \alpha \] or, equivalently, including \(S\) constraints \[ f(\bm{x}; \bm{\theta}_i) \le \alpha, \qquad i=1,\dots,S; \]*adaptive sampling algorithms*: sample the uncertainty set \(\mathcal{U}_{\hat{\bm{\theta}}}\) in a smart and efficient way at each iteration while the problem is being solved;*closed-form expression*: compute the supremum in closed form whenever possible (see Example 14.2);*via Lagrange duality*: in some cases, it is possible to rewrite the worst-case supremum as an infimum that can later be combined with the outer portfolio optimization layer;*saddle-point optimization*: as a last resort, since the worst-case design is expressed in the form of a min-max (minimax) formulation, numerical methods specifically designed for minimax problems or related saddle-point problems can be used (Bertsekas, 1999; Tütüncü and Koenig, 2004).

**Example 14.2 (Worst-case constraint in closed form) **Suppose we have the quadratic constraint \(f(\bm{x}; \bm{\theta}) = (\bm{c}^\T\bm{x})^2\), where the parameter is \(\bm{\theta} = \bm{c}\) and belongs to a spherical uncertainty set
\[\mathcal{U} = \left\{\bm{c} \mid \|\bm{c} - \hat{\bm{c}}\|_2 \le \epsilon\right\}.\]
Then, the worst-case value is
\[
\begin{aligned}
\underset{\bm{c} \in \mathcal{U}}{\textm{sup}} \; |\bm{c}^\T\bm{x}|
&= \underset{\|\bm{\delta}\| \le \epsilon}{\textm{sup}} \; |(\hat{\bm{c}} + \bm{\delta})^\T\bm{x}| \\
&= |\hat{\bm{c}}^\T\bm{x}| + \underset{\|\bm{\delta}\| \le \epsilon}{\textm{sup}} \; |\bm{\delta}^\T\bm{x}| \\
&= |\hat{\bm{c}}^\T\bm{x}| + \epsilon \|\bm{x}\|_2,
\end{aligned}
\]
where we have used the triangle inequality and Cauchy-Schwarz’s inequality (with upper bound achieved by \(\bm{\delta} = \epsilon \; \bm{x}/\|\bm{x}\|_2\)). Again, this expression has the form of the naive version plus a regularization term.

As a final note, it is worth mentioning that worst-case uncertainty philosophy can be applied to probability distributions, refered to as distributional uncertainty models or distributionally robust optimization (D. Bertsimas and Caramanis, 2011).

### 14.2.2 Robust worst-case portfolios

For portfolio design, we will focus on the worst-case robust optimization philosophy, due to its simplicity and convenience (Goldfarb and Iyengar, 2003; Lobo, 2000; Tütüncü and Koenig, 2004); see also textbooks such as (Cornuejols and Tütüncü, 2006, Chapter 20) and (Fabozzi, 2007, Chapter 12).

#### Worst-case mean vector \(\bmu\)

To study the uncertainty in \(\bmu\), we consider the global maximum return portfolio (GMRP) for an estimated mean vector \(\hat{\bmu}\) (see Chapter 6 for details): \[ \begin{array}{ll} \underset{\w}{\textm{maximize}} & \w^\T\hat{\bmu}\\ \textm{subject to} & \bm{1}^\T\w=1, \quad \w\ge\bm{0}, \end{array} \] which is is known to be terribly sensitive to estimation errors.

We will instead use the worst-case formulation: \[ \begin{array}{ll} \underset{\w}{\textm{maximize}} & \textm{inf}_{\bmu \in \mathcal{U}_{\bmu}} \; \w^\T\bmu\\ \textm{subject to} & \bm{1}^\T\w=1, \quad \w\ge\bm{0}, \end{array} \] where typical choices for the uncertainty region \(\mathcal{U}_{\bmu}\) are the ellipsoidal and box sets considered next.

**Lemma 14.1 (Worst-case mean vector under ellipsoidal uncertainty set) **Consider the ellipsoidal uncertainty set for \(\bmu\):
\[
\mathcal{U}_{\bmu} = \left\{\bmu = \hat{\bmu} + \kappa \bm{S}^{1/2}\bm{u} \mid \|\bm{u}\|_2 \le 1\right\},
\]
where \(\bm{S}^{1/2}\) is the symmetric square-root matrix of the shape \(\bm{S}\) (a typical choice is the covariance matrix scaled with the number of observations \(\bm{S}=(1/T)\bSigma\)) and \(\kappa\) determines the size of the ellipsoid. Then, the worst-case value of \(\w^\T\bmu\) is
\[
\begin{aligned}
\underset{\bmu \in \mathcal{U}_{\bmu}}{\textm{inf}} \; \w^\T\bmu
&= \underset{\|\bm{u}\| \le 1}{\textm{inf}} \; \w^\T( \hat{\bmu} + \kappa \bm{S}^{1/2}\bm{u}) \\
&= \w^\T\hat{\bmu} + \kappa\; \underset{\|\bm{u}\| \le 1}{\textm{inf}} \; \w^\T\bm{S}^{1/2}\bm{u} \\
&= \w^\T\hat{\bmu} - \kappa\; \|\bm{S}^{1/2}\w\|_2,
\end{aligned}
\]
where the last equality follows from Cauchy-Schwarz’s inequality with \(\bm{u} = -\bm{S}^{1/2}\w/\|\bm{S}^{1/2}\w\|_2\).

**Lemma 14.2 (Worst-case mean vector under box uncertainty set) **Consider the box uncertainty set for \(\bmu\):
\[
\mathcal{U}_{\bmu} = \left\{\bmu \mid -\bm{\delta} \le \bmu - \hat{\bmu} \le \bm{\delta}\right\},
\]
where \(\bm{\delta}\) is the half-width of the box in all dimensions. Then, the worst-case value of \(\w^\T\bmu\) is
\[
\begin{aligned}
\underset{\bmu \in \mathcal{U}_{\bmu}}{\textm{inf}} \; \w^\T\bmu
&= \w^\T\hat{\bmu} - |\w|^\T\bm{\delta},
\end{aligned}
\]
where \(|\w|\) denotes the elementwise absolute value of \(\w\).

Thus, under an ellipsoidal uncertainty set as in Lemma 14.1, the robust version of the GMRP is \[ \begin{array}{ll} \underset{\w}{\textm{maximize}} & \w^\T\hat{\bmu} - \kappa\; \|\bm{S}^{1/2}\w\|_2\\ \textm{subject to} & \bm{1}^\T\w=1, \quad \w\ge\bm{0}, \end{array} \] which is still a convex problem but now the problem complexity has increased to that of a second-order cone program (from a simple linear program in the naive formulation).

Similarly, under a box uncertainty set as in Lemma 14.2, the robust version of the GMRP is \[ \begin{array}{ll} \underset{\w}{\textm{maximize}} & \w^\T\hat{\bmu} - |\w|^\T\bm{\delta}\\ \textm{subject to} & \bm{1}^\T\w=1, \quad \w\ge\bm{0}, \end{array} \] which is still a convex problem and can be rewritten as a linear program after getting rid of the absolute value (see Appendix A for details). In fact, under the constraints \(\bm{1}^\T\w=1\) and \(\w\ge\bm{0}\), the problem can be reduced to a naive GMRP where \(\hat{\bmu}\) is replaced by \(\hat{\bmu} - \bm{\delta}\).

There are other uncertainty sets that can also be considered, such as the \(\ell_1\)-norm ball. The following example illustates a rather interesting case of how the otherwise heuristic quintile portfolio (see Section 6.4.4 in Chapter 6 for details) can be formally derived as a robust portfolio.

**Example 14.3 (Quintile portfolio as a robust portfolio) **The quintile portfolio is a heuristic portfolio widely used by practitioners (see Section 6.4.4 in Chapter 6 for details). It selects the top fifth of the assets (one could also select a different fraction of the assets) over which the capital is equally allocated. This is a common-sense heuristic portfolio widely used in practice. Interestingly, it can be shown to be the optimal solution to the worst-case GMRP with an \(\ell_1\)-norm ball uncertainty set around the estimated mean vector (Zhou and Palomar, 2020):
\[
\mathcal{U}_{\bmu} = \left\{\hat{\bmu} + \bm{e} \mid \|\bm{e}\|_1 \leq \epsilon\right\}.
\]

#### Worst-case covariance matrix \(\bSigma\)

To study the uncertainty in \(\bSigma\), we consider the global minimum variance portfolio (GMVP) for an estimated mean vector \(\hat{\bSigma}\) (see Chapter 6 for details): \[ \begin{array}{ll} \underset{\w}{\textm{minimize}} & \w^\T\hat{\bSigma}\w\\ \textm{subject to} & \bm{1}^\T\w=1, \quad \w\ge\bm{0}, \end{array} \] which is sensitive to estimation errors (albeit not as much as the previous sensitivity to errors in \(\bmu\)).

We will instead use the worst-case formulation: \[ \begin{array}{ll} \underset{\w}{\textm{minimize}} & \textm{sup}_{\bSigma \in \mathcal{U}_{\bSigma}} \; \w^\T\bSigma\w\\ \textm{subject to} & \bm{1}^\T\w=1, \quad \w\ge\bm{0}, \end{array} \] where typical choices for the uncertainty region \(\mathcal{U}_{\bSigma}\) are the spherical, ellipsoidal, and box sets considered next.

**Lemma 14.3 (Worst-case covariance matrix under a data spherical uncertainty set) **Consider the spherical uncertainty set for the data matrix \(\bm{X}\in\R^{T\times N}\) containing \(T\) observations of the \(N\) assets:
\[
\mathcal{U}_{\bm{X}} = \left\{\bm{X} \mid \|\bm{X} -\hat{\bm{X}}\|_\textm{F} \le \epsilon\right\},
\]
where \(\epsilon\) determines the size of the sphere. Then, the worst-case value of \(\sqrt{\w^\T\bSigma\w}\) under a sample covariance matrix estimation \(\hat{\bSigma} = \frac{1}{T}\hat{\bm{X}}^\T\hat{\bm{X}}\) is
\[
\begin{aligned}
\underset{\bm{X} \in \mathcal{U}_{\bm{X}}}{\textm{sup}} \; \sqrt{\w^\T\left(\frac{1}{T}\bm{X}^\T\bm{X}\right)\w}
&= \underset{\bm{X} \in \mathcal{U}_{\bm{X}}}{\textm{sup}} \; \frac{1}{\sqrt{T}} \|\bm{X}\w\|_2\\
&= \underset{\|\bm{\Delta}\|_\textm{F} \le \epsilon}{\textm{sup}} \; \frac{1}{\sqrt{T}} \left\|(\hat{\bm{X}} + \bm{\Delta})\w\right\|_2\\
&= \frac{1}{\sqrt{T}} \left\|\hat{\bm{X}}\w\right\|_2 +
\underset{\|\bm{\Delta}\|_\textm{F} \le \epsilon}{\textm{sup}} \; \frac{1}{\sqrt{T}} \left\|\bm{\Delta}\w\right\|_2\\
&= \frac{1}{\sqrt{T}} \left\|\hat{\bm{X}}\w\right\|_2 + \frac{1}{\sqrt{T}} \epsilon \|\w\|_2,
\end{aligned}
\]
where the third equality follows from the triangle inequality and it is achieved when \(\hat{\bm{X}}\w\) and \(\bm{\Delta}\w\) are aligned.

**Lemma 14.4 (Worst-case covariance matrix under an ellipsoidal uncertainty set) **Consider the ellipsoidal uncertainty set for \(\bSigma\):
\[
\mathcal{U}_{\bSigma} = \left\{\bSigma \succeq \bm{0} \mid \big(\textm{vec}(\bSigma) - \textm{vec}(\hat{\bSigma})\big)^\T \bm{S}^{-1} \big(\textm{vec}(\bSigma) - \textm{vec}(\hat{\bSigma})\big) \le \epsilon^2\right\},
\]
where \(\textm{vec}(\cdot)\) denotes the vec operator that stacks the matrix argument into a vector, matrix \(\bm{S}\) gives the shape of the ellipsoid, and \(\epsilon\) determines the size. Then, the worst-case value of \(\w^\T\bSigma\w\), given by the convex semidefinite problem
\[
\begin{array}{ll}
\underset{\bSigma}{\textm{maximize}} & \w^\T\bSigma\w\\
\textm{subject to}
& \big(\textm{vec}(\bSigma) - \textm{vec}(\hat{\bSigma})\big)^\T \bm{S}^{-1} \big(\textm{vec}(\bSigma) - \textm{vec}(\hat{\bSigma})\big) \le \epsilon^2\\
& \bSigma \succeq \bm{0},
\end{array}
\]
can be rewritten as the Lagrange dual problem:
\[
\begin{array}{ll}
\underset{\bm{Z}}{\textm{minimize}} &
\textm{Tr}\left(\hat{\bSigma}\left(\w\w^\T+\bm{Z}\right)\right) + \epsilon\left\| \bm{S}^{1/2}\left(\textm{vec}(\w\w^\T)+\textm{vec}(\bm{Z})\right)\right\|_{2}\\
\textm{subject to}
& \bm{Z} \succeq \bm{0},
\end{array}
\]
which is a convex semidefinite problem.

**Lemma 14.5 (Worst-case covariance matrix under a box uncertainty set) **Consider the box uncertainty set for \(\bSigma\):
\[
\mathcal{U}_{\bSigma} = \left\{\bSigma \succeq \bm{0} \mid \underline{\bSigma} \le \bSigma \le \overline{\bSigma}\right\},
\]
where \(\underline{\bSigma}\) and \(\overline{\bSigma}\) denote the elementwise lower- and upper-bounds, respectively. Then, the worst-case value of \(\w^\T\bSigma\w\), given by the convex semidefinite problem
\[
\begin{array}{ll}
\underset{\bSigma}{\textm{maximize}} & \w^\T\bSigma\w\\
\textm{subject to}
& \underline{\bSigma} \le \bSigma \le \overline{\bSigma}\\
& \bSigma \succeq \bm{0},
\end{array}
\]
can be rewritten as the Lagrange dual problem (Lobo, 2000):
\[
\begin{array}{ll}
\underset{\overline{\bm{\Lambda}},\,\underline{\bm{\Lambda}}}{\textm{minimize}} &
\textm{Tr}\big(\overline{\bm{\Lambda}}\,\overline{\bSigma}\big) - \textm{Tr}\big(\underline{\bm{\Lambda}}\,\underline{\bSigma}\big)\\
\textm{subject to}
& \begin{bmatrix}
\overline{\bm{\Lambda}} - \underline{\bm{\Lambda}} & \w\\
\w^\T & 1
\end{bmatrix} \succeq \bm{0}\\
& \overline{\bm{\Lambda}}\geq\bm{0},\quad\underline{\bm{\Lambda}}\geq\bm{0},
\end{array}
\]
which is a convex semidefinite problem.

Thus, under a spherical uncertainty set for the data matrix as in Lemma 14.3, the robust version of the GMVP is \[ \begin{array}{ll} \underset{\w}{\textm{minimize}} & \left(\left\|\hat{\bm{X}}\w\right\|_2 + \epsilon \|\w\|_2\right)^2\\ \textm{subject to} & \bm{1}^\T\w=1, \quad \w\ge\bm{0}, \end{array} \] which is still a convex problem but now the problem complexity has increased to that of a second-order cone program (from a simple quadratic program in the naive formulation).

Interestingly, this problem bears a striking resemblance with a common heuristic called Tikhonov regularization: \[ \begin{array}{ll} \underset{\w}{\textm{minimize}} & \left\|\hat{\bm{X}}\w\right\|_2^2 + \epsilon \|\w\|_2^2\\ \textm{subject to} & \bm{1}^\T\w=1, \quad \w\ge\bm{0}, \end{array} \] where the objective function can be rewritten as \(\w^\T\frac{1}{T}\left(\hat{\bm{X}}^\T\hat{\bm{X}} + \epsilon \bm{I}\right)\w\), which leads to a regularized sample covariance matrix (see Section 3.6.1 in Chapter 3 for details on shrinkage).

Regarding the worst-case formulations that involve the Lagrange dual problem, the outer and inner minimizations can be simply written as a joint minimization. For example, under an ellipsoidal uncertainty set for the covariance matrix as in Lemma 14.4, the robust version of the GMVP is (Feng and Palomar, 2016) \[ \begin{array}{ll} \underset{\w,\bm{W},\bm{Z}}{\textm{minimize}} & \textm{Tr}\left(\hat{\bSigma}\left(\bm{W}+\bm{Z}\right)\right) + \epsilon\left\| \bm{S}^{1/2}\left(\textm{vec}(\bm{W})+\textm{vec}(\bm{Z})\right)\right\|_{2}\\ \textm{subject to} & \begin{bmatrix} \bm{W} & \w\\ \w^\T & 1 \end{bmatrix} \succeq \bm{0}\\ & \bm{1}^\T\w=1, \quad \w\ge\bm{0}\\ & \bm{Z} \succeq \bm{0}, \end{array} \] which is still a convex problem but now the problem complexity has increased to that of a semidefinite program (from a simple quadratic program in the naive formulation). Note that the first matrix inequality is equivalent to \(\bm{W}\succeq\w\w^\T\) and, at an optimal point, it can be shown to be satisfied with equality \(\bm{W}=\w\w^\T\).

#### Worst-case mean vector \(\bmu\) and covariance matrix \(\bSigma\)

The previous worst-case formulations against uncertainty in the mean vector \(\bmu\) and the covariance matrix \(\bSigma\) can be trivially combined under the mean–variance portfolio formulation.

For illustration purposes, we consider the the mean–variance worst-case portfolio formulation under box uncertainty sets for \(\bmu\) and \(\bSigma\), as in Lemmas 14.2 and 14.5, given by \[ \begin{array}{ll} \underset{\w,\,\overline{\bm{\Lambda}},\,\underline{\bm{\Lambda}}}{\textm{maximize}} & \w^\T\hat{\bmu} - |\w|^\T\bm{\delta} - \frac{\lambda}{2} \left( \textm{Tr}\big(\overline{\bm{\Lambda}}\,\overline{\bSigma}\big) - \textm{Tr}\big(\underline{\bm{\Lambda}}\,\underline{\bSigma}\big) \right)\\ \textm{subject to} & \bm{1}^\T\w=1, \quad \w\ge\bm{0},\\ & \begin{bmatrix} \overline{\bm{\Lambda}} - \underline{\bm{\Lambda}} & \w\\ \w^\T & 1 \end{bmatrix} \succeq \bm{0}\\ & \overline{\bm{\Lambda}}\geq\bm{0},\quad\underline{\bm{\Lambda}}\geq\bm{0}, \end{array} \] which is a convex semidefinite problem.

Other more sophisticated uncertainty sets can be considered for the mean vector and the covariance matrix, such as based on factor modeling (Goldfarb and Iyengar, 2003).

#### Worst-case of other performance measures

Apart from the mean–variance framework for portfolio optimization, many other portfolio designs can be formulated based on other objective functions or performance measures (see Chapter 10 for details), as well as higher-order moments (see Chapter 9) or risk-parity portfolios (see Chapter 11). Each of such formulations can be robustified so that the solution becomes more stable and robust against estimation errors in the parameters. For example, the worst-case Sharpe ratio can be easily managed, as well as the the worst-case value-at-risk (VaR) (El Ghaoui et al., 2003).

### 14.2.3 Numerical experiments

The effectiveness of the different robust portfolio formulations depends on the shape as well as the size of the uncertainty region. These are parameters that have to be properly chosen and tuned to the type of data and nature of the financial market. Therefore, it is easy to overfit the model to the training data and extra care has to be taken.

The goal of a robust design should be in making the solution more stable and less sensitive to the error realization, but not necessarily improving the performance. In other words, one should aim at gaining robustness but not the best performance for a given backtest compared to a naive design.

We now evaluate robust portfolios under errors in the mean vector \(\bmu\). Recall that robustness for the covariance matrix \(\bSigma\) is not as critical, because the bottleneck in the estimation part is on the mean vector (see Chapter 3 for details).

#### Sensitivity of robust portfolios

The extreme sensitivity of a naive mean–variance portfolio was shown in Figure 14.1. We now repeat the same numerical experiment with a robust formulation (still \(\lambda=1\)).

Figure 14.2 shows the sensitivity of a robust portfolio under an ellipsoidal uncertainty set for the mean vector \(\bmu\) over six different realizations of the estimation error. Compared to Figure 14.1, it is clearly more stable and less sensitive. Many other variations in the robust formulation can be similarly considered with similar results.

#### Comparison of naive versus robust portfolios

Now that we have empirically observed the improved stability of mean–variance robust designs, we can assess their performance (with a box and ellipsoidal uncertainty sets for the mean vector \(\bmu\)) in comparison with the naive design. Backtests are conducted for 50 randomly chosen stocks from the S&P 500 during 2017-2020.

Figure 14.3 shows the empirical distribution of the achieved mean–variance objective, as well as the Sharpe ratio, calculated over 1,000 Monte Carlo noisy observations. We can see that the robust designs avoid the worst-case realizations (left tail in the distributions) at the expense of not achieving the best-case realizations (right tail); thus, they are more stable and robust as expected.

Figure 14.4 shows the cumulative return and drawdown of the naive and robust mean–variance portfolios for an illustrative backtest. We can observe how the robust portfolios seem to be more robust indeed. However, this is just a single backtest and more exhaustive multiple backtest are necessary.

Finally, multiple backtests are conducted for 200 realizations of 50 randomly chosen stocks from the S&P 500 during random periods along 2015-2020. Figure 14.5 shows the results in terms of Sharpe ratio and drawdown, confirming that the robust portfolios are superior to the naive portfolio.

### References

*Robust optimization*. Princeton University Press.

*Mathematical Programming*,

*112*(1).

*Nonlinear programming*. Athena Scientific.

*Introduction to stochastic programming*. Springer.

*Optimization methods in finance*. Cambridge University Press.

*SIAM Review*,

*53*(3).

*Operations Research*,

*51*(4), 543–556.

*Robust portfolio optimization and management*. Wiley.

*A signal processing perspective on financial engineering*. Foundations and Trends in Signal Processing, Now Publishers.

*Mathematics of Operations Research*,

*28*(1), 1–38.

*Robust and convex optimization with applications in finance*(PhD thesis). Stanford University.

*Stochastic programming*. Kluwer Academic Publishers.

*Stochastic programming*. Elsevier.

*Annals of Operations Research*,

*132*(1), 157–187.

*IEEE Transactions on Signal Processing*,

*68*, 4030–4040.