A.7 Multi-objective Optimization
In Section A.5, we expanded the scope of inequality constraints to include vector values through the use of generalized inequalities, as illustrated in the conic programming (CP) formulation in (A.14). We will now briefly explore the implications of having a vector-valued objective function.
The reader interested in more details is referred to Chapter 2 in Boyd and Vandenberghe (2004) for generalized inequalities and Chapter 4 in Boyd and Vandenberghe (2004) for vector optimization, Pareto optimality, and multi-objective optimization.
A.7.1 Generalized Inequalities
A cone was defined in Section A.2 as a set \(\mathcal{K}\in\R^n\) such that for every \(\bm{x} \in \mathcal{K}\) and \(\theta \ge 0\) we have \(\theta \bm{x} \in \mathcal{K}\). A cone \(\mathcal{K}\) is called a proper cone if it is convex, closed, solid (i.e., has a nonempty interior), and pointed (i.e., it contains no line).
A proper cone \(\mathcal{K}\) can be used to define a generalized inequality, which is a partial ordering on \(\R^{n}\) that has many of the properties of the standard ordering on \(\R\): \[ \bm{x} \preceq_{\mathcal{K}} \bm{y}\quad\Longleftrightarrow\quad \bm{y} - \bm{x}\in\mathcal{K}. \] We also write \(\bm{x} \preceq_{\mathcal{K}} \bm{y}\) as \(\bm{y} \succeq_{\mathcal{K}} \bm{x}\).
Example A.24 (Nonnegative orthant and componentwise inequality) The nonnegative orthant \(\mathcal{K}=\R_+^n\) is a proper cone and its associated generalized inequality \(\preceq_{\mathcal{K}}\) corresponds to the componentwise inequality between vectors: \(\bm{x} \preceq_{\mathcal{K}} \bm{y}\) means \(x_i \leq y_i,\;i=1,\dots,m\).
Example A.25 (Positive semidefinite cone and matrix inequality) The positive semidefinite cone \(\mathbb{S}_{+}^{n}\) is a proper cone in the set of \(n\times n\) symmetric matrices \(\mathbb{S}^{n}\) and its associated generalized inequality \(\preceq_{\mathcal{K}}\) is the usual matrix inequality: \(\bm{X} \preceq_{\mathcal{K}} \bm{Y}\) means \(\bm{Y}-\bm{X}\) is positive semidefinite (typically written as \(\bm{Y} \succeq \bm{X}\)).
The notation of generalized inequality (i.e., \(\preceq_{\mathcal{K}}\)) is meant to suggest the analogy to ordinary inequality on \(\R\) (i.e., \(\leq\)). While many properties of ordinary inequality do hold for generalized inequalities, some important ones do not. The most obvious difference is that \(\leq\) on \(\R\) is a total ordering: any two points are comparable, meaning that we can always write \(x \leq y\) or \(y \leq x\). This property does not hold for other generalized inequalities as they simply define a partial ordering. One implication is that concepts like minimum and maximum are more complicated in the context of generalized inequalities.
We can also use generalized inequalities to extend the definition of a convex function from Section A.3 to the vector case. We say a function \(\bm{f}:\R^{n}\rightarrow\R^{q}\) is \(\mathcal{K}\)-convex if the domain is a convex set and, for all \(\bm{x},\bm{y}\in\operatorname*{dom}\bm{f}\) and \(\theta\in\left[0,1\right]\), \(\bm{f}(\theta\bm{x}+(1-\theta)\bm{y})\preceq_{\mathcal{K}}\theta\bm{f}( \bm{x})+(1-\theta)\bm{f}(\bm{y})\).
A.7.2 Vector Optimization
We denote a general vector optimization problem as \[\begin{equation} \begin{array}[c]{ll} \underset{\bm{x}}{\textm{minimize}}\;(\textm{with respect to }\mathcal{K}) & \begin{array}[c]{c} \bm{f}_{0}(\bm{x}) \end{array} \\ \textm{subject to} & \begin{array}[t]{l} f_{i}(\bm{x})\leq0,\\ h_{i}(\bm{x})=0, \end{array} \quad \begin{array}[t]{l} 1\leq i\leq m,\\ 1\leq i\leq p, \end{array} \end{array} \tag{A.24} \end{equation}\] where \(\mathcal{K}\in\R^q\) is a proper cone, \(\bm{f}_0:\R^n\rightarrow\R^q\) is the vector-valued objective function, \(f_{i}:\R^n\rightarrow\R\) are the inequality constraint functions, and \(h_{i}:\R^n\rightarrow\R\) are the equality constraint functions. The only difference between this problem and the standard optimization problem (A.1) is that here the objective function takes values in \(\R^q\) and the problem specification includes a proper cone \(\mathcal{K}\), which is used to compare objective values.
We say the vector optimization problem (A.24) is a convex vector optimization problem if the objective function \(\bm{f}_0\) is \(\mathcal{K}\)-convex, the inequality constraint functions \(f_1,\dots,f_m\) are convex, and the equality constraint functions \(h_1,\dots,h_p\) are affine (which can then be written as \(\bm{A}\bm{x}=\bm{b}\)).
The interpretation of problem (A.24) is not straightforward because the general inequality \(\preceq_{\mathcal{K}}\), which is used to compare different points based on their objective values \(\bm{f}_{0}(\bm{x})\), is not a total ordering but rather a partial ordering. This means we may encounter two points, \(\bm{x}\) and \(\bm{y}\), that are not comparable; that is, neither \(\bm{f}_{0}(\bm{x}) \preceq_{\mathcal{K}} \bm{f}_{0}(\bm{y})\) nor \(\bm{f}_{0}(\bm{y}) \preceq_{\mathcal{K}} \bm{f}_{0}(\bm{x})\) holds (a situation that cannot occur in the standard optimization problem (A.1)).
A.7.3 Pareto Optimality
Let us start with a special case in which the meaning of the vector optimization problem is clear. Consider the set of achievable objective values, that is, the set of objective values of feasible points: \[ \mathcal{O}=\left\{ \bm{f}_{0}\left(\bm{x}\right)\mid \bm{x}\textm{ is feasible}\right\} . \] We say this set has a minimum element if there is a feasible \(\bm{x}^\star\) such that \(\bm{f}_{0}(\bm{x}^\star) \preceq_{\mathcal{K}} \bm{f}_{0}(\bm{x})\) for all feasible \(\bm{x}\). Then we say that \(\bm{x}^\star\) is optimal for the problem (A.24) and refer to \(\bm{f}_{0}(\bm{x}^\star)\) as the optimal value. Using set notation, a point \(\bm{x}^\star\) is optimal if and only if it is feasible and \(\mathcal{O}\subseteq\bm{f}_{0}(\bm{x}^\star) + \mathcal{K}\). In this case, \(\bm{x}^\star\) is unambiguously a best choice for \(\bm{x}\) among the feasible points. However, most vector optimization problems do not have an optimal point because there may be other points that are not comparable via the cone \(\mathcal{K}\).
We now turn our attention to a more general case, which is common in most vector optimization problems of interest. In this case, the set of achievable objective values, \(\mathcal{O}\), does not have a minimum element. Consequently, the problem does not have an optimal point or optimal value. When \(\mathcal{O}\) lacks a minimum element, we can only discuss the so-called minimal elements. These are the best among the objective values that can be compared, although there are other objective values that cannot be compared. The points \(\bm{x}\) that achieve such minimal elements in the set of objective values \(\mathcal{O}\) are referred to as Pareto optimal points. We say that \(\bm{f}_{0}(\bm{x})\) is a Pareto optimal value for the vector optimization problem (A.24).
Thus, a point \(\bm{x}^\textm{po}\) is Pareto optimal if it is feasible and, for any other feasible \(\bm{x}\), \(\bm{f}_{0}(\bm{x}) \preceq_{\mathcal{K}} \bm{f}_{0}(\bm{x}^\textm{po})\) implies \(\bm{f}_{0}(\bm{x}) = \bm{f}_{0}(\bm{x}^\textm{po})\). Using set notation, a point \(\bm{x}^\textm{po}\) is Pareto optimal if and only if it is feasible and \(\left(\bm{f}_{0}(\bm{x}^\textm{po}) - \mathcal{K}\right) \cap \mathcal{O} = \bm{f}_{0}(\bm{x}^\textm{po})\). In words, \(\bm{x}^\star\) cannot be in the cone of points worse than any other point.
A vector optimization problem usually has many Pareto optimal values (and points), and they lie in the boundary of the set of achievable objective values, usually termed efficient frontier.
A.7.4 Multi-objective Optimization
When a vector optimization problem is based on the cone \(\mathcal{K}=\R_{+}^{q}\), that is, the nonnegative orthant, it is called a multi-objective or multi-criterion optimization problem. The components of the vector objective function \(\bm{f}_{0}\), denoted by \(F_1,\dots,F_q\), can be interpreted as \(q\) different scalar objectives to be minimized. The simplest case of multi-objective optimization is bi-objective or bi-criterion optimization problems, where we just have two objectives \(F_1(\bm{x})\) and \(F_2(\bm{x})\).
A multi-objective optimization problem is convex if the inequality constraint functions \(f_1,\dots,f_m\) are convex, the equality constraint functions \(h_1,\dots,h_p\) are affine (denoted then as \(\bm{A}\bm{x}=\bm{b}\)), and the objectives \(F_1,\dots,F_q\) are convex.
For two feasible points \(\bm{x}\) and \(\bm{y}\), \(F_i(\bm{x}) \leq F_i(\bm{y})\) means that \(\bm{x}\) is at least as good as \(\bm{y}\) and \(F_i(\bm{x}) < F_i(\bm{y})\) means that \(\bm{x}\) is better than \(\bm{y}\), according to the \(i\)th objective. We say that \(\bm{x}\) is better than \(\bm{y}\), or \(\bm{x}\) dominates \(\bm{y}\), if \(F_i(\bm{x}) \leq F_i(\bm{y})\) for \(i = 1,\dots,q\), and for at least one \(j\), \(F_j(\bm{x}) < F_j(\bm{y})\). In words, \(\bm{x}\) is better than \(\bm{y}\) if \(\bm{x}\) meets or beats \(\bm{y}\) on all objectives, and beats it in at least one objective.
In a multi-objective optimization problem, for a point to be considered optimal, \(\bm{x}^\star\), it has to be simultaneously optimal for each of the scalar problems: \[ F_i(\bm{x}^\star) \leq F_i(\bm{y}),\quad i=1,\dots,q. \] In general, however, this cannot happen unless in the trivial case that the objectives are noncompeting, since no compromises have to be made among the objectives. In most practical problems, there is naturally a trade-off among the objectives. Consequently, an optimal solution does not exists and we have to resort to the concept of Pareto optimality. For a point to be considered Pareto optimal, \(\bm{x}^\textm{po}\), it must be such that no objective can be improved without degrading at least one other objective.
The set of Pareto optimal values for a multi-objective problem is called the optimal trade-off surface or, when \(q = 2\), the optimal trade-off curve. In other contexts, this is also referred to as the efficient frontier.
A.7.5 Scalarization
Scalarization is a standard technique for finding Pareto optimal points for a vector optimization problem. Specifically, for a multi-objective optimization problem the scalarized problem is \[\begin{equation} \begin{array}[c]{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}[c]{c} \sum_{i=1}^{q}\lambda_i F_i(\bm{x}) \end{array} \\ \textm{subject to} & \begin{array}[t]{l} f_{i}(\bm{x})\leq0,\\ h_{i}(\bm{x})=0, \end{array} \quad \begin{array}[t]{l} 1\leq i\leq m,\\ 1\leq i\leq p, \end{array} \end{array} \tag{A.25} \end{equation}\] where the \(\lambda_i\)’s are the weights associated to the objectives and must satisfy \(\lambda_i\ge0\).
An optimal point of the scalarized problem is Pareto optimal for the multi-objective optimization problem. By choosing different values for the weights we can obtain different Pareto optimal solutions for the multi-objective optimization problem. However, there may be some Pareto optimal points that cannot be obtained via scalarization.
For convex multi-objective optimization problems, the scalarized problem is also convex and yields all Pareto optimal points for different weights. That is, for every Pareto optimal point, there are some weights such that it is optimal in the scalarized problem.
Example A.26 (Regularized LS) Consider a modified least squares problem where the energy of the solution is also desired to be small. We can then consider a multi-objective optimization problem with two objectives:
- \(F_1(\bm{x}) = \left\Vert \bm{A}\bm{x}-\bm{b} \right\Vert _{2}^{2}\) is a measure of the regression error;
- \(F_2(\bm{x}) = \left\Vert \bm{x} \right\Vert _{2}^{2}\) is a measure of the energy of \(\bm{x}\).
The multi-objective optimization problem formulation is \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}}\;(\textm{with respect to }\R_+^2) & \begin{array}{c} f_0(\bm{x}) =\left(F_1(\bm{x}), F_2(\bm{x})\right) \end{array} \end{array} \] and its corresponding scalarization is \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c} \left\Vert \bm{A}\bm{x}-\bm{b} \right\Vert _{2}^{2} + \gamma\left\Vert \bm{x} \right\Vert _{2}^{2}\end{array}, \end{array} \] where \(\gamma\geq0\) is the weight to indicate the preference in the trade-off.