A.4 Convex optimization problems
We will now delve into the basics of convex optimization problems. For more detailed information, readers are referred to (S. P. Boyd and Vandenberghe, 2004, Chapter 4).
If the objective and inequality constraint functions of the general optimization problem (A.1) are convex and the equality constraint functions are linear (or, more generally, affine), the problem is then a convex optimization problem or convex program.
We can write a convex optimization problem in standard form as \[\begin{equation} \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c}f_{0}(\bm{x})\end{array}\\ \textm{subject to} & \begin{array}[t]{l} f_{i}(\bm{x})\leq0,\quad i=1,\ldots,m\\ \bm{A}\bm{x}=\bm{b}, \end{array} \end{array} \tag{A.6} \end{equation}\] where \(f_{0},\,f_{1},\ldots,f_{m}\) are convex and the \(p\) equality constraints are affine with \(\bm{A} \in \R^{p\times n}\) and \(\bm{b} \in \R^{p}\).
Convex problems enjoy a rich body of theory and availability of algorithms with desirable convergence properties. However, most problems are not convex when naturally formulated. Reformulating a nonconvex problem in convex form may still be possible, but it is an art and there is no systematic way.
A fundamental property of convex optimization problems is that any locally optimal point is also (globally) optimal.
A.4.1 Optimality characterization
Suppose \(f_0\) is differentiable. Then, from the first-order characterization of convexity in Section A.3, we have that \[ f_0(\bm{y})\geq f_0(\bm{x})+\nabla f_0(\bm{x})^\T(\bm{y} - \bm{x}) \] for all \(\bm{x},\bm{y}\in\textm{dom}\,f_0\).
Then, a feasible point \(\bm{x}\) is optimal if and only if \[\begin{equation} \nabla f_0(\bm{x})^\T(\bm{y} - \bm{x}) \geq0 \textm{ for all }y\in\mathcal{X}, \tag{A.7} \end{equation}\] where \(\mathcal{X}\) denotes the feasible set. This is the so-called minimum principle. Geometrically, it means that the gradient \(\nabla f_0(\bm{x})\) defines a supporting hyperplane.
The minimum principle in (A.7) may be difficult to manage in most practical cases. One more convenient characterization of optimality, when the feasible set \(\mathcal{X}\) is explicitly given in terms of constraint functions, is the so-called KKT optimality conditions as elaborated in Section A.6. Two simple illustrative examples are considered next.
Example A.4 (Unconstrained minimization problem) For an unconstrained problem (i.e., \(m = p = 0\) with feasible set \(\mathcal{X}=\R^n\)), the condition (A.7) reduces to the well-known necessary and sufficient condition \[\nabla f_0(\bm{x})=0\] for \(\bm{x}\) to be optimal.
Example A.5 (Minimization over the nonnegative orthant) Consider the problem \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c} f_0(\bm{x})\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{x} \ge \bm{0}, \end{array} \end{array} \] where the only inequality constraints are nonnegativity constraints on the variables and there are no equality constraints.
The optimality condition (A.7) becomes \[ \bm{x} \geq \bm{0}, \quad \nabla f_0(\bm{x})^\T(\bm{y} - \bm{x}) \geq \bm{0} \textm{ for all }\bm{y} \geq \bm{0}. \]
The term \(\nabla f_0(\bm{x})^\T\bm{y}\) is unbounded below on \(\bm{y} \geq \bm{0}\), unless \(\nabla f_0(\bm{x}) \ge \bm{0}\). The condition reduces to \(-\nabla f_0(\bm{x})^\T\bm{x} \geq \bm{0}\), which further becomes \(\nabla f_0(\bm{x})^\T\bm{x} = \bm{0}\), due to \(\bm{x} \geq \bm{0}\) and \(\nabla f_0(\bm{x}) \ge \bm{0}\), and also elementwise \(\left(\nabla f_0(\bm{x})\right)_i x_i = 0\). This means that if \(x_i>0\), i.e., the \(i\)th component is in the interior of the feasible set, then \(\left(\nabla f_0(\bm{x})\right)_i = 0\). So we can finally write the optimality conditions as \[ \bm{x} \geq \bm{0}, \quad \nabla f_0(\bm{x}) \ge \bm{0}, \quad \left(\nabla f_0(\bm{x})\right)_i x_i = 0, \quad i=1,\dots,n. \] Note that the condition \(\left(\nabla f_0(\bm{x})\right)_i x_i = 0\) implies that the two conditions \(\left(\nabla f_0(\bm{x})\right)_i > 0\) and \(x_i > 0\) cannot both be true. This is known as a complementary condition, which we will revisit in Section A.6.
A.4.2 Equivalent reformulations
As previously mentioned, most problems are not convex when naturally formulated. In some cases, fortunately, there is a hidden convexity that can be unveiled by properly reformulating an equivalent problem. However, there is no systematic way to reformulate a problem in convex form: it is rather an art.
Two problems are considered equivalent if a solution to one can be easily converted into a solution for the other, and vice versa. A stricter form of this equivalence can be established by requiring a mapping between the two problems for every feasible point, not just for the optimal solutions.
Example A.6 Consider the problem \[ \begin{array}{ll} \underset{x}{\textm{minimize}} & \begin{array}{c} 1/\left( 1+x^{2}\right)\end{array}\\ \textm{subject to} & \begin{array}[t]{l} x^{2}\geq1, \end{array} \end{array} \] which is nonconvex (both the cost function and the constraint are nonconvex). It can be rewritten in convex form, after the change of variable \(y=x^{2}\), as \[ \begin{array}{ll} \underset{y}{\textm{minimize}} & \begin{array}{c} 1/\left( 1+y\right)\end{array}\\ \textm{subject to} & \begin{array}[t]{l} y\geq1, \end{array} \end{array} \] and the optimal points \(x\) can be recovered from the optimal \(y\) as \(x=\pm\sqrt{y}\).
Example A.7 This example does not employ a change of variable, but transforms the problematic functions into more convenient ones. Consider the problem \[ \begin{array}{ll} \underset{x_1,x_2}{\textm{minimize}} & \begin{array}{c} x_{1}^{2}+x_{2}^{2}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} x_{1}/\left(1+x_{2}^{2}\right)\leq0\\ \left(x_{1}+x_{2}\right)^{2}=0, \end{array} \end{array} \] which is nonconvex (the inequality constraint function is nonconvex and the equality constraint function is not affine). It can be equivalently rewritten as the convex problem \[ \begin{array}{ll} \underset{x_1,x_2}{\textm{minimize}} & \begin{array}{c} x_{1}^{2}+x_{2}^{2}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} x_{1}\leq0\\ x_{1}=-x_{2}. \end{array} \end{array} \]
Example A.8 The class of geometric problems (GPs) is a very important example of nonconvex problems that can be reformulated in convex form by a change of variable. This is later revisited in Section A.5.
We now explore some of the common tricks to obtain equivalent problems.
Change of variables
Suppose \(\phi\) is a one-to-one mapping from \(\bm{z}\) to \(\bm{x}\), then we can define \(\tilde{f}_i(\bm{z}) = f_i(\phi(\bm{z}))\) and problem (A.6) can be rewritten (ignoring equality constraints) as \[ \begin{array}{ll} \underset{\bm{z}}{\textm{minimize}} & \begin{array}{c}\tilde{f}_{0}(\bm{z})\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \tilde{f}_{i}(\bm{z})\leq0,\quad i=1,\ldots,m. \end{array} \end{array} \]
Convexity may of may not be preserved depending on the mapping \(\phi\) (see Section A.3 for details). With equality constraints, the mapping \(\phi\) has to be affine to preserve convexity of the problem.
Transformation of objective and constraint functions
Suppose \(\psi_i\) are strictly increasing functions satisfying \(\psi_i(0)=0\). Then we can define \(\tilde{f}_i(\bm{x}) = \psi_i\left(f_i(\bm{x})\right)\) and problem (A.6) can be rewritten (ignoring equality constraints) as \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c}\tilde{f}_{0}(\bm{x})\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \tilde{f}_{i}(\bm{x})\leq0,\quad i=1,\ldots,m. \end{array} \end{array} \] For equality constraints, the mapping has to be affine to preserve the convexity of the problem.
The requirements on the mappings can be relaxed; for example, for the inequality constraints we simply need \(\psi_i(u)\leq0\) if and only if \(u\leq0\).
Slack variables
One simple transformation of interest is based on the observation that \(f_i(\bm{x})\leq0\) if and only if there is an \(s_i\ge0\) that satisfies \(f_i(\bm{x}) + s_i = 0\).
By introducing nonnegative slack variables \(s_i\ge0\), we can transform linear (or affine) inequalities \(\bm{a}_{i}^\T\bm{x}\leq b_{i}\) into linear equalities \(\bm{a}_{i}^\T\bm{x} + s_i = b_{i}\).
Eliminating equality constraints
Equality constraints in convexs problem must be affine, i.e., of the form \(\bm{A}\bm{x} = \bm{b}\). From linear algebra, we know that the subspace of points satisfying such affine constraints can be written as \(\bm{x} = \bm{F}\bm{z} + \bm{x}_{0}\), where \(\bm{x}_0\) is any solution to \(\bm{A}\bm{x} = \bm{b}\), \(\bm{F}\) is a matrix whose range is the nullspace of \(\bm{A}\), i.e., \(\bm{A}\bm{F}=\bm{0}\), and \(\bm{z}\) is any vector of appropriate dimensions.
Then, the problem \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c}f_{0}(\bm{x})\end{array}\\ \textm{subject to} & \begin{array}[t]{l} f_{i}(\bm{x})\leq0,\quad i=1,\ldots,m\\ \bm{A}\bm{x}=\bm{b} \end{array} \end{array} \] is equivalent to \[ \begin{array}{ll} \underset{\bm{z}}{\textm{minimize}} & \begin{array}{c}f_{0}(\bm{F}\bm{z}+\bm{x}_{0})\end{array}\\ \textm{subject to} & \begin{array}[t]{l} f_{i}(\bm{F}\bm{z}+\bm{x}_{0})\leq0,\quad i=1,\ldots,m,\end{array} \end{array} \] with variable \(\bm{z}\). Since the composition of a convex function with an affine function is convex, eliminating equality constraints preserves convexity of a problem.
Introducing equality constraints
We can introduce new variables and equality constraints into a convex optimization problem, provided the equality constraints are linear, and the resulting problem will also be convex.
For example, if an objective or constraint function has the form \(f_i(\bm{A}_i\bm{x} + \bm{b}_i)\), we can introduce a new variable \(\bm{y}_i\), replace \(f_i(\bm{A}_i\bm{x} + \bm{b}_i)\) with \(f_i(\bm{y}_i)\), and add the linear equality constraint \(\bm{y}_i = \bm{A}_i\bm{x} + \bm{b}_i\).
Epigraph problem form
The epigraph form of the convex problem (A.6) is the problem \[\begin{equation} \begin{array}{ll} \underset{t,\bm{x}}{\textm{minimize}} & \begin{array}{c}t\end{array}\\ \textm{subject to} & \begin{array}[t]{l} f_{0}(\bm{x}) - t \leq0\\ f_{i}(\bm{x})\leq0,\qquad i=1,\ldots,m\\ \bm{A}\bm{x}=\bm{b}, \end{array} \end{array} \tag{A.8} \end{equation}\] with variables \(\bm{x}\in\R^n\) and \(t\in\R\). We can easily see that it is equivalent to the original problem: \((\bm{x},t)\) is optimal for (A.8) if and only if \(\bm{x}\) is optimal for (A.6) and \(t = f_0(\bm{x})\).
It is often stated that a linear objective is universal for convex optimization, as any convex optimization problem can be readily transformed into one with a linear objective. This transformation can aid in theoretical analysis and algorithm development.
Minimizing over some variables
We always have \[ \underset{\bm{x},\bm{y}}{\textm{inf}}\;f(\bm{x},\bm{y}) = \underset{\bm{x}}{\textm{inf}}\;\tilde{f}(\bm{x}), \] where \(\tilde{f}(\bm{x}) = \textm{inf}_{\bm{y}}\;f(\bm{x},\bm{y})\). In addition, if \(f(\bm{x},\bm{y})\) is jointly convex in \(\bm{x}\) and \(\bm{y}\), i.e., it is convex in \((\bm{x},\bm{y})\), then \(\tilde{f}(\bm{x})\) is convex.
In plain words, we can always minimize a function by first minimizing over some set of variables, and then minimizing over the remaining ones. Note that this is a nested minimization, in the sense that as \(\bm{x}\) changes, then implicitly the \(\bm{y}\) that minimizes \(f(\bm{x},\bm{y})\) to obtain \(\tilde{f}(\bm{x})\) changes as well. Do not confuse this nested minimization with an alternate minimization method, where one optimizes alternately over \(\bm{x}\) and \(\bm{y}\) until convergence is achieved.
Suppose the variable \(\bm{x}\in\R^n\) is partitioned in two blocks as \(\bm{x}=(\bm{x}_1,\bm{x}_2)\). Then, the convex problem \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c}f_{0}(\bm{x}_1,\bm{x}_2)\end{array}\\ \textm{subject to} & \begin{array}[t]{l} f_i(\bm{x}_1)\leq0,\quad i=1,\ldots,m_1\\ \tilde{f}_i(\bm{x}_2)\leq0,\quad i=1,\ldots,m_2, \end{array} \end{array} \] in which each of the constraints involves either \(\bm{x}_1\) or \(\bm{x}_2\), is equivalent to the convex problem \[ \begin{array}{ll} \underset{\bm{x}_1}{\textm{minimize}} & \begin{array}{c} \tilde{f}_{0}(\bm{x}_1) \end{array}\\ \textm{subject to} & \begin{array}[t]{l} f_i(\bm{x}_1)\leq0,\quad i=1,\ldots,m_1, \end{array} \end{array} \] where \[ \tilde{f}_{0}(\bm{x}_1) = \textm{inf}_{\bm{z}}\{f_{0}(\bm{x}_1,\bm{z})\mid\tilde{f}_i(\bm{z})\leq0,\quad i=1,\ldots,m_2\}. \]
A.4.3 Approximate reformulations
In many cases, the formulated optimization problem may still remain nonconvex despite all attempts to use some transformation to unveil any possible hidden convexity. In such situations, one can resort to some kind of approximation to form an approximated problem, possibly convex, that is easy to solve: \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c}\tilde{f}_0(\bm{x})\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \tilde{f}_i(\bm{x})\leq0,\quad i=1,\ldots,m\\ \bm{A}\bm{x}=\bm{b}, \end{array} \end{array} \] where \(\tilde{f}_i(\bm{x}) \approx f_i(\bm{x})\).
There are three types of approximations:
Conservative approximation or tightened approximation leading to a tightened formulation: \(\tilde{f}_i(\bm{x}) \geq f_i(\bm{x})\). This approximation defines a feasible set that is a subset of the original feasible set, which guarantees the feasibility of the approximated solution.
Relaxed approximation leading to a relaxed formulation or simply a relaxation: \(\tilde{f}_i(\bm{x}) \leq f_i(\bm{x})\). This approximation defines a feasible set that is a superset of the original feasible set. It does not guarantee the feasibility of the approximated solution and may require an additional step to enforce feasibility.
An approximation without any guarantees: \(\tilde{f}_i(\bm{x}) \approx f_i(\bm{x})\).
Relaxed formulations are very commonly used in practice, often by simply removing some of the constraints (typically the more difficult ones). A notable example of this approach is found in (Bengtsson and Ottersten, 2001) for multiuser beamforming in wireless communications, where a nonconvex rank-one constraint was removed (thus relaxing the problem). However, it was still proven to be an equivalent reformulation and not a relaxation.
Example A.9 Suppose a problem contains the nonconvex constraint \(x^2 = 1\) or, equivalently, \(x\in\{\pm1\}\), which is a nonconvex discrete set. This is typically associated with combinatorial optimization, which has exponential complexity. A relaxation would involve enlarging the feasible set, which could be achieved by using instead the interval \(-1\leq x\leq1\). On the other hand, a tightening would involve reducing the feasible set, which could be accomplished simply by using \(x=1\). Both approximations, \(-1\leq x\leq1\) and \(x=1\), are convex and can be easily handled.
A.4.4 Quasi-convex optimization
A quasiconvex optimization problem has the standard form \[\begin{equation} \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c}f_{0}(\bm{x})\end{array}\\ \textm{subject to} & \begin{array}[t]{l} f_{i}(\bm{x})\leq0,\quad i=1,\ldots,m\\ \bm{A}\bm{x}=\bm{b}, \end{array} \end{array} \tag{A.9} \end{equation}\] where the inequality constraint functions \(f_1,\dots,f_m\) are convex, and the objective \(f_0\) is quasiconvex, unlike in a convex optimization problem where the objective is convex. For details on quasiconvex functions, see Section A.3.5.
The most significant difference between convex and quasiconvex optimization is that a quasiconvex optimization problem can have locally optimal solutions that are not globally optimal. For instance, this can occur when the function becomes flat before reaching the optimal value.
A general approach to quasiconvex optimization relies on the representation of the sublevel sets of a quasiconvex function via a family of convex inequalities: \[ f(\bm{x}) \leq t \quad \Longleftrightarrow \quad \phi_t(\bm{x}) \leq0, \] where \(\phi_t(\bm{x})\) is a family of convex functions in \(\bm{x}\) (indexed by \(t\)).
Let \(p^\star\) denote the optimal value of the quasiconvex optimization problem (A.9). If the (now convex) feasibility problem \[\begin{equation} \begin{array}{ll} \underset{\bm{x}}{\textm{find}} & \begin{array}{c}\bm{x}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \phi_t(\bm{x}) \leq0\\ f_{i}(\bm{x})\leq0,\quad i=1,\ldots,m\\ \bm{A}\bm{x}=\bm{b} \end{array} \end{array} \tag{A.10} \end{equation}\] is feasible, then we have \(p^\star\leq t\). Conversely, if it is infeasible, then \(p^\star > t\).
This observation can serve as the foundation for a simple algorithm to solve quasiconvex optimization problems termed bisection method. This involves solving a sequence of convex feasibility problems, like the one in (A.10), at each step. Suppose that the original problem (A.9) is feasible and that we start with an interval \([l,u]\) known to contain the optimal value \(p^\star\). We can then solve the convex feasibility problem at the interval midpoint \(t=(l+u)/2\), to determine whether the optimal value is in the lower or upper half of this interval, and update the interval accordingly. This produces a new interval, which also contains the optimal value, but has half the width of the initial interval; so the length of the interval after \(k\) iterations is \(2^{-k}(u - l)\), where \((u - l)\) is the length of the initial interval. Therefore, if a tolerance of \(\epsilon\) is desired in the computation of \(p^\star\), the number of iterations is \(\lceil\textm{log}_2\left((u-l)/\epsilon\right)\rceil\), where \(\lceil\cdot\rceil\) denoted the ceiling rounding operation. This procedure is called bisection method (a.k.a. sandwich technique) and is summarized in Algorithm A.1.
Algorithm A.1: Bisection method for the quasiconvex optimization problem in (A.9).
Choose interval \([l,u]\) containing \(p^\star\) and tolerance \(\epsilon>0\);
repeat
- \(t \leftarrow (l+u)/2\);
- Solve the convex feasibility problem (A.10);
- if feasible, then \(u \leftarrow t\) and keep solution \(\bm{x}\); else \(l \leftarrow t\);
until \(u-l \leq \epsilon\);