\( \newcommand{\bm}[1]{\boldsymbol{#1}} \newcommand{\textm}[1]{\textsf{#1}} \def\T{{\mkern-2mu\raise-1mu\mathsf{T}}} \newcommand{\R}{\mathbb{R}} % real numbers \newcommand{\E}{{\rm I\kern-.2em E}} \newcommand{\w}{\bm{w}} % bold w \newcommand{\bmu}{\bm{\mu}} % bold mu \newcommand{\bSigma}{\bm{\Sigma}} % bold mu \newcommand{\bigO}{O} %\mathcal{O} \renewcommand{\d}[1]{\operatorname{d}\!{#1}} \)

A.1 Optimization problems

A mathematical optimization problem with arbitrary equality and inequality constraints can always be written in the following standard form (S. P. Boyd and Vandenberghe, 2004, Chapter 1): \[\begin{equation} \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}[t]{l} f_{0}(\bm{x}) \end{array}\\ \textm{subject to} & \begin{array}[t]{l} f_{i}(\bm{x})\leq0,\\ h_{i}(\bm{x})=0, \end{array} \quad \begin{array}[t]{l} i=1,\ldots,m\\ i=1,\ldots,p, \end{array} \end{array} \tag{A.1} \end{equation}\] where \(\bm{x} = \left(x_{1},\ldots,x_{n}\right)\in\R^n\) is the optimization variable, \(f_{0}:\R^n\longrightarrow\R\) is the objective function or cost function, \(f_{i}:\R^n\longrightarrow\R,\; i=1,\ldots,m,\) are the \(m\) inequality constraint functions, and \(h_{i}:\R^n\longrightarrow\R,\; i=1,\ldots,p,\) are the \(p\) equality constraint functions. If there are no constraints, we say that the problem is unconstrained.

The goal is to find an optimal solution \(\bm{x}^{\star}\) that minimizes \(f_{0}\) while satisfying all the constraints.

Examples

Convex optimization is currently used in many different areas, including the following:

  • Circuit design
  • Filter design
  • Communication systems (e.g., transceiver design, multi-antenna beamforming design, maximum likelihood detection)
  • Radar systems
  • Communication networks (e.g., power control in wireless networks, congestion control in the Internet)
  • Financial engineering (e.g., portfolio design, index tracking)
  • Model fitting (e.g., in financial data or recommender systems)
  • Image processing (e.g., deblurring, compressive sensing, blind separation, inpainting)
  • Robust designs under uncertainty
  • Sparse regression
  • Low-rank matrix discovery
  • Machine learning
  • Graph learning from data
  • Biomedical applications (e.g., DNA sequencing, anti-viral vaccine design)

An optimization problem has three basic elements: variables (as opposed to other fixed parameters), constraints, and an objective.

Example A.1 (Device sizing for electronic circuits) In the context of electronic circuit design, the elements in a device sizing optimization problem may be chosen as:

  • Variables: widths and lengths of devices.
  • Constraints: manufacturing limits, timing requirements, or maximum area.
  • Objective: power consumption.

Example A.2 (Portfolio design) In a financial context, the elements in a portfolio optimization problem may be identified as:

  • Variables: amounts invested in different assets.
  • Constraints: budget, maximum investments per asset, or minimum return.
  • Objective: This could be theoverall risk or return variance.

A.1.1 Definitions

Domain and constraints

The domain of the optimization problem (A.1) is defined as the set of points for which the objective and all constraint functions are defined, i.e., \[ \mathcal{D} = \bigcap_{i=0}^{m}\textm{dom}\,f_{i}\cap\bigcap_{i=1}^{p}\textm{dom} \, h_{i}, \] and it can be interpreted as a set of implicit constraints \(\bm{x} \in \mathcal{D}\), as opposed to the explicit constraints \(f_{i}(\bm{x})\leq0\) and \(h_{i}(\bm{x})=0\) in (A.1).

A problem is unconstrained if it has no explicit constraints. For example, \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \log\left(a - \bm{b}^\T\bm{x}\right) \end{array} \] is an unconstrained problem with implicit constraint \(a > \bm{b}^\T\bm{x}\).

Feasibility

A point \(\bm{x}\in \mathcal{D}\) is feasible if it satisfies all the constraints, \(f_{i}(\bm{x})\leq0\) and \(h_{i}(\bm{x})=0\), and infeasible otherwise. The problem (A.1) is said to be feasible if there exists at least one feasible point and infeasible otherwise.

The optimal value (minimal value) is defined as \[ p^{\star}=\textm{inf}\left\{f_0(\bm{x}) \mid f_i(\bm{x})\leq0,\,i=1,\ldots,m,\;h_i(\bm{x})=0,\,i=1,\ldots,p\right\}. \] If the problem is feasible, then the optimal value may be achieved at an optimal solution \(\bm{x}^{\star}\), i.e., \(f_{0}(\bm{x}^{\star})=p^{\star}\), or the problem may be unbounded below, i.e., \(p^{\star}=-\infty\). Otherwise, if the problem is infeasible, it is commonly denoted by \(p^{\star}=+\infty\).

A feasible point \(\bm{x}\) is optimal if \(f_0(\bm{x})=p^{\star}\). In general, there may be more than one optimal point and the set of optimal points is denoted by \(\mathcal{X}_{\textm{opt}}\). A feasible point \(\bm{x}\) is locally optimal if is optimal within a ball or a local neighborhood.

Example A.3 We illustrate the concepts of optimal value and optimal solution with a few simple unconstrained optimization problems with scalar variable \(x\in\R\):

  • \(f_{0}\left(x\right)=1/x\), \(\textm{dom}\,f_{0}=\R_{++}\): In this case \(p^{\star}=0\), but there is no optimal point since the optimal value cannot be achieved.
  • \(f_0(x)=-\textm{log}\;x\), \(\textm{dom}\,f_{0}=\R_{++}\): This function is unbounded below \(p^{\star}=-\infty\).
  • \(f_{0}\left(x\right)=x^{3}-3x\): This is a nonconvex function with \(p^{\star}=-\infty\) and a local optimum at \(x=1\).

If \(\bm{x}\) is feasible and \(f_{i}(\bm{x})=0\), we say the \(i\)th inequality constraint \(f_{i}(\bm{x})\le0\) is active at \(\bm{x}\). If \(f_{i}(\bm{x})<0\), we say the constraint \(f_{i}(\bm{x})\le0\) is inactive. (The equality constraints are active at all feasible points.) We say that a constraint is redundant if deleting it does not change the feasible set.

Feasibility problem

On many occasions, our goal is not necessarily to minimize or maximize any objective, but simply to find a feasible point. This is referred to as a feasibility problem: \[ \begin{array}{ll} \underset{\bm{x}}{\textm{find}} & \begin{array}{c} \bm{x}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} f_{i}(\bm{x})\leq0,\quad i=1,\ldots,m\\ h_{i}(\bm{x})=0,\quad i=1,\ldots,p. \end{array} \end{array} \]

In practice, a feasibility problem can be regarded as a special case of a general optimization problem: \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c} 0\end{array}\\ \textm{subject to} & \begin{array}[t]{l} f_{i}(\bm{x})\leq0,\quad i=1,\ldots,m\\ h_{i}(\bm{x})=0,\quad i=1,\ldots,p, \end{array} \end{array} \] where \(p^{\star}=0\) if the constraints are feasible and \(p^{\star}=\infty\) otherwise.

A.1.2 Solving optimization problems

General optimization problems are typically very difficult to solve, meaning that either a long computation time is required to find an optimal solution or that a sub-optimal solution is found in a reasonable computation time. Some exceptions include the family of least squares problems, linear programming problems, and convex optimization problems. Nevertheless, general nonconvex problems (also known as nonlinear problems with some abuse of terminology) are difficult to solve.

Least-squares

Least-squares (LS) problems go back to Gauss in 1795 and are formulated as \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \left\Vert \bm{A}\bm{x} - \bm{b}\right\Vert_2^2. \end{array} \] Solving LS problems is straightforward with the closed-form solution \(\bm{x}^{\star}=\left(\bm{A}^\T\bm{A}\right)^{-1}\bm{A}^\T\bm{b}\), for which reliable and efficient algorithms exist. Utilizing LS is considered trivial because these problems are easy to identify and solve.

Linear programming

A linear problem (LP) can be written as \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \bm{c}^\T\bm{x}\\ \textm{subject to} & \bm{a}_i^\T\bm{x}\leq b_i, \quad i=1,\ldots,m. \end{array} \] LPs do not have closed-form solutions in general, but there are reliable and efficient algorithms and software. An LP is not as easy to recognize as an LS, but one can easily learn a few standard tricks to convert a variety of problems into LPs.

Convex optimization problems

An inequality-constrained convex problem is of the form \[ \begin{array}{ll} \underset{x}{\textm{minimize}} & f_0(\bm{x})\\ \textm{subject to} & f_i(\bm{x})\leq b_i,\quad i=1,\ldots,m, \end{array} \] where all the functions are convex. Convex problems do not have closed-form solutions in general, but there are reliable and efficient algorithms and software. They are often difficult to recognize and there are many tricks for transforming problems into convex form.

Nonconvex optimization

Nonconvex optimization problems are generally very difficult to solve, although there are some rare exceptions. In general, they require either a long computation time75 or the compromise of not always finding the optimal solution. This results in two strategies:

  • Local optimization: This involves fast algorithms, but there’s no guarantee of global optimality. It only provides a local solution around the initial point.

  • Global optimization: While the worst-case complexity increases exponentially with the problem size, this strategy ensures the discovery of a global solution.

Historical snapshop of optimization

The theory of optimization, termed convex analysis, was extensively developed in the past century during 1900-1970. However, the computational aspect, i.e., efficient algorithms, and the applications came later.

The first computationally efficient method was the seminal simplex method developed in 1947 for LPs by Dantzig. It represents the beginning of an era of algorithmic development. However, despite the simplex method being very efficient in practice, it has a theoretical exponential worst-case complexity. In the 1970s, the ellipsoid method was proposed with a provable polynomial worst-case complexity, although it could be very slow in practice. Later, in 1984, Karmakar proposed a polynomial-time interior-point method for LPs that was not only good in theory but also in practice (Karmarkar, 1984). This development was followed by numerous researchers extending the application of interior-point methods to quadratic programming and linear complementarity problems. In 1994, Nesterov and Nemirovskii further advanced the field by developing the theory of self-concordant functions. This theory facilitated the expansion of algorithms based on the log-barrier function to a wider array of convex problems, notably including semidefinite programming and second-order cone programming (Nesterov and Nemirovskii, 1994).

In terms of practical applications, following the development of the simplex method, linear programming has been widely used to model a variety of real-life problems, such as allocation issues, since the 1950s. However, during that period, there was little interest in modeling real-life problems as convex problems. It was only after the mid-1990s, with the development of interior-point methods for convex problems, that there was a surge in activity related to modeling applications as convex problems.

A.1.3 Illustrative example

The following lamp illumination problem is a well-known example that illustrates how an engineering problem can be formulated in different ways, leading to more or less sophisticated solutions.

Suppose we have \(m\) lamps illuminating \(n\) small flat patches, located in fixed locations. The overall goal is to achieve a desired illumination \(I_{\textm{des}}\) on all patches by controlling the power of the lamps. The intensity \(I_{k}\) at patch \(k\) depends linearly on the lamp powers \(p_{j}\) as \[ I_{k} = \sum_{j=1}^{m}a_{kj}p_{j}, \] where the coefficients \(a_{kj}\) are given by \(a_{kj}=\cos\theta_{kj}/r_{kj}^{2}\), with \(\theta_{kj}\) and \(r_{kj}\) denoting the angle and distance, respectively, between lamp \(j\) and patch \(k\).

Ideally, we would like to achieve a perfect illumination \(I_{k} = I_{\textm{des}}\) for all the patches, but this is not feasible in practice, and we need to relax the problem to \(I_{k} \approx I_{\textm{des}}\).

There are many different ways to formulate this problem. The main idea is to somehow measure the error in the approximation for all the patches \(I_{k} - I_{\textm{des}}\). One possible formulation is based on minimizing the largest of the errors measured in a logarithmic scale (because the eyes perceive intensity on a log-scale): \[ \begin{array}{cl} \underset{I_{1},\ldots,I_{n},p_{1},\ldots,p_{m}}{\textm{minimize}} & \begin{array}{c} \textm{max}_{k}\left|\textm{log} I_{k} - \textm{log} I_{\textm{des}}\right|\end{array}\\ \textm{subject to} & \begin{array}[t]{l} 0\leq p_{j}\leq p_{\max},\\ I_{k}=\sum_{j=1}^{m}a_{kj}p_{j}, \end{array} \quad \begin{array}[t]{l} j=1,\ldots,m\\ k=1,\ldots,n. \end{array} \end{array} \] This problem appears complex, and we will explore various possible approaches to address it.

  1. If one does not know anything about optimization, then a heuristic guess can be made, such as using a uniform power \(p_{j}=p\), perhaps trying different values of \(p\).

  2. If one knows about least-squares, then the problem formulation could be changed to resemble an LS: \[ \begin{array}{cl} \underset{I_{1},\ldots,I_{n},p_{1},\ldots,p_{m}}{\textm{minimize}} & \begin{array}{c} \sum_{k=1}^{n}\left(I_{k}-I_{\mathsf{des}}\right)^{2}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} I_{k}=\sum_{j=1}^{m}a_{kj}p_{j},\quad k=1,\ldots,n, \end{array} \end{array} \] and then clip \(p_{j}\) if \(p_{j}>p_{\max}\) or \(p_{j}<0\) to make it feasible.

  3. If one knows about linear programming, then the problem could be changed to resemble an LP (basically by removing the logarithmic function): \[ \begin{array}{cl} \underset{I_{1},\ldots,I_{n},p_{1},\ldots,p_{m}}{\textm{minimize}} & \begin{array}{c} \textm{max}_{k}\left|I_{k} - I_{\textm{des}}\right|\end{array}\\ \textm{subject to} & \begin{array}[t]{l} 0\leq p_{j}\leq p_{\max},\\ I_{k}=\sum_{j=1}^{m}a_{kj}p_{j}, \end{array} \quad \begin{array}[t]{l} j=1,\ldots,m\\ k=1,\ldots,n. \end{array} \end{array} \] which may not look as an LP at first sight, but it is in disguise (revealed after a few simple manipulations).

  4. If one knows about convex optimization, then it turns out that after some smart manipulations, the problem can be equivalently reformulated in convex form as \[ \begin{array}[t]{cl} \underset{I_{1},\ldots,I_{n},p_{1},\ldots,p_{m}}{\textm{minimize}} & \begin{array}{c} \textm{max}_{k}\;h\left(I_{k}/I_{\textm{des}}\right)\end{array}\\ \textm{subject to} & \begin{array}[t]{l} 0\leq p_{j}\leq p_{\max},\\ I_{k}=\sum_{j=1}^{m}a_{kj}p_{j}, \end{array} \quad \begin{array}[t]{l} j=1,\ldots,m\\ k=1,\ldots,n, \end{array} \end{array} \] where \(h\left(u\right) = \textm{max}\left\{ u,1/u\right\}\).

At this point, one can go further and ask whether additional constraints can be added. For example, the constraint “no more than half of total power is in any 10 lamps” looks complicated, but it actually can be written in convex form, so computationally keeps the problem solvable. On the other hand, the constraint “no more than half of the lamps are on” may look simple, but instead is a deadly combinatorial constraint that makes the complexity of its resolution exponential. The moral is that untrained intuition does not always work; one needs to obtain the proper background and develop the right intuition to discern between difficult and easy problems.

References

Boyd, S. P., and Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.
Karmarkar, N. (1984). A new polynomial time algorithm for linear programming. Combinatorica, 4(4), 373–395.
Nesterov, Y., and Nemirovskii, A. (1994). Interior-point polynomial algorithms in convex programming. Philadelphia, PA: SIAM.

  1. The computational complexity of an algorithm is measured in number of operations required to obtain a solution (computation time is measured in time units). This complexity is presented as a function of the number of variables \(n\) to optimize; so the important thing is how this complexity grows with \(n\). Typically, a polynomial complexity is considered acceptable in practice (of course the order of the polynomial is also important), whereas an exponential complexity is considered not acceptable in practice as the complexity quickly explodes.↩︎