A.6 Lagrange Duality

\( \newcommand{\bm}[1]{\boldsymbol{#1}} \newcommand{\textm}[1]{\textsf{#1}} \newcommand{\textnormal}[1]{\textsf{#1}} \def\T{{\mkern-2mu\raise-1mu\mathsf{T}}} \newcommand{\R}{\mathbb{R}} % real numbers \newcommand{\E}{{\rm I\kern-.2em E}} \newcommand{\w}{\bm{w}} % bold w \newcommand{\bmu}{\bm{\mu}} % bold mu \newcommand{\bSigma}{\bm{\Sigma}} % bold mu \newcommand{\bigO}{O} %\mathcal{O} \renewcommand{\d}[1]{\operatorname{d}\!{#1}} \)

Lagrange duality theory is a very rich and mature theory that links the original minimization problem (A.1), termed primal problem, with a maximization problem, termed dual problem. In some occasions, it is simpler to solve the dual problem than the primal one. A fundamental result in duality theory is given by the Karush–Kuhn–Tucker (KKT) optimality conditions that any primal-dual solution must satisfy. By exploring the KKT conditions, it is possible in many cases to obtain a closed-form solution to the original problem.

We next overview the basic results on duality theory, including the KKT conditions. For further details the reader is referred to Chapter 5 in Boyd and Vandenberghe (2004), Bertsekas (1999), and Bertsekas et al. (2003).

A.6.1 Lagrangian

Recall the optimization problem in standard form (A.1) (not necessarily convex): \[\begin{equation} \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}[t]{l} f_{0}(\bm{x}) \end{array}\\ \textm{subject to} & \begin{array}[t]{l} f_{i}\left(\bm{x}\right)\leq0,\\ h_{i}\left(\bm{x}\right)=0, \end{array} \quad \begin{array}[t]{l} i=1,\ldots,m,\\ i=1,\ldots,p, \end{array} \end{array} \tag{A.17} \end{equation}\] with variable \(\bm{x}\in\R^{n}\), domain \(\mathcal{D}\), and optimal value \(p^{\star}\).

The basic idea in Lagrange duality is to take the constraints in (A.17) into account by augmenting the objective function with a weighted sum of the constraint functions. The Lagrangian \(L:\R^n\times\R^m\times\R^p\rightarrow\R\) associated with the problem (A.17) is defined as \[ L(\bm{x},\bm{\lambda},\bm{\nu}) = f_{0}(\bm{x})+\sum_{i=1}^{m}\lambda_{i}f_{i}(\bm{x})+\sum_{i=1}^{p}\nu_{i}h_{i}(\bm{x}), \] with \(\textm{dom}\,L=\mathcal{D}\times\R^m\times\R^p\). We refer to \(\lambda_{i}\) and \(\nu_{i}\) as the Lagrange multipliers associated with the \(i\)th inequality constraint \(f_{i}(\bm{x})\leq0\) and with the \(i\)th equality constraint \(h_{i}(\bm{x})=0\), respectively. The vectors \(\bm{\lambda}\) and \(\bm{\nu}\) are then called dual variables or Lagrange multiplier vectors associated with the problem (A.17).

We define the Lagrange dual function (or just dual function) \(g:\times\R^m\times\R^p\rightarrow\R\) as the minimum value of the Lagrangian over \(\bm{x}\) for a given \((\bm{\lambda},\bm{\nu})\): \[\begin{equation} g(\bm{\lambda},\bm{\nu}) = \underset{\bm{x}\in \mathcal{D}}{\textm{inf}} L(\bm{x},\bm{\lambda},\bm{\nu}) = \underset{\bm{x}\in \mathcal{D}}{\textm{inf}}\left\{ f_{0}(\bm{x})+\sum_{i=1}^{m}\lambda_{i}f_{i}(\bm{x})+\sum_{i=1}^{p}\nu_{i}h_{i}(\bm{x}) \right\}. \tag{A.18} \end{equation}\] Note that the infimum in (A.18) is with respect to all \(\bm{x}\in\mathcal{D}\) (not necessarily feasible points). When the Lagrangian is unbounded below in \(\bm{x}\), the dual function takes on the value \(-\infty\). Since the dual function is the pointwise infimum of a family of affine functions of \((\bm{\lambda},\bm{\nu})\), it is concave, even when the original problem (A.17) is not convex.

In contrast to the dual variables and dual function, the original optimization variable \(\bm{x}\) is then called the primal variable, and the original objective function \(f_{0}(\bm{x})\) is referred to as the primal function.

The dual function \(g(\bm{\lambda},\bm{\nu})\) provides lower bounds on the optimal value \(p^\star\) of the problem (A.17). Specifically, for any \(\bm{\lambda}\ge\bm{0}\) and any \(\bm{\nu}\) (referred to dual feasible) we have \[ g(\bm{\lambda},\bm{\nu}) \leq p^\star, \] which holds even if the original problem is not convex.

This important property can be easily verified through the following inequalities: for any feasible \(\bm{x}\), \[ \begin{aligned} f_{0}(\bm{x}) & \ge f_{0}(\bm{x})+\sum_{i=1}^{m}\lambda_{i}f_{i}(\bm{x})+\sum_{i=1}^{p}\nu_{i}h_{i}(\bm{x})\\ & \ge \underset{\bm{z}\in \mathcal{D}}{\textm{inf}}\left\{ f_{0}(\bm{z})+\sum_{i=1}^{m}\lambda_{i}f_{i}(\bm{z})+\sum_{i=1}^{p}\nu_{i}h_{i}(\bm{z}) \right\}\\ & = g(\bm{\lambda},\bm{\nu}), \end{aligned} \] where we have used the fact that \(f_i(\bm{x})\leq0\) and \(h_i(\bm{x})=0\) for any feasible \(\bm{x}\).

As a consequence of the lower-bound property, a primal–dual feasible pair \(\left( \bm{x},\left(\bm{\lambda},\bm{\nu}\right) \right)\) localizes the optimal value of the primal (and dual) problem within an interval: \[\begin{equation} p^{\star}\in\left[ g\left(\bm{\lambda},\bm{\nu}\right) ,f_{0}(\bm{x})\right]. \tag{A.19} \end{equation}\] This can be utilized in optimization algorithms to provide non-heuristic stopping criteria.

A.6.2 Lagrange Dual Problem

As we have seen, the Lagrange dual function gives us a lower bound on the optimal value of the optimization problem (A.17): \(g(\bm{\lambda},\bm{\nu}) \leq p^\star\). Since the lower bound depends on the choice of \((\bm{\lambda},\bm{\nu})\), a natural question is what would be the tightest lower bound that can be obtained. Precisely, this leads to the Lagrange dual problem associated with the original problem (A.17): \[\begin{equation} \begin{array}[c]{ll} \underset{\bm{\lambda},\bm{\nu}}{\textm{maximize}} & g(\bm{\lambda},\bm{\nu})\\ \textm{subject to} & \bm{\lambda}\geq\bm{0}. \end{array} \tag{A.20} \end{equation}\] This is a convex optimization problem (irrespective of the convexity of the original problem (A.17)) because the objective to be maximized is concave, and the constraints are convex.

In view of the dual problem, the original problem (A.17) is also called primal problem. In the dual problem (A.20), we say the variables \((\bm{\lambda},\bm{\nu})\) are dual feasible if \(\bm{\lambda}\ge\bm{0}\) and \(g(\bm{\lambda},\bm{\nu})>-\infty\). We refer to \((\bm{\lambda}^\star,\bm{\nu}^\star)\) as dual optimal if they are optimal for the problem (A.20). The optimal value of the dual problem (A.20) is denoted by \(d^\star\) in contraposition to the optimal value of the primal problem \(p^\star\).

Example A.16 (Least-norm solution of linear equations) Consider the problem \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c} \bm{x}^\T\bm{x}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{A}\bm{x}=\bm{b}.\end{array} \end{array} \]

Its Lagrangian is \[ L(\bm{x},\bm{\nu})=\bm{x}^\T\bm{x}+\bm{\nu}^\T(\bm{A}\bm{x}-\bm{b}). \]

To find the dual function, we need to solve an unconstrained minimization of the Lagrangian. We set the gradient equal to zero, \[ \nabla_{\bm{x}}L(\bm{x},\bm{\nu})=2\bm{x}+\bm{A}^\T\bm{\nu}=\bm{0}\quad\Longrightarrow\quad \bm{x}=-\frac{1}{2}\bm{A}^\T\bm{\nu}, \] and we plug the solution in \(L\) to obtain the dual function \(g\): \[ g(\bm{\nu})=L\left(-\frac{1}{2}\bm{A}^\T\bm{\nu},\bm{\nu}\right)=-\frac{1}{4}\bm{\nu}^\T\bm{A}\bm{A}^\T\bm{\nu}-\bm{b}^\T\bm{\nu}, \] which is, as expected, a concave function of \(\bm{\nu}\).

From the lower bound property, we have \[ p^{\star}\geq-\frac{1}{4}\bm{\nu}^\T\bm{A}\bm{A}^\T\bm{\nu}-\bm{b}^\T\bm{\nu}\quad\textm{for all }\bm{\nu}. \]

Finally, the dual problem is the QP: \[ \begin{array}{ll} \underset{\bm{\nu}}{\textm{maximize}} & \begin{array}{c}-\frac{1}{4}\bm{\nu}^\T\bm{A}\bm{A}^\T\bm{\nu}-\bm{b}^\T\bm{\nu}.\end{array} \end{array} \]

Example A.17 (Standard-form LP) Consider the problem \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c}\bm{c}^\T\bm{x}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{A}\bm{x}=\bm{b},\quad \bm{x}\geq\bm{0}.\end{array} \end{array} \]

Its Lagrangian is \[ \begin{aligned} L(\bm{x},\bm{\lambda},\bm{\nu}) & = \bm{c}^\T\bm{x}+\bm{\nu}^\T(\bm{A}\bm{x}-\bm{b})-\bm{\lambda}^\T\bm{x}\\ & = \left(\bm{c}+\bm{A}^\T\bm{\nu}-\bm{\lambda}\right)^\T\bm{x}-\bm{b}^\T\bm{\nu}. \end{aligned} \]

To find the dual function, note that \(L\) is a linear function of \(\bm{x}\) and it is unbounded if the term multiplying \(\bm{x}\) is nonzero. Therefore, we can write the dual function as \[ g(\bm{\lambda},\bm{\nu})=\underset{\bm{x}}{\textm{inf}}\;L(\bm{x},\bm{\lambda},\bm{\nu})=\left\{ \begin{array}{cl} -\bm{b}^\T\bm{\nu} & \bm{c}+\bm{A}^\T\bm{\nu}-\bm{\lambda}=\bm{0},\\ -\infty & \textm{otherwise}, \end{array}\right. \] which, as expected, is a concave function of \((\bm{\lambda},\bm{\nu})\) since it is linear on an affine domain.

From the lower bound property, we have \[ p^{\star}\geq-\bm{b}^\T\bm{\nu}\quad\textm{ if }\bm{c}+\bm{A}^\T\bm{\nu}\geq\bm{0}. \]

Finally, the dual problem is the LP \[ \begin{array}{ll} \underset{\bm{\nu}}{\textm{maximize}} & \begin{array}{c}-\bm{b}^\T\bm{\nu}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{c}+\bm{A}^\T\bm{\nu}\geq\bm{0}.\end{array} \end{array} \]

Example A.18 (Two-way partitioning) Consider the problem \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c}\bm{x}^\T\bm{W}\bm{x}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} x_{i}^{2}=1,\quad i=1,\ldots,n.\end{array} \end{array} \] This is a nonconvex problem because the matrix \(\bm{W}\) is not necessarily positive semidefinite and because of the quadratic equality constraints (the feasible set contains \(2^{n}\) discrete points).

Its Lagrangian is \[ \begin{aligned} L(\bm{x},\bm{\nu}) & = \bm{x}^\T\bm{W}\bm{x}+\sum_{i=1}^{n}\nu_{i}\left(x_{i}^{2}-1\right)\\ & = \bm{x}^\T\left(\bm{W}+\textm{Diag}(\bm{\nu})\right)\bm{x} - \bm{1}^\T\bm{\nu}. \end{aligned} \]

To find the dual function, note that \(L\) is a quadratic function of \(\bm{x}\) and it is unbounded if the matrix \(\bm{W}+\textm{Diag}(\bm{\nu})\) has a negative eigenvalue. Therefore, we can write the dual function as \[ g(\bm{\nu})=\underset{\bm{x}}{\textm{inf}}\;L(\bm{x},\bm{\nu})=\left\{ \begin{array}{cl} -\bm{1}^\T\bm{\nu} & \bm{W}+\textm{Diag}(\bm{\nu})\succeq\bm{0},\\ -\infty & \textm{otherwise}. \end{array}\right. \]

From the lower bound property, we have \[ p^{\star}\geq-\bm{1}^\T\bm{\nu}\quad\textm{ if }\bm{W}+\textm{Diag}(\bm{\nu})\succeq\bm{0}. \] One particular lower bound is obtained by choosing \(\bm{\nu}=-\bm{\lambda}_{\textm{min}}(\bm{W})\bm{1}\): \[ p^{\star}\geq n\lambda_{\textm{min}}(\bm{W}). \]

Finally, the dual problem is the SDP \[ \begin{array}{ll} \underset{\bm{\nu}}{\textm{maximize}} & \begin{array}{c}-\bm{1}^\T\bm{\nu}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{W}+\textm{Diag}(\bm{\nu})\succeq\bm{0}.\end{array} \end{array} \]

Equivalent formulations of a problem can lead to different dual problems as the following examples illustrate.

Example A.19 (Introducing new variables) Consider the problem \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c} \left\Vert \bm{A}\bm{x} - \bm{b}\right\Vert _{2}.\end{array}\end{array} \] Since the problem has no constraints, the dual problem does not even make sense since it would be a constant. Instead, we can introduce some dummy variables in the original problem and rewrite it as \[ \begin{array}{ll} \underset{\bm{x},\bm{y}}{\textm{minimize}} & \begin{array}{c}\left\Vert \bm{y}\right\Vert _{2}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{y}=\bm{A}\bm{x}-\bm{b}.\end{array} \end{array} \] Then the dual problem can be derived as \[ \begin{array}{ll} \underset{\bm{\nu}}{\textm{maximize}} & \begin{array}{c}\bm{b}^\T\bm{\nu}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{A}^\T\bm{\nu}=\bm{0},\quad\left\Vert \bm{\nu}\right\Vert _{2}\leq1.\end{array} \end{array} \]

Example A.20 (Implicit constraints) Consider the following LP with box constraints: \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c}\bm{c}^\T\bm{x}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{A}\bm{x}=\bm{b},\\ -\bm{1}\leq \bm{x}\leq\bm{1}. \end{array} \end{array} \] The dual problem is \[ \begin{array}{ll} \underset{\bm{\nu},\bm{\lambda}_{1},\bm{\lambda}_{2}}{\textm{maximize}} & \begin{array}{c} -\bm{b}^\T\bm{\nu}-\bm{1}^\T\bm{\lambda}_{1}-\bm{1}^\T\bm{\lambda}_{2}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{c}+\bm{A}^\T\bm{\nu}+\bm{\lambda}_{1}-\bm{\lambda}_{2}=0,\\ \bm{\lambda}_{1}\geq\bm{0},\quad\bm{\lambda}_{2}\geq\bm{0}, \end{array} \end{array} \] which does not give much insight. If, instead, we rewrite the primal problem (after making some explicit constraints implicit) as \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c} f_{0}(\bm{x})=\left\{ \begin{array}{ll} \bm{c}^\T\bm{x}\quad & -\bm{1}\leq \bm{x}\leq\bm{1},\\ \infty & \textm{otherwise} \end{array}\right.\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{A}\bm{x}=\bm{b},\end{array} \end{array} \] then the dual becomes way more insightful: \[ \begin{array}{ll} \underset{\bm{\nu}}{\textm{maximize}} & \begin{array}{c} -\bm{b}^\T\bm{\nu}-\left\Vert \bm{A}^\T\bm{\nu}+\bm{c}\right\Vert _{1}. \end{array}\end{array} \]

A.6.3 Weak and Strong Duality

The optimal value \(d^\star\) of the Lagrange dual problem (A.20) is, by definition, the tightest lower bound on \(p^\star\) that can be obtained from the Lagrange dual function. This property is called the weak duality: \[\begin{equation} d^\star \le p^\star, \tag{A.21} \end{equation}\] which holds even if the original problem is not convex. The difference \(\Gamma = p^\star - d^\star\) is called optimal duality gap of the original problem (and is always nonnegative).

The bound in (A.21) can be utilized to establish a lower limit on the optimal value of a problem that is challenging to solve, given that the dual problem (A.20) is always convex.

Of particular interest is when equality is attained in (A.21). This is referred to as strong duality, \[\begin{equation} d^{\star} = p^{\star}, \tag{A.22} \end{equation}\] and implies that the duality gap is zero. Strong duality is very desirable and may facilitate the resolution of a difficult problem via the dual.

Unfortunately, strong duality does not hold for general optimization problems. However, if the primal problem (A.17) is convex, we usually (but not always) have strong duality. There are specific conditions under which strong duality holds, and these conditions are referred to as constraint qualifications.

Slater’s condition is a simple constraint qualification that requires the existence of a strictly feasible point, that is, \(\bm{x}\in\textm{relint}\;\mathcal{D}\) (relative interior of the domain) such that \(f_i(\bm{x})<0,\) \(i=1,\dots,m,\) and \(h_i(\bm{x})=0,\) \(i=1,\dots,p.\) Strict feasibility is typically easy to verify in practical problems. Slater’s theorem states that strong duality holds if Slater’s condition is met and the optimization problem is convex.

Example A.21 (Inequality-form LP) Consider the problem \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c}\bm{c}^\T\bm{x}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{A}\bm{x}\leq \bm{b}.\end{array} \end{array} \] Its dual problem is also an LP: \[ \begin{array}{ll} \underset{\bm{\lambda}}{\textm{maximize}} & \begin{array}{c} -\bm{b}^\T\bm{\lambda}\end{array}\\ \textm{subject to} & \begin{array}{l} \bm{A}^\T\bm{\lambda} + \bm{c} = \bm{0},\quad\bm{\lambda}\geq0.\end{array} \end{array} \]

From Slater’s condition, strong duality holds if \(\bm{A}\tilde{\bm{x}}<\bm{b}\) for some \(\tilde{\bm{x}}\), which may be difficult to verify. Interestingly, in this case, we do not need Slater’s condition, as we always have \(p^{\star}=d^{\star}\) (except when both the primal and dual problems are infeasible).

Example A.22 (Convex QP) Consider the problem (with \(\bm{P}\succeq\bm{0}\)) \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \bm{x}^\T\bm{P}\bm{x}\\ \textm{subject to} & \bm{A}\bm{x}\leq \bm{b}. \end{array} \]

Its dual problem is also a QP: \[ \begin{array}{ll} \underset{\bm{\lambda}}{\textm{maximize}} & -\frac{1}{4}\bm{\lambda}^\T\bm{A}\bm{P}^{-1}\bm{A}^\T\bm{\lambda} - \bm{b}^\T\bm{\lambda}\\ \textm{subject to} & \bm{\lambda}\geq\bm{0}. \end{array} \]

From Slater’s condition, strong duality holds if \(\bm{A}\tilde{\bm{x}}<\bm{b}\) for some \(\tilde{\bm{x}}\). However, in this case, we always have \(p^{\star}=d^{\star}\).

Example A.23 (Nonconvex QP) Consider the problem \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \bm{x}^\T\bm{A}\bm{x} + 2\bm{b}^\T\bm{x}\\ \textm{subject to} & \bm{x}^\T\bm{x}\leq1, \end{array} \] which is nonconvex in general as \(\bm{A}\nsucceq\bm{0}\).

Its dual problem can be written as the SDP: \[ \begin{array}{ll} \underset{t,\lambda}{\textm{maximize}} & \begin{array}{c}-t-\lambda\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \begin{bmatrix} \bm{A}+\lambda \bm{I} & \bm{b}\\ \bm{b}^\T & t \end{bmatrix}\succeq0.\end{array} \end{array} \]

In this case, strong duality holds even though the original problem is nonconvex (not trivial to show).

A.6.4 Optimality Conditions

In Section A.4, we examined the optimality characterization of a solution through the minimum principle. In the following discussion, we will consider a more explicit characterization that is of great practical interest, as it often enables us to derive solutions to problems.

Complementary Slackness

Suppose that strong duality holds so that the primal and dual optimal values are equal, \(d^{\star} = p^{\star}\). Let \(\bm{x}^\star\) be a primal optimal point and \((\bm{\lambda}^\star,\bm{\nu}^\star)\) be a dual optimal point. Then, \[ \begin{aligned} f_{0}(\bm{x}^\star) & = g(\bm{\lambda}^\star,\bm{\nu}^\star)\\ & = \underset{\bm{x}\in \mathcal{D}}{\textm{inf}}\left\{ f_{0}(\bm{x})+\sum_{i=1}^{m}\lambda_{i}^{\star}f_{i}(\bm{x})+\sum_{i=1}^{p}\nu_{i}^{\star}h_{i}(\bm{x}) \right\}\\ & \leq f_{0}(\bm{x}^\star)+\sum_{i=1}^{m}\lambda_{i}^{\star}f_{i}(\bm{x}^\star)+\sum_{i=1}^{p}\nu_{i}^{\star}h_{i}(\bm{x}^\star)\\ & \leq f_{0}(\bm{x}^\star), \end{aligned} \] where the first line comes from the zero duality gap, the second line is the definition of the dual function, the third line follows trivially from the definition of infimum, and the fourth line results from feasibility (i.e., \(\lambda_i^\star\geq0\), \(f_{i}(\bm{x}^\star)\leq0\), and \(h_{i}(\bm{x}^\star)=0\)).

We then conclude that the two inequalities in this chain must hold with equality. Equality in the first inequality means that \(\bm{x}^\star\) minimizes \(L(\bm{x},\bm{\lambda}^\star,\bm{\nu}^\star)\) over \(\bm{x}\) (note that there may be other minimizers). Equality in the second inequality implies that \[ \sum_{i=1}^{m}\lambda_{i}^{\star}f_{i}(\bm{x}^\star) = 0, \] which in turns means (because each term is nonpositive) \[ \lambda_{i}^{\star}f_{i}(\bm{x}^\star) = 0,\quad i=1,\dots,m. \] These conditions are referred to as complementary slackness. They hold for any primal optimal \(\bm{x}^\star\) and dual optimal \((\bm{\lambda}^\star,\bm{\nu}^\star)\) (assuming strong duality holds).

The interpretation of complementary slackness provides relevant and practical information: \[ \lambda_{i}^{\star}>0 \quad\Longrightarrow\quad f_{i}\left(x^{\star}\right)=0 \] and \[ f_{i}\left(x^{\star}\right)<0 \quad\Longrightarrow\quad \lambda_{i}^{\star}=0. \] In simpler terms, if a Lagrange multiplier is active (i.e., \(\lambda_{i}^{\star}>0\)), it indicates that the constraint is at the boundary of the feasible set and would be violated otherwise. Similarly, if a constraint is strictly feasible (i.e., \(f_{i}(x^{\star})<0\)), it implies that it is not active, and the corresponding Lagrange multiplier is not even necessary, hence \(\lambda_{i}^{\star}=0\).

KKT Optimality Conditions

We are now ready to write the famous Karush–Kuhn–Tucker (KKT) optimality conditions for convex and nonconvex optimization problems. We assume that the functions \(f_0,f_1,\dots,f_m,h_1,\dots,h_p\) are differentiable.

Let \(\bm{x}^\star\) and \((\bm{\lambda}^\star,\bm{\nu}^\star)\) be any primal and dual optimal points with zero duality gap. Since \(\bm{x}^\star\) minimizes \(L(\bm{x},\bm{\lambda}^\star,\bm{\nu}^\star)\) over \(\bm{x}\), it follows that its gradient must vanish, that is, \[ \nabla f_{0}(\bm{x}^\star)+\sum_{i=1}^{m}\lambda_{i}^{\star}\nabla f_{i}(\bm{x}^\star)+\sum_{i=1}^{p}\nu_{i}^{\star}\nabla h_{i}(\bm{x}^\star) = \bm{0}. \]

Thus, for any optimization problem (not necessarily convex), any pair of optimal and dual points, \(\bm{x}^\star\) and \((\bm{\lambda}^\star,\bm{\nu}^\star)\), must satisfy the KKT optimality conditions: \[\begin{equation} \begin{array}{rll} f_{i}(\bm{x}^\star)\leq0, & i=1,\dots,m, & \textm{(primal feasibility)}\\ h_{i}(\bm{x}^\star)=0, & i=1,\dots,p, & \\ \lambda_{i}^{\star}\ge0, & i=1,\dots,m, & \textm{(dual feasibility)}\\ \lambda_{i}^{\star}f_{i}(\bm{x}^\star) = 0, & i=1,\dots,m, & \textm{(complementary slackness)}\\ \nabla f_{0}(\bm{x}^\star) + \begin{aligned}\sum\limits_{i=1}^{m}\lambda_{i}^{\star}\nabla f_{i}(\bm{x}^\star)\hspace{-0.5em}\end{aligned} & + \hspace{0.5em}\begin{aligned}\sum\limits_{i=1}^{p}\nu_{i}^{\star}\nabla h_{i}(\bm{x}^\star) = \bm{0}.\end{aligned} & \textm{(zero Lagrangian gradient)}\\ \end{array} \tag{A.23} \end{equation}\]

When the primal problem is convex, the KKT conditions (A.23) are also sufficient for the points to be primal and dual optimal. To see this, simply note that since the problem is convex, the Lagrangian in \(\bm{x}\) is also convex. A point \(\tilde{\bm{x}}\) that achieves a zero gradient in the Lagrangian implies that it minimizes the Lagrangian, so \[ \begin{aligned} g(\tilde{\bm{\lambda}},\tilde{\bm{\nu}}) &= L(\tilde{\bm{x}},\tilde{\bm{\lambda}},\tilde{\bm{\nu}})\\ &= f_{0}(\tilde{\bm{x}})+\sum_{i=1}^{m}\tilde{\lambda_{i}}f_{i}(\tilde{\bm{x}})+\sum_{i=1}^{p}\tilde{\nu}_{i}h_{i}(\tilde{\bm{x}})\\ &= f_{0}(\tilde{\bm{x}}), \end{aligned} \] which shows zero duality gap and, therefore, primal and dual optimality. This is summarized in the next statement.

Theorem A.1 (KKT optimality conditions) Suppose we have an optimization problem with differentiable functions.

For any optimization problem (not necessarily convex) with strong duality, the KKT conditions (A.23) are necessary for optimality.
For a convex optimization problem satisfying Slater’s condition, strong duality follows and the KKT conditions (A.23) are necessary and sufficient for optimality.

The KKT conditions play a key role in optimization. In some cases, it is possible to characterize analytically the solution in a convenient form. More generally, many algorithms are conceived, or can be interpreted, as methods for solving the KKT conditions.

References

Bertsekas, D. P. (1999). Nonlinear Programming. Athena Scientific.

Bertsekas, D. P., Nedić, A., and Ozdaglar, A. E. (2003). Convex Analysis and Optimization. Belmont, MA, USA: Athena Scientific.

Boyd, S., and Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press.