A.3 Convex Functions

$\newcommand{\bm}[1]{\boldsymbol{#1}} \newcommand{\textm}[1]{\textsf{#1}} \newcommand{\textnormal}[1]{\textsf{#1}} \def\T{{\mkern-2mu\raise-1mu\mathsf{T}}} \newcommand{\R}{\mathbb{R}} % real numbers \newcommand{\E}{{\rm I\kern-.2em E}} \newcommand{\w}{\bm{w}} % bold w \newcommand{\bmu}{\bm{\mu}} % bold mu \newcommand{\bSigma}{\bm{\Sigma}} % bold mu \newcommand{\bigO}{O} %\mathcal{O} \renewcommand{\d}[1]{\operatorname{d}\!{#1}}$

We now provide a concise overview of convex functions. For more detailed information, the reader is referred to Chapter 3 in Boyd and Vandenberghe (2004).

Definition A.2 (Convex function) A function $f:\R^{n}\rightarrow\R$ is convex if the domain, $\textm{dom}\,f$ , is a convex set and if for all $\bm{x},\bm{y}\in\textm{dom}\,f$ and $0\leq\theta\leq1$ , we have $\begin{equation} f(\theta \bm{x} + (1-\theta)\bm{y})\leq\theta f(\bm{x}) + (1-\theta)f(\bm{y}). \tag{A.3} \end{equation}$

Geometrically, this inequality means that the line segment between $(\bm{x},f(\bm{x}))$ and $(\bm{y},f(\bm{y})),$ which is the chord from $\bm{x}$ to $\bm{y}$ , lies above the graph of $f$ .

A function $f$ is strictly convex if strict inequality holds in (A.3) whenever $\bm{x} \neq \bm{y}$ and $0 < \theta < 1$ . We say $f$ is concave if $-f$ is convex, and strictly concave if $-f$ is strictly convex.

For an affine function we always have equality in (A.3), so all affine (and therefore also linear) functions are both convex and concave. Conversely, any function that is convex and concave is affine.

A.3.1 Elementary Convex and Concave Functions

Apart from linear and affine functions, which are both convex and concave, it is good to become familiar with some elementary examples.

Let us start with some basic examples on $\R$ :

exponential: $e^{ax}$ is convex on $\R$ for any $a\in\R$ ;
powers: $x^a$ is convex on $\R_{++}$ when $a\geq1$ or $a\leq0$ (e.g., $x^2$ ), and concave for $0\le a\le1$ ;
powers of absolute value: $\left|x\right|^{p}$ is convex on $\R$ for $p\geq1$ (e.g., $\left|x\right|$ );
logarithm: $\textm{log}\;x$ is concave on $\R_{++}$ ;
negative entropy: $x\,\textm{log}\;x$ is convex on $\R_{++}$ .

Now, some interesting examples on $\R^n$ :

quadratic function: $f(\bm{x}) = \bm{x}^\T\bm{P}\bm{x} + 2\bm{q}^\T\bm{x} + r$ is convex on $\R^n$ if and only if $\bm{P}\succeq\bm{0}$ ;
norms: every norm $\left\Vert\bm{x}\right\Vert$ is convex on $\R^{n}$ (e.g., $\left\Vert\bm{x}\right\Vert_{\infty}$ , $\left\Vert\bm{x}\right\Vert_{1}$ , and $\left\Vert\bm{x}\right\Vert_{2}$ );
max function: $f(x) = \textm{max}\{x_1, \dots, x_n\}$ is convex on $\R^n$ ;
quadratic over linear function: $f(x,y)=x^{2}/y$ is convex on $\R\times\R_{++}$ ;
geometric mean: $f(\bm{x})=\left(\prod_{i=1}^{n}x_{i}\right)^{1/n}$ is concave on $\R_{++}^{n}$ ;
log-sum-exp function: $f(\bm{x})=\textm{log}\left(e^{x_1}+\dots+e^{x_n}\right)$ is convex on $\R^n$ (it can be used to approximate the function $f(\bm{x}) = \textm{max}\{x_1, \dots, x_n\}$ ).

Finally, some examples on $\R^{n\times n}$ :

log-determinant: the function $f(\bm{X})=\textm{log\,det}(\bm{X})$ is concave on $\mathbb{S}_{++}^{n}=\left\{\bm{X}\in\R^{n\times n} \mid \bm{X}\succ\bm{0}\right\}$ ;
maximum eigenvalue: the function $f(\bm{X}) = \lambda_{\textm{max}}(\bm{X}) \triangleq \underset{\bm{y}\neq\bm{0}}{\textm{sup}}\,\frac{\bm{y}^\T\bm{X}\bm{y}}{\bm{y}^\T\bm{y}}$ is convex on $\mathbb{S}^n$ .

A.3.2 Epigraph

So far, we have used the adjective “convex” to describe both sets and functions, even though it refers to two distinct properties. Interestingly, these two can be linked, as we will demonstrate next.

The graph of a function $f:\R^{n}\rightarrow\R$ is defined as the set $\left\{(\bm{x},f(\bm{x}))\in\R^{n+1} \mid \bm{x} \in \textm{dom}\,f\right\},$ which is a subset of $\R^{n+1}$ .

The epigraph of a function $f:\R^{n}\rightarrow\R$ is defined as the set $\begin{equation} \textm{epi}\,f=\left\{(\bm{x},t)\in\R^{n+1} \mid \bm{x}\in\textm{dom}\,f,\,f(\bm{x})\leq t\right\}. \tag{A.4} \end{equation}$ One way to conceptualize the epigraph is to envision pouring a bucket of water over the function and filling it up indefinitely.

The link between convex sets and convex functions is precisely via the epigraph: A function is convex if and only if its epigraph is a convex set, $f\;\textm{is convex}\quad\Longleftrightarrow\quad\textm{epi}\,f\;\textm{is convex}.$

A.3.3 Characterization of Convex Functions

Apart from the definition of convexity, there are several ways to characterize convex functions such as restriction to a line, first-order condition, and second-order conditions.

Restriction of a Convex Function to a Line

A function is convex if and only if it is convex when restricted to any line that intersects its domain. In other words, $f:\R^{n}\rightarrow\R$ is convex if and only if the function $g:\R\rightarrow\R$ defined as $g(t) = f(\bm{x} + t\bm{v})$ is convex on its domain $\textm{dom}\,g = \{t \mid \bm{x} + t\bm{v} \in\textm{dom}\,f\}$ , for any $\bm{x} \in\textm{dom}\,f$ and $\bm{v}\in\R^n$ .

This property is very useful, since it allows us to check whether a function is convex by restricting it to a line, which is much easier (and can even be plotted in an exploratory analysis).

For example, the proof of the concavity of the log-determinant function $f(\bm{X})=\textm{log\,det}(\bm{X})$ can be reduced to the concavity of the log function: $\begin{aligned} g(t)=\textm{log\,det}(\bm{X}+t\bm{X}) & = \textm{log\,det}(\bm{X}) + \textm{log\,det}\left(\bm{I} + t\bm{X}^{-1/2}\bm{V}\bm{X}^{-1/2}\right)\\ & = \textm{log\,det}(\bm{X}) + \sum_{i=1}^{n}\textm{log}(1 + t\lambda_i), \end{aligned}$ where the $\lambda_{i}$ are the eigenvalues of $\bm{X}^{-1/2}\bm{V}\bm{X}^{-1/2}$ .

First-Order Condition

For a differentiable function $f$ , the gradient $\nabla f(\bm{x})=\begin{bmatrix} \frac{\partial f(\bm{x})}{\partial x_{1}} & \cdots & \frac{\partial f(\bm{x})}{\partial x_{n}}\end{bmatrix}^\T\in\R^n$ exists at each point in $\textm{dom}\,f$ , which is open. We can use it to write the first-order Taylor approximation of $f$ near $\bm{x}$ : $f(\bm{y}) \approx f(\bm{x}) + \nabla f(\bm{x})^\T(\bm{y} - \bm{x}).$

Suppose $f$ is differentiable. Then $f$ is convex if and only if $\textm{dom}\,f$ is convex and $\begin{equation} f(\bm{y}) \geq f(\bm{x}) + \nabla f(\bm{x})^\T(\bm{y} - \bm{x}) \tag{A.5} \end{equation}$ holds for all $\bm{x},\bm{y}\in\textm{dom}\,f$ .

Geometrically, the inequality (A.5) states that for a convex function, the first-order Taylor approximation is in fact a global underestimator of the function. Conversely, if the first-order Taylor approximation of a function is always a global underestimator of the function, then the function is convex.

The inequality (A.5) shows that from local information about a convex function (i.e., its value and derivative at a point) we can derive global information (i.e., a global underestimator of it). This is a remarkable property of convex functions and it justifies the connection between local optimality and global optimality in convex optimization problems.

Second-Order Condition

For a twice-differentiable function $f$ , the Hessian $\nabla^{2}f(\bm{x}) = \left(\frac{\partial^{2}f(\bm{x})}{\partial x_{i}\partial x_{j}}\right)_{ij}\in\R^{n\times n}$ exists at each point in $\textm{dom}\,f$ , which is open. The Hessian can be used to write the second-order Taylor approximation of $f$ near $\bm{x}$ : $f(\bm{y}) \approx f(\bm{x}) + \nabla f(\bm{x})^\T(\bm{y} - \bm{x}) + \frac{1}{2}(\bm{y} - \bm{x})^\T\nabla^{2}f(\bm{x})(\bm{y} - \bm{x}).$

Suppose $f$ is twice differentiable. Then $f$ is convex if and only if $\textm{dom}\,f$ is convex and its Hessian is positive semidefinite: $\nabla^{2}f(\bm{x})\succeq\bm{0}$ for all $\bm{x}\in\textm{dom}\,f$ .

For a function on $\R$ , this reduces to the simple condition $f''(x) \ge 0$ (and $\textm{dom}\,f$ convex, i.e., an interval), which means that the derivative is nondecreasing. The condition $\nabla^{2}f(\bm{x})\succeq\bm{0}$ can be interpreted geometrically as the requirement that the graph of the function has positive (upward) curvature at $\bm{x}$ .

Similarly, $f$ is concave if and only if $\textm{dom}\,f$ is convex and $\nabla^{2}f(\bm{x})\preceq\bm{0}$ for all $\bm{x}\in\textm{dom}\,f$ .

A.3.4 Operations that Preserve Convexity

Thus far, we have explored four different methods to characterize the convexity of a function: applying the definition directly, restricting the function to a line, using the first-order condition, and employing the second-order condition. However, in most practical scenarios, a more intriguing approach to establish convexity is to demonstrate that the function can be derived from basic convex or concave functions (such as exponentials, powers, and norms) through operations that preserve the convexity of functions. This approach is essentially a calculus of convex functions.

Some simple operations that preserve convexity of functions include: nonnegative weighted sum, composition with affine functions, pointwise maximum, pointwise supremum, certain compositions with nonaffine functions, partial minimization, and perspective.

Nonnegative Weighted Sum

If $f_1$ and $f_2$ are both convex functions, then so is their sum $f_1 + f_2$ . Also, scaling a function $f$ with a nonnegative number $\alpha\ge0$ preserves convexity. Combining nonnegative scaling and addition, we get that a nonnegative weighted sum of convex functions, with weights $w_1,\dots,w_m\ge0$ , $f = w_1 f_1 + \dots + w_m f_m,$ is convex.

Composition with an Affine Mapping

Suppose $h:\R^m\rightarrow\R$ , $\bm{A}\in\R^{m\times n}$ , and $\bm{b}\in\R^m$ . Define $f:\R^n\rightarrow\R$ as the composition of $f$ with the affine mapping $\bm{A}\bm{x} + \bm{b}$ : $f(\bm{x}) = h(\bm{A}\bm{x} + \bm{b}),$ with $\textm{dom}\,f=\left\{\bm{x} \mid \bm{A}\bm{x} + \bm{b} \in \textm{dom}\,h \right\}$ . Then, if $f$ is convex, so is $g$ ; if $f$ is concave, so is $g$ .

For example, $f(\bm{x}) = \left\Vert \bm{y} - \bm{A}\bm{x}\right\Vert$ is convex and $f(\bm{X}) = \textm{log\,det}\left(\bm{I} + \bm{H}\bm{X}\bm{H}^\T\right)$ is concave.

Pointwise Maximum and Supremum

If $f_1$ and $f_2$ are convex functions, then their pointwise maximum $f$ , defined as $f(\bm{x}) = \textm{max}\left\{f_1(\bm{x}), f_2(\bm{x})\right\},$ with $\textm{dom}\,f = \textm{dom}\,f_1 \cap \textm{dom}\,f_2$ , is also convex. This property extends to more than two functions. If $f_1, \dots, f_m$ are convex, then their pointwise maximum $f(\bm{x}) = \textm{max}\left\{f_1(\bm{x}), \dots, f_m(\bm{x})\right\}$ is also convex.

For example, the sum of the $r$ largest components of $\bm{x}\in\R^n$ , $f(\bm{x}) = x_{\left[1\right]}+x_{\left[2\right]}+\cdots+x_{\left[r\right]}$ , where $x_{\left[i\right]}$ is the $i$ th largest component of $\bm{x}$ , is convex because it can be written as the pointwise maximum $f(\bm{x})=\textm{max}\left\{ x_{i_{1}}+x_{i_{2}}+\cdots+x_{i_{r}}\mid1\leq i_{1}<i_{2}<\cdots<i_{r}\leq n\right\}.$

The pointwise maximum property extends to the pointwise supremum over an infinite set of convex functions. If for each $\bm{y}\in\mathcal{Y}$ , $f(\bm{x},\bm{y})$ is convex in $\bm{x}$ , then the pointwise supremum $g$ , defined as $g(\bm{x})=\textm{sup}_{\bm{y}\in\mathcal{Y}}f(\bm{x},\bm{y}),$ is convex in $\bm{x}$ .

One simple example is distance to farthest point in a set $\mathcal{C}$ : $f(\bm{x})=\textm{sup}_{\bm{y}\in C}\left\Vert \bm{x} - \bm{y}\right\Vert.$

Another example is the maximum eigenvalue function of a symmetric matrix, $\lambda_{\textm{max}}(\bm{X})=\underset{\bm{y}\neq\bm{0}}{\textm{sup}}\,\frac{\bm{y}^\T\bm{X}\bm{y}}{\bm{y}^\T\bm{y}}.$

Composition with Arbitrary Functions

We have observed that the composition of a function with an affine mapping preserves convexity. We will now examine the conditions under which this can be generalized to non-affine mappings.

Suppose $h:\R^m\rightarrow\R$ and $g:\R^n\rightarrow\R^m$ . Their composition $f=h\circ g:\R^n\rightarrow\R$ is defined as $f(\bm{x}) = h(g(\bm{x}))$ with $\textm{dom}\;f = \left\{\bm{x}\in\textm{dom}\,g \mid g(\bm{x})\in\textm{dom}\,h\right\}$ .

Let us start with the composition with a scalar mapping $m=1$ for simplicity, so $h:\R\rightarrow\R$ and $g:\R^n\rightarrow\R$ . The function $f(\bm{x}) = h(g(\bm{x}))$ satisfies $f\textm{ is convex if } \left\{ \begin{array}{l} h\textm{ is convex nondecreasing and } g \textm{ is convex}\\ h\textm{ is convex nonincreasing and } g \textm{ is concave} \end{array} \right.$ and $f\textm{ is concave if } \left\{ \begin{array}{l} h\textm{ is concave nondecreasing and } g \textm{ is concave}\\ h\textm{ is concave nonincreasing and } g \textm{ is convex.} \end{array} \right.$

For the case $n=1$ , one can easily derive the previous results from the second derivative of the composition function $f = h \circ g$ given by $f''(x) = h''(g(x))g'(x)^2 + h'(g(x))g''(x).$ So, for example, if $h$ is convex and nondecreasing, we have $h''(g(x))\ge0$ and $h'(g(x))\ge0$ , and if $g$ is convex then $g''(x)\ge0$ , which guarantees that $f''(x)\ge0$ so $f$ is convex.

Here are some examples:

if $g$ is convex, then $\textm{exp}\;g(x)$ is convex;
if $g$ is concave and positive, then $\textm{log}(g(x))$ is concave;
if $g$ is concave and positive, then $1/g(x)$ is convex;
if $g$ is convex and nonnegative, then $g(x)^p$ is convex for $p\ge1$ ;
if $g$ is convex, then $-\textm{log}(-g(x))$ is convex on $\{x\mid g(x)<0\}$ .

Now let consider the general case of vector composition: $f(\bm{x}) = h(g(\bm{x})) = h(g_1(\bm{x}),\dots,g_m(\bm{x}))$ with $h:\R^m\rightarrow\R$ and $g:\R^n\rightarrow\R^m$ . The rules in this case are generalized to: $f\textm{ is convex if } \left\{ \begin{array}{l} h\textm{ is convex, nondecreasing in each argument, and } g_i \textm{ are convex}\\ h\textm{ is convex, nonincreasing in each argument, and } g_i \textm{ are concave} \end{array} \right.$ and $f\textm{ is concave if } \left\{ \begin{array}{l} h\textm{ is concave, nondecreasing in each argument, and } g_i \textm{ are concave}\\ h\textm{ is concave, nonincreasing in each argument, and } g_i \textm{ are convex.} \end{array} \right.$

Partial Minimization

We have seen that the maximum or supremum of an arbitrary family of convex functions is convex. It turns out that some special forms of minimization also yield convex functions.

If $f(\bm{x},\bm{y})$ is convex in $(\bm{x},\bm{y})$ and $\mathcal{C}$ is a convex set, then the function $g(\bm{x})=\underset{\bm{y} \in \mathcal{C}}{\textm{inf}}\,f(\bm{x},\bm{y})$ is convex in $\bm{x}$ . Note that the requirement here is for joint convexity in $(\bm{x},\bm{y})$ , unlike the case of supremum where the requirement is for convexity in $\bm{x}$ for any given $\bm{y}$ .

One simple example is the distance to a set $\mathcal{C}$ : $f(\bm{x})=\textm{inf}_{\bm{y}\in C}\left\Vert\bm{x} - \bm{y}\right\Vert,$ which is convex if $\mathcal{C}$ is convex.

Perspective

Suppose $f:\R^n\rightarrow\R$ , then the perspective of $f$ is the function $g:\R^{n+1}\rightarrow\R$ defined as $g(\bm{x},t) = tf(\bm{x}/t),$ with domain $\textm{dom}\,g=\left\{(\bm{x},t)\in\R^{n+1}\mid \bm{x}/t\in\textm{dom}\,f,\,t>0\right\}$ .

The perspective operation preserves convexity: If $f$ is a convex function, then so is its perspective function $g$ .

For example, since $f(\bm{x})=\bm{x}^\T\bm{x}$ is convex, then its perspective $g(\bm{x},t)=\bm{x}^\T \bm{x}/t$ is convex for $t>0$ .

Also, since the negative logarithm $f(x)=-\text{log}\,x$ is convex, its perspective (known as the relative entropy function) $g(x,t)=t\text{log}\,t - t\text{log}\,x$ is also convex on $\R_{++}^{2}$ .

A.3.5 Quasi-convex Functions

The $\alpha$ -sublevel set of a function $f:\R^{n}\rightarrow\R$ is defined as $\mathcal{S}_{\alpha}=\left\{\bm{x}\in\textm{dom}\,f\mid f(\bm{x})\leq\alpha\right\}.$ Sublevel sets of a convex function are convex for any value of $\alpha$ . The converse is not true: a function can have all its sublevel sets convex, but not be a convex function. For example, $f(x) = -e^x$ is not convex on $\R$ (indeed, it is strictly concave) but all its sublevel sets are convex.

A function $f:\R^{n}\rightarrow\R$ is called quasi-convex if its domain and all its sublevel sets $\mathcal{S}_{\alpha}$ , for all $\alpha$ , are convex. A function $f$ is quasi-concave if $-f$ is quasi-convex. A function that is both quasi-convex and quasi-concave is called quasi-linear.

For example, for a function on $\R$ to be quasi-convex, each sublevel set must be an interval. Here are some other illustrative examples:

$\sqrt{\left|x\right|}$ is quasi-convex on $\R$ ;
$\textm{ceil}\left(x\right)=\textm{inf}\left\{ z\in\mathbb{Z}\mid z\geq x\right\}$ is quasi-linear;
$\textm{log}\;x$ is quasi-linear on $\R_{++}$ ;
$f\left(x_{1},x_{2}\right)=x_{1}x_{2}$ is quasi-concave on $\R_{++}^{2}$ ;
the linear-fractional function $f(\bm{x})=\frac{\bm{a}^\T\bm{x} + b}{\bm{c}^\T\bm{x} + d},\qquad \textm{dom}\,f=\left\{\bm{x} \mid \bm{c}^\T\bm{x} + d>0\right\}$ is quasi-linear.

We can always represent the sublevel sets of a quasi-convex function $f$ (which are convex) via inequalities of convex functions: $f(\bm{x}) \leq t \quad \Longleftrightarrow \quad \phi_t(\bm{x}) \leq0,$ where $\phi_t(\bm{x})$ is a family of convex functions in $\bm{x}$ (indexed by $t$ ).

For example, consider a convex over concave function $f(\bm{x}) = p(\bm{x})/q(\bm{x})$ , where $p(\bm{x})\ge0$ and $q(\bm{x})>0$ . The function $f(\bm{x})$ is not convex but it is quasi-convex: $f(\bm{x}) \leq t \quad \Longleftrightarrow \quad p(\bm{x}) - tq(\bm{x}) \leq0,$ so we can take the convex function $\phi_t(\bm{x}) = p(\bm{x}) - tq(\bm{x})$ for $t\geq0$ .

References

Boyd, S., and Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press.