B.1 Solvers
A solver, also referred to as an optimizer, is an engine designed to solve specific types of mathematical problems. Every programming language, including R, Python, Matlab, Julia, Rust, C, and C++, offers a comprehensive list of available solvers. Each of these solvers is typically capable of handling only a certain category of problems, such as LP, QP, QCQP, SOCP, or SDP. For a detailed classification of optimization problems, refer to Section A.5.
Some popular solvers
Many of the currently popular solvers were originally developed in a specific programming language, such as Fortran, C, C++, or Matlab. However, over time, they have been adapted and integrated into a wide range of other programming languages.
For the purpose of illustration, among many others, some popular solvers include:
GLPK (GNU Linear Programming Kit)78: intended for large-scale LP including mixed-integer variables, written in C.
quadprog79: very popular open-source QP solver originally written in Fortran by Berwin Turlach in the late 1980s, and now accessible from most other programming languages.
MOSEK80: proprietary LP, QP, SOCP, SDP solver including mixed-integer variables established in 1997 by Erling Andersen (free licence available for academia); specialized in large-scale problems and very fast, robust, and reliable.
SeDuMi81: open-source LP, QP, SOCP, SDP solver originally developed by Sturm in 1999 for Matlab (Sturm, 1999).
SDPT382: open-source LP, QP, SOCP, SDP solver originally developed in 1999 for Matlab (Tütüncü et al., 2003).
Gurobi83: proprietary LP, QP, and SOCP solver including mixed-integer variables (free licence available for academia).
Embedded COnic Solver (ECOS)84: SOCP solver originally written in C.
CPLEX85: proprietary LP and QP solver that also handles mixed-integer variables (free licence available for academia).
Complexity of interior-point methods
Internally, solvers can be based on different types of algorithms like the simplex method for LP (Nocedal and Wright, 2006), cutting plane methods (Bertsekas, 1999), interior-point methods (S. P. Boyd and Vandenberghe, 2004; Nemirovski, 2001; Nesterov, 2018; Nesterov and Nemirovskii, 1994; Nocedal and Wright, 2006), or even more specific algorithms tailored to a more narrow type of problems such as \(\ell_2\)-norm regression problems with an \(\ell_1\)-norm regularization term (used to promote sparsity) (Tibshirani, 1996).
In general, the complexity of interior-point methods for LP, QP, QCQP, SOCP, and SDP is \(\bigO\left(n^3L\right)\),86 where \(n\) is the number of variables and \(L\) is the number of accuracy digits of the solution. To further discern the difference in the complexity, we need to take into account the number of constraints, \(m\), and dimensionality of the different cones, \(k\) (Nemirovski, 2001):
LP: complexity of \(\bigO\left((m + n)^{3/2}n^2L\right)\), later improved to \(\bigO\left(((m+ n) n^2+(m+ n)^{1.5} n) L\right)\), and even to \(\bigO\left(\left(n^3/\textm{log}(n)\right)L\right)\) when \(m\) and \(n\) are of the same order;
QCQP: complexity of \(\bigO\left(\sqrt{m}(m + n)n^2L\right)\);
SOCP: complexity of \(\bigO\left(\sqrt{m + 1}\;n (n^2 + m + (m+1)k^2)L\right)\); and
SDP: complexity of \(\bigO\left(\sqrt{1 + mk}\;n (n^2 + nmk^2 + mk^3)L\right)\), where \(k\) means that the matrices are of dimension \(k\times k\).
For example, consider an SOCP where the number of constraints and cone dimension are both of the same order as the number of variables (i.e., \(m=\bigO(n)\) and \(k=\bigO(n)\)), then the complexity order is \(\bigO(n^{4.5}L)\). Consider now an SDP where the cone dimension is of the same order as the number of variables (i.e., \(k=\bigO(n)\)), then the complexity order is to \(\bigO(n^4)\); if, in addition, the number of constraints also grows with the number of variables (i.e., \(m=\bigO(n)\)), then the complexity order becomes \(\bigO\left(n^6L\right)\). We can clearly see that the complexity for solving SOCP is much higher than that of LP, QP, and QCQP; and it only gets worse for SDP.
Interface with solvers
Solvers require that problems be expressed in a standard form; that is, the arguments to be passed to the solvers must be formatted in a very specific way. However, most formulated problems do not immediately present themselves in a standard form and they must be transformed. This process is time-consuming and, more importantly, it is susceptible to human transcription errors, as illustrated by the following examples.
Example B.1 (Norm approximation problem) Consider the problem \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \left\Vert \bm{A}\bm{x} - \bm{b} \right\Vert, \end{array} \] whose solution depends on the choice of the norm and so does the process of conversion to standard form.
If we choose the Euclidean or \(\ell_{2}\)-norm, \(\left\Vert \bm{A}\bm{x} - \bm{b} \right\Vert_2\), then the problem is just a least squares (LS) with analytic solution \(\bm{x}^{\star}=(\bm{A}^\T\bm{A})^{-1}\bm{A}^\T\bm{b}\).
If we choose the Chebyshev or \(\ell_{\infty}\)-norm, \(\left\Vert \bm{A}\bm{x} - \bm{b} \right\Vert_{\infty}\), then the problem can be rewritten as the LP \[ \begin{array}{ll} \underset{\bm{x},t}{\textm{minimize}} & \begin{array}{c} t\end{array}\\ \textm{subject to} & \begin{array}[t]{l} -t\bm{1}\leq\bm{A}\bm{x}-\bm{b}\leq t\bm{1}\end{array} \end{array} \] or, equivalently, \[ \begin{array}{ll} \underset{\bm{x},t}{\textm{minimize}} & \begin{array}{c} \begin{bmatrix} \bm{0}^\T & 1\end{bmatrix} \begin{bmatrix} \bm{x}\\ t \end{bmatrix} \end{array}\\ \textm{subject to} & \begin{array}[t]{l} \begin{bmatrix} \bm{A} & -\bm{1}\\ -\bm{A} & -\bm{1} \end{bmatrix} \begin{bmatrix} \bm{x}\\ t \end{bmatrix}\leq\begin{bmatrix} \bm{b}\\ -\bm{b} \end{bmatrix} \end{array}, \end{array} \] which finally can be written as the Matlab code
- If we choose the Manhattan or \(\ell_{1}\)-norm, \(\left\Vert \bm{A}\bm{x} - \bm{b} \right\Vert_{1}\), then the problem can be rewritten as the LP \[ \begin{array}{ll} \underset{\bm{x},\bm{t}}{\textm{minimize}} & \begin{array}{c} \bm{1}^\T\bm{t}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} -\bm{t}\leq\bm{A}\bm{x}-\bm{b}\leq\bm{t}\end{array} \end{array} \] or, equivalently, \[ \begin{array}{ll} \underset{\bm{x},t}{\textm{minimize}} & \begin{array}{c} \begin{bmatrix} \bm{0}^\T & \bm{1}^\T\end{bmatrix} \begin{bmatrix} \bm{x}\\ \bm{t} \end{bmatrix}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \begin{bmatrix} \bm{A} & -\bm{I}\\ -\bm{A} & -\bm{I} \end{bmatrix}\begin{bmatrix} \bm{x}\\ \bm{t} \end{bmatrix}\leq\begin{bmatrix} \bm{b}\\ -\bm{b} \end{bmatrix}\end{array}, \end{array} \] which finally can be written as the Matlab code
The previous example was as simple as it can get. We explore now a slightly more complicated problem with linear constraints and observe how quickly the code becomes complicated and prone to errors.
Example B.2 (Euclidean norm approximation problem with linear constraints) Consider the problem \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c} \left\Vert \bm{A}\bm{x}-\bm{b}\right\Vert _{2}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{C}\bm{x}=\bm{d}\\ \bm{l}\leq\bm{x}\leq\bm{u}. \end{array} \end{array} \] This can be expressed as \[ \begin{array}{ll} \underset{\bm{x},\bm{y},t,\bm{s}_{l},\bm{s}_{u}}{\textm{minimize}} & \begin{array}{c} t\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{A}\bm{x}-\bm{b}=\bm{y}\\ \bm{C}\bm{x}=\bm{d}\\ \bm{x}-\bm{s}_{l}=\bm{l}\\ \bm{x}+\bm{s}_{u}=\bm{u}\\ \bm{s}_{l},\bm{s}_{u}\geq\bm{0}\\ \left\Vert \bm{y}\right\Vert _{2}\leq t \end{array} \end{array} \] or, equivalently, \[ \qquad\qquad\begin{array}{ll} \underset{\bm{x},\bm{y},t,\bm{s}_{l},\bm{s}_{u}}{\textm{minimize}} & \begin{array}{c} \begin{bmatrix} \bm{0}^\T & \bm{0}^\T & \bm{0}^\T & \bm{0}^\T & 1\end{bmatrix}\bar{\bm{x}}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \begin{bmatrix} \bm{A} & & & -\bm{I}\\ \bm{C}\\ \bm{I} & -\bm{I}\\ \bm{I} & & \bm{I} \end{bmatrix}\bar{\bm{x}}\leq\begin{bmatrix} \bm{b}\\ \bm{d}\\ \bm{l}\\ \bm{u} \end{bmatrix}\\ \bar{\bm{x}}\in\bm{R}^{n}\times\bm{R}_{+}^{n}\times\bm{R}_{+}^{n}\times\bm{Q}^{m}, \end{array} \end{array} \] which finally can be written as the Matlab code
AA = [ A, zeros(m,n), zeros(m,n), -eye(m), 0;
C, zeros(p,n), zeros(p,n), zeros(p,n), 0;
eye(n), -eye(n), zeros(n,n), zeros(n,n), 0;
eye(n), zeros(n,n), eye(n), zeros(n,n), 0 ]
bb = [ b; d; l; u ]
cc = [ zeros(3*n + m, 1); 1 ]
K.f = n; K.l = 2*n; K.q = m + 1
xsyz = sedumi( AA, bb, cc, K )
x = xsyz(1:n)
Modeling frameworks
A modeling framework simplifies the use of solvers by shielding the user from the specific details of the solver argument formatting. It effectively acts as an interface between the user and the solver. In fact, a modeling framework can typically interface with a variety of solvers that the user can choose depending on the type of problem. A modeling framework is a fantastic tool for rapid prototyping of models while avoiding transcription errors when writing the code. Nevertheless, if high speed is desired, then one should consider calling the solver directly while avoiding the convenient interface provided by a modeling framework.
Arguably, the two most successful examples (both of which are freely available) are YALMIP87 (for Matlab) (Löfberg, 2004) and CVX88. CVX was initially released in 2005 for Matlab and now available in Python, R, and Julia (Fu et al., 2020; Grant and Boyd, 2008, 2014).
CVX stands for “convex disciplined programming” and it is a convenient and powerful tool for the rapid prototyping of models and algorithms incorporating convex optimization (also allows integer constraints). It interfaces with many solvers such as the free solvers SeDuMi and SDPT3, as well as the commercial solvers Gurobi and MOSEK. Internally, CVX knows the elementary convex and concave functions as well as the rules of compositions that preserve convexity. This way, it can determine whether the problem is convex or not. The usage of CVX is extremely simple and it is very convenient for prototyping while avoiding transcript errors.
Example B.3 (Constrained Euclidean norm approximation in CVX) Consider the problem \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c} \left\Vert \bm{A}\bm{x}-\bm{b}\right\Vert _{2}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{C}\bm{x}=\bm{d}\\ \bm{l}\leq\bm{x}\leq\bm{u}. \end{array} \end{array} \] The corresponding code in different popular languages (e.g., Matlab, R, and Python) is almost identical and very simple.
- Matlab:
- R:
x <- Variable(n)
prob <- Problem(Minimize(cvxr_norm(A %*% x - b, 2)),
list(C %*% x == d,
l <= x,
x <= u))
solve(prob)
- Python:
References
quadprog R version: https://cran.r-project.org/package=quadprog↩︎
MOSEK: https://www.mosek.com/↩︎
SeDuMi: https://sedumi.ie.lehigh.edu/; Matlab version at https://github.com/sqlp/sedumi↩︎
SDPT3: https://blog.nus.edu.sg/mattohkc/softwares/sdpt3/, https://github.com/Kim-ChuanToh/SDPT3, https://github.com/sqlp/sdpt3↩︎
Gurobi optimizer: https://www.gurobi.com/↩︎
IBM CPLEX optimizer: https://www.ibm.com/analytics/cplex-optimizer↩︎
The “big O” notation, \(\bigO(\cdot)\), measures the order of complexity. To be specific, we say that the complexity is \(\bigO\left(g(N)\right)\), as \(N\rightarrow\infty\), if there exists a positive real number \(M\) and \(N_0\) such that the complexity is upper-bounded by \(Mg(N)\) for all \(N\geq N_0\).↩︎
YALMIP: https://yalmip.github.io/↩︎
CVX: http://cvxr.com; CVX in R: https://cvxr.rbind.io/↩︎