\( \newcommand{\bm}[1]{\boldsymbol{#1}} \newcommand{\textm}[1]{\textsf{#1}} \def\T{{\mkern-2mu\raise-1mu\mathsf{T}}} \newcommand{\R}{\mathbb{R}} % real numbers \newcommand{\E}{{\rm I\kern-.2em E}} \newcommand{\w}{\bm{w}} % bold w \newcommand{\bmu}{\bm{\mu}} % bold mu \newcommand{\bSigma}{\bm{\Sigma}} % bold mu \newcommand{\bigO}{O} %\mathcal{O} \renewcommand{\d}[1]{\operatorname{d}\!{#1}} \)

B.1 Solvers

A solver, also referred to as an optimizer, is an engine designed to solve specific types of mathematical problems. Every programming language, including R, Python, Matlab, Julia, Rust, C, and C++, offers a comprehensive list of available solvers. Each of these solvers is typically capable of handling only a certain category of problems, such as LP, QP, QCQP, SOCP, or SDP. For a detailed classification of optimization problems, refer to Section A.5.

Complexity of interior-point methods

Internally, solvers can be based on different types of algorithms like the simplex method for LP (Nocedal and Wright, 2006), cutting plane methods (Bertsekas, 1999), interior-point methods (S. P. Boyd and Vandenberghe, 2004; Nemirovski, 2001; Nesterov, 2018; Nesterov and Nemirovskii, 1994; Nocedal and Wright, 2006), or even more specific algorithms tailored to a more narrow type of problems such as \(\ell_2\)-norm regression problems with an \(\ell_1\)-norm regularization term (used to promote sparsity) (Tibshirani, 1996).

In general, the complexity of interior-point methods for LP, QP, QCQP, SOCP, and SDP is \(\bigO\left(n^3L\right)\),86 where \(n\) is the number of variables and \(L\) is the number of accuracy digits of the solution. To further discern the difference in the complexity, we need to take into account the number of constraints, \(m\), and dimensionality of the different cones, \(k\) (Nemirovski, 2001):

  • LP: complexity of \(\bigO\left((m + n)^{3/2}n^2L\right)\), later improved to \(\bigO\left(((m+ n) n^2+(m+ n)^{1.5} n) L\right)\), and even to \(\bigO\left(\left(n^3/\textm{log}(n)\right)L\right)\) when \(m\) and \(n\) are of the same order;

  • QCQP: complexity of \(\bigO\left(\sqrt{m}(m + n)n^2L\right)\);

  • SOCP: complexity of \(\bigO\left(\sqrt{m + 1}\;n (n^2 + m + (m+1)k^2)L\right)\); and

  • SDP: complexity of \(\bigO\left(\sqrt{1 + mk}\;n (n^2 + nmk^2 + mk^3)L\right)\), where \(k\) means that the matrices are of dimension \(k\times k\).

For example, consider an SOCP where the number of constraints and cone dimension are both of the same order as the number of variables (i.e., \(m=\bigO(n)\) and \(k=\bigO(n)\)), then the complexity order is \(\bigO(n^{4.5}L)\). Consider now an SDP where the cone dimension is of the same order as the number of variables (i.e., \(k=\bigO(n)\)), then the complexity order is to \(\bigO(n^4)\); if, in addition, the number of constraints also grows with the number of variables (i.e., \(m=\bigO(n)\)), then the complexity order becomes \(\bigO\left(n^6L\right)\). We can clearly see that the complexity for solving SOCP is much higher than that of LP, QP, and QCQP; and it only gets worse for SDP.

Interface with solvers

Solvers require that problems be expressed in a standard form; that is, the arguments to be passed to the solvers must be formatted in a very specific way. However, most formulated problems do not immediately present themselves in a standard form and they must be transformed. This process is time-consuming and, more importantly, it is susceptible to human transcription errors, as illustrated by the following examples.

Example B.1 (Norm approximation problem) Consider the problem \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \left\Vert \bm{A}\bm{x} - \bm{b} \right\Vert, \end{array} \] whose solution depends on the choice of the norm and so does the process of conversion to standard form.

  • If we choose the Euclidean or \(\ell_{2}\)-norm, \(\left\Vert \bm{A}\bm{x} - \bm{b} \right\Vert_2\), then the problem is just a least squares (LS) with analytic solution \(\bm{x}^{\star}=(\bm{A}^\T\bm{A})^{-1}\bm{A}^\T\bm{b}\).

  • If we choose the Chebyshev or \(\ell_{\infty}\)-norm, \(\left\Vert \bm{A}\bm{x} - \bm{b} \right\Vert_{\infty}\), then the problem can be rewritten as the LP \[ \begin{array}{ll} \underset{\bm{x},t}{\textm{minimize}} & \begin{array}{c} t\end{array}\\ \textm{subject to} & \begin{array}[t]{l} -t\bm{1}\leq\bm{A}\bm{x}-\bm{b}\leq t\bm{1}\end{array} \end{array} \] or, equivalently, \[ \begin{array}{ll} \underset{\bm{x},t}{\textm{minimize}} & \begin{array}{c} \begin{bmatrix} \bm{0}^\T & 1\end{bmatrix} \begin{bmatrix} \bm{x}\\ t \end{bmatrix} \end{array}\\ \textm{subject to} & \begin{array}[t]{l} \begin{bmatrix} \bm{A} & -\bm{1}\\ -\bm{A} & -\bm{1} \end{bmatrix} \begin{bmatrix} \bm{x}\\ t \end{bmatrix}\leq\begin{bmatrix} \bm{b}\\ -\bm{b} \end{bmatrix} \end{array}, \end{array} \] which finally can be written as the Matlab code

xt = linprog( [zeros(n,1); 1],
              [A,-ones(m,1); -A,-ones(m,1)],
              [b; -b] )
x = xt(1:n)              
  • If we choose the Manhattan or \(\ell_{1}\)-norm, \(\left\Vert \bm{A}\bm{x} - \bm{b} \right\Vert_{1}\), then the problem can be rewritten as the LP \[ \begin{array}{ll} \underset{\bm{x},\bm{t}}{\textm{minimize}} & \begin{array}{c} \bm{1}^\T\bm{t}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} -\bm{t}\leq\bm{A}\bm{x}-\bm{b}\leq\bm{t}\end{array} \end{array} \] or, equivalently, \[ \begin{array}{ll} \underset{\bm{x},t}{\textm{minimize}} & \begin{array}{c} \begin{bmatrix} \bm{0}^\T & \bm{1}^\T\end{bmatrix} \begin{bmatrix} \bm{x}\\ \bm{t} \end{bmatrix}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \begin{bmatrix} \bm{A} & -\bm{I}\\ -\bm{A} & -\bm{I} \end{bmatrix}\begin{bmatrix} \bm{x}\\ \bm{t} \end{bmatrix}\leq\begin{bmatrix} \bm{b}\\ -\bm{b} \end{bmatrix}\end{array}, \end{array} \] which finally can be written as the Matlab code
xt = linprog( [zeros(n,1); ones(n,1)],
              [A,-eye(m,1); -A,-eye(m,1)],
              [b; -b] )
x = xt(1:n)              

The previous example was as simple as it can get. We explore now a slightly more complicated problem with linear constraints and observe how quickly the code becomes complicated and prone to errors.

Example B.2 (Euclidean norm approximation problem with linear constraints) Consider the problem \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c} \left\Vert \bm{A}\bm{x}-\bm{b}\right\Vert _{2}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{C}\bm{x}=\bm{d}\\ \bm{l}\leq\bm{x}\leq\bm{u}. \end{array} \end{array} \] This can be expressed as \[ \begin{array}{ll} \underset{\bm{x},\bm{y},t,\bm{s}_{l},\bm{s}_{u}}{\textm{minimize}} & \begin{array}{c} t\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{A}\bm{x}-\bm{b}=\bm{y}\\ \bm{C}\bm{x}=\bm{d}\\ \bm{x}-\bm{s}_{l}=\bm{l}\\ \bm{x}+\bm{s}_{u}=\bm{u}\\ \bm{s}_{l},\bm{s}_{u}\geq\bm{0}\\ \left\Vert \bm{y}\right\Vert _{2}\leq t \end{array} \end{array} \] or, equivalently, \[ \qquad\qquad\begin{array}{ll} \underset{\bm{x},\bm{y},t,\bm{s}_{l},\bm{s}_{u}}{\textm{minimize}} & \begin{array}{c} \begin{bmatrix} \bm{0}^\T & \bm{0}^\T & \bm{0}^\T & \bm{0}^\T & 1\end{bmatrix}\bar{\bm{x}}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \begin{bmatrix} \bm{A} & & & -\bm{I}\\ \bm{C}\\ \bm{I} & -\bm{I}\\ \bm{I} & & \bm{I} \end{bmatrix}\bar{\bm{x}}\leq\begin{bmatrix} \bm{b}\\ \bm{d}\\ \bm{l}\\ \bm{u} \end{bmatrix}\\ \bar{\bm{x}}\in\bm{R}^{n}\times\bm{R}_{+}^{n}\times\bm{R}_{+}^{n}\times\bm{Q}^{m}, \end{array} \end{array} \] which finally can be written as the Matlab code

AA = [ A, zeros(m,n), zeros(m,n),   -eye(m),    0;
       C, zeros(p,n), zeros(p,n),   zeros(p,n), 0;
       eye(n), -eye(n), zeros(n,n), zeros(n,n), 0;
       eye(n), zeros(n,n), eye(n),  zeros(n,n), 0 ]
bb = [ b; d; l; u ]
cc = [ zeros(3*n + m, 1); 1 ]
K.f = n; K.l = 2*n; K.q = m + 1
xsyz = sedumi( AA, bb, cc, K )
x = xsyz(1:n)            

Modeling frameworks

A modeling framework simplifies the use of solvers by shielding the user from the specific details of the solver argument formatting. It effectively acts as an interface between the user and the solver. In fact, a modeling framework can typically interface with a variety of solvers that the user can choose depending on the type of problem. A modeling framework is a fantastic tool for rapid prototyping of models while avoiding transcription errors when writing the code. Nevertheless, if high speed is desired, then one should consider calling the solver directly while avoiding the convenient interface provided by a modeling framework.

Arguably, the two most successful examples (both of which are freely available) are YALMIP87 (for Matlab) (Löfberg, 2004) and CVX88. CVX was initially released in 2005 for Matlab and now available in Python, R, and Julia (Fu et al., 2020; Grant and Boyd, 2008, 2014).

CVX stands for “convex disciplined programming” and it is a convenient and powerful tool for the rapid prototyping of models and algorithms incorporating convex optimization (also allows integer constraints). It interfaces with many solvers such as the free solvers SeDuMi and SDPT3, as well as the commercial solvers Gurobi and MOSEK. Internally, CVX knows the elementary convex and concave functions as well as the rules of compositions that preserve convexity. This way, it can determine whether the problem is convex or not. The usage of CVX is extremely simple and it is very convenient for prototyping while avoiding transcript errors.

Example B.3 (Constrained Euclidean norm approximation in CVX) Consider the problem \[ \begin{array}{ll} \underset{\bm{x}}{\textm{minimize}} & \begin{array}{c} \left\Vert \bm{A}\bm{x}-\bm{b}\right\Vert _{2}\end{array}\\ \textm{subject to} & \begin{array}[t]{l} \bm{C}\bm{x}=\bm{d}\\ \bm{l}\leq\bm{x}\leq\bm{u}. \end{array} \end{array} \] The corresponding code in different popular languages (e.g., Matlab, R, and Python) is almost identical and very simple.

  • Matlab:
cvx_begin
    variable x(n)
    minimize(norm(A * x - b, 2))
    subject to
        C * x == d
        l <= x
        x <= u
cvx_end         
  • R:
x <- Variable(n)
prob <- Problem(Minimize(cvxr_norm(A %*% x - b, 2)),
                list(C %*% x == d,
                     l <= x,
                     x <= u))
solve(prob)
  • Python:
x = cvxpy.Variable(n)
prob = cvxp.Minimize(cvxpy.norm(A @ x - b, 2),
                     [C @ x == d,
                      l <= x,
                      x <= u])
prob.solve()

References

Bertsekas, D. P. (1999). Nonlinear programming. Athena Scientific.
Boyd, S. P., and Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.
Fu, A., Narasimhan, B., and Boyd, S. (2020). CVXR: An R package for disciplined convex optimization. Journal of Statistical Software, 94(14), 1–34.
Grant, M., and Boyd, S. (2008). Graph implementations for nonsmooth convex programs. In V. Blondel, S. Boyd, and H. Kimura, editors, Recent advances in learning and control, pages 95–110. Springer-Verlag.
Grant, M., and Boyd, S. (2014). CVX: Matlab software for disciplined convex programming.
Löfberg, J. (2004). YALMIP: A toolbox for modeling and optimization in MATLAB. In Proceedings of the CACSD conference. Taipei, Taiwan.
Nemirovski, A. (2001). Lectures on modern convex optimization. In Society for industrial and applied mathematics (SIAM).
Nesterov, Y. (2018). Lectures on convex optimization. Springer.
Nesterov, Y., and Nemirovskii, A. (1994). Interior-point polynomial algorithms in convex programming. Philadelphia, PA: SIAM.
Nocedal, J., and Wright, S. J. (2006). Numerical optimization. Springer Verlag.
Sturm, J. F. (1999). Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software, 11(1-4), 625–653.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 58(1), 267–288.
Tütüncü, R. H., Toh, K. C., and Todd, M. J. (2003). Solving semidefinite-quadratic-linear programs using SDPT3. Mathematical Programming. Series B, 95, 189–217.

  1. GLPK: https://www.gnu.org/software/glpk/↩︎

  2. quadprog R version: https://cran.r-project.org/package=quadprog↩︎

  3. MOSEK: https://www.mosek.com/↩︎

  4. SeDuMi: https://sedumi.ie.lehigh.edu/; Matlab version at https://github.com/sqlp/sedumi↩︎

  5. SDPT3: https://blog.nus.edu.sg/mattohkc/softwares/sdpt3/, https://github.com/Kim-ChuanToh/SDPT3, https://github.com/sqlp/sdpt3↩︎

  6. Gurobi optimizer: https://www.gurobi.com/↩︎

  7. ECOS: https://github.com/embotech/ecos↩︎

  8. IBM CPLEX optimizer: https://www.ibm.com/analytics/cplex-optimizer↩︎

  9. The “big O” notation, \(\bigO(\cdot)\), measures the order of complexity. To be specific, we say that the complexity is \(\bigO\left(g(N)\right)\), as \(N\rightarrow\infty\), if there exists a positive real number \(M\) and \(N_0\) such that the complexity is upper-bounded by \(Mg(N)\) for all \(N\geq N_0\).↩︎

  10. YALMIP: https://yalmip.github.io/↩︎

  11. CVX: http://cvxr.com; CVX in R: https://cvxr.rbind.io/↩︎