18 References

Abatzoglou, T ., and B. O’Donnell. 1982. “Minimization by Coordinate Descent.” Journal of Optimization Theory and Applications 36: 163–74.

Auslender, A. 1970. “Une Méthode Générale pour la Décomposition et la Minimisation de Fonctions non Differentiables.” Comptes Rendus Académie Sciences Paris 271: 1078–81.

———. 1971. “Méthodes Numériques Pour La Décomposition et La Minimisation de Fonctions Non Differentiables.” Numerische Mathematik 18: 213–23.

Axelsson, O. 2010. “Milestones in the Development of Iterative Solution Methods.” Journal of Electrical and Computer Engineering. http://www.hindawi.com/journals/jece/2010/972794/.

Beck, A., and L. Tetruashvili. 1913. “On the Convergence of Block Coordinate Descent Type Methods.” SIAM Journal of Optimization 23 (4): 2037–60.

Benzi, M. 2009. “The Early History of Matrix Iterations: with a Focus on the Italian Contribution.” https://www.siam.org/meetings/la09/talks/benzi.pdf.

Berinde, V. 2007. Iterative Approximation of Fixed Points. Second Edition. Springer.

Bezdek, J. C., R. J. Hathaway, R. E. Howard, C. A. Wilson, and M. P. Windham. 1987. “Local Convergence Analysis of a Grouped Variable Version of Coordinate Descend.” Journal of Optimization Theory and Applications 54: 471–77.

Böhning, D., and B.G. Lindsay. 1988. “Monotonicity of Quadratic-approximation Algorithms.” Annals of the Institute of Statistical Mathematics 40 (4): 641–63.

Breiman, L., and J. H. Friedman. 1985. “Estimating Optimal Transformations for Multiple Regression and Correlation.” Journal of the American Statistical Association 80: 580–619.

Browne, M.W. 1987. “The Young-Householder Algorithm and the Least Squares Multdimensional Scaling of Squared Distances.” Journal of Classification 4: 175–90.

Bunch, J.R., and C.P. Nielsen. 1978. “Updating the Singular Value Decomposition.” Numerische Mathematik 31: 111–29.

Bunch, J.R., C.P. Nielsen, and D.C. Sorensen. 1978. “Rank-one Modification of the Symmetric Eigenproblem.” Numerische Mathematik 31: 31–48.

Céa, J. 1968. “Les Méthodes de ‘Descente’ dans la Theorie de l’Optimisation.” Revue Francaise d’Automatique, d’Informatique Et De Recherche Opérationelle 2: 79–102.

———. 1970. “Recherche Numérique d’un Optimum dans un Espace Produit.” In Colloquium on Methods of Optimization. Berlin, Germany: Springer-Verlag.

Céa, J., and R. Glowinski. 1973. “Sur les Méthodes d’Optimisation par Rélaxation.” Revue Francaise d’Automatique, d’Informatique Et De Recherche Opérationelle 7: 5–32.

Dax, A. 2003. “The Adventures of a Simple Algorithm.” Linear Algebra and Its Applications 361: 41–61.

De Leeuw, J ., and K. Sorenson. 2012. “Derivatives of the Procrustus Transformation with Applications.” http://www.stat.ucla.edu/~deleeuw/janspubs/2012/notes/deleeuw_sorenson_U_12b.pdf.

De Leeuw, J. 1968. “Nonmetric Discriminant Analysis.” Research Note 06-68. Department of Data Theory, University of Leiden. http://www.stat.ucla.edu/~deleeuw/janspubs/1968/reports/deleeuw_R_68d.pdf.

De Leeuw, J. 1975. “An Alternating Least Squares Approach to Squared Distance Scaling.” Department of Data Theory FSW/RUL.

———. 1977. “Applications of Convex Analysis to Multidimensional Scaling.” In Recent Developments in Statistics, edited by J.R. Barra, F. Brodeau, G. Romier, and B. Van Cutsem, 133–45. Amsterdam, The Netherlands: North Holland Publishing Company. http://www.stat.ucla.edu/~deleeuw/janspubs/1977/chapters/deleeuw_C_77.pdf.

———. 1982. “Generalized Eigenvalue Problems with Positive Semidefinite Matrices.” Psychometrika 47: 87–94. http://www.stat.ucla.edu/~deleeuw/janspubs/1982/articles/deleeuw_A_82b.pdf.

———. 1988. “Multivariate Analysis with Linearizable Regressions.” Psychometrika 53: 437–54. http://www.stat.ucla.edu/~deleeuw/janspubs/1988/articles/deleeuw_A_88a.pdf.

———. 1994. “Block Relaxation Algorithms in Statistics.” In Information Systems and Data Analysis, edited by H.H. Bock, W. Lenski, and M.M. Richter, 308–24. Berlin: Springer Verlag. http://www.stat.ucla.edu/~deleeuw/janspubs/1994/chapters/deleeuw_C_94c.pdf.

———. 2004. “Least Squares Optimal Scaling of Partially Observed Linear Systems.” In Recent Developments in Structural Equation Models, edited by K. van Montfort, J. Oud, and A. Satorra. Dordrecht, Netherlands: Kluwer Academic Publishers. http://www.stat.ucla.edu/~deleeuw/janspubs/2004/chapters/deleeuw_C_04a.pdf.

———. 2007a. “Derivatives of Generalized Eigen Systems with Applications.” Preprint Series 528. Los Angeles, CA: UCLA Department of Statistics. http://www.stat.ucla.edu/~deleeuw/janspubs/2007/reports/deleeuw_R_07c.pdf.

———. 2007b. “Minimizing the Cartesian Folium.” http://www.stat.ucla.edu/~deleeuw/janspubs/2007/notes/deleeuw_U_07e.pdf.

———. 2008. “Derivatives of Fixed-Rank Approximations.” Preprint Series 547. Los Angeles, CA: UCLA Department of Statistics. http://www.stat.ucla.edu/~deleeuw/janspubs/2008/reports/deleeuw_R_08b.pdf.

De Leeuw, J., and K. Lange. 2009. “Sharp Quadratic Majorization in One Dimension.” Computational Statistics and Data Analysis 53: 2471–84. http://www.stat.ucla.edu/~deleeuw/janspubs/2009/articles/deleeuw_lange_A_09.pdf.

De Leeuw, J., and G. Liu. 1993. “Majorization Algorithms for Mixed Model Analysis.” Preprint 115. Los Angeles, CA: UCLA Statistics. http://www.stat.ucla.edu/~deleeuw/janspubs/1993/reports/deleeuw_liu_R_93.pdf.

Delfour, M.C. 2012. Introduction to Optimization and Semidifferential Calculus. SIAM.

Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. “Maximum Likelihood from Incomplete Data via the EM algorithm (with Discussion).” Journal of the Royal Statistical Society Series B 39: 1–38.

Demyanov, V.F. 2010. “Nonsmooth Optimization.” In Nonlinear Optimization. Lectures Given at the c.I.M.E. Summer School Held in Cetraro, Italy, July 1-7, 2007., edited by G. Di Pillo and F. Schoen, 55–163. Lecture Notes in Mathematics 1989. Springer.

Dinkelbach, W. 1967. “On Nonlinear Fractional Programming.” Management Science 13: 492–98.

Dontchev, A.L., and R.T. Rockafellar. 2014. Implicit Functions and Solution Mappings. Second Edition. Springer.

D’Esopo, D. A. 1959. “A Convex Programming Procedure.” Naval Research Logistic Quarterly 6: 33–42.

(Ed), P. Huard. 1979. Point-to-set Maps and Mathematical Programming. Edited by P. Huard. Amsterdam, Netherlands: North Holland Publishing Company.

Elkin, R. M. 1968. “Convergence Theorems for Gauss-Seidel and Other Minimization Algorithms.” Technical Report 68-59. College Park, MD: Computer Sciences Center, University of Maryland.

Fiorot, J. Ch., and P. Huard. 1979. “Composition and Union of General Algorithms of Optimization.” Mathematical Programming Study 10: 69–85.

Floudas, P.M., C.A.and Pardalos, ed. 2009. “Dini and Hadamard Derivatives in Optimization.” In Encyclopedia of Optimization, Revised and expanded edition. Springer.

Forsythe, G.E. 1950. “Translation of C. F. Gauss, ``Brief an Gerling vom 26 Dec.1823".” MTAC 5: 255–58.

———. 1953. “Solving Linear Algebraic Equations Can Be Interesting.” Bulletin of the American Mathematical Society 59 (4): 299–329.

Forsythe, G.E., and G.H. Golub. 1965. “On the Stationary Values of a Second Degree Polynomial on the Unit Sphere.” Journal of the Society for Industrial and Applied Mathematics 13: 1050–68.

Gander, W. 1981. “Least Squares with a Quadratic Constraint.” Numerische Mathematik 36: 291–307.

Gifi, A. 1990. Nonlinear Multivariate Analysis. New York, N.Y.: Wiley.

Golub, G.H. 1973. “Some Modified Matrix Eigenvalue Problems.” SIAM Review 15: 318–34.

Groenen, P.J.F., P. Giaquinto, and H.A.L Kiers. 2003. “Weighted Majorization Algorithms for Weighted Least Squares Decomposition Models.” Econometric Institute Report EI 2003-09. Econometric Institute, Erasmus University Rotterdam. http://repub.eur.nl/pub/1700/.

Harman, H.H., and W.H. Jones. 1966. “Factor Analysis by Minimizing Residuals (MINRES).” Psychometrika 31: 351–68.

Hastie, T., and R. Tibshirani. 1990. Generalized Additive Models. London: Chapman; Hall.

Heiser, W.J. 1986. “A Majorization Algorithm for the Reciprocal Location Problem.” RR-86-12. Department of Data Theory, University of Leiden.

———. 1995. “Convergent Computing by Iterative Majorization: Theory and Applications in Multidimensional Data Analysis.” In Recent Advantages in Descriptive Multivariate Analysis, edited by W.J. Krzanowski, 157–89. Oxford: Clarendon Press.

Hildreth, C. 1957. “A Quadratic Programming Procedure.” Naval Research Logistic Quarterly 14 (79–85).

Hunter, D. R., and R. Li. 2005. “Variable Selection Using MM Algorithms.” The Annals of Statistics 33: 1617–42.

Jaakkola, T.S., and M. I. Jordan. 2000. “Bayesian Parameter Estimation via Variational Methods.” Statistics and Computing 10: 25–37.

Jacobi, C.G.J. 1845. “Über eine neue Auflösungsart der bei der Methode der kleinsten Quadrate vorkommenden linearen Gleichungen.” Astronomische Nachrichten 22: 297–306.

Jensen, S. T., S. Johansen, and S. L. Lauritzen. 1991. “Globally Convergent Algorithms for Maximizing a Likelihood Function.” Biometrika 78: 867–77.

Kato, T. 1976. Perturbation Theory for Linear Operators. Second Edition. Springer.

Kiers, H. 1990. “Majorization as a Tool for Optimizing a Class of Matrix Functions.” Psychometrika 55: 417–28.

Krantz, S.G., and H.R. Parks. 2003. The Implicit Function Theorem. Birkhäuser.

Kruskal, J.B. 1964a. “Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis.” Psychometrika 29: 1–27.

———. 1964b. “Nonmetric Multidimensional Scaling: a Numerical Method.” Psychometrika 29: 115–29.

———. 1965. “Analysis of Factorial Experiments by Estimating Monotone Transformations of the Data.” Journal of the Royal Statistical Society B27: 251–63.

Lange, K. 2013. Optimization. Second Edition. Springer Verlag.

———. 2016 (in press). MM Optimization Algorithms.

Lange, K., D.R. Hunter, and I. Yang. 2000. “Optimization Transfer Using Surrogate Objective Functions.” Journal of Computational and Graphical Statistics 9: 1–20.

Lawson, C.L., and R.J. Hanson. 1974. Solving Least Squares Problems. Prentice Hall.

Lipp, T., and S. Boyd. 2015. “Variations and Extension of the Convex–concave Procedure.” Optimization and Engineering, 1–25.

Magnus, J.R., and H. Neudecker. 1999. Matrix Differential Calculus with Applications in Statistics and Econometrics. Revised Edition. Wiley.

Mair, P., and J. De Leeuw. 2010. “A General Framework for Multivariate Analysis with Optimal Scaling: The R Package Aspect.” Journal of Statistical Software 32 (9): 1–23. http://www.stat.ucla.edu/~deleeuw/janspubs/2010/articles/mair_deleeuw_A_10.pdf.

Martinet, B., and A. Auslender. 1974. “Méthodes de Decomposition Pour La Minimisation d’une Fonction Sur Un Espace Produit.” SIAM Journal Control 12: 635–42.

Melman, A. 1995. “Numerical Solution of a Secular Equation.” Numerische Mathematik 69: 483–93.

———. 1997. “A Unifying Convergence Analysis of Second-Order Methods for Secular Equations.” Mathematics of Computation 66: 333–44.

———. 1998. “Analysis of Third-order Methods for Secular Equations.” Mathematics of Computation 67: 271–86.

Meng, X.L., and D.B. Rubin. 1993. “Maximum Likelihood Estimation via the ECM Algorithm: A General Framework.” Biometrika 80: 267–78.

Meyer, G. G. L. 1975. “A Systematic Approach to the Synthesis of Algorithms.” Numerische Mathematik 24: 277–89.

Meyer, R. R. 1976. “Sufficient Conditions for the Convergence of Monotonic Mathematical Programming Algorithms.” Journal of Computer and System Sciences 12: 108–21.

Mönnigmann, M. 2011. “Fast Calculation of Spectral Bounds for Hessian Matrices on Hyperrectangles.” SIAM Journalof Matrix Analysis and Applications 32: 1351–66.

Nesterov, Y., and B.T. Polyak. 2006. “Cubic Regularization of Newton Method and Its Global Performance.” Mathematical Programming A108: 177–205.

Oberhofer, W., and J. Kmenta. 1974. “A General Procedure for Obtaining Maximum Likelihood Estimates in Generalized Regression Models.” Econometrica 42: 579–90.

Ortega, J. M., and W. C. Rheinboldt. 1967. “Monotone Iterations for Nonlinear Equations with Application to Gauss-Seidel Methods.” SIAM Journal of Numerical Analysis 4: 171–90.

———. 1970a. Iterative Solution of Nonlinear Equations in Several Variables. New York, N.Y.: Academic Press.

———. 1970b. “Local and Global Convergence of Generalized Linear Iterations.” In Numerical Solution of Nonlinear Problems, edited by J. M. Ortega and W. C. Rheinboldt. Philadelphia, PA: Society of Inductrial; Applied Mathematics.

Ostrowski, A. M. 1966. Solution of Equations and Systems of Equations. New York, N.Y.: Academic Press.

Penot, J.-P. 2013. Calculus Without Derivatives. Springer.

Polak, E. 1969. “On the Convergence of Optimization Algorithms.” Revue Francaise d’Automatique, d’Informatique Et De Recherche Opérationelle 3: 17–34.

Powell, M. J. D. 1973. “On Search Directions for Minimization Algorithms.” Mathematical Programming 4: 193–201.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. {http://www.R-project.org}.

Rockafellar, R.T. 1970. Convex Analysis. Princeton University Press.

Saad, Y., and H.A. Van der Vorst. 2000. “Iterative Solution of Linear Systems in the 20th Century.” Journal of Computational and Applied Mathematics 123: 1–3.

Saha, A., and A. Tewari. 2013. “On the Nonasymptotic Convergence of Cyclic Coordinate Descent Methods.” SIAM Journal of Optimization 23: 576–601.

Schechter, S. 1962. “Iteration Methods for Nonlinear Problems.” Transactions American Mathematical Society 104: 179–89.

———. 1968. “Relaxation Methods for Convex Problems.” SIAM Journal Numerical Analysis 5: 601–12.

———. 1970. “Minimization of a Convex Function by Relaxation.” In Integer and Nonlinear Programming, edited by J. Abadie. Amsterdam, Netherlands: North Holland Publishing Company.

Schirotzek, W. 2007. Nonsmooth Analysis. Springer.

Seidel, L. 1874. “Über ein Verfahren, die Gleichungen, auf welche die Methode der kleinsten Quadrate führt, sowie lineäre Gleichungen ueberhaupt, durch successive Annäherung aufzulösen.” Abhandlungen Der Mathematisch-Physikalischen Klasse Der Königlich Bayerischen Akademie Der Wissenschaften 11, III Abtheilung: 81–108.

Smart, D.R. 1974. Fixed Point Theorems. Cambridge Tracts in Mathematics 66. Cambridge University Press.

Spall, J.C. 2012. “Cyclic Seesaw Process for Optimization and Identification.” Journal of Optimization Theory and Applications 154: 187–208.

Spivak, M. 1965. Calculus on Manifolds. Westview Press.

Spjøtvoll, E. 1972. “A Note on a Theorem by Forsythe and Golub.” SIAM Joural of Applied Mathematics 23: 307–11.

Sriperumbudur, B.K., and G.R.G. Lanckriet. 2012. “A Proof of Convergence of the Concave-Convex Procedure Using Zangwill’s Theory.” Neural Computation 24: 1391–1407.

Takane, Y. 1977. “On the Relations among Four Methods of Multidimensional Scaling.” Behaviormetrika 4: 29–42.

Takane, Y., F.W. Young, and J. De Leeuw. 1977. “Nonmetric Individual Differences in Multidimensional Scaling: An Alternating Least Squares Method with Optimal Scaling Features.” Psychometrika 42: 7–67. http://www.stat.ucla.edu/~deleeuw/janspubs/1977/articles/takane_young_deleeuw_A_77.pdf.

Thomson, G.H. 1934. “Hotelling’s Method Modfiied to Give Spearman’s g.” Journal of Educational Psychology 25: 366–74.

Van der Burg, E., and J. De Leeuw. 1983. “Non-Linear Canonical Correlation.” British Journal of Mathematical and Statistical Psychology 36: 54–80. http://www.stat.ucla.edu/~deleeuw/janspubs/1983/articles/vanderburg_deleeuw_A_83.pdf.

Van der Heijden, P.G.M., and K. Sijtsma. 1996. “Fifty Years of Measurement and Scaling in the Dutch Social Sciences.” Statistica Neerlandica 50: 111–35.

Van Ruitenburg, J. 2005. “Algorithms for Parameter Estimation in the Rasch Model.” Measurement and Research Department Reports 2005-04. Arnhem, Netherlands: CITO.

Varga, R.S. 1962. Matrix Iterative Analysis. Englewood Cliffs: Prentice Hall.

Von Mises, R., and H. Pollackzek-Geiringer. 1929. “Practische Verfahren der Gleichungs-auflösung.” Zeitschrift Für Angewandte Mathematik Und Mechanik 9: 58–79 and 152–64.

Voss, H., and U. Eckhardt. 1980. “Linear Convergence of Generalized Weiszfeld’s Method.” Computing 25: 243–51.

Wainer, H., A. Morgan, and J.E. Gustafsson. 1980. “A Review of Estimation Procedures for the Rasch Model with an Eye toward Longish Tests.” Journal of Educational Statistics 5: 35–64.

Weiszfeld, E. 1937. “Sur le Point par lequel la Somme des Distances de n Points Donnés est Minimum.” Tohoku Mathematics Journal 43: 355–86.

Weiszfeld, E., and F. Plastria. 2009. “On the Poiont for Which the Sum of the Distances to N Given Points Is Minimum.” Annals of Operations Research 167: 7–41.

Wilkinson, J.H. 1965. The Algebraic Eigenvalue Problem. Clarendon Press.

Wright, S. 2015. “Coordinate Descent Algorithms.” Mathematical Programming, Series B 151: 3–34.

Xie, Y. 2015. Dynamic Documents with R and knitr. Second Edition. CRC Press.

Yayes, F. 1933. “The Analysis of Replicated Experiments when the Field Results are Incomplete.” Empirical Journal of Experimental Agriculture 1: 129–42.

Yen, E.-H., N. Peng, P.-W. Wang, and S.-D. Lin. 2012. “On Convergence Rate of Concave-Convex Procedure.” In Paper Presented at 5th NIPS Workshop on Optimization for Machine Learning, Lake Tahoe, December 8 2012. http://opt-ml.org/oldopt/papers/opt2012_paper_10.pdf.

Young, D.M. 1971. Iterative Solution of Large Linear Systems. Academic Press.

———. 1990. “A Historical Review of Iterative Methods.” In A History of Scientific Computing, edited by S.G. Nash, 180–94. Addison-Wesley.

Young, F.W., J. De Leeuw, and Y. Takane. 1980. “Quantifying Qualitative Data.” In Similarity and Choice. Papers in Honor of Clyde Coombs, edited by E.D. Lantermann and H. Feger. Bern: Hans Huber. http://www.stat.ucla.edu/~deleeuw/janspubs/1980/chapters/young_deleeuw_takane_C_80.pdf.

Yuille, A.L., and A. Rangarajan. 2003. “The Concave-Convex Procedure.” Neural Computation 15: 915–36.

Zangwill, W. I. 1969. Nonlinear Programming: a Unified Approach. Englewood-Cliffs, N.J.: Prentice-Hall.