# Multivariate Analysis with Optimal Scaling

*Jan de Leeuw*

*Patrick Mair*

*Patrick Groenen*

*Version 0.078, September 26 2016*

# 1 Preface

**Note: The directory gifi.stat.ucla.edu/gifi has the complete Rmd file of this book with all code chunks. It also has pdf and epub versions, and R and C files with the code. The book and the files that go with it are in the public domain. Suggestions for improvement are always welcome.**

In 1980 members of the Department of Data Theory at the University of Leiden taught a post-doctoral course in Nonlinear Multivariate Analysis. The course content was sort-of-published, in Dutch, as Gifi (1980). The course was repeated in 1981, and this time the sort-of-published version (Gifi (1981)) was in English.

The preface gives some details about the author.

The text is the joint product of the members of the Department of Data Theory of the Faculty of Social Sciences, University of Leiden. ‘Albert Gifi’ is their ‘nom de plume’. The portrait, however, of Albert Gifi shown here, is that of the real Albert Gifi to whose memory this book is dedicated, as a far too late recompense for his loyalty and devotion, during so any years, to the Cause he served.

Roughly ten years later a revised version of these course notes came out as an actual book in the *Wiley Series in Probabilty and Mathematical Statistics* (Gifi (1990)). This despite the fact that the contents of the book had very little to do with either probability or mathematical statistics. The book is organized around a series of computer programs for correspondence analysis, principal component analysis, and canonical analysis. The programs, written in FORTRAN, are called HOMALS, PRINCALS, PRIMALS, CRIMINALS, CANALS, OVERALS because they combine classical linear multivariate analysis with optimal transformation of the variables, using alternating least squares (or ALS). It serves, to some extent, as a manual for the programs, but it also discusses the properties of the techniques implemented in the programs, and it presents many detailed applications of these techniques.

Reviewers generally had some difficulties separating the wheat from the chaff.

As the spirit of Albert Gifi has faded away, so has his whimsical approach to publishing, and his latest book is an idiosyncratic account of multivariate methods developed by the Leiden group during the 1970s. The names of their computer programs are distinguished by the ending ~ALS, thus we have OVERALS, PRINCALS, HOMALS, CANALS, MORALS, MANOVALS, CRIMINALS, PARTALS and PATHALS. Perhaps if you have a warped mind like this reviewer, you will turn rapidly to CRIMINALS. What can it be ? Surely it must give some illicit view of the truth about the world, a vision of the underworld of multivariate analysis ? Alas no ! It turns out only to be a synonym of Canonical Variate Analysis, sometimes known as Multiple Discriminant Analysis. Likewise HOMALS turns out to be Reciprocal Averaging, otherwise known as Correspondence Analysis. (Hill (1990))

This ambiguity and confusion are not too surprising. The Gifi book was a summary of the work of a large number of people, over a period of almost 20 years. Nevertheless, and perhaps because of this, it is somewhat of a *camel*, which we define for our purposes as a *horse designed by a committee*. Different chapters had different authors, and the common ideas behind the various techniques were not always clearly explained.

In Gifi’s MVA the criterion called “meet” loss plays a central role. Although the adoption of this criterion is one of the most important contributions of Gifi, the book would have been much more readable if this criterion had been introduced right at the outset and was followed throughout the rest of the book. (Takane (1992))

Nevertheless there is much original material in Gifi (1990), and the book has early applications of alternating least squares, majorization, coordinate descent, the delta method, and the bootstrap. And it emphasizes throughout the idea that statistics is about techniques, not about models. But, yes, the organization leaves much to be desired. An on demand printing of the first and only edition is now available on Amazon for $ 492 – although of course used versions go for much less.

The book was published by a prestiguous publisher in a prestiguous series, but it is fair to say it never really caught on. It is not hard to understand why. The content, and the style, are unfamiliar to statisticians and mathematicians. There is no inference, no probability, and very little rigor. The content is in multivariate data analysis, which would be most at home these days, if anywhere, in a computer science department. The Gifi group did not have the resources of, say, Benzécri in France or Hayashi in Japan. The members were mostly active in psychometrics, a small and insular field, and they were from The Netherlands, a small country prone to overestimate its importance (Marvell (1653)). They also did not have the evangelical zeal necessary for creating and sustaining a large impact.

There have been some other major publication events in the Gifi saga. Around the same time as the Wiley book there was the publication of SPSS (1989). Starting in the late seventies the Gifi FORTRAN programs had been embedded in the SPSS system. The *SPSS Categories* manual was updated many times, in fact every time SPSS or IBM SPSS had a new release. Over the years other programs produced by the Department of Data Theory were added. A recent version is, for example, Meulman and Heiser (2012), corresponding to IBM SPSS 21. It acknowledges the contributions of some of the members of the Gifi team – but in IBM (2015), the version for IBM SPSS 23, these acknowledgements and the names of the authors have disappeared. Sic transit gloria mundi.

Michailidis and De Leeuw (1998) made an attempt to make the Gifi material somewhat more accessible by publishing a review article in a widely read mainstream statistical journal. Another such attempt is De Leeuw and Mair (2009a), in which the homals package for R is introduced. The homals package is basically a single monolithic R function that can do everything the Gifi programs can do, and then some. In both cases, however, the problem remained that the techniques, and the software, were too convoluted and too different from what both statisticians and users were accustomed to.

Van der Heijden and Van Buuren (1916) give an excellent, though somewhat wistful, historic overview of the Gifi project. It is too early for eulogies, however, and we refuse to give up. This book is yet another reorganization of the Gifi material, with many extensions. Right from the outset, and throughout the book, it uses the *aspect* approach to multivariate analysis with optimal scaling, which was presented first by De Leeuw (1988b) and subsequently implemented in R by Mair and De Leeuw (2010). The algorithms are generally different from the homals algorithm in De Leeuw and Mair (2009a), because they are based on majorization (De Leeuw (1994), De Leeuw (2015a), Lange (2016)) and not on alternating least squares. A particular aspect, first defined in De Leeuw (2004), is used to connect our work with the earlier Gifi project. We separate the basic computational engine from its various applications that define the techniques of *Multivariate Analysis with Optimal Scaling (MVAOS)*. Hiding the core makes it possible to make the programs behave in much the same way as traditional MVA programs. The software is written in R (R Core Team (2016)), with some parts of the computational engine written in C.

The book itself is written in Rmarkdown, using bookdown (Xie (2016)) and knitr (Xie (2015)) to embed the computations and graphics, and to produce html and pdf versions that are completely reproducible. The book and all the files that go with it are in the public domain.

We would like to acknowledge those who have made substantial contributions to the Gifi project (and its immediate ancestors and offspring) over the years. Some of them are lost in the mists of time, some of them are no longer on this earth. They are, in alphabetical order, Bert Bettonvil, Jason Bond, Catrien Bijleveld, Frank Busing, Jacques Commandeur, Henny Coolen, Steef de Bie, Jan de Leeuw, John Gower, Patrick Groenen, Chris Haveman, Willem Heiser, Abby Israels, Judy Knip, Jan Koster, Pieter Kroonenberg, Patrick Mair, Adriaan Meester, Jacqueline Meulman, George Michailidis, Peter Neufeglise, Dré Nierop, Ineke Stoop, Yoshio Takane, Stef van Buuren, John van de Geer, Gerda van den Berg, Eeke van der Burg, Peter van der Heijden, Anita van der Kooij, Ivo van der Lans, Rien van der Leeden, Jan van Rijckevorsel, Renée Verdegaal, Peter Verboon, Susañña Verdel, and Forrest Young.

### References

Gifi, A. 1980. *Niet-Lineaire Multivariate Analyse [Nonlinear Multivariate Analysis]*. Leiden, The Netherlands: Department of Data Theory FSW/RUL. http://www.stat.ucla.edu/~deleeuw/janspubs/1980/books/gifi_B_80.pdf.

Gifi, A. 1981. *Nonlinear Multivariate Analysis*. Leiden, The Netherlands: Department of Data Theory FSW/RUL. http://www.stat.ucla.edu/~deleeuw/janspubs/1981/books/gifi_B_81.pdf.

Gifi, A. 1990. *Nonlinear Multivariate Analysis*. New York, N.Y.: Wiley.

Hill, M.O. 1974. “Correspondence Analysis: a Neglected Multivariate Method.” *Applied Statistics* 23: 340–54.

*Journal of Ecology*78 (4): 1148–9.

Takane, Y. 1992. “Review of Albert Gifi, Nonlinear Multivariate Analysis.” *Journal of the American Statistical Association* 87: 587–88.

Marvell, A. 1653. “The Character of Holland.” https://ebooks.adelaide.edu.au/m/marvell/andrew/poems/poem58.html.

SPSS. 1989. *SPSS Categories*. SPSS Inc.

Meulman, J.M., and W.J. Heiser. 2012. *IBM Spss Categories 21*. IBM Corporation.

IBM. 2015. *IBM Spss Categories 23*. IBM Corporation.

Michailidis, G., and J. De Leeuw. 1998. “The Gifi System for Descriptive Multivariate Analysis.” *Statistical Science* 13: 307–36. http://www.stat.ucla.edu/~deleeuw/janspubs/1998/articles/michailidis_deleeuw_A_98.pdf.

De Leeuw, J., and P. Mair. 2009a. “Homogeneity Analysis in R: the Package `homals`

.” *Journal of Statistical Software* 31 (4): 1–21. http://www.stat.ucla.edu/~deleeuw/janspubs/2009/articles/deleeuw_mair_A_09a.pdf.

Van der Heijden, P.G.M., and S. Van Buuren. 1916. “Looking Back at the Gifi System of Nonlinear Multivariate Analysis.” *Journal of Statistical Software* 73 (4).

De Leeuw, J. 1988b. “Multivariate Analysis with Optimal Scaling.” In *Proceedings of the International Conference on Advances in Multivariate Statistical Analysis*, edited by S. Das Gupta and J.K. Ghosh, 127–60. Calcutta, India: Indian Statistical Institute. http://www.stat.ucla.edu/~deleeuw/janspubs/1988/chapters/deleeuw_C_88b.pdf.

Mair, P., and J. De Leeuw. 2010. “A General Framework for Multivariate Analysis with Optimal Scaling: The R Package aspect.” *Journal of Statistical Software* 32 (9): 1–23. http://www.stat.ucla.edu/~deleeuw/janspubs/2010/articles/mair_deleeuw_A_10.pdf.

De Leeuw, J. 1994. “Block Relaxation Algorithms in Statistics.” In *Information Systems and Data Analysis*, edited by H.H. Bock, W. Lenski, and M.M. Richter, 308–24. Berlin: Springer Verlag. http://www.stat.ucla.edu/~deleeuw/janspubs/1994/chapters/deleeuw_C_94c.pdf.

De Leeuw, J. 2015a. *Block Relaxation Algorithms in Statistics*. Bookdown. https://bookdown.org/jandeleeuw6/bras/.

Lange, K. 2016. *MM Optimization Algorithms*. SIAM.

De Leeuw, J. 2004. “Least Squares Optimal Scaling of Partially Observed Linear Systems.” In *Recent Developments in Structural Equation Models*, edited by K. van Montfort, J. Oud, and A. Satorra. Dordrecht, Netherlands: Kluwer Academic Publishers. http://www.stat.ucla.edu/~deleeuw/janspubs/2004/chapters/deleeuw_C_04a.pdf.

R Core Team. 2016. *R: A Language and Environment for Statistical Computing*. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Xie, Y. 2016. *Bookdown: Authoring Books with R Markdown.* https://github.com/rstudio/bookdown.

Xie, Y. 2015. *Dynamic Documents with R and Knitr*. Second Edition. CRC Press.