6 Algorithm

6.1 Block Relaxation

Our task is to minimize σ(H,A) over H and A, suitably constrained. Write the constraints as HH and AA. The strategy we use is block relaxation (De Leeuw (2015a)). Thus we iterate as follows.

  1. Set k=0 and start with some H(0).
  2. A(k)argminAA σ(H(k),A).
  3. H(k+1)argminHH σ(H,A(k)).
  4. If converged stop. Else kk+1 and go to step 1.

It is assumed that step 1, updating A for given H, can be carried out simply by some form of linear least squares. We assume that for each there is at least one j such that Aj=I. Note that this is the case for MLR, PCA, EFA, and for all Gifi Systems.

Step 2 is somewhat more intricate, because of the cone restrictions. In partitioned form we can write the loss function as σ(H,A)=mi=1tr Himj=1HjL=1AjAi

Bij(A)=L=1AjAi

6.2 Majorization

tr HHG=tr (˜H+(H˜H))(˜H+(H˜H))Gtr ˜H˜HG+2tr ˜H(H˜H)G

tr H˜HG(˜H)

6.3 Alternating Least Squares

The standard way to minimize loss function (???) is implemented in the OVERALS program (Van der Burg, De Leeuw, and Verdegaal 1988, Meulman and Heiser (2012)). It is also the one used in the homals package (De Leeuw and Mair 2009a).

In this paper the algorithm is different because we use the loss function (???). We still use ALS, which means in this case that we cycle through three substeps in each iteration. We update A for given X and H, we then update X for given H and A, and finally we update H for given X and A. Algorithm A goes as follows.

  1. Set k=0 and start with some X(0),H(0),A(0).
  2. X(k+1)=ortho(center(H(k)A(k)).
  3. For j=1,,m compute A(k+1)j={H(k)j}+X(k+1).
  4. For j=1,,m and s=1,pj compute h(k+1)js=projKjsS((X(k+1)t<sh(k+1)jt{a(k+1)jt}t>sh(k)jt{a(k+1)jt})a(k+1)s).
  5. If converged stop. Else kk+1 and go to step 1.

In step 1 we use superscript + for the Moore-Penrose inverse. In step 2 the center operator does column centering, the ortho operator finds an orthonormal basis for the column space of its argument.

The complicated part is step 4, the optimal scaling, i.e. the updating of Hj for given X and Aj. We cycle through the variables in the block, each time projecting a single column on the cone of admissible transformations of the variable, and then normalizing the projection to length one. The target, i.e. the vector we are projecting, is complicated, because the other variables in the same block must be taken into account.

In order to simplify the optimal scaling computations within an iteration we can use majorization (De Leeuw 1994, Heiser (1995), Lange, Hunter, and Yang (2000), De Leeuw (2015a)). This has the additional benefit that the optimal scaling step becomes embarassingly parallel. We expand the loss for block j around a previous solution ˜Hj. SSQ(XHjAj)=SSQ(X˜HjAj)2tr (Hj˜Hj)(X˜HjAj)Aj+tr Aj(Hj˜Hj)(Hj˜Hj)Aj. Now tr (Hj˜Hj)AjAj(Hj˜Hj)κj tr (Hj˜Hj)(Hj˜Hj), where κj is the largest eigenvalue of AjAj. Thus SSQ(XHjAj)SSQ(X˜HjAj)+κj SSQ(HjUj)1κj SSQ((X˜HjAj)Aj), where Uj is the target Uj=˜Hj+1κj(X˜HjAj)Aj. It follows we can update the optimal scaling of the variables by projecting the columns of Uj on their respective cones and then normalizing. See De Leeuw (1975) for results on normalized cone regression. This can be done for all variables in the block separately, without taking any of the other variables in the block (or in any of the other blocks) into account. Thus the optimal scaling is easy to parallellize. The resulting algorithm B is as follows.

  1. Set k=0 and start with some X(0),H(0),A(0).
  2. X(k+1)=ortho(center(H(k)A(k)).
  3. For j=1,,m compute A(k+1)j={H(k)j}+X(k+1).
  4. For j=1,,m compute U(k+1)j=H(k)j+1κj(X(k+1)H(k)jA(k+1)j){A(k+1)j} and for s=1,pj compute h(k+1)js=projKjsS(u(k+1)js).
  5. If converged stop. Else kk+1 and go to step 1.

6.4 Implementation Details

If we follow the ALS strategy strictly the ortho() operator should be implemented using Procrustus rotation (Gibson 1962). Thus if Z=KΛL is the singular value decomposition of X, then ortho(Z)=KL. Note, however, that any other basis for the column space of Z merely differs from the Procrustus basis by a rotation. And this rotation matrix will carry unmodified into the upgrade of Aj in step 2 of the algorithm, and thus after steps 1 and 2 the loss will be the same, no matter whch rotation we select. In our algorithm we use the QR decomposition to find the basis, using the Gram-Schmidt code from De Leeuw (2015c).

In actual computation we column-center the basis and compute a full rank QR decomposition, using the code in De Leeuw (2015c). Thus G=QR,

We implement the cone restrictions by the constraints hjs=Gjszs in combination with Tjshjs0. Thus the transformed variables must be in the intersection of the subspace spanned by the columns of the transformation basis Gjs and the polyhedral convex cones of all vectors h such that Tjsh0. We suppose that all columns of the Gjs add up to zero, and we require, in addition, the normalization SSQ(hjs)=1.

We use the code described in De Leeuw (2015b) to generate B-spline bases. Note that for coding purposes binary indicators are B-splines of degree zero, while polynomials are B-splines without interior knots. We include the utility functions to generate lists of knots. There is knotsQ() for knots at the quantiles, knotsR() for knots equally spaced on the range, knotsD() for knots at the data points, and knotsE() for no interior knots. Also note that binary indicators can be created for qualitative non-numerical variables, for which B-splines are not defined. We have added the option using degree -1 to bypass the B-spline code and generate an indicator matrix, using the utility makeIndicator(). Note that 'makeIndicator(foo) is equivalent to bsplineBasis(foo, degree = 0, innerknots = sort(unique(foo))). Throughout we first orthonormalize the basis matrices Gjs, using the Gram-Schmidt code from De Leeuw (2015c).

The matrices Tjs in the homogeneous linear inequality restrictions that define the cones Kjs can be used to define monotonicity or convexity of the resulting transformations. In the current implementation we merely allow for monotonicity, which means the Tjs do not have to be stored. The transformations for each variable can be restricted to be increasing, or they can be unrestricted. By using splines without interior knots we allow in addition for polynomial transformations, which again can be restricted to be either monotonic or not. Note that it is somewhat misleading to say we are fitting monotone splines or polynomials, we are mainly requiring monotonicity at the data points.

If there are multiple copies of a variable in a block then requiring the transformation to be ordinal means that we want the transformation of the first copy to be monotonic. The transformations of the other copies are not constrained to be monotonic. If you want all copies to be transformed monotonically, you have to explicitly introduce them as separate variables.

For variables with copies there is yet another complication. For copies we have HjAj=Gj(ZjAj)=GjYj. If we require monotonicity in MVAOS we constrain a column of Hj (in fact, the first one) to be monotonic. In classic Gifi, in which the Gj are binary indicators, we constrain the first column of Yj, which automatically implies the first column of GjYj is monotonic as well. In previous Gifi work with B-splines, we also constrained the first column of Yj, which again implied the first column of GjYj was monotnic as well. But in our current MVAOS implementation monotonicity of the first column of Hj does not imply monotonicity of the first column of HjAj, even if the basis Gj is a binary indicator. This discrepancy between the old and the new Gifi only comes into play for ordinal variables with multiple copies.

Missing data are incorporated in the definition of the cones of transformations by using a Gjs which is the direct sum of a spline basis for the non-missing and an identity matrix for the missing data. This is called missing data multiple in Gifi (1990). There are no linear inequality restrictions on the quantifications of the missing data.

6.5 Wrappers

The homals() implementation in De Leeuw and Mair (2009a) is a single monolithic program in R, which specializes to the various MVAOS techniques by a suitable choice of its parameters. This approach has some disadvantages. If we want principal component analysis, we already know all blocks are singletons. If we want multiple correspondence analysis we know each variable has p copies. If we want multiple regression, we know there are two blocks, and one is a singleton. So it is somewhat tedious to specify all parameters all of the time. Also, some of the output, graphical and otherwise, is specific to a particular technique. For regression we want residuals and fitted values, in canonical analysis we want block scores and loadings. And, more generally, we may want the output in a form familiar from the classical MVA techniques. It is indeed possible to transform the homals() output to more familar forms (De Leeuw (2009)), but this requires some extra effort.

In this book we go back to the original approach of Gifi (1990) and write separate programs for nonlinear versions principal component analysis, multiple regression, canonical analysis, discriminant analysis, and so on.

These programs, now written in R and no longer in FORTRAN, are wrappers for the main computational core, the program gifiEngine(). The wrappers, which have the familiar names morals(), corals(), princals(), homals(), criminals(), overals(), primals(), and canals(), create a gifi object from the data and parameters, and then pass this to gifiEngine(). Computations are itereated to convergence, and result are stored in a xGifi object. Then the output is transformed to a format familiar from the corresponding technique from classical MVA. Each wrapper foo returns a structure of class foo.

This modular approach saves code, because both makeGifi() and gifiEngine() are common to all programs. It also makes it comparatively easy to add new wrappers not currently included, possibly even contributed by others.

Although we like the above quotation from Hill (1990), it is not quite accurate. Our current generation of wrappers can use B-spline bases, it can use an arbitrary number of copies of a variable, and each copy can be either categorical, ordinal, polynomial, or splinical. Thus, even more so than the original gifi programs, we have a substantial generalization of the classical techniques, not merely a sequence of synonyms.

6.6 Structures

The computations are controlled by the arguments to the wrappers. These arguments are used to construct three structures: the gifi, the gifiBlock, and the gifiVariable. A gifi is just a list of gifiBlocks, and a gifiBlock is a list of gifiVariables. This reflects the partitioning of the variables into blocks. A gifiVariable contains a great deal of information about the variable. The function makeGifiVariable() is a constructor that returns a structure of class gifiVariable. The contents of a gifiVariable remain the same throughout the computations.

    return (structure (
      list (
        data = data,
        basis = basis,
        qr = qr,
        copies = copies,
        degree = degree,
        ties = ties,
        missing = missing,
        ordinal = ordinal,
        active = active,
        name = name,
        type = type
      ),
      class = "gifiVariable"
    ))

There are three corresponding structures containing initial and intermediate results, and eventually output, the xGifi, xGifiBlock, and xGifiVariable. Again, an xGifi is a list of xGifiBlocks, and an xGifiBlock is a list of xGifiVariables. The constructor for an xGifiVariable returns an object of class xGifiVariable, which contains the elements that are updated in each iteration during the computations. There is an xGifiVariable for both active and passive variables.

 return (structure (
    list(
      transform = transform,
      weights = weights,
      scores = scores,
      quantifications = quantifications
    ),
    class = "xGifiVariable"
  ))

References

De Leeuw, J. 2015a. Block Relaxation Algorithms in Statistics. Bookdown. https://bookdown.org/jandeleeuw6/bras/.

Van der Burg, E., J. De Leeuw, and R. Verdegaal. 1988. “Homogeneity Analysis with K Sets of Variables: An Alternating Least Squares Approach with Optimal Scaling Features.” Psychometrika 53: 177–97. http://www.stat.ucla.edu/~deleeuw/janspubs/1988/articles/vanderburg_deleeuw_verdegaal_A_88.pdf.

Meulman, J.M., and W.J. Heiser. 2012. IBM Spss Categories 21. IBM Corporation.

De Leeuw, J., and P. Mair. 2009a. “Homogeneity Analysis in R: the Package homals.” Journal of Statistical Software 31 (4): 1–21. http://www.stat.ucla.edu/~deleeuw/janspubs/2009/articles/deleeuw_mair_A_09a.pdf.

De Leeuw, J. 1994. “Block Relaxation Algorithms in Statistics.” In Information Systems and Data Analysis, edited by H.H. Bock, W. Lenski, and M.M. Richter, 308–24. Berlin: Springer Verlag. http://www.stat.ucla.edu/~deleeuw/janspubs/1994/chapters/deleeuw_C_94c.pdf.

Heiser, W.J. 1995. “Convergent Computing by Iterative Majorization: Theory and Applications in Multidimensional Data Analysis.” In Recent Advantages in Descriptive Multivariate Analysis, edited by W.J. Krzanowski, 157–89. Oxford: Clarendon Press.

Lange, K., D.R. Hunter, and I. Yang. 2000. “Optimization Transfer Using Surrogate Objective Functions.” Journal of Computational and Graphical Statistics 9: 1–20.

De Leeuw, J. 1975. “A Normalized Cone Regression Approach to Alternating Least Squares Algorithms.” Department of Data Theory FSW/RUL. http://www.stat.ucla.edu/~deleeuw/janspubs/1975/notes/deleeuw_U_75a.pdf.

Gibson, W.A. 1962. “On the Least Squares Orthogonalization of an Oblique Transformation.” Psychometrika 27: 193–95.

De Leeuw, J. 2015c. “Exceedingly Simple Gram-Schmidt Code.” http://rpubs.com/deleeuw/84866.

De Leeuw, J. 2015b. “Exceedingly Simple B-spline Code.” http://rpubs.com/deleeuw/79161.

Gifi, A. 1990. Nonlinear Multivariate Analysis. New York, N.Y.: Wiley.

De Leeuw, J. 2009. “Regression, Discriminant Analysis, and Canonical Analysis with homals.” Preprint Series 562. Los Angeles, CA: UCLA Department of Statistics. http://www.stat.ucla.edu/~deleeuw/janspubs/2009/reports/deleeuw_R_09c.pdf.

Hill, M.O. 1974. “Correspondence Analysis: a Neglected Multivariate Method.” Applied Statistics 23: 340–54.

1990. “Review of A. Gifi, Multivariate Analysis.” Journal of Ecology 78 (4): 1148–9.