Section 9 Canonical Correlation Analysis

We saw in the Linear Predictors Section that the ability of a linear predictor to explain the variation in a response variable is measured by the coefficient of determination. The best linear predictor explains the most variation in the response \(Y\) and maximises the absolute value of its correlation coefficient with \(Y\). The upper limit on the correlation that can be achieved is the Multiple Correlation Coefficient \(\rho_{Y(Z)}\).

Canonical Correlation Analysis is an extension of this approach. A key difference is that it takes linear combinations of both the “response” \(\underset{(p \times 1)}{X^{(1)}}\) and “explanatory” \(\underset{(q \times 1)}{X^{(1)}}\) sets of variables. In doing so, it finds transformed co-ordinates in which the correlations are higher than any individual entry in \(\underset{(p \times q)}{\Sigma_{12}}\). It helps answer questions like what is the maximum proportion of variation in data set (1) that can be explained by data set (2) and vice-versa?

Linear Analysis Approach

The key definition is given below: (see Johnson, Wichern, and others (2014), Section 9.3, page 490)

Proposition 9.1 (Identifying Canonical Correlates) Suppose \(p \leq q\) and let the random vectors \(\underset{(p \times 1)}{X^{(1)}}\) and \(\underset{(q \times 1)}{X^{(2)}}\) have \(Cov(X^{(1)})=\underset{(p \times p)}{\Sigma_{11}}\), \(Cov(X^{(2)})=\underset{(q \times q)}{\Sigma_{22}}\) and \(Cov(X^{(1)}, X^{(2)})=\underset{(p \times q)}{\Sigma_{12}}\), where \(\Sigma\) has full rank. For coefficient vectors \(\underset{(p \times 1)}{a}\) and \(\underset{(q \times 1)}{b}\), form the linear combinations \(U=b^´X^{(1)}\) and \(V=b^´X^{(2)}\). Then:

\[\underset{a,b}{max}Corr(U,V)=\rho_{1}^*\]

is attained by the linear combinations (first canonical variate pair):

\[\begin{align} U_{1}&=e_{1}^´\Sigma_{11}^{-0.5}X^{(1)}=a_{1}^´X^{(1)}\\ V_{1}&=f_{1}^´\Sigma_{22}^{-0.5}X^{(2)}=b_{1}^´X^{(2)} \end{align}\]

where \(a_{1}^´:=e_{1}^´\Sigma_{11}^{-0.5}\) and \(b_{1}^´:=f_{1}^´\Sigma_{22}^{-0.5}\).

The kth pair of canonical variates k=2,3,…,p:

\[\begin{align} U_{k}&=e_{k}^´\Sigma_{11}^{-0.5}X^{(1)}=a_{k}^´X^{(1)}\\ V_{k}&=f_{k}^´\Sigma_{22}^{-0.5}X^{(2)}=b_{k}^´X^{(2)} \end{align}\]

maximises \(Corr(U_{k},V_{k})=\rho_{K}^*\) among those linear combinations uncorrelated with the proceeding 1,2,…,k-1 canonical variables.

Here \(\rho_{1}^* \geq \rho_{2}^* ... \geq \rho_{p}^*\) are the eigenvalues of \(\Sigma_{11}^{-0.5}\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}\Sigma_{11}^{-0.5}\) and \(e_{1},...,e_{p}\) are the associated (p x 1) eigenvectors. The quantities \(\rho_{1}^* \geq \rho_{2}^* ... \geq \rho_{p}^*\) are also the p largest eigenvalues of the matrix \(\Sigma_{22}^{-0.5}\Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}\Sigma_{22}^{-0.5}\) with corresponding (q x 1) eigenvectors \(f_{1},...,f_{p}\).

\(\square\)

Geometrical Approach

The geometric insight (see Johnson, Wichern, and others (2014), page 549) is that transforming to canonical co-ordinates equates to:

  1. A transformation of \(X^{(1)}\) to uncorrelated and standardised principal components
  2. A rigid (orthogonal) rotation \(P_{1}\) determined by \(\Sigma_{11}\)
  3. Another transformation \(E^´\) determined by the full covariance matrix \(\Sigma\)

The same argument applies to \(X^{(2)}\).

Proposition 9.2 (Geometrical Description of Canonical Variables) Let \(\underset{(p \times 1)}{U}=AX^{(1)}\) denote the transformation to canonical co-ordinates, where \(A_{k}=e_{k}^´\Sigma_{11}^{-0.5}\) and \(e_{k}\) is the kth eigenvector of \(\Sigma_{11}^{-0.5}\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}\Sigma_{11}^{-0.5}\). In matrix notation:

\[U=AX^{(1)}=E^´\Sigma_{11}^{-0.5}X^{(1)}=E^´P_{1} \Lambda_{11}^{-0.5} P_{1}^´X^{(1)}\]

where \(P_{1} \Lambda_{11}^{-0.5} P_{1}\) is defined as per theorem 8.1.

\(\Lambda_{11}^{-0.5} P_{1}^´X^{(1)}\) is a transformation to uncorrelated and standardised principal components (Step1). Pre-Multiplication by \(P_{1}\) equates with a rigid (orthogonal) rotation. Pre-Multiplication by \(E^´\) is a transformation determined by the full covariance matrix \(\Sigma\).

\(\square\)

References

Johnson, Richard Arnold, Dean W Wichern, and others. 2014. Applied Multivariate Statistical Analysis. Vol. 4. Prentice-Hall New Jersey.