Section 9 Canonical Correlation Analysis
We saw in the Linear Predictors Section that the ability of a linear predictor to explain the variation in a response variable is measured by the coefficient of determination. The best linear predictor explains the most variation in the response Y and maximises the absolute value of its correlation coefficient with Y. The upper limit on the correlation that can be achieved is the Multiple Correlation Coefficient ρY(Z).
Canonical Correlation Analysis is an extension of this approach. A key difference is that it takes linear combinations of both the “response” X(1)(p×1) and “explanatory” X(1)(q×1) sets of variables. In doing so, it finds transformed co-ordinates in which the correlations are higher than any individual entry in Σ12(p×q). It helps answer questions like what is the maximum proportion of variation in data set (1) that can be explained by data set (2) and vice-versa?
Linear Analysis Approach
The key definition is given below: (see Johnson, Wichern, and others (2014), Section 9.3, page 490)
Proposition 9.1 (Identifying Canonical Correlates) Suppose p≤q and let the random vectors X(1)(p×1) and X(2)(q×1) have Cov(X(1))=Σ11(p×p), Cov(X(2))=Σ22(q×q) and Cov(X(1),X(2))=Σ12(p×q), where Σ has full rank. For coefficient vectors a(p×1) and b(q×1), form the linear combinations U=b^´X^{(1)} and V=b^´X^{(2)}. Then:
\underset{a,b}{max}Corr(U,V)=\rho_{1}^*
is attained by the linear combinations (first canonical variate pair):
\begin{align} U_{1}&=e_{1}^´\Sigma_{11}^{-0.5}X^{(1)}=a_{1}^´X^{(1)}\\ V_{1}&=f_{1}^´\Sigma_{22}^{-0.5}X^{(2)}=b_{1}^´X^{(2)} \end{align}where a_{1}^´:=e_{1}^´\Sigma_{11}^{-0.5} and b_{1}^´:=f_{1}^´\Sigma_{22}^{-0.5}.
The kth pair of canonical variates k=2,3,…,p:
\begin{align} U_{k}&=e_{k}^´\Sigma_{11}^{-0.5}X^{(1)}=a_{k}^´X^{(1)}\\ V_{k}&=f_{k}^´\Sigma_{22}^{-0.5}X^{(2)}=b_{k}^´X^{(2)} \end{align}maximises Corr(U_{k},V_{k})=\rho_{K}^* among those linear combinations uncorrelated with the proceeding 1,2,…,k-1 canonical variables.
Here \rho_{1}^* \geq \rho_{2}^* ... \geq \rho_{p}^* are the eigenvalues of \Sigma_{11}^{-0.5}\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}\Sigma_{11}^{-0.5} and e_{1},...,e_{p} are the associated (p x 1) eigenvectors. The quantities \rho_{1}^* \geq \rho_{2}^* ... \geq \rho_{p}^* are also the p largest eigenvalues of the matrix \Sigma_{22}^{-0.5}\Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}\Sigma_{22}^{-0.5} with corresponding (q x 1) eigenvectors f_{1},...,f_{p}.
\squareGeometrical Approach
The geometric insight (see Johnson, Wichern, and others (2014), page 549) is that transforming to canonical co-ordinates equates to:
- A transformation of X^{(1)} to uncorrelated and standardised principal components
- A rigid (orthogonal) rotation P_{1} determined by \Sigma_{11}
- Another transformation E^´ determined by the full covariance matrix \Sigma
The same argument applies to X^{(2)}.
Proposition 9.2 (Geometrical Description of Canonical Variables) Let \underset{(p \times 1)}{U}=AX^{(1)} denote the transformation to canonical co-ordinates, where A_{k}=e_{k}^´\Sigma_{11}^{-0.5} and e_{k} is the kth eigenvector of \Sigma_{11}^{-0.5}\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}\Sigma_{11}^{-0.5}. In matrix notation:
U=AX^{(1)}=E^´\Sigma_{11}^{-0.5}X^{(1)}=E^´P_{1} \Lambda_{11}^{-0.5} P_{1}^´X^{(1)}
where P_{1} \Lambda_{11}^{-0.5} P_{1} is defined as per theorem 8.1.
\Lambda_{11}^{-0.5} P_{1}^´X^{(1)} is a transformation to uncorrelated and standardised principal components (Step1). Pre-Multiplication by P_{1} equates with a rigid (orthogonal) rotation. Pre-Multiplication by E^´ is a transformation determined by the full covariance matrix \Sigma.
\squareReferences
Johnson, Richard Arnold, Dean W Wichern, and others. 2014. Applied Multivariate Statistical Analysis. Vol. 4. Prentice-Hall New Jersey.