Section 9 Canonical Correlation Analysis

We saw in the Linear Predictors Section that the ability of a linear predictor to explain the variation in a response variable is measured by the coefficient of determination. The best linear predictor explains the most variation in the response $Y$ and maximises the absolute value of its correlation coefficient with $Y$ . The upper limit on the correlation that can be achieved is the Multiple Correlation Coefficient $\rho_{Y(Z)}$ .

Canonical Correlation Analysis is an extension of this approach. A key difference is that it takes linear combinations of both the “response” $\underset{(p \times 1)}{X^{(1)}}$ and “explanatory” $\underset{(q \times 1)}{X^{(1)}}$ sets of variables. In doing so, it finds transformed co-ordinates in which the correlations are higher than any individual entry in $\underset{(p \times q)}{\Sigma_{12}}$ . It helps answer questions like what is the maximum proportion of variation in data set (1) that can be explained by data set (2) and vice-versa?

Linear Analysis Approach

The key definition is given below: (see Johnson, Wichern, and others (2014), Section 9.3, page 490)

Proposition 9.1 (Identifying Canonical Correlates) Suppose $p \leq q$ and let the random vectors $\underset{(p \times 1)}{X^{(1)}}$ and $\underset{(q \times 1)}{X^{(2)}}$ have $Cov(X^{(1)})=\underset{(p \times p)}{\Sigma_{11}}$ , $Cov(X^{(2)})=\underset{(q \times q)}{\Sigma_{22}}$ and $Cov(X^{(1)}, X^{(2)})=\underset{(p \times q)}{\Sigma_{12}}$ , where $\Sigma$ has full rank. For coefficient vectors $\underset{(p \times 1)}{a}$ and $\underset{(q \times 1)}{b}$ , form the linear combinations $U=b^´X^{(1)}$ and $V=b^´X^{(2)}$ . Then:

$\underset{a,b}{max}Corr(U,V)=\rho_{1}^*$

is attained by the linear combinations (first canonical variate pair):

$\begin{align} U_{1}&=e_{1}^´\Sigma_{11}^{-0.5}X^{(1)}=a_{1}^´X^{(1)}\\ V_{1}&=f_{1}^´\Sigma_{22}^{-0.5}X^{(2)}=b_{1}^´X^{(2)} \end{align}$

where $a_{1}^´:=e_{1}^´\Sigma_{11}^{-0.5}$ and $b_{1}^´:=f_{1}^´\Sigma_{22}^{-0.5}$ .

The kth pair of canonical variates k=2,3,…,p:

$\begin{align} U_{k}&=e_{k}^´\Sigma_{11}^{-0.5}X^{(1)}=a_{k}^´X^{(1)}\\ V_{k}&=f_{k}^´\Sigma_{22}^{-0.5}X^{(2)}=b_{k}^´X^{(2)} \end{align}$

maximises $Corr(U_{k},V_{k})=\rho_{K}^*$ among those linear combinations uncorrelated with the proceeding 1,2,…,k-1 canonical variables.

Here $\rho_{1}^* \geq \rho_{2}^* ... \geq \rho_{p}^*$ are the eigenvalues of $\Sigma_{11}^{-0.5}\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}\Sigma_{11}^{-0.5}$ and $e_{1},...,e_{p}$ are the associated (p x 1) eigenvectors. The quantities $\rho_{1}^* \geq \rho_{2}^* ... \geq \rho_{p}^*$ are also the p largest eigenvalues of the matrix $\Sigma_{22}^{-0.5}\Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}\Sigma_{22}^{-0.5}$ with corresponding (q x 1) eigenvectors $f_{1},...,f_{p}$ .

$\square$

Geometrical Approach

The geometric insight (see Johnson, Wichern, and others (2014), page 549) is that transforming to canonical co-ordinates equates to:

A transformation of $X^{(1)}$ to uncorrelated and standardised principal components
A rigid (orthogonal) rotation $P_{1}$ determined by $\Sigma_{11}$
Another transformation $E^´$ determined by the full covariance matrix $\Sigma$

The same argument applies to $X^{(2)}$ .

Proposition 9.2 (Geometrical Description of Canonical Variables) Let $\underset{(p \times 1)}{U}=AX^{(1)}$ denote the transformation to canonical co-ordinates, where $A_{k}=e_{k}^´\Sigma_{11}^{-0.5}$ and $e_{k}$ is the kth eigenvector of $\Sigma_{11}^{-0.5}\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}\Sigma_{11}^{-0.5}$ . In matrix notation:

$U=AX^{(1)}=E^´\Sigma_{11}^{-0.5}X^{(1)}=E^´P_{1} \Lambda_{11}^{-0.5} P_{1}^´X^{(1)}$

where $P_{1} \Lambda_{11}^{-0.5} P_{1}$ is defined as per theorem 8.1.

$\Lambda_{11}^{-0.5} P_{1}^´X^{(1)}$ is a transformation to uncorrelated and standardised principal components (Step1). Pre-Multiplication by $P_{1}$ equates with a rigid (orthogonal) rotation. Pre-Multiplication by $E^´$ is a transformation determined by the full covariance matrix $\Sigma$ .

$\square$

References

Johnson, Richard Arnold, Dean W Wichern, and others. 2014. Applied Multivariate Statistical Analysis. Vol. 4. Prentice-Hall New Jersey.