Chapter 6 Functional Prinicpal Components Analysis

6.1 Overview

In Chapter 5 we considered two constructions of a random function. The first was as a random element of a Hilbert space, and the second was as a second-order mean-square continuous stochastic process. Under both constructions we were able express the respective random variable as a series in terms of a basis expansions using the eigenfunctions of the associated covariance operator, where random coefficients captured the stochasticity. Specifically, for a random function $\chi$ taking values in a Hilbert space $\mathbb{H}$ with well-defined mean element $m$ and covariance operator $\mathscr{K}$ , we saw in Equation (5.2) that we could express $\chi$ as, $\begin{align*} \chi = m + \sum_{j=1}^\infty Z_j e_j, \end{align*}$ where the $e_j$ are the eigenfunctions of $\mathscr{K}$ and $Z_j := \langle \chi - m, e_j \rangle$ . On the other hand, for a mean-square continuous stochastic process $X = \{X_t\mid t \in E\}$ with, we found in Section 5.3 that we could express $X$ as, $\begin{equation} X_t = m_t + \sum_{j=1}^\infty Z_j e_j(t), \tag{6.1} \end{equation}$ where the $Z_j$ now represent the random variables $\mathscr{I}_X(e_j)$ and the $e_j$ are the eigenfunctions of the integral operator $\mathscr{K}$ defined by the kernel $K(s,t) = \text{Cov}(X_s, X_t)$ . Finally, in Section 5.4 we saw that under the assumption of joint measurability, these two decompositions coincide, so that the path realizations of the stochastic process can be viewed as randomly sampled elements of the Hilbert space $\mathbb{H}$ .

Consider the special case in which we choose the Hilbert space $\mathbb{H}$ to be $\mathbb{R}^p$ . We then get that the $Z_j$ are given by $e_j^\mathrm{\scriptscriptstyle T}(X-m)$ , and therefore correspond to the usual multivariate principal components. Hence, when the space is an infinite-dimensional Hilbert space, specifically when the elements can be regarded as functions in the usual sense (e.g. when the space is $\mathbb{L}^2[0,1]$ or similar) we call the $Z_j$ functional principal components. Since the space of our observations in this setting is naturally infinite-dimensional, the analysis of functional principal components, which still have the same interpretation here as modes variation, offers even more potential for dimension reduction than in the multivariate case, and hence serve a role which is at least as important as that setting. In the rest of this chapter, we will consider the challenges posed by estimation of principal components in this context.

Throughout the rest of the notes, we will use $X$ to represent a stochastic process that is also a random element of a Hilbert space.

6.2 Estimation of Principal Components Analysis

Let $X$ be a random function taking values in $\mathbb{L}^2(\mathscr{T})$ , where $\mathscr{T}$ is some closed and bounded interval of $\mathbb{R}$ and suppose $S = (x_1, \ldots, x_n)$ is a sample of observed paths known to be generated by $X$ . Our goal is to use these sample paths to estimate the eigenpairs of the associated covariance operator $\mathscr{K}$ . Recall that the eigen problem takes the form, $\begin{equation} (\mathscr{K}e)(t) = \int_{[0,1]} K(t,s) e(s) \,d\mu(s) = \lambda e(t). \tag{6.2} \end{equation}$ In practice, three important challenges must be addressed if one is to estimate eigenfunctions from observed curves.

First, there is a clear dimensionality problem, which arises from the infinite-dimensional nature of the chosen function space $\mathbb{L}^2(\mathscr{T})$ . Specifically, as seen in Equation (6.1) $X_t$ , and therefore also $K(s,t)$ , are expanded as an infinite series in terms of the eigenfunctions. Of course, it is impossible to work with an infinite series, as it not possible to directly observe infinite terms, nor is it computationally feasible to try and estimate them.

Second, each sample curve in $S$ is typically not fully observed, but is instead recorded at a discete subset of points in $\mathscr{T}$ . This poses difficulties because the eigenfunctions $e_j$ , which we hope to estimate, are assumed to be continuous functions. The question is then how can we accurately infer the continuous structure of these objects from discrete observations, which may give very little information regarding the overall continuous path trajectories. This problem can be further complicated when the sample points are sparse in $\mathscr{T}$ , or when each curve is sampled at different, and on a differing number of, points in $\mathscr{T}$ .

Finally, the eigenproblem presented in Equation (6.2) involves an integral and two unknown quantities. Additionally, even once the eigenfunctions are estimated, e.g. once we have $\hat{e}_j$ , the scores associated with each observed curve $x_i$ are computed as $\langle x_i, \hat{e}_j \rangle$ and therefore also require integration. Hence, the estimation of the eigenfunctions and the associated scores can be computationally prohibitive, and assumptions are often made to make the estimation with these integrals more efficient.

Many different approaches exist for tackling the problem of eigenfunction estimation which are distinct from one another based on their assumptions regarding the nature of the observed data (e.g. sparsely or densely sampled), the formulation of the model (e.g parametric or non-parametric), whether the observations are observed with measurement error or not, etc. Due to the brief nature of this course, we will not have the opportunity to investigate all of these different interesting options for estimation that currently exist. Instead, we will focus on a few particular cases, while directing the reader to alternative approaches with references where appropriate.

6.3 Finite Basis Expansion

One common modelling assumption is best referred to as finite basis expansion, which is a fairly classical approach. Its general motivation is as follows: In theory, the observed paths of a random function $X$ arise as infinite series, but it is unlikely that the paths will have significant/nonzero contributions from every one of the countable basis functions. This is supported by earlier theorems, which suggest that finitely many, or at most countably many of the eigenvalues are nonzero, that only finitely many of the eigenvalues are above any magnitude, and that the paths live in the image of the covariance operator. In such scenarios, only finitely many basis functions would be required to span, or in the worst case approximately span, the space containing the observed curves. As a result, a common practice in FDA is to assume that the image of the random function $X$ is a finite-dimensional subspace of $\mathbb{L}^2(\mathscr{T})$ .

Denote this finite-dimensional subspace by $\mathbb{H}(\mathscr{T})$ and let $\Phi = \{\phi_1, \ldots, \phi_b\}$ be a basis for this space, and $\Phi(t) = \big(\phi_1(t), \ldots, \phi_b(t) \big)$ denote the vector of basis functions evaluated at $t \in \mathscr{T}$ . In practice, the basis $\Phi$ is usually chosen heuristically by the practitioner case-by-case, and depends on the assumed behaviours and patterns of the observed curves. Unlike in our theoretical exposition, $\Phi$ may not be composed of orthnormal elements. We will discuss basis choice in a bit more detail later, but right now assume a suitable $\Phi$ is chosen. Then, each observation $x_i$ lives in $\mathbb{H}(\mathscr{T})$ and therefore has the following functional form, $\begin{align*} x_i(t) = \sum_{j=1}^b a_{ij} \phi_j(t) = a_i^\mathrm{\scriptscriptstyle T}\Phi(t), \end{align*}$ where we have defined $a_i = (a_{i1}, \ldots, a_{ib})$ as the vector of coefficients for the $i$ th observation. The associated estimated mean element is, $\begin{align*} m(t) &= \frac{1}{n} \sum_{i=1}^n x_i(t) \\ &= \frac{1}{n} \sum_{i=1}^n a_i^\mathrm{\scriptscriptstyle T}\Phi(t) \\ &= \frac{1}{n} (1, \ldots, 1) \begin{bmatrix} a_1^\mathrm{\scriptscriptstyle T} \\ \vdots \\ a_n^\mathrm{\scriptscriptstyle T} \end{bmatrix} \begin{bmatrix} \phi_1(t) \\ \vdots \\ \phi_b(t) \end{bmatrix} \\ &:= \frac{1}{n}\mathbf{1}_n^\mathrm{\scriptscriptstyle T} A \Phi(t). \end{align*}$
The estimate for the covariance function is, $\begin{align*} \hat{K}(s,t) = \frac{1}{n-1} \sum_{i=1}^n \big\{x_i(s) - \hat{m}(s)\big\}\big\{x_i(t) - \hat{m}(t)\big\}. \end{align*}$ Using the same matrix notation used for expressing the $x_i$ and the mean element, we can express the term $x_i(t) - \hat{m}(t)$ as $A\Phi(t) - (1/n)\mathbf{1}_n^\mathrm{\scriptscriptstyle T} A \Phi(t)$ and hence we can write the covariance estimate as, $\begin{align*} \hat{K}(s,t) &= \frac{1}{n-1} \left\{A\Phi(s) - \frac{1}{n}\mathbf{1}_n^\mathrm{\scriptscriptstyle T} A \Phi(s)\right\}^\mathrm{\scriptscriptstyle T} \left\{A\Phi(t) - \frac{1}{n}\mathbf{1}_n^\mathrm{\scriptscriptstyle T} A \Phi(t)\right\} \\ &:=\frac{1}{n-1} \left\{\tilde{A}\Phi(s)\right\}^\mathrm{\scriptscriptstyle T}\left\{\tilde{A}\Phi(t)\right\} \\ &= \frac{1}{n-1}\Phi(s)^\mathrm{\scriptscriptstyle T}\tilde{A}^\mathrm{\scriptscriptstyle T}\tilde{A}\Phi(t). \end{align*}$

It can easily be seen that under the assumption of a finite-dimensional image space $\mathbb{H}(\mathscr{T})$ , all of the challenges discussed in the previous section are addressed. Explicitly, the dimensionality problem has been addressed because we can now represent all quantities of interest as a sum of finitely many terms, and hence we consider now the $b$ -dimensional subspace $\mathbb{H}(\mathscr{T})$ rather than the entire infinite dimensional space $\mathbb{L}^2(\mathscr{T})$ . Second, each sample curve is represented in terms of the known basis functions $\Phi$ . All we need now is a regression or interpolation procedure to project the observations onto this basis, after which we will have returned our discrete observations to functional form. Finally, it can be shown that the eigenproblem can be simplified to the following multivariate PCA problem, $\begin{equation} (n-1)^{-1}W^{1/2}\tilde{A}^\mathrm{\scriptscriptstyle T}\tilde{A}W^{1/2}u_j = \lambda_j u_j, \tag{6.3} \end{equation}$ where we have defined $W$ as the square matrix of inner products between the basis functions, i.e., $\begin{align*} W_{ij} = \langle \phi_i, \phi_j \rangle = \int_{\mathscr{T}} \phi_i(t)\phi_j(t) \, d\mu(t), \end{align*}$ and $(\lambda_j, u_j)$ is the $j$ th eigenpair of the matrix $(n-1)^{-1}W^{1/2}\tilde{A}^\mathrm{\scriptscriptstyle T}\tilde{A}W^{1/2}$ . Thus, the problem has been converted to one for which we already have efficient algorithms. The coefficients for the $j$ th eigenfunction are then given by $b_j = W^{-1/2}u_j$ and we then have, $\begin{align*} e_j(t) = b_j^\mathrm{\scriptscriptstyle T}\Phi(t). \end{align*}$ Additionally, the scores can be found as, $\begin{align*} \langle x_i(t) - \hat{m}(t), e_j(t) \rangle = \tilde{a}_i^\mathrm{\scriptscriptstyle T}Wb_j, \end{align*}$ where we have defined $\tilde{a}_i$ to be the $i$ th row of $\tilde{A}$ .

6.4 Example: Wafer Data

In class example: see R script in Canvas.

6.5 Exercises

6.5.1 Testing Understanding

Exercise 6.1 (Verify Derivation) Verify Equation (6.3). That is, plug $\hat{K}(s,t)$ into eigenproblem and show that it reduces to Equation (6.3). Additionally, verify that $\langle x_i(t) - \hat{m}(t), e_j(t) \rangle = \tilde{a}_i^\mathrm{\scriptscriptstyle T}Wb_j$ .

Exercise 6.2 (Verify pca.fd Function) In the R script FPCAex_waferdata.R compute Equation (6.3) directly by performing pca on the appropriate matrix. For this you will probably need the inprod function in the fda package. Report on your findings—do your results match those reported by the pca.fd function?