Chapter 5 Function-Valued Random Variables

In this chapter, we finally characterize function-valued random variables, i.e., random functions. As we have been alluding to in previous chapters, our function space will be a separable Hilbert space H, almost always taking the form L2(E,B,μ), the space of square-integrable functions over a compact metric space E with respect to measure μ. A function-valued random variable, denoted in (Hsing and Eubank 2015) by χ, is a measurable function from some underlying probability space (Ω,F,P) to H. Specifically, χ:ωf, for ωΩ and fL2(E). For this function to be measurable, the preimage of each set BB under χ must be in F. In other words, χ1(B):={ωχ(ω)B} is an element of F. Characterization of this measurability is the subject of our first section.

5.1 Probability Measures on a Hilbert Space

This is concerned with the property of measurability, hence we consider a general separable Hilbert space H and let B(H) denote the associated Borel σ-algebra. For a particular fH and BR open, let Mf,B be the subset of H defined by Mf,B:={gg,fB}. Let M be the class of all such sets, i.e. M={Mf,BfH,BR,B open}, and define σ(M) as the smallest σ-algebra containing M. Our first result is a lemma relating this σ-algebra σ(M) and B(H), which we will need for proving our main result of this section.

Lemma 5.1 (Identity of Sigma-Algebras) The σ-algebras σ(M) and B(H) are identical.

Proof (sketch). We proceed in the typical fashion of showing σ(M)B(H) and σ(M)B(H).

Since the inner product is continuous, each set contained in M is open (or possibly closed, e.g. when the preimage is all of H), hence MB(H).

Since B(H) is the smallest σ-algebra containing the open sets, it is sufficient to show that each open set of H is contained in σ(M). Do this by constructing an arbitrary open set from a countable union of sets in M, and the result follows.

Why might we be interested in making a connection between these two σ-algebras? Normally, measurability is defined with respect to the σ-algebra B(H), however, due to the previous theorem, we have an equivalence between B(H) and σ(M). Hence, every open set generating B(H) can be constructed from projections of functions/vectors in H onto others. In such a circumstance, we can equivalently characterize measurability in terms of these sets of projections, which turns out to be quite useful, as the next result demonstrates.

Theorem 5.1 (Measurability in a Separable Hilbert Space) Let χ be a mapping from some probability space (Ω,F,P) to (H,B(H)). Then, χ is measurable if χ,f is measurable for all fH. Further, if χ is measurable, then its distribution is uniquely determined by the (marginal) distributions of χ,f over fH.

There are two typos in the proof of this theorem in (Hsing and Eubank 2015) so we provide the proof here with those typos fixed for the purpose of clarity.

Proof (Part i). (χ meas. χ,f meas.) An immediate consequence of the continuity of ,f for each fH.

(χ,f meas. χ meas.) If χ,f is measurable for all f, then for any BR open, the set {gHg,fB}MB(H) satisfies, χ1({gg,fB})={ωΩχ(ω),fB}F, where the equality holds by definition, and inclusion in F holds by the measurability of χ,f for all fH. This proves the result since the M generates B(H) by Lemma 5.1.

Proof (part ii, sketch). Define ˇM as the class containing finite intersections of sets in M and note that σ(ˇM)=σ(M). Using the πλ theorem, we can show that this is a determining class, so that if two measures agree on ˇM then they agree on B(H).

Let Mi, iI be a finite collection of sets in M. Then, iMiˇM and Pχ1(iMi)=P(χ,fiBi,iI), which is a joint probability on the χ,fi. Since ˇM is a determining class, these uniquely define Pχ1.

5.2 Mean and Covariance of a Random Element of a Hilbert Space

In this section, we let χ be a measurable map from a probability space (Ω,F,P) to a separable Hilbert space H. As such, χ is a random function. In this section we will define and discuss the notions of expected value and covariance which generalize the same notions from the multivariate context. Recall that E, which is a Lebesgue integral of the random norm over the support of \chi.

Definition 5.1 (Mean of Random Function) If \mathbb{E}\lVert \chi \rVert < \infty, the mean element, or mean, of \chi is defined as the Bochner integral, \begin{align*} m = \mathbb{E}(\chi) := \int_\Omega \chi d \mathbb{P}. \end{align*}

Recall from Theorem 3.8 that the condition \mathbb{E}\lVert \chi \rVert < \infty guarantees Bochner integrability of \chi, and hence the mean element is well-defined. Also notice that by Theorem 4.2 and more explicitly, Example 4.2, we have \begin{align*} \mathbb{E}\langle \chi, f \rangle = \langle \mathbb{E}(\chi),f \rangle = \langle m, f \rangle, \end{align*} so that m can also be thought of as the representer of the linear functional f \mapsto \mathbb{E}\langle \chi,f \rangle. Additionally, again using Theorem 4.2 we can show that \mathbb{E}\lVert \chi - m \rVert^2 = \mathbb{E}\lVert \chi \rVert^2 - \lVert m \rVert^2, when \mathbb{E}\lVert \chi \rVert^2 exists and is finite.

In the same way that Definition 5.1 extends the usual understanding of the mean of a random variable to Hilbert space-valued random variables, our definition for the covariance operator seeks to extend the usual notion of covariance for finite-dimensional random variables.

Definition 5.2 (Covariance Operator of a Random Function) Suppose \mathbb{E}\lVert \chi \rVert^2 < \infty. Then the covariance operator for \chi is the element of \mathfrak{B}_{HS}(\mathbb{H}) given by the Bochner integral, \begin{align*} \mathscr{K} = \mathbb{E}\big[(\chi - m) \otimes (\chi - m)\big] := \int_\Omega (\chi - m) \otimes (\chi - m) d \mathbb{P}. \end{align*}

Here, the condition \mathbb{E}\lVert \chi \rVert^2 < \infty and the fact that \mathscr{K} is an element of \mathfrak{B}_{HS}(\mathbb{H}) also come from an application of Theorem 4.2 and Theorem 3.8, as discussed in Example 4.4. That is, for each \omega \in \Omega one can define the operator T_\omega := (\chi(\omega) -m) \otimes (\chi(\omega) -m) and show that the norm of this operator can be specified as \lVert T_\omega \rVert_{HS} = \lVert \chi(\omega) - m \rVert^2. We know this norm is finite whenever \lVert \chi(\omega) \rVert^2 is finite. Hence, if we have \mathbb{E}\lVert \chi \rVert^2 <\infty, then (\chi - m) \otimes (\chi-m) is a random element of \mathfrak{B}_{HS}(\mathbb{H}) and it is Bochner integrable because its Hilbert-Schmidt norm is integrable, i.e., \mathbb{E}\lVert \chi - m \rVert^2 < \infty. It is precisely in such a case that \mathscr{K} is well-defined. Finally, using the properties of \otimes and another application of Example 4.4 we can see that \mathbb{E}[(\chi - m) \otimes (\chi - m)]= \mathbb{E}(\chi \otimes \chi) - m\otimes m.

We now derive some properties of the covariance operator \mathscr{K}. Details can be found in (Hsing and Eubank 2015). A notable, although minor, difference is that we include the mean element m in these results, while (Hsing and Eubank 2015) assumes, without loss of generality, that m=0. We choose to include it simply to make its role in these results explicit.

Theorem 5.2 (Properties of the Covariance Operator) Suppose \mathbb{E}\lVert \chi \rVert^2 < \infty. For f,g in \mathbb{H},

  1. \langle \mathscr{K}f, g\rangle = \mathbb{E}\big[ \langle \chi - m,f \rangle \langle \chi - m, g \rangle\big]
  2. \mathscr{K} is a nonnegative-definite, trace-class operator with \lVert \mathscr{K} \rVert_{TR} = \mathbb{E}\lVert \chi - m \rVert^2, and
  3. \mathbb{P}\big( \chi \in \overline{\text{Im}(\mathscr{K})} \big) = 1.

Since \mathscr{K} is a compact, self-adjoint operator, it admits an eigen decomposition as specified in Theorem 4.12, \begin{equation} \mathscr{K} = \sum_{j=1}^\infty \lambda_j e_j \otimes e_j, \tag{5.1} \end{equation} where the eigenfunctions e_j form a basis for \overline{\text{Im}(\mathscr{K})}, and the eigenvalues are nonnegative since \mathscr{K} is also a nonnegative operator. Using the fact that \mathbb{P}\big( \chi \in \overline{\text{Im}(\mathscr{K})} \big) = 1 and Corollary 3.1, we also find that the random function \chi can be written as \begin{equation} \chi = m+ \sum_{j=1}^\infty \langle \chi - m, e_j \rangle e_j = \sum_{j=1}^\infty \langle \chi, e_j \rangle e_j, \tag{5.2} \end{equation} with probability 1. This also makes it clear that the one-dimensional projections \langle \chi, e_j \rangle determine the distribution of \chi. Additionally, this expansion is optimal in terms of the mean squared error: for any natural number n, let \chi_n = m + \sum_{i=1}^n \langle \chi - m, e_i \rangle e_i be the random function defined as the sum of the first n terms in the expansion of \chi, then, \begin{equation} \mathbb{E}\left[ \lVert \chi - \chi_n \rVert^2 \right] \leq \mathbb{E}\left[ \lVert (\chi - m) - \sum_{j=1}^n \langle \chi-m,f_j \rangle f_j \rVert^2 \right] \tag{5.3} \end{equation} where the f_j are any other orthonormal subset in \mathbb{H}.

5.3 Mean-square Continuous Processes and the Karhunen-Loeve Theorem

In the course introduction it was mentioned that we will view observed curve data as having arisen as realized paths of stochastic processes. In this section, the promised stochastic process view point finally makes its appearance. We first describe in relatively general terms how we can view functional data as having arisen from stochastic processes. We then develop the Karhunen-Loeve (KL) expansion, which is analogous to Equation (5.2) in this new context.

As before, we let (\Omega, \mathscr{F}, \mathbb{P}) be a probability space and (E, \mathscr{B}, \mu) be a compact metric space with Borel \sigma-algebra \mathscr{B} and finite measure \mu.

Definition 5.3 (Stochastic Process) A stochastic process X: E \times \Omega \rightarrow \mathbb{R} is a collection of random variables, X_t, indexed by a compact metric space (E, \mathscr{B}), each defined on a common probability space (\Omega, \mathscr{F}, \mathbb{P}). We write X = \{X_t(\cdot) \mid t \in E\} where, for each fixed t \in E, X_t(\cdot) is an \mathscr{F}-measurable random variable, and for each fixed \omega, \{X_t(\omega) \mid t\in E\} is a path of X.

Its important to remember that we sort of have three spaces floating around here: the probability space (\Omega, \mathscr{F}, \mathbb{P}) on which the random variables are defined, the compact metric space (E, \mathscr{B}, \mu) which serves as the index space for the process, and (\mathbb{R}, \mathscr{B}(\mathbb{R})) which serves as the space in which the random variables (and hence, for us, the resulting functions of interest) are taking values. Keeping these spaces and their roles in order is crucial for understanding these processes.

For a given stochastic process X, if we have existence of the expectations \mathbb{E}X_t and \mathbb{E}(X_sX_t) for each s,t in \mathbb{E}, then we can immediately define the mean and covariance functions of X as m(t) = \mathbb{E}(X_t) and K(s,t) = \text{Cov}(X_s, X_t) respectively. When this holds for a particular stochastic process, we say that it is a second-order process. We will be interested in second-order process with the additional property of mean-square continuity, which we now define.

Definition 5.4 (Mean-Square Continuity) A stochastic process X is said to be mean-square continuous if, for any t \in E and any sequence t_n converging to t we have that \lim_{n \rightarrow \infty} \mathbb{E}\left[ (X_{t_n} - X_t)^2 \right] =0.

The property of mean-square continuity can be understood as a condition on the continuity of the process in the average sense, rather than on the continuity of any particular path. Intuitively, mean-square continuity tells us that any two points of a path X(\omega) that have index values which are close with respect to the metric on E will have values that are close with respect to the metric on \mathbb{R} as well, on average. This enforces smoothness in the sense that we will usually observe paths which do not have extreme oscillatory behaviours over small regions of the index space.

library(mvtnorm)
library(tibble)
library(ggplot2)
library(ggpubr)

# Preliminaries
nsteps = 500
sig = 0.5

t = seq(0, 1, length.out = nsteps)
Kstgp = outer( t,t,  \(s,t){ exp(- (s - t)^2 / (2 * sig^2)) } ) #K(s,t) for Gaussian process
Kstbm = outer( t,t, Vectorize( \(s,t){ min(s,t) } ) ) #K(s,t) for Brownian motion


# Generate sample paths
set.seed( 90053 )
num_samples = 50
samples_gp = rmvnorm( n = num_samples, mean = rep(0, length(t)), sigma = Kstgp) |> t()
samples_bm = rmvnorm( n = num_samples, mean = rep(0, length(t)), sigma = Kstbm) |> t()
samples_wn = rmvnorm( n = num_samples, mean = rep(0, length(t)), sigma = diag(nsteps)) |> t()

#Alternatively, generate the values together, stack them in df, use facet_wrap on wn, bm label
list( 
  "time"  = rep(t, times=num_samples),
  "gp_values" = as.numeric(samples_gp),
  "bm_values" = as.numeric( samples_bm ),
  "wn_values" = as.numeric( samples_wn ),
  "group" = rep(1:num_samples, each=length(t)) |> factor()
) |>
  new_tibble() -> pathsdf

list(
  "time" = t,
  "ymingp" = -2 * sqrt( diag(Kstgp) ), 
  "ymaxgp" = 2 * sqrt( diag(Kstgp) ), 
  "yminbm" = -2 * sqrt( diag(Kstbm) ), 
  "ymaxbm" = 2 * sqrt( diag(Kstbm) ),
  "yminwn" = rep(-2, nsteps) , 
  "ymaxwn" = rep(2, nsteps)
) |> 
  new_tibble() -> ci_bounds

# Plotting
gpplot = ggplot() +
  geom_ribbon(data=ci_bounds, aes( x=time, ymin=ymingp, ymax=ymaxgp ), fill="#929292", color="#000000", lty=2, alpha=0.2) +
  geom_line(data=pathsdf, aes(x=time, y=gp_values, group=group, color=group), lwd=0.5) +
  geom_hline(yintercept=0, linetype="solid", color="#000000", lwd=1) +
  labs(title = "",
       x="", y="Gaussian Process Path Value") +
  theme_classic() +
  theme(legend.position = "none")

bmplot = ggplot() +
  geom_ribbon(data=ci_bounds, aes(x=time, ymin=yminbm, ymax=ymaxbm), fill="#929292", color="#000000", lty=2, alpha=0.2) +
  geom_line(data=pathsdf, aes(x=time, y=bm_values, group=group, color=group), lwd=0.5) +
  geom_hline(yintercept=0, linetype="solid", color="#000000", lwd=1) +
  labs(title = "",
       x="", y="Brownian Motion Path Value") +
  theme_classic() +
  theme(legend.position = "none")

wnplot = ggplot() +
  geom_ribbon(data=ci_bounds, aes(x=time, ymin=yminwn, ymax=ymaxwn), fill="#929292", color="#000000", lty=2, alpha=0.2) +
  geom_line(data=pathsdf, aes(x=time, y=wn_values, group=group, color=group), lwd=0.5) +
  geom_hline(yintercept=0, linetype="solid", color="#000000", lwd=1) +
  labs(title = "",
       x="", y="White Noise Path Value") +
  theme_classic() +
  theme(legend.position = "none")

#ggsave("msc_plot.png",ggarrange(gpplot, bmplot, wnplot, ncol=3, nrow=1),width=10, height=5)
ggarrange(gpplot, bmplot, wnplot, ncol=3, nrow=1)
A plot showing paths for mean-square continuous processes (left, middle), and a process that is not mean-square continuous (right).

Figure 5.1: A plot showing paths for mean-square continuous processes (left, middle), and a process that is not mean-square continuous (right).

Assuming a process is mean-square continuous is equivalent to assuming its mean function and covariance function are continuous (see theorem 7.3.2 in (Hsing and Eubank 2015)). Despite this, we still cannot guarantee that the resulting paths of a mean-square continuous process X live in the Hilbert space \mathbb{L}^2(E), and as a result, we cannot make use of our results from earlier sections without further assumptions or conditions.

In a first step to merging these two view points, we notice that the covariance function K has all the makings of an integral operator kernel. We have already seen that K is symmetric and uniformly continuous on E, but we can also show that it is nonnegative in the sense of Definition 4.11. To do this, let n be any natural number, and T_n = \{t_i\}_{i=1}^n an arbitrary subset of n elements in E. We define \mathbf{K} as the matrix with entries \mathbf{K}_{ij} = K(t_i, t_j). Then, for any vector a in \mathbb{R}^n we have \text{Var}(\langle a,X_{T_n} \rangle) = \langle a , \mathbf{K}a \rangle \geq 0, where with a slight abuse of notation we’ve used X_{T_n} to denote the vector of random variables (X_{t_1},\ldots, X_{t_n}). Then, we know by Mercer’s Theorem that the integral operator on \mathbb{L}^2(E) defined by (\mathscr{K}f)(\cdot) = \int_E K(\cdot, s)f(s) d\mu(s) has an associated sequence of eigenpairs (\lambda_j, e_j)_{j \in \mathbb{N}} which we can use to write K(s,t)=\sum_{j\in \mathbb{N}}\lambda_je_j(s)e_j(t), where the sum converges absolutely and uniformly on the support of \mu.

Expansion of the covariance function K in terms of the eigenpairs of the associated integral operator \mathscr{K} is the first piece we need for stating and proving the KL expansion. The next, is the notion of an \mathbb{L}^2 stochastic integral of a mean-square continous process, X, which we now define.

First, recall that since E is a compact metric space, by Theorem 2.1 it is totally bounded. Hence, we can define a function D: \mathbb{N} \rightarrow \mathbb{N} on the natural numbers \mathbb{N} such that, for each n \in \mathbb{N}, we can find a partition \{E_i\}_{i=1}^{D(n)} of E where each E_i is in \mathscr{B} and has diameter less than D(n)^{-1}. Here, the diameter of a set E_i is 2 times the radius of the smallest open ball, \{t \in E \mid d(t,t_i) < r\}, that contains E_i. In such a case, we can choose an arbitrary element t_i in each E_i and define the random function G_n(t) = \sum_{i=1}^{D(n)} (X_{t_i}-m_{t_i})I\{t \in E_i\}, which takes the value (X_{t_i}-m_{t_i}) on the partition set E_i for each i. Then, for any f in \mathbb{L}^2(E), we can define the following stochastic integral, \begin{align*} \mathscr{I}_X(f; G_n) &= \int_E G_n(s)f(s) d\mu(s) \\ &= \int_E \sum_{i=1}^{D(n)} (X_{t_i}-m_{t_i})I\{s \in E_i\}f(s) d\mu(s) \\ &= \sum_{i=1}^{D(n)} (X_{t_i}-m_{t_i})\int_E I\{s \in E_i\}f(s) d\mu(s) \\ &= \sum_{i=1}^{D(n)} (X_{t_i}-m_{t_i})\int_{E_i}f(s) d\mu(s), \end{align*} where we refer to the integral as stochastic due to the randomness of the value X_{t_i} for each t_i. By this definition, \mathscr{I}_X(f;G_n) is a random variable, and because X is a second-order process, \mathscr{I}_X(f; G_n) is square-integrable and is therefore an element of \mathbb{L}^2(\Omega, \mathscr{F}, \mathbb{P}) for each f in \mathbb{L}^2(E) and each choice for the partition/point-representative collection, \{(E_i, t_i)\}_{i=1}^{D(n)}.

Our next goal is to show that the sequence of random variables \mathscr{I}_X(f; G_n) is Cauchy in \mathbb{L}^2(\Omega) and hence converges to a particular random variable, which we will denote by \mathscr{I}_X(f). To do this, we let \{E_i\}_{i=1}^n and \{E_j\}_{j=1}^{n^\prime} be two partitions. Then, the sequence is Cauchy if \mathbb{E}[\{\mathscr{I}_X(f; G_n) - \mathscr{I}_X(f; G_{n^\prime}) \}^2] goes to 0 as n, n^\prime approach \infty. Expanding we get, \begin{align*} \mathbb{E}[\{\mathscr{I}_X(f; G_n) &- \mathscr{I}_X(f; G_{n^\prime}) \}^2] =\\ &\begin{aligned} &\sum_{i=1}^{D(n)}\sum_{j=1}^{D(n)} K(t_i,t_j)\mathbb{E}[f; E_i] \mathbb{E}[f;E_j] + \\ &\sum_{i=1}^{D(n^\prime)}\sum_{j=1}^{D(n^\prime)} K(t_i^\prime,t_j^\prime)\mathbb{E}[f; E_i^\prime] \mathbb{E}[f;E_j^\prime] - \\ &2\sum_{i=1}^{D(n)}\sum_{i=1}^{D(n^\prime)} K(t_i,t_j^\prime)\mathbb{E}[f; E_i] \mathbb{E}[f;E_j^\prime], \end{aligned} \end{align*} where \mathbb{E}[f; E_i] := \int_{E_i} f(s) d\mu(s). Each term of this double sum represents the integral of a simple function over the product space E \times E. By increasing n and/or n^\prime we see that simple functions, and hence their integrals are becoming closer and closer, so that, each term is approaching the same limit integral, \iint_{E\times E} f(s)f(t) d(\mu\times\mu)(s,t). For example, considering the first sum, \sum_{i=1}^{D(n)}\sum_{j=1}^{D(n)} K(t_i,t_j)\mathbb{E}[f; E_i] \mathbb{E}[f;E_j], for any \epsilon > 0, we can choose n and n^\prime large enough, i.e. a \delta small enough, such that \lvert K(s, t) - K(t_i, t_j)\rvert < \epsilon for all s \in E_i and t \in E_j. Similar arguments can be made for the other sums. As a result the entire equation tends to 0 in the limit.

We now derive some important properties of the limiting random variable I_X(f).

Theorem 5.3 (Properties of I_x) Let X = \{X_t \mid t \in E\} be a mean-square continuous stochastic process. The, for any f,g in \mathbb{L}^2(E, \mathscr{B}, \mu),

  1. \mathbb{E}[\mathscr{I}_X(f)] = 0
  2. \mathbb{E}[\mathscr{I}_X(f)(X_t-m_t)] = \int_E K(s,t) f(s) d\mu(s) for any t \in E, and
  3. \mathbb{E}[\mathscr{I}_X(f)\mathscr{I}_X(g)] = \iint_{E \times E} K(s,t)f(s)g(t)d(\mu \times \mu)(s,t).

Proof. We prove the first statement. Proofs for the other two are similar.

Since E[\mathscr{I}_X(f;G_n)] =0 for any n, we can write, \begin{align*} \lvert E[\mathscr{I}_X(f)] \rvert &= \lvert E[\mathscr{I}_X(f) - \mathscr{I}_X(f;G_n)] \rvert \\ &= \big(\{ E[\mathscr{I}_X(f) - \mathscr{I}_X(f;G_n)] \}^2 \big)^{1/2} \\ &\underset{\text{Jensen's}}{\leq} \big( E\{ [\mathscr{I}_X(f) - \mathscr{I}_X(f;G_n)]^2 \} \big)^{1/2}, \end{align*} which goes to 0 in the limit, hence we have \lvert E[\mathscr{I}_X(f)] \rvert \leq 0 and the result is proved.

An important corollary of Theorem 5.3, which we will use in the proof of the KL expansion theorem, is that, when f=e_i and g=e_j are two eigenvectors of \mathscr{K}, then part 3 of the theorem tells us that \mathbb{E}[\mathscr{I}_X(e_i)\mathscr{I}_X(e_j)] = \delta_{ij}\lambda_i.

Theorem 5.4 (Karhunen-Loeve Expansion) Let \{X_t \mid t \in E\} be a mean-square continuous stochastic process. Then, \begin{align*} \lim_{n\rightarrow \infty} \sup_{t \in E} \mathbb{E}\{ [X_t - X_{n,t}]^2 \} =0, \end{align*} where X_{n,t} := m_t + \sum_{i=1}^n \mathscr{I}_X(e_j)e_j(t).

Proof. \begin{align*} \mathbb{E}\{ [X_t - X_{n,t}]^2 \} &= \mathbb{E}[(X_t - m_t)^2] + \mathbb{E}\left[ \left\{\sum_{j=1}^n \mathscr{I}_X(e_j)e_j(t)\right\}^2 \right] - 2 \mathbb{E}\left[\sum_{j=1}^n(X_t -m_t) \mathscr{I}_X(e_j)e_j(t) \right] \\ &= K(t,t) + \sum_{i=1}^n \sum_{j=1}^n \mathbb{E}\left\{\mathscr{I}_X(e_i)\mathscr{I}_X(e_j)\right\}e_i(t)e_j(t) - 2\sum_{j=1}^n \mathbb{E}[(X_t - m_t) \mathscr{I}_X(e_j)]e_j(t) \\ &= \sum_{j=1}^\infty \lambda_j e_j(t)^2 + \sum_{j=1}^n \lambda_j e_j(t)^2 - 2\sum_{j=1}^n \lambda_j e_j(t)^2, \end{align*} which tends to zero uniformly on E by Mercer’s Theorem.

This result has obvious parallels with Equation (5.2), but of course the context is slightly different, which is important. In the next section, we discuss the additional conditions we need to bring these two perspectives together.

5.4 Mean-Square Continuous Process in a Hilbert Space \mathbb{L}^2

The purpose of this section is to bring together the more abstract random element of a Hilbert space concept, which we built up in the previous chapters and the first section of this chapter, and the stochastic process viewpoint we introduced in the previous section. The idea is to have the stochastic process producing paths which live in a Hilbet space almost surely, and hence allow us to apply our earlier results in this context.

Definition 5.5 (Joint Measurability) A stochastic process X is said to be jointly measurable if X_t(\omega) is measurable with respect to the product \sigma-algebra, \mathscr{B} \times \mathscr{F}. Hence, X_{\cdot}(\omega) is a \mathscr{B}-measurable function on E for all \omega, and X_t(\cdot) is measurable on \Omega for each fixed t in E.

By assuming this additional property, we get the result we are after, as we now show.

Theorem 5.5 (Random Process in L2) Suppose X is jointly measurable and X_{\cdot}(\omega) is in \mathbb{H} for each \omega \in \Omega. then the mapping \omega \mapsto X_{\cdot}(\omega) is measurable from \Omega to \mathbb{H} so that X is a random element of \mathbb{H}.

Proof (Sketch). Use joint measurability to prove \langle X_{\cdot}(\omega), f \rangle is a measurable function from \Omega to \mathbb{L}^2(E) for each f in \mathbb{L}^2(E). This can be done using Fubini’s Theorem. Then appeal to Theorem 5.1.

What is an easily understood condition that implies joint measurability? Consider the next theorem

Theorem 5.6 (Path Continuity Gives Joint Measurability) Assume that for each t, X_{t}(\cdot) is measurable and that X_{\cdot}(\omega) is continuous for each \omega in \Omega. Then, X_{t}(\omega) is jointly measurable and hence a random element of \mathbb{L}^2(E, \mathscr{B}, \mu). In this case, the distribution of X is uniquely determined by the (finite-dimensional) distributions of \big(X_{t_1}(\cdot), \ldots, X_{t_n}(\cdot)\big) for all t_1, \ldots, t_n in E, and for all natural numbers n.

(Hsing and Eubank 2015) note that there are many ways to verify sample path continuity, e.g. Kolomogorov’s criterion, which by extension gives us established ways to check for joint measurability. Finally, we give a theorem dedicated to the important properties of a mean-square continuous process which is also a random element of \mathbb{L}^2(E, \mathscr{B}, \mu).

Theorem 5.7 (Properties of M-S.C. Process in L2) Let X = \{ X_t \mid t \in E\} be a mean-square continuous process that is jointly measurable. Then,

  1. the mean function m of the stochastic process belongs to \mathbb{L}^2(E) and coincides with the mean element of X in \mathbb{L}^2(E) from Definition 5.1,
  2. the covariance operator \mathbb{E}[(X - m)\otimes (X-m)] of Definition 5.2 is defined, and coincides with the operator \mathscr{K} defined as (\mathscr{K}f)(\cdot) = \int_E K(\cdot, s)f(s) d\mu(s) where K is the covariance function of the process, and,
  3. for any f in \mathbb{L}^2(E), \mathscr{I}_X(f) = \int_E X(t)f(t) d\mu(t) = \langle X,f \rangle.

As a consequence, the scores can be represented as projections onto the basis, as one would typically expect.

5.5 Exercises

5.5.1 Testing Understanding

Exercise 5.1 (Norm of Covariance Component Operator) Verify that the operator (\chi(\omega) - m) \otimes (\chi(\omega) - m) has norm given by \lVert \chi(\omega) - m \rVert^2.

Exercise 5.2 (Mean-Square Continuity) Confirm the claims of the example code, that the Gaussian process with kernel \exp\{-1/2(s-t)^2\} and Brownian motion are mean-square continuous processes, and show that the white noise process is not mean-square continuous. Do this directly using the definition of mean-square continuity, but also use the equivalent characterization that the mean and covariance functions must be continuous.

References

Hsing, Tailen, and Randall Eubank. 2015. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley. https://www.oreilly.com/library/view/theoretical-foundations-of/9780470016916/.