# Chapter 4 Linear Operators

In this section we will be looking at linear operators, which are a special kind of linear map between normed vector spaces. Our motivation for investigating these relates to extending the notion of covariance in the context of random functions, which will materialize as a particular kind of linear operator. We wil further use results from this section to show that this operator has properties which can be realized as natural extensions of the usual properties of positive-definiteness and possessing an eigendecomposition.

## 4.1 Operators

We begin with a definition of linearity for functions between vector spaces. The text calls these linear transformations, but in other references you may see them called linear maps as well. We will use both names interchangeably.

Definition 4.1 (Linear Transformation) Let $$\mathbb{V}_{1}$$, $$\mathbb{V}_{2}$$ be vector spaces. A transformation (or map) $$T$$ from $$\mathbb{V}_{1}$$ into $$\mathbb{V}_{2}$$ is said to be linear if it has the following properties:

• Additivity: For all $$v_1, v_2 \in \mathbb{V}$$, $$T(v_{1} + v_{2}) = Tv_{1} + Tv_{2}$$, and
• Homogeneity: for all $$a \in \mathbb{R}$$ and $$v_1 \in \mathbb{V}$$, $$T(av_1) = aTv_1$$.

Notice in Definition 4.1 that the notation for the application of $$T$$ to a particular vector does not include parentheses. This is standard notation for linear transformations, which likely came about due to the fact that the most ubiqitous linear transformations are those between finite-dimensional vector spaces, which can be represented using matrices.

Additionally, although linear maps are most generally defined on vector spaces, as in Definition 4.1, we will be specifically interested in their application to normed vector spaces, as mentioned in the introduction. Before we get into the results, we need to establish notation for spaces associated with linear maps which will be important moving forward.

Definition 4.2 (Transformation Spaces) Suppose $$T:\mathbb{V}_1 \rightarrow \mathbb{V}_2$$ is a linear transformation. We then define,

• Domain: Denoted by $$\text{Dom}(T)$$, this is the subset of $$\mathbb{V_1}$$ on which $$T$$ is defined.
• Image: Denoted by $$\text{Im}(T)$$, the image is the set of values in $$\mathbb{V}_2$$ which are mapped to by $$T$$. Formally, $$\{v_2 \in \mathbb{V}_2 \mid v_2 = Tv_1, \exists v_1 \in \text{Dom}(T) \}$$.
• Kernel Denoted by $$\text{Ker}(T)$$, the kernel of a linear map is the set of all elements of $$\mathbb{V}_1$$ that $$T$$ maps to the zero element of $$\mathbb{V}_2$$. It is sometimes also called the null space.
• Rank: Denoted by $$\text{rank}(T)$$, the rank of a linear map is the dimension of its image.

The first property of linear transformations that we are interested in is boundedness.

Definition 4.3 (Bounded) Suppose $$\mathbb{X}_1$$ and $$\mathbb{X}_2$$ are normed vector spaces. A linear transformation $$T: \mathbb{X}_1 \rightarrow \mathbb{X}_2$$ is said to be bounded if there exists a finite constant $$C >0$$ such that, \begin{align*} \lVert Tx \rVert_2 \leq C\lVert x \rVert_1, \end{align*} where $$\lVert \cdot \rVert_i$$ is the norm of $$\mathbb{X}_i$$, $$i=1,2$$, and $$x \in \mathbb{X}_1$$ an arbitrary element.

One way to think about boundedness is that it means the linear transformation takes bounded sets in its domain to bounded sets in its image. Here, a bounded set is one which can be completely contained within an open ball centered at some point in the space. Another equivalent interpretation is that bounded linear transformations are restricted in how much they can stretch elements under their mapping, and this restriction is uniform over its entire domain (because $$C$$ doesn’t depend on $$x$$).

For two normed vector spaces $$\mathbb{X}_1$$ and $$\mathbb{X}_2$$, we denote the space of all bounded linear transformations from $$\mathbb{X}_1$$ to $$\mathbb{X}_2$$ as $$\mathfrak{B}(\mathbb{X}_1, \mathbb{X}_2)$$. We also give elements of this space the special name linear operator. When we have $$\mathbb{X}_1 = \mathbb{X}_2 = \mathbb{X}$$ we simplify the notation for the space of associated operators to $$\mathfrak{B}(\mathbb{X})$$. The next theorem illustrate one of the usefulness consequences of boundedness.

Theorem 4.1 (Bounded iff Uniformly Continuous) A linear transformation $$T$$ between two normed vector spaces $$(\mathbb{X}_1, \lVert \cdot \rVert_1)$$ and $$(\mathbb{X}_2, \lVert \cdot \rVert_2)$$ is uniformly continuous if and only if it is bounded.

Proof. $$(\text{Uniform Continuity } \implies \text{ Boundedness})$$ By the definition of uniform continuity we know that, for any $$\epsilon >0$$ we can find an associated $$\delta >0$$ such that when $$\lVert x-y \rVert_1 \leq \delta$$ we also have $$\lVert Tx - Ty \rVert_2 \leq \epsilon$$. By selecting $$y=0$$ and $$\epsilon=1$$ we see that there exists a $$\delta_1$$ such that whenever $$\lVert x \rVert_1 \leq \delta_1$$ we have $$\lVert Tx \rVert \leq 1$$. It follows that for any $$x \neq 0$$ we can write, \begin{align*} \lVert Tx \rVert_2 &= \left\lVert \frac{\delta_1 \lVert x \rVert_1}{\delta_1 \lVert x \rVert_1} Tx \right\rVert_2 \\ &= \frac{\lVert x \rVert_1}{\delta_1} \left\lVert T \left(\frac{\delta_1 x}{\lVert x \rVert_1}\right) \right\rVert_2 \\ &\leq \delta_1^{-1} \lVert x \rVert_1 \end{align*} where the last line holds because $$\lVert \delta_1 x/ \lVert x \rVert_1 \rVert_1 = \delta_1$$.

$$(\text{Boundedness } \implies \text{ Uniform Continuity})$$ Let $$x_n$$ be a sequence converging to $$x$$ in $$\mathbb{X}_1$$. Then by the definition of boundedness we have, $$\lVert T(x - x_n) \rVert \leq C \lVert x - x_n \rVert$$. Hence, $$Tx_n \rightarrow Tx$$ and $$T$$ is a continuous map. Further, for any $$\epsilon >0$$ we have $$\lVert T(x - x_n)\rVert <\epsilon$$ whenever $$C\lVert x - x_n \rVert < \epsilon$$. Therefore, by setting $$\delta = \epsilon/C$$ we have that $$\lVert x - x_n \rVert < \delta$$ implies $$\lVert T(x - x_n) \rVert < \epsilon$$. Since $$\epsilon$$ was arbitrary, $$T$$ is uniformly continuous.

As a result of Theorem 4.1 the space $$\mathfrak{B}(\mathbb{X}_1, \mathbb{X}_2)$$ is also the space of uniformly continuous linear transformations from $$\mathbb{X}_1$$ to $$\mathbb{X}_2$$. By the properties of linearity, $$\mathfrak{B}(\mathbb{X}_1, \mathbb{X}_2)$$ is a vector space and if we define, \begin{align*} \lVert T \rVert = \sup_{x \in \mathbb{X}_1, \lVert x \rVert_1=1}\lVert Tx \rVert_2, \end{align*} then it is a normed vector space.

Why this particular norm? It turns out that if we define the operator norm this way, we have \begin{align*} \lVert Tx \rVert_2 &= \left\lVert T\frac{x}{\lVert x \rVert_1} \right\rVert_2 \lVert x \rVert_1 \leq \lVert T \rVert \lVert x \rVert_1, \end{align*} so that the norm tells us something about how $$T$$ is bounded. Specifically, this norm quantifies the largest proportional increase in the norm of a unit vector under $$T$$. Additionally, $$\mathfrak{B}(\mathbb{X}_1,\mathbb{X}_2)$$ is a Banach space under this norm when $$\mathbb{X}_2$$ is a complete normed vector space (c.f. Theorem 3.1.3 in text for details).

Example 4.1 (Integral Operators on L2) Consider $$\mathbb{L}^2[0,1]$$ and a function $$K$$ on $$[0,1] \times [0,1]$$ that is square-integrable. We then define a linear transformation $$T$$ of the form, \begin{align*} (Tf)(\cdot) = \int_{[0,1]} K(\cdot, u)f(u) \,du. \end{align*} In general, $$T$$ is called an integral transform and $$K$$ is known as the kernel of $$T$$. We now show that this particular $$T$$ is an operator. First, using the Cauchy-Schwarz inequality we have, \begin{align*} \lvert(Tf)(t) \rvert^2 &= \left\lvert \int_{[0,1]} K(t, u)f(u) \,du \right\rvert^2 \\ &= \lvert \langle K(t, \cdot) , f \rangle \rvert^2 \\ &\leq \lVert K(t,\cdot) \rVert^2 \lVert f \rVert^2, \end{align*} where the norm is that of $$\mathbb{L}^2[0,1]$$. It follows that, \begin{align*} \int _{[0,1]} \lvert(Tf)(t) \rvert^2 \,dt &\leq \lVert f \rVert^2 \int_{[0,1]} \int_{[0,1]} K(t,u)^2 \,du \,dt < \infty. \end{align*} Hence $$Tf$$ is in $$\mathbb{L}^2[0,1]$$ for all $$f$$ and $$T$$ is an element of $$\mathfrak{B}\big(\mathbb{L}^2[0,1] \big)$$.

Next we introduce a very useful theorem which clarifies how linear operators between Banach spaces and the Bochner integral interact.

Theorem 4.2 (Bounded Operators and Bochner Integral Commute) If $$\mathbb{X}_1$$ and $$\mathbb{X}_2$$ are Banach spaces, $$f$$ is a Bochner integrable function from a measure space $$E$$ to $$\mathbb{X}_1$$ and $$T \in \mathfrak{B}(\mathbb{X}_1, \mathbb{X}_2)$$, then the function $$Tf$$ in $$\mathbb{X}_2$$ is Bochner integrable and its integral is, \begin{align*} \int_E Tf \, d\mu = T\left( \int_E f \, d\mu\right). \end{align*}

Proof. Let $$\{f_n\}$$ be a sequence of simple functions such that $$\int_E \lVert f_n -f \rVert d\mu \rightarrow 0$$ as $$n \rightarrow \infty$$. It follows that $$Tf_n$$ is also simple for each $$n$$ and $$\int_E Tf_n \,d\mu = T \int_E f_n \,d\mu$$. By the continuity of $$T$$, we get that $$T \int_E f_n \,d\mu \rightarrow T \int_E f \,d\mu$$ and hence $$\int_E Tf_n \,d\mu \rightarrow T\int_E f \,d\mu$$ as well.

Additionally, using the boundedness of $$T$$ we have, \begin{align*} \int_E \lVert Tf - Tf_n \rVert \, d\mu \leq \lVert T \rVert \int_E \lVert f - f_n \rVert \, d\mu \rightarrow 0, \end{align*} by the assumptions on $$f_n$$. Hence, $$Tf$$ is Bochner integrable and $$\int_E Tf_n \, d\mu \rightarrow \int_E Tf \, d\mu$$, which finishes the proof.

## 4.2 Linear Functionals

For bounded linear operators between normed vector spaces, a special case worth investigating is when the image space corresponds to the real numbers, $$\mathbb{X}_2 = \mathbb{R}$$. In this case, $$\mathfrak{B}(\mathbb{X}, \mathbb{R})$$ is called the dual space of $$\mathbb{X}$$, and the elements of dual space are referred to as linear functionals.

Our first result regarding linear functionals is the well known Riesz Representation Theorem, which specifies a special form for the elements of the dual space when $$\mathbb{X} = \mathbb{H}$$ is a Hilbert space.

Theorem 4.3 (Riesz Representation Theorem) Let $$\mathbb{H}$$ be a Hilbert space with inner-product $$\langle \cdot,\cdot \rangle$$ and norm $$\lVert \cdot \rVert$$. For $$T \in \mathfrak{B}(\mathbb{H}, \mathbb{R})$$ there is a unique element $$e_T \in \mathbb{H}$$, called the representer of $$T$$, with the property that, \begin{align*} Tx = \langle x, e_T \rangle,\quad \text{and} \quad \lVert T \rVert = \lVert e_T \rVert, \end{align*} for all $$x \in \mathbb{H}$$.

Besides the useful representation, which is the namesake of this theorem, we also get as a corollary that the dual space of $$\mathbb{H}$$ is isomorphic to $$\mathbb{H}$$, and as note, it is isometrically isomorphic. This means that if $$S$$ is the isomorphism that takes an element $$x$$ of $$\mathbb{H}$$ to its dual element $$T_x$$, then $$S$$ is also an isometry so that $$\lVert Sx \rVert = \lVert x \rVert$$. In this case, we can confirm this by writing $$\lVert Sx \rVert = \lVert T_x \rVert = \lVert x \rVert$$, where the first equality holds by definition of $$S$$ and the second equality holds by the theorem. Hence, both the algebraic and geometric structure of $$\mathbb{H}$$ is preserved under its dual mapping.

Example 4.2 Referring back to Theorem 4.2, set $$\mathbb{X}_1$$ to be a Hilbert space with inner product $$\langle \cdot, \cdot \rangle$$ and set $$\mathbb{X}_2$$ to be $$\mathbb{R}$$. Then the linear map $$T$$ in the theorem is an element of $$\mathfrak{B}(\mathbb{X}_1, \mathbb{R})$$ and hence a linear functional. Letting $$e_T$$ be its representer, we have \begin{align*} \int_E \langle f,e_T \rangle \, d\mu = \int_E Tf \, d\mu = T\left( \int_E f \, d\mu\right) = \left\langle \int_E f \, d\mu, e_T \right\rangle. \end{align*} Since $$T$$ is arbitrary, this holds for any element $$e_T$$ of $$\mathbb{X}_1$$.

The dual space $$\mathfrak{B}(\mathbb{X}, \mathbb{R})$$ induces a topology which is different from the topology defined by the norm on $$\mathbb{X}$$, which we make formal in the next definition.

Definition 4.4 (Weak Convergence) A sequence $$\{x_n\}$$ in a Banach space $$\mathbb{X}$$ converges weakly to $$x$$ if $$\ell(x_n) \rightarrow \ell(x)$$ for every $$\ell$$ in $$\mathfrak{B}(\mathbb{X}, \mathbb{R})$$.

This is different from the topology defined by the norm in the sense that convergence in the norm, also called strong convergence, implies weak convergence, but the converse does not generally hold. For Hilbert spaces, weak convergence can be characterized as $$\langle x_n, y \rangle \rightarrow \langle x,y \rangle$$ for every $$y$$. Note that this does not follow from our earlier proof of the continuity of the inner product since we have not assumed $$x_n \rightarrow x$$ here. However, another interpretation of the weak topology is that it is the coarsest topology for which linear functionals are continuous functions.

## 4.3 Special Operators on Hilbert Spaces

In this section we will restrict our attention to operators acting between Hilbert spaces. This continues our development of operator properties that we will eventually need for describing the covariance of a function-valued random variable. Many of these concepts will have natural analogs to commonly used concepts in multivariate analysis, such as nonnegative definiteness, square-root, projection and adjoint. We begin with the latter of these concepts.

Theorem 4.4 (Adjoint Operator) For Hilbert spaces, $$(\mathbb{H}_1, \langle\cdot ,\cdot \rangle_1)$$ and $$(\mathbb{H}_2, \langle \cdot , \cdot \rangle_2)$$, every element $$T$$ of $$\mathfrak{B}(\mathbb{H}_1, \mathbb{H}_2)$$ has a unique corresponding element of $$\mathfrak{B}(\mathbb{H}_2, \mathbb{H}_1)$$ called the adjoint of $$T$$ and denoted by $$T^\ast$$ which is determined by, \begin{align*} \langle Tx_1,x_2 \rangle_2 = \langle x_1, T^\ast x_2 \rangle_1, \end{align*} for all $$x_1 \in \mathbb{H}_1$$ and $$x_2 \in \mathbb{H}_2$$.

Proof. Consider the function $$f: \mathbb{X}_1 \rightarrow \mathbb{R}$$ defined by $$f(x_1) = \langle Tx_1,x_2 \rangle_2$$. By the properties of the $$T$$ and the inner product, $$f$$ is both linear and bounded in the operator sense. Hence, the Riesz Representation Theorem guarantees the existence of a unique $$y$$ in $$\mathbb{H}_1$$ such that $$f(x_1) = \langle x_1, y\rangle_1$$. We then define $$T^\ast x_2=y$$ which can be shown to be a linear map using the linearity of the inner product. To show $$T^\ast$$ belongs to $$\mathfrak{B}(\mathbb{H_2}, \mathbb{H}_1)$$ we write, \begin{align*} \lVert T^\ast x_2 \rVert^2_1 &= \lvert \langle T^\ast x_2,T^\ast x_2 \rangle_1 \rvert \\ &= \lvert \langle T T^\ast x_2, x_2 \rangle_2 \rvert \\ &\overset{\text{C-S}}{\leq} \lVert TT^\ast x_2 \rVert_2 \lVert x_2 \rVert_2 \\ &\leq \lVert T \rVert \lVert T^\ast x_2 \rVert_1 \lVert x_2 \rVert_2. \end{align*} We then get $$\lVert T^\ast x_2 \rVert_1 \leq \lVert T \rVert \lVert x_2 \rVert_2$$ so that $$T^\ast$$ is bounded.

When $$T \in \mathfrak{B}(\mathbb{H})$$ and $$T^\ast = T$$, we call $$T$$ self-adjoint. From the proof of Theorem 4.4 we also get that $$(T^\ast)^\ast = T$$. We collect some of the most useful properties of the adjoint and present them collectively in the next theorem.

Theorem 4.5 (Adjoint Properties) Let $$T$$ be a bounded linear operator between Hilbert spaces $$\mathbb{H}_1$$ and $$\mathbb{H}_2$$. Then,

1. $$(T^\ast)^\ast = T$$,
2. $$\lVert T^\ast \rVert = \lVert T \rVert$$,
3. $$\lVert T^\ast T \rVert = \lVert T \rVert^2$$,
4. $$\text{rank}(T^\ast) = \text{rank}(T)$$,
5. $$\text{Ker}(T^\ast T) = \text{Ker}(T)$$ and $$\overline{\text{Im}(T^\ast T)} = \overline{\text{Im}(T^\ast)}$$, and
6. $$\mathbb{H}_1 = \text{Ker}(T) \oplus \overline{\text{Im}(T^\ast)} = \text{Ker}(T^\ast T) \oplus \overline{\text{Im}(T^\ast T)}$$

We omit the proof and take these as given. Interested readers may see the proof of Theorem 3.3.7 on page 73 in for additional details.

Example 4.3 (Adjoint of Integral Operator) We build off our discussion of integral operators in Example 4.1. If $$g$$ is another element of $$\mathbb{L}^2[0,1]$$ then, \begin{align*} \langle Tf,g \rangle &= \int_{[0,1]} \left( \int_{[0,1]}K(t,s)f(s)\,ds \right) g(t)dt \\ &= \int_{[0,1]} \int_{[0,1]} K(t,s)f(s)g(t)\, dsdt \\ &\overset{\text{Fubini}}{=} \int_{[0,1]} f(s) \int_{[0,1]}K(t,s)g(t)\,dtds \\ &= \left\langle f, \int_{[0,1]}K(t,\cdot)g(t)\,dt \right\rangle. \end{align*} We therefore have $$(T^\ast g)(t) = \int_{[0,1]} K(s,t)g(s)\,ds$$. The follows that $$T$$ is self-adjoint if $$K$$ is symmetric.

We now introduce the concepts of nonegative definiteness and positive definiteness as they arise in the context of linear operators.

Definition 4.5 (Nonnegative and Postive Definite Operators) Let $$\mathbb{H}$$ be a Hilbert space and consider $$T \in \mathfrak{B}(\mathbb{H})$$. Then $$T$$ is said to be nonnegative definite if it is self-adjoint and $$\langle Tx,x \rangle \geq 0$$ for all $$x \in \mathbb{H}$$. If instead we have $$\langle Tx,x \rangle > 0$$ for all $$x \in \mathbb{H}$$ then $$T$$ is said to be postive definite.

Notice that this definition requires $$T$$ to be self-adjoint. When the space of interest is a complex Hilbert space, it turns out that this criterion is redundant—the condition $$\langle Tx,x \rangle \geq 0$$ implies self-adjointness in this context. However, when specificially considering real Hilbert spaces, this implication ceases to hold. One need look no further than finite-dimensional Hilbert spaces (e.g. $$\text{dim}(\mathbb{H}) = 2$$) to find counter examples. So, since we are working with real Hilbert spaces, why has this condition been included? My best guess is that many of the properities we are interested in with relation to nonnegative operators also required self-adjointness. The authors therefore added this extra condition to make statements regarding the assumption that both properties hold simpler.

An important example of a nonnegative operator pointed out in is $$T^\ast T$$. Indeed, this operator is self-adjoint and $$\langle T^\ast Tx,x \rangle = \lVert Tx \rVert^2$$.

Analogous to the finite-dimensional case, we can relate a special operator≤ called the square-root operator, to each nonnegative definite operator.

Theorem 4.6 (Square-Root Operator) Let $$\mathbb{H}$$ be a Hilbert space and suppose $$T \in \mathfrak{B}(\mathbb{H})$$ is nonnegative definite. Then there is a unique nonnegative definite operator in $$\mathfrak{B}(\mathbb{H})$$, denoted by $$T^{1/2}$$, such that $$T = T^{1/2}T^{1/2}$$ and that commutes with every operator that commutes with $$T$$.

One can find the same result for the finite-dimensional case in as definition 7.33 and result 7.36. Uniqueness here comes from the condition that the square-root operator also be a nonnegative operator.

Now, recall Corollary 3.2 which said that for a Hilbert space $$\mathbb{H}$$ and a closed subspace $$\mathbb{M}$$, for each element $$x \in \mathbb{H}$$ there is a unique element $$\hat{x} \in \mathbb{M}$$ that is closest to $$x$$ in the sense of minimizing the distance to $$x$$ among all elements of $$\mathbb{M}$$. Consider the function $$P_{\mathbb{M}}$$ defined as the function which takes $$x$$ to its closest neighbour in $$\mathbb{M}$$, $$\hat{x} = P_{\mathbb{M}}x$$. We now show that $$P_{\mathbb{M}}$$ is a linear operator, and derive a few of its attributes.

Theorem 4.7 (Projection Operator) Let $$\mathbb{M}$$ be a closed subspace of a Hilbert space $$\mathbb{H}$$. Then the projection $$P_{\mathbb{M}}$$ associated with $$\mathbb{M}$$ is such that,

1. $$P_{\mathbb{M}}$$ is an element of $$\mathfrak{B}(\mathbb{H})$$,
2. $$\langle P_{\mathbb{M}} x,y \rangle = \langle x,P_{\mathbb{M}}y \rangle$$, that is, $$P_{\mathbb{M}}$$ is self-adjoint,
3. $$P_{\mathbb{M}}P_{\mathbb{M}} = P_{\mathbb{M}}$$, that is, $$P_{\mathbb{M}}$$ is idempotent, and
4. $$\lVert P_{\mathbb{M}} \rVert =1$$.

Proof. To see that $$P_{\mathbb{M}}$$ is linear, for $$a_1, a_2$$ in $$\mathbb{R}$$, $$x_1, x_2$$ in $$\mathbb{H}$$ and $$y$$ in $$\mathbb{M}$$ write, \begin{align*} \langle a_1 P_{\mathbb{M}}x_1 + a_2 P_{\mathbb{M}}x_2,y \rangle &= a_1\langle P_{\mathbb{M}}x_1 ,y \rangle + a_2\langle P_{\mathbb{M}}x_2 ,y \rangle \\ &= a_1\langle x_1 ,y \rangle + a_2\langle x_2 ,y \rangle \\ &= \langle a_1x_1 + a_2x_2,y \rangle. \end{align*} Where we have used the fact that $$\langle \hat{x}, y \rangle = \langle x,y \rangle$$ from Corollary 3.2. Alternatively, $$\langle P_{\mathbb{M}}(a_1x_1 + a_2x_2), y \rangle = \langle a_1x_1 + a_2x_2, y \rangle$$ by the same result, hence $$P_{\mathbb{M}}$$ is linear.

To show that it is self-adjoint, \begin{align*} \langle P_{\mathbb{M}}x_1, x_2 \rangle &= \langle \hat{x}_1, x_2 \rangle \\ &= \langle \hat{x}_1, \hat{x}_2 \rangle \\ &= \langle x_1, \hat{x}_2 \rangle \\ &= \langle x_1, P_{\mathbb{M}}x_2 \rangle. \end{align*}

Now, since $$P_{\mathbb{M}}x$$ is in $$\mathbb{M}$$ for all $$x \in \mathbb{H}$$, the minimization attribute of the projection means that $$P_{\mathbb{M}}\hat{x} = \hat{x}$$ because this is the element in $$\mathbb{M}$$ which is closest to $$\hat{x}$$, and hence $$P_{\mathbb{M}}P_{\mathbb{M}}x = P_{\mathbb{M}}x$$ and $$P_{\mathbb{M}}$$ is idempotent.

To show boundedness, \begin{align*} \lVert P_{\mathbb{M}}x \rVert^2 &= \langle P_{\mathbb{M}}x , P_{\mathbb{M}}x \rangle \\ &= \langle x , P_{\mathbb{M}}P_{\mathbb{M}}x \rangle \\ &= \langle x , P_{\mathbb{M}}x \rangle \\ &\leq \lVert P_{\mathbb{M}}x \rVert \lVert x \rVert, \end{align*} so that $$\lVert P_{\mathbb{M}}x \rVert \leq \lVert x \rVert$$.

Finally, we have that $$\lVert P_{\mathbb{M}}x \rVert = \lVert P_{\mathbb{M}}P_{\mathbb{M}}x \rVert \leq \lVert P_{\mathbb{M}} \rVert \lVert P_{\mathbb{M}}x \rVert$$ so that we have $$\lVert P_{\mathbb{M}} \rVert \lVert \geq 1$$. However, we have also already shown that $$\lVert P_{\mathbb{M}}x \rVert \leq \lVert x \rVert$$ and hence $$\lVert P_{\mathbb{M}} \rVert \leq 1$$, from which we get $$\lVert P_{\mathbb{M}} \rVert = 1$$.

The operator $$P_{\mathbb{M}}$$ is called the projection operator for the subspace $$\mathbb{M}$$. This operator is also nonnegative since $$\langle P_{\mathbb{M}}x, x \rangle = \langle P_{\mathbb{M}}x,P_{\mathbb{M}}x \rangle \geq 0$$.

Definition 4.6 (Tensor Product) Let $$x_1, x_2$$ be elements of Hilbert spaces $$\mathbb{H}_1$$ and $$\mathbb{H}_2$$ respectively. The tensor product operator $$(x_1 \otimes x_2): \mathbb{H}_1 \mapsto_1 \mathbb{H}_2$$ is defined by, \begin{align*} (x_1 \otimes x_2)y = \langle x_1, y \rangle_1 x_2, \end{align*} for $$y$$ in $$\mathbb{H}_1$$. When we have $$\mathbb{H}_1 = \mathbb{H}_2$$, we drop the subscript and simply write $$\otimes$$.

We can use the tensor product to give an explicit formula for a projection operator when the subspace $$\mathbb{M}$$ is a line. Suppose that $$\mathbb{M}$$ is a subspace spanned by a single vector $$e \in \mathbb{H}$$, and without loss of generality, let $$\lVert e \rVert =1$$. Then, we can write the projection operator $$P_{\mathbb{M}}$$ as $$e \otimes e$$ so that for any $$x \in \mathbb{H}$$ we have $$\hat{x} = P_{\mathbb{M}}x = (e \otimes e)x = \langle x,e \rangle e$$.

Finally, like all functions, when linear operators are one-to-one and onto (i.e. bijective) they admit an inverse. We state this formally as the next result.

Theorem 4.8 (Inverse Operator) Let $$\mathbb{X}_1$$ and $$\mathbb{X}_2$$ be Banach spaces and $$T$$ a bounded linear operator between them. If $$T$$ is bijective, then the inverse of $$T$$, denoted by $$T^{-1}$$, is an element of $$\mathfrak{B}(\mathbb{X}_2, \mathbb{X}_1)$$ such that $$T T^{-1}$$ is the identity operator on $$\mathbb{X}_1$$ and $$T^{-1}T$$ is the identity operator on $$\mathbb{X}_2$$.

The important takeaway here is that the inverse exists, and it is a bounded operator. Curious readers can find more details in Chapter 3.5 of .

## 4.4 Compact Operators

We next turn our attention to a category of operators know as compact operators. It is within this particular class of operators that we will be able to derive a form of eigendecomposition. This property will be the key that allows us to unlock functional principal component analysis as an extension of multivariate principal component analysis.

Definition 4.7 (Compact Operator) Let $$\mathbb{X}_1$$ and $$\mathbb{X}_2$$ be normed vector spaces and $$T$$ a linear transformation from $$\mathbb{X}_1$$ to $$\mathbb{X}_2$$. Then $$T$$ is said to be compact if for any bounded sequence $$\{x_n\}$$ the associated sequence $$\{Tx_n\}$$ contains a convergent subsequence in $$\mathbb{X}_2$$.

Notice that this definition starts from an arbitrary linear transformation rather than an operator. This is because boundedness follows from the property of compactness. Specifically, if $$T$$ is compact but not bounded, then there is a bounded sequence in $$\mathbb{X}_1$$ such that $$\lVert Tx_n \rVert_2 \geq n$$ for each $$n$$, hence, $$\{Tx_n\}$$ does not contain a convergent subsequence, which is a contradiction. Thus, we are justified in calling these transformations compact operators.

Not all operators are compact, however. One need look no further than the identity operator. For example, consider a Hilbert space $$\mathbb{H}$$ with basis $$\{e_j\}$$ and associated identity operator $$I$$. Then, the basis itself is a bounded sequence however, $$\lVert Ie_i - Ie_j \rVert = \lVert e_i - e_j \rVert = \sqrt{2}$$ for $$i \neq j$$. It follows that $$\{Ie_j\}$$ cannot contain a convergent subsequence and therefore the identity operator is not compact. This result is extended in to identity operators on infinite-dimensional normed spaces (theorem 4.1.2, page 92).

To help us better understand compact operators, we now formally state some of their basic properties which hold quite generally.

Theorem 4.9 (Properties of Compact Operators) Let $$T$$ be a compact operator between two normed linear spaces. Then the following properties hold:

1. The closure of the image of any compact operator is separable.
2. Operators with finite rank are compact.
3. The composition of two operators is compact if either operator is compact.
4. The set of compact operators that map to any Banach space is closed.

Using these results, we can extend our investigation of the non-compactness of the identity operator to a more general class of operators.

Theorem 4.10 (Bijective Operators are not Compact in Infinite Dimensions) Let $$\mathbb{X}_1$$ and $$\mathbb{X}_2$$ be infinite-dimensional Banach spaces and $$T \in \mathfrak{B}(\mathbb{X}_1, \mathbb{X}_2)$$ bijective. Then $$T$$ is not compact.

Proof. By Theorem 4.8, $$T^{-1}$$ is an element of $$\mathfrak{B}(\mathbb{X}_2, \mathbb{X}_1)$$. Suppose $$T$$ is a compact operator. Then, by part 3 of Theorem, $$T^{-1}T = I$$ is compact. This is a contradiction.

Normally, this is where an intuitive explanation would be given which helps us understand more clearly, why it is that bijective operators fail to be compact, as the proof itself is not entirely enlightening. It turns out that we can give a pretty concrete condition on why this is the case, but we need a few more results in order to develop it.

Theorem 4.11 (Chracterization of Compact Operators on Hilbert Spaces) Let $$\mathbb{H}_1$$ and $$\mathbb{H}_2$$ be Hilbert spaces and $$T: \mathbb{H}_1 \rightarrow \mathbb{H}_2$$ a bounded operator. Then,

1. $$T$$ is compact if there exists a sequence $$\{ T_n\}$$ of finite-dimensional operators such that $$\lVert T_n - T \rVert\rightarrow 0$$ as $$n\rightarrow \infty$$ and,
2. $$T$$ is compact if $$T^\ast$$ is compact.

The authors of the textbook finish this section by noting that part 2 of Theorem 4.11 and part 2 of Theorem 4.9 imply that $$\overline{\text{Im}(T^\ast)}$$ is separable. They then make the claim that since $$\mathbb{H}_1 = \text{Ker}(T) \oplus \overline{\text{Im}(T^\ast)}$$, we can assume without loss of generality that both $$\mathbb{H}_1$$ and $$\mathbb{H}_2$$ are separable when considering compact operators. On the surface, this is a bit of a mysterious conclusion, since we cannot say anything here about the space $$\text{Ker}(T)$$. However, upon further reflection, I believe the authors are suggesting this space in inconsequential. Indeed, since $$\text{Dom}(T)\backslash \text{Ker}(T) = \overline{\text{Im}(T^\ast)}$$, and therefore by symmetry of the arguments used to derive this, $$\text{Dom}(T^\ast)\backslash \text{Ker}(T^\ast) = \overline{\text{Im}(T)}$$, we find that the spaces on which both $$T$$ and $$T^\ast$$ are acting “non-trivially” are separable. That is, the elements being mapped to non-zero elements by both operators, and the closure of the image of both operators are separable spaces. The authors are suggesting that because of this, assuming $$\mathbb{H}_1$$ and $$\mathbb{H}_2$$ are both separable does not come with a loss of generality, because the exact nature of the kernels is will not be as important as the nature of the images.

Accordingly, in the remaining sections the Hilbert space $$\mathbb{H}$$ we are working with will be assumed to be separable, unless otherwise specified. The purpose of this will be to utilize Theorem 3.5, which gives the existence of a basis in this case.

## 4.5 Eigenvalues of Compact Operators

The main behind of our diving into the study of compact operators was to obtain the results that we will now describe in this section. In short, we find that compact operators acting on Hilbert spaces admit an eigendecompostion, or more generally, a singular value decomposition. From this result, we will be able to build up a notion of principal component analysis for random functions.

We begin with an appropriate definition of eigen-pair in this context.

Definition 4.8 (Eigen-pair) Let $$T \in \mathfrak{B}(\mathbb{H})$$ and suppose that there exists a $$\lambda \in \mathbb{R}$$ and a nonzero element $$e \in \mathbb{H}$$ such that, \begin{align*} T e = \lambda e. \end{align*} Then $$\lambda$$ is called an eigenvalue of $$T$$, and $$e$$ is an associated eigenvector (or eigenfunction when $$\mathbb{H}$$ is a function space) of $$T$$. We call $$(\lambda, e)$$ and eigen-pair of $$T$$.

We can succinctly identify the space of eigenvectors associated with a particular eigenvalue as $$\text{Ker}(T - \lambda I) = \{f \mid Tf = \lambda f\}$$. The eigenvectors corresponding to each eigenspace are necessarily linearly independent, and if the operator $$T$$ is self-adjoint, they are mutually orthogonal. This is Theorem 4.2.2 in .

When strictly considering compact operators, we can make a number of simplifying statements. Specifically, when $$T$$ is compact, $$\text{Ker}(T - \lambda I)$$ is finite-dimensional for any scalar $$\lambda\neq 0$$ and the set of nonzero eigenvalues of $$T$$ is countable. This is Theorem 4.2.3 in . We refine these properties a bit in the next theorem.

Theorem 4.12 (Eigen-Decomposition of Self-Adjoint Compact Operator) Let $$T$$ be a compact and self-adjoint operator on a Hilbert space $$\mathbb{H}$$. The set of nonzero eigenvalues of $$T$$ is either finite or consists of a sequence which tends to zero. Each nonzero eigenvalue has finite multiplicity and eigenvectors corresponding to different eigenvalues are orthogonal. Let $$\{\lambda_j\}$$ be the eigenvalues ordered by decreasing magnitude and $$\{e_j\}$$ the associated orthonormal eigenvectors. Then, $$\{ e_j\}$$ is a basis for $$\overline{\text{Im}(T)}$$ and, \begin{align*} T = \sum_{j=1}^{\infty} \lambda_j e_j \otimes e_j; \end{align*} i.e., for every $$x \in \mathbb{H}$$, \begin{align*} Tx = \sum_{j=1}^\infty \lambda_j \langle x, e_j \rangle e_j. \end{align*}

This result pertains to existence, but it does not directly inform us about how to actually find these eigen-pairs. The next results give the eigenvalues as the solutions of variational problems associated with differing assumptions on the operator $$T$$.

Theorem 4.13 (Eigenvalues of Nonnegative definite Compact Operator) Let $$T$$ be a compact and nonnegative definite operator with associated eigensequence $$\{ (\lambda_j, e_j)\}$$. Then, \begin{align*} \lambda_k = \max_{e \in \text{span}\{ e_1, \ldots, e_{k-1}\}^\perp } \frac{\langle Te, e \rangle}{\lVert e \rVert^2}, \end{align*} for all $$k$$, where $$\text{span}\{ e_1, \ldots, e_{k-1}\}^\perp$$ is the entirety of $$\mathbb{H}$$ when $$k=1$$.

Theorem 4.14 (Eigenvalues of Compact Self-Adjoint Operator) Let $$T$$ be a compact, self-adjoint operator with associated ordered eigensequence $$\{ (\lambda_j, e_j)\}$$. Then, \begin{align*} \lvert\lambda_k\rvert = \max_{e \in \text{span}\{ e_1, \ldots, e_{k-1}\}^\perp } \frac{\lVert Te\rVert}{\lVert e \rVert}. \end{align*}

Based on this this result, setting $$k=1$$ we find, \begin{align*} \lvert\lambda_1\rvert = \max_{e \in \mathbb{H}; \lVert e \rVert=1 } \lVert Te \rVert, \end{align*} which is the norm of $$T$$.

We can actually generalize the eigendecomposition to the case of compact operators between two Hilbert spaces. We refer to this as singular value expansion, or singular value decomposition. The derivation depends on the fact that, if $$T \in \mathfrak{B}(\mathbb{H}_1, \mathbb{H}_2)$$ is compact, then $$T^\ast T$$ is both compact and self-adjoint. It follows from Theorem 4.12 that $$T^\ast T$$ admits an eigendecomposition. Similarly for $$T T^\ast$$. The key observation is that, if $$f_{1j}$$ is an eigenfunction of $$T^\ast T$$ then $$Tf_{1j}$$ is an eigenfunction of $$T T^\ast$$.

Theorem 4.15 (Singular Value Decomposition (SVD)) Let $$\mathbb{H}_1$$ and $$\mathbb{H}_2$$ be Hilbert spaces and $$T \in \mathfrak{B}(\mathbb{H}_1, \mathbb{H}_2)$$ compact. Let $$\{(\lambda_j^2, f_{1j})\}_{j \in \mathbb{N}}$$ be the sequence of eigenpairs associated with $$T^\ast T$$ and $$\{(\lambda_j^2, f_{2j})\}_{j \in \mathbb{N}}$$ be the sequence of eigenpairs associated with $$T T^\ast$$ with $$f_{2j} = \lambda_j^{-1}Tf_{1j}$$. Then, \begin{align*} T = \sum_{j=1}^\infty \lambda_j (f_{1j} \otimes_1 f_{2j}), \end{align*} so that $$Tx = \sum_{j=1}^\infty \lambda_j \langle x, f_{1j} \rangle_1 f_{2j}$$ for any $$x \in \mathbb{H}_1$$.

Details regarding the derivation of this decomposition can be found in section 4.3 of . We mention it here so that we can additionally state the following characterization of compact operators.

Theorem 4.16 (Characterization of Compact Operators) An operator $$T \in \mathfrak{B}(\mathbb{H}_1, \mathbb{H}_2)$$ is compact if and only if the singular value decomposition of Theorem 4.15 holds.

## 4.6 Hilbert-Schmidt Operators

Definition 4.9 (Hilbert-Schmidt Operator) Let $$T \in \mathfrak{B}(\mathbb{H}_1, \mathbb{H}_2)$$ and $$\{e_i\}$$ a basis for $$\mathbb{H}_1$$. Then $$T$$ is called a Hilbert-Schmidt operator if $$\sum_{i=1}^\infty \lVert Te_i \rVert_2^2$$ converges.

The finiteness of the sum $$\sum_{i=1}^\infty \lVert Te_i \rVert_2^2$$ implicitly means that the norms $$\lVert Te_i \rVert_2$$ must decrease sufficiently fast such that their squared sum converges. With this in mind, we can interpret the action of a Hilbert-Schmidt operator as enacting some kind of compression on the space. This is because it is impossible that each basis vector maintain (or exceed) its unit norm under a HS operation. Instead, the norms must diminish in a way that their collective magnitude, in the sense of the sum of squares, remains bounded.

It is straightforward to show that Hilbert-Schmidt operators are a subclass of compact operators. This is done in the usual way: constructing a sequence of finite-dimensional operators that converge to an arbitrary Hilbert-Schmidt operator. Additionally, Hilbert-Schmidt operators are closed under addition. In fact, the space of all Hilbert-Schmidt operators, denoted by $$\mathfrak{B}_{HS}(\mathbb{H}_1, \mathbb{H}_2)$$, is a Hilbert space under the inner product, \begin{align*} \langle T_1, T_2 \rangle = \sum_{j=1}^\infty \langle T_1e_j, T_2 e_j \rangle_2, \end{align*} which has associated norm $$\lVert T \rVert^2 = \sum_{j=1}^\infty \lVert Te_j \rVert_2^2$$ which can also be written as, $$\lVert T \rVert^2 = \sum_{j=1}^\infty \lambda_j^2$$, the sum of squared singular values of $$T$$. We formalize this result in the next theorem.

Theorem 4.17 (Space of Hilbert-Schmidt Operators is Separable Hilbert Space) Let $$\mathfrak{B}_{\text{HS}}(\mathbb{H}_1, \mathbb{H}_2)$$ be the space of Hilbert-Schmidt operators between Hilbert spaces $$\mathbb{H}_1$$ and $$\mathbb{H}_2$$. Then $$\mathfrak{B}_{\text{HS}}(\mathbb{H}_1, \mathbb{H}_2)$$ is a separable Hilbert space when equipped with the HS inner product. Additionally, for any choice of basis $$\{e_{1i}\}$$ for $$\mathbb{H}_1$$ and $$\{e_{2j}\}$$ for $$\mathbb{H}_2$$, $$\{e_{1i} \otimes e_{2j}\}$$ is a basis for $$\mathfrak{B}_{HS}(\mathbb{H}_1, \mathbb{H}_2)$$.

One key property of Hilbert-Schmidt operators is that the best approxmation to a particular HS operator is the truncated singular value expansion. Specifically, for any natural number $$n \leq 1$$, the $$n$$ term singular value decomposition of a HS operator $$T$$ is the best approximation in terms of the HS norm among approximations of that form, \begin{align*} \left\lVert T - \sum_{j=1}^n x_j \otimes y_j \right\rVert_{HS} \geq \left\lVert T - \sum_{j=1}^n x_j \otimes y_j \right\rVert_{HS}. \end{align*}

Example 4.4 (Bochner Integral of Hilbert-Schmidt Mapping) Let $$(E, \mathscr{B}, \mu)$$ be a measure space and $$\mathscr{G}: E \rightarrow \mathfrak{B}_{HS}(\mathbb{H}_1, \mathbb{H}_2)$$ a measurable map from $$E$$ into the Hilbert-Schmidt operators mapping $$\mathbb{H}_1$$ to $$\mathbb{H}_2$$ separable Hilbert spaces.

Then, following from Theorem 3.8, $$\mathscr{G}$$ is Bochner integrable if $$\int_E \lVert \mathscr{G} \rVert_{HS} \,d\mu$$ is finite. In this example, our goal is to show that, \begin{align*} \int_E(\mathscr{G}f)\,d\mu = \left( \int_E \mathscr{G} \,d\mu \right) f, \end{align*} for all $$f \in \mathbb{H}_1$$. To do this, we define a new mapping $$\mathscr{H}_f$$ which takes an element $$T \in \mathfrak{B}_{HS}(\mathbb{H}_1, \mathbb{H}_2)$$ to the element $$Tf$$, for a fixed $$f$$ in $$\mathbb{H}_1$$. It is clear that $$\mathscr{H}_f$$ is linear. To show that it is bounded, consider its norm, which can be written as, \begin{align*} \lVert \mathscr{H}_f \rVert &= \sup_{T \in \mathfrak{B}_{HS}(\mathbb{H}_1, \mathbb{H}_2) \\ \lVert T \rVert =1} \lVert \mathscr{H}_f T \rVert \\ &= \sup_{T \in \mathfrak{B}_{HS}(\mathbb{H}_1, \mathbb{H}_2) \\ \lVert T \rVert =1} \lVert Tf \rVert \\ & \leq \lVert T \rVert \lVert f \rVert_1 = \lVert f \rVert_1. \end{align*} Hence, $$\mathscr{H}_f$$ is a bounded linear operator, and therefore is an element of $$\mathfrak{B}\big( \mathfrak{B}_{HS}(\mathbb{H}_1, \mathbb{H}_2), \mathbb{H}_2 \big)$$. Then, we can use Theorem 4.2 to write, \begin{align*} \int_E \mathscr{H}_f(\mathscr{G}) \, d\mu = \mathscr{H}_f \left(\int_E \mathscr{G} \, d\mu\right). \end{align*} The result follows by noting that $$f$$ was arbitrary.

## 4.7 Trace Class Operators

Another important special class of compact operators are the so-called trace class operators.

Definition 4.10 (Trace Class and Trace Norm) Let $$T$$ be a bounded operator taking elements of a Hilbert space $$\mathbb{H}_1$$ to another Hilbert space $$\mathbb{H}_2$$. Then, $$T$$ is a trace class operator, or trace class, if for some basis $$\{ e_j\}$$ of $$\mathbb{H}_1$$ the sum \begin{align*} \lVert T \rVert_{TR} := \sum_{j=1}^\infty \langle (T^\ast T)^{1/2}e_j,e_j \rangle_1, \end{align*} converges. In such a case, we call this quantity the trace norm of $$T$$.

Although we stated that trace class operators are a subclass of compact operators, this qualifier is noticably absent from Definition 4.10. This is because compactness follows from the definition, and hence it is redundant to include it. Briefly, since $$\lVert T \rVert_{TR} = \lVert (T^\ast T)^{1/4} \rVert_{HS}^2$$ it follows that $$T^\ast T$$ is compact, hence $$T$$ has a singluar value decomposition, and is hence also compact. In fact, more is true, as it can be shown that trace class operators are also Hilbert-Schmidt (see texbook chapter 4.5 for details).

With compactness, one can use the singular value expansion of $$T^\ast T$$ to show that the trace norm of $$T$$ is equal to the sum of its singular values, $$\lVert T \rVert_{TR} = \sum_{j=1}^\infty \lambda_j$$. When the operator $$T$$ is instead in $$\mathfrak{B}(\mathbb{H})$$, it admits an eigenvalue expansion and the trace norm is equal to the sum of the absolute value of the eigenvalues, $$\lVert T \rVert_{TR} = \sum_{j=1}^\infty \lvert\lambda_j \rvert$$. In this second case, the absolute convergence of the eigenvalue sum allows us to define the familiar trace operation as $$\text{trace}(T) = \sum_{j=1}^\infty \lambda_j = \sum_{j=1}^\infty \langle T e_j, e_j \rangle$$, where $$\{e_j\}$$ is any basis of the underlying Hilbert space. Naturally, when none of the eigenvalues are negative, i.e. when the operator is nonnegative, these two notions coincide: $$\lVert T \rVert_{TR} = \text{trace}(T)$$.

## 4.8 Integral Operators and Mercer’s theorem

Let $$(E, \mathscr{B}, \mu)$$ a measure space, where $$E$$ a compact metric space, $$\mathscr{B}$$ the associated Borel $$\sigma$$-algebra, and $$\mu$$ a finite measure with support $$E$$. Our interest in this section will be with integral operators, which we introduced previously in Example 4.1. For us integral operators are functions $$\mathscr{K}: \mathbb{L}^2(E, \mathbb{B}, \mu) \rightarrow \mathbb{L}^2(E, \mathbb{B}, \mu)$$ of the form \begin{align*} (\mathscr{K} f)(\cdot) := \int_{E} K(s, \cdot) f(s) d\mu(s), \end{align*} where the kernel $$K: E\times E \rightarrow \mathbb{R}$$ is square-integrable and $$f$$ is an arbitrary element of $$\mathbb{L}^2(E, \mathbb{B}, \mu)$$ which is not part of the operator itself, but serves as a placeholder to demonstrate how the operator acts on $$\mathbb{L}^2(E, \mathbb{B}, \mu)$$.

We will make the additional assumption that $$K$$ is continuous, and due to the compactness of $$E$$, this means that $$K$$ is uniformly continuous. It can easily be shown that this leads to uniform continuity of $$\mathscr{K}$$ in the following sense: for each $$f$$ in $$\mathbb{L}^2(E, \mathbb{B}, \mu)$$, $$\mathscr{K}f$$ is uniformly continuous. To do this, we can write $$\lvert \mathscr{K}f(s_2) - \mathscr{K}f(s_1) \rvert = \lvert \int_E [K(s,s_2) - K(s,s_1)]f(s)d\mu\rvert$$ and then bound this for any choice of $$\epsilon$$ using the uniform continuity of $$K$$ by choosing the distance between $$s_1, s_2$$ in $$E$$ accordingly.

Recall Definition 4.12, which states that a compact and self-adjoint operator can be decomposed into a weighted sum of eigenvector tensor products. In the context of integral operators, we found out by way of Example 4.3 that if the kernel is a symmetric function, then the operator will be self-adjoint. To show that integral operators are compact, we can use the tried and true method of finding a sequence of finite-dimensional operators $$\mathscr{K}_n$$ which converge to the integral operator $$\mathscr{K}$$. This result again relies on the continuity of the kernel $$K$$ to construct the sequence of estimators (Theorem 4.6.2, page 117 in the ). Hence, for an integral operator $$\mathscr{K}$$ with continuous and symmetric kernel $$K$$, there is an associated eigensystem $$\{(\lambda_j, e_j)\}$$ such that $$\mathscr{K} = \sum_{j=1}^\infty \lambda_j e_j \otimes e_j$$.

So far, we have derived interesting properties for integral operators by first assuming some similar property on the associated kernel. This trend continues as we now introduce nonnegative kernels.

Definition 4.11 (Nonnegative Kernel) Let $$K: E \times E \rightarrow \mathbb{R}$$ be a symmetric kernel. For any natural number $$n$$ and any associated sequence of values $$S_n = \{s_1,\ldots, s_n\} \subset E$$, construct the $$n \times n$$ matrix $$\mathbf{K}$$ as $$\mathbf{K}_{ij} = K(s_i, s_j)$$. If for all $$n \in \mathbb{N}$$ and $$S_n \subset E$$ we have $$\mathbf{K} \succeq 0$$, then we say $$K$$ is a nonnegative definite, or simply nonnegative, kernel.

You may also see/hear kernels referred to as positive semi-definite in other references, but this is simply another name for the kernels we have called nonnegative.

As you might have guessed, when we use a nonnegative kernel to formulate an integral operator, we imbue that operator with an additional useful property. In this case, besides being compact, self-adjoint and uniformly continuous, it is also now a nonnegative operator. As the textbook notes, the reverse is also true, so that an integral operator is nonnegative definite iff its kernel is nonnegative definite (Theorem 4.6.4, page 119).

We now have all of the requisite material to introduce Mercer’s theorem. This theorem plays a key role in classical functional data analysis, as it is used to justify a spectral decomposition of the covariance function—the function-valued random variable equivalent of the covariance matrix.

Theorem 4.18 (Mercer's Theorem (1909)) Let $$K$$ be a continous, symmetric, nonnegative kernel, and $$\mathscr{K}$$ the associated integral operator. If $$\{(\lambda_j, e_j)\}$$ is the eigensystem associated with $$\mathscr{K}$$, then the kernel $$K$$ can be written in terms of this eigensystem as, \begin{align*} K(s,t) = \sum_{i=1}^\infty \lambda_j e_j(s)e_j(t), \end{align*} for all $$s,t$$ with the sum converging absolutely and uniformly.

## 4.9 Exercises

### 4.9.1 Reinforcing Concepts

Exercise 4.1 (Boundedness) In many other contexts, a function $$f$$ is called bounded if there exists $$C \geq0$$ such that $$\sup_{x \in \text{Im}(f)} \lVert x \rVert \leq C$$. What does the space of linear transformations satisfying this notion of boundedness look like?

Exercise 4.2 (Kernel and Image) Let $$T \in \mathfrak{B}(\mathbb{H}_1, \mathbb{H}_2)$$ and $$T^\ast$$ the adjoint operator of $$T$$. Show that $$T^\ast Tx=0$$ implies that $$Tx=0$$.

Exercise 4.3 (Operators Preserve Weak Topology) Let $$T$$ be an element of $$\mathfrak{B}(\mathbb{H})$$ for a Hilbert space $$\mathbb{H}$$. Prove that $$T$$ preserves the weak topology on $$\mathbb{H}$$.

Exercise 4.4 (Nonnegative but not Self-Adjoint) Provide an example of an operator $$T$$ that is not self-adjoint but for which $$\langle Tx,x \rangle \geq 0$$ holds for all $$x \in \mathbb{H}$$.

Exercise 4.5 (Symmetric Matrices) Let $$T \in \mathfrak{B}(\mathbb{R}^p)$$. Show that $$T$$ is self-adjoint if its matrix is symmetric.

Exercise 4.6 (Projection Check) Show that the operator $$P_{\mathbb{M}} = (e \otimes e)$$ satisfies the conditions of Theorem 3.2 and Theorem 4.7.

### 4.9.2 Testing Understanding

Exercise 4.7 (Kernels) Consider the Hilbert space $$\mathbb{L}^2[0,1]$$ and let $$c_1, c_2$$ be nonnegative constants, $$\phi$$ an element of $$\mathbb{L}^2[0,1]$$, and $$K_1, K_2$$ kernels on $$\mathbb{L}^2[0,1]$$ satisfying Mercer’s Theorem. Show that the following are also kernels satisfying Mercer’s Theorem:

1. $$c_1$$
2. $$c_1K_1 + c_2 K_2$$
3. $$K_1 \cdot K_2$$
4. $$\phi(s)\cdot \phi(t)$$

Exercise 4.8 (Trace) Suppose $$K$$ is a kernel satisfying the conditions of Mercer’s Theorem, so that it defines an operator $$\mathscr{K}$$. Show that, \begin{align*} \text{trace}(\mathscr{K}) = \int_E K(t,t)\,d\mu(t). \end{align*}

Exercise 4.9 (Inner Products as Kernels) Show that an inner product $$\langle \cdot,\cdot \rangle$$ (for example, on the reals) satisfies the kernel conditions of Mercer’s Theorem.

### 4.9.3 Enrichment

Exercise 4.10 (Self-Adjoint Norm) Suppose that $$T$$ is a self-adjoint operator on a Hilbert space $$\mathbb{H}$$. Show that $$\lVert T \rVert = \sup_{\lVert x \rVert=1} \langle Tx ,x \rangle$$.

### References

Axler, Sheldon. 2024. Linear Algebra Done Right. 4th ed. Vol. 172–6056. Undergraduate Texts in Mathematics. Springer. https://doi.org/10.1007/978-3-031-41026-0.
Hsing, Tailen, and Randall Eubank. 2015. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley. https://www.oreilly.com/library/view/theoretical-foundations-of/9780470016916/.