Chapter 3 Hilbert Spaces and Vector Integrals

3.1 Inner-Product Spaces and Hilbert Spaces

With the metric/norm combo we have a way to measure the “distance” between elements of our space, and the “size” of elements in our space, but these tools are not sufficient for building notions such as orthogonality and projection which are commonly used concepts in multivariate analysis. For this purpose we introduce another tool: the inner product.

Definition 3.1 (Inner Product) A function \(\langle \cdot, \cdot \rangle\) on a vector space \(\mathbb{V}\) is called an inner product if it satisfies

  1. \(\langle v,v \rangle \geq 0\),
  2. \(\langle v, v \rangle = 0\) iff \(v = 0\),
  3. \(\langle a_1v_1 + a_2v_2, v \rangle = a_1 \langle v_1,v \rangle + a_2\langle v_2,v \rangle\), and
  4. \(\langle v_1, v_2 \rangle = \langle v_2, v_1 \rangle\),

for every \(v, v_1, v_2 \in \mathbb{V}\) and \(a_1, a_2 \in \mathbb{R}\).

(Hsing and Eubank 2015) note that an inner product is a generalization of the dot product in Euclidean space. We call a vector space \(\mathbb{V}\) embued with an inner product \(\langle \cdot,\cdot \rangle\) an inner product space.

Theorem 3.1 (Inner Products Define Norms) An inner product \(\lVert \cdot, \cdot \rVert\) on a vector space \(\mathbb{V}\) produces a norm \(\lVert \cdot \rVert\) defined by \(\lVert v \rVert = \langle v,v \rangle^{1/2}\) for \(v \in \mathbb{V}\). The inner product then satisfies the Cauchy-Schwarz inequality, \[\begin{align*} \lvert \langle v_1,v_2 \rangle \rvert \leq \lVert v_1 \rVert \lVert v_2 \rVert, \end{align*}\] for \(v_1, v_2 \in \mathbb{V}\) with equality holding if \(v_1 = a_1 + a_2v_2\) for some \(a_1, a_2 \in \mathbb{R}\).

When we have an inner product on our space, it can therefore also be used to define a norm, and hence a metric. This is quite fortuitous because it allows us to define an inner product on our space, from which we can then naturally generate a norm and a metric as well, so that all the results we’ve discussed up to this point apply to inner product spaces as well. To avoid continual reference to “the norm induced by the inner product” or “the metric induced by the norm induced by the inner product” we simply call the induced norm the canonical norm and the associated metric the canonical metric.

\[\begin{align*} \langle v, u \rangle \xrightarrow{\text{induces norm}} \lVert \cdot \rVert = \langle v,v \rangle^{1/2} \xrightarrow{\text{induces metric}} d(\cdot, \cdot) = \lVert u - v \rVert \end{align*}\]

Based on the assertion that inner products define norms, one is naturally lead to wonder if/when the converse is true. That is, if we have a norm, is there a way to ascertain whether or not there exists an inner product from which the given norm can be derived? Thankfully, the answer to this question is yes. It turns out that if a norm satisfies \(\lVert x + y \rVert^2 + \lVert x - y \rVert^2 = 2\lVert x \rVert^2 + 2 \lVert y \rVert^2\) then there is an inner-product associated with the norm. This is called the parallelogram law.

Additionally (Hsing and Eubank 2015) also mention that the inner product is a continuous function under the “norm induced topology.” This just means, it is a continuous function under the assumption that the metric we used to define the open sets of our space is the canonical metric. We now state and prove this result formally.

Theorem 3.2 (The Inner Product is Continuous Under the Topology Induced by the Canonical Norm) Let \(\{v_{1n}\}\) and \(\{v_{2n}\}\) be sequences, and \(v_1\), \(v_2\) elements of an inner produce space \(\mathbb{V}\), with associated inner product \(\langle \cdot,\cdot \rangle\) and induced norm \(\lVert \cdot \rVert\). If both sequences converge under the (canonical) norm, i.e. \(\lVert v_{in} - v_i \rVert \rightarrow 0\) for \(i=1,2\), then \(\langle v_{1n}, v_{2n} \rangle \rightarrow \langle v_1, v_2 \rangle\).

Proof. We have, \[\begin{align*} \lvert \langle v_{1n}, v_{2n} \rangle - \langle v_1, v_2\rangle \rvert &\leq \lvert \langle v_{1n} - v_1, v_{2n} \rangle \rvert + \lvert \langle v_1, v_{2n} - v_2 \rangle \rvert \\ &\leq \lVert v_{1n} - v_1 \rVert \lVert v_{2n} \rVert + \lVert v_{1n} \rVert \lVert v_{2n} - v_2 \rVert, \end{align*}\] where both terms in the sum go to zero as \(n\) goes to \(\infty\) by the assumed convergence of the sequences.

As always, completeness is a property we covet. This leads us to the definition of a Hilbert space.

Definition 3.2 (Hilbert Space) A complete inner-product space is called a Hilbert space.

To be clear, completeness is still related to the convergence of Cauchy sequences, and hence to a metric. What we mean by a complete inner-product space, is that the space is complete under the canonical metric. We can therefore also characterize Hilbert spaces as inner product spaces that are also Banach spaces under the canonical norm.

In the introduction to this section, we mentioned that our motivation for introducing inner products was related to orthogonality and projection. We now introduce the first of those concepts.

Definition 3.3 (Orthogonality and Orthonormality) Two elements \(x_1\) and \(x_2\) of an inner-product space \(\mathbb{X}\) are said to be orthogonal if \(\langle x_1,x_2 \rangle=0\). A countable collection of elements \(\{e_1, e_2,\ldots\}\) is said to be an orthonormal sequence if \(\lVert e_j \rVert=1\) for all \(j\) and the \(e_j\) are pairwise orthogonal.

In a general inner product space, we can still think of orthogonal vectors as simply not sharing any of the same directions or dimensions. The element \(x_1\) from the definition does not contribute anything in the direction of \(x_2\) and hence they are, intuitively, as linearly independent as possible. Of course, the usual notion of projection also applies here.

Our next goal is to use this notion of orthogonality, and the notion of a span to build a definition of a basis that is suitable for infinite-dimensional spaces. The next result is another step in that direction.

Theorem 3.3 (Bessel's Inequality) Let \(\{e_1, e_2, \ldots\}\) be an orthonormal sequence in an inner-product space \(\mathbb{X}\). For any \(x \in \mathbb{X}\), \(\sum_{j=1}^\infty \langle x,e_j \rangle^2 \leq \lVert x \rVert^2\), and therefore \(\sum_{j=1}^\infty \langle x,e_j \rangle e_j\) converges in \(\mathbb{X}\).

Proof. We have, \[\begin{align*} \left\lVert x - \sum_{i=1}^n \langle x, e_i \rangle e_i \right\rVert^2 = \lVert x \rVert^2 - \sum_{i=1}^n \langle x,e_i \rangle^2. \end{align*}\] This shows that the sequence of partial sums \(\left( \sum_{i=1}^n \langle x,e_i \rangle^2 \right)_{n \in \mathbb{N}}\) is bounded. Since the sequence is also increasing we have that it converges. The result then follows.

This is a nice result because it shows that we can construct elements of an inner-product space \(\mathbb{X}\) using an orthonormal sequence \(\{e_j\}\), which is reminiscent of the decomposition of elements in finite-dimensional vector spaces in terms of a basis. However, there are two issues here. First, although we have that the object \(\sum_{j=1}^\infty \langle x, e_j \rangle e_j\) is an element of the space \(\mathbb{X}\), we don’t really know what element it is. Specifically, since we are generating it from \(x\), we would like some way to relate it back to \(x\). Second, since \(\sum_{j=1}^\infty \langle x, e_j \rangle e_j\) is an element of \(\mathbb{X}\) it immediately becomes clear that some elements of our infinite-dimensional inner product space take the form of an infinite linear combination, and therefore we will not find them among the members of the set \(\text{span}\{e_j\}\). That means we can no longer rely on the span to recover the whole space as we did in the finite-dimensional case. Hence we need a little something extra when defining a basis in this context. The next definition is a proposed ammendment to the definition of a basis which can be extended to the infinite-dimensional context.

Definition 3.4 (Basis in Hilbert Space) An orthonormal sequence \(\{e_j\}\) in a Hilbert space \(\mathbb{H}\) is called an orthonormal basis if \(\overline{\text{span}\{e_j\}} = \mathbb{H}\).

The idea behind this definition is that by including the limits points of \(\text{span}\{e_j\}\), we are now guaranteed to also include elements of the form \(\sum_{j=1}^\infty \langle x, e_j \rangle e_j\), which we know from our previous discussion should be included. Also, from here on out we will almost exclusively be interested in orthonormal bases, so I will just refer to them as a basis. In a case where we are talking about a basis that is not orthonormal, I will make that condition explicit.

Theorem 3.4 (Orthonormal Sequence to Basis) An orthonormal sequence \(\{e_j\}\) in a Hilbert space \(\mathbb{H}\) is an orthonormal basis if \(\langle x, e_j \rangle = 0\) for all \(j\) implies that \(x=0\).

Proof. The proof is assigned as exercise 3.4.

This result then allows us to solve both of the problems we mentioned above, which we state formally as a corollary.

Corollary 3.1 (Fourier Expansion and Parseval's Relation) Every element \(x\) of a Hilbert space \(\mathbb{H}\) with basis \(e_j\) can be expressed as, \[\begin{equation} x = \sum_{i=1}^\infty \langle x,e_j \rangle e_j, \tag{3.1} \end{equation}\] which we call the Fourier expansion with \(\langle x,e_j \rangle\) the Fourier coefficients. Additionally, we have, \[\begin{equation} \lVert x \rVert^2 = \sum_{i=1}^\infty \langle x,e_j \rangle^2, \tag{3.2} \end{equation}\] which is known as Parseval’s relation.

Now that we know how to describe a basis, and how to decompose elements in the Hilbert space in terms of the basis, a natural next question is, when does a basis actually exist? It was mentioned earlier that the notion of separability would play a key role related to the existence of bases for Hilbert spaces. The next result delivers on that promise.

Theorem 3.5 (Separable Hilbert Space has Basis) A Hilbert space \(\mathbb{H}\) is separable if and only if it has an orthonormal basis.

Proof (sketch). If we have an orthonormal basis, then \(\overline{\text{span}\{e_j\}}=\mathbb{H}\) and we can take as a countable dense subset, the elements of \(\text{span}\{e_j\}\) which have rational coefficients. If we have a countable dense subset, i.e., when \(\mathbb{H}\) is separable, we can use the Gram-Schmidt procedure to produce an orthonormal sequence whose closed span matches the closed span of the countable dense subset. It follows that this orthonormal sequence forms a basis for \(\mathbb{H}\).

When discussing bases in the finite-dimensional case in the previous section, it was said that the notion of basis for infinite-dimensional spaces is not a particularly simple concept to nail down. Our short derivation of a basis for separable Hilbert spaces here makes that statement seem perhaps overly dramatic. However, this is because we have focused on a very particular kind of space, for which this notion works out nicely. In particular, the basis we’ve developed here is a special case of a Schauder basis**, while the usual notion of a basis in finite dimensions is sometimes called a Hamel basis. For more general spaces, such as non-separable Hilbert spaces, an associated Schauder basis need not be countable. Additionally, in the most general case, Schauder bases are actually ordered sets in the sense that the vector representation \(x = \sum_{i=1}^\infty \langle x,e_j \rangle e_j\) may not converge unconditionally. That is, the expression may fail to hold under arbitrary reorderings of the summands. Further, if we drop back to a more general space like a Banach space, even if it is separable, it may not necessarily admit a Schauder basis. These sublties will not be important in this course, as our focus will usually be on separable Hilbert spaces, however it is good that you are at least aware that the notion of a basis in larger spaces can indeed become quite complex.

We have finally arrived at the fundamental space of interest for functional data analysis. In functional data analysis, separable Hilbert spaces are among the most commonly chosen sample spaces for function-valued random variables. Chief among these is \(\mathbb{L}^2(E, \mathscr{B}, \mu)\), and in particular \(\mathbb{L}^2[0,1]\). The associated inner-product is the natural extension of the Euclidean dot-product to this infinite-dimensional context, \(\langle f,g \rangle := \int_E fg d\mu\). This will be made formal in our section building up function-valued random variables.

3.2 Orthogonal Decomposition

In the previous section our discussion was focused on results derived from orthogonality. In this section, we begin with some discussion of the how projection plays a role, and then how that leads to more geometric properties related to orthogonality.

Theorem 3.6 (Hilbert Space Best Approximation) Let \(\mathbb{H}\) be a Hilbert space and \(\mathbb{M}\) a closed convex set of \(\mathbb{H}\). For any \(x \in \mathbb{H}\) there exists a unique element \(\hat{x}\) of \(\mathbb{M}\) such that, \[\begin{align*} \lVert x-\hat{x} \rVert = \inf_{y \in \mathbb{M}} \lVert x-y \rVert. \end{align*}\] The minimizer \(\hat{x}\) satisfies, \[\begin{equation} \langle x - \hat{x}, y-\hat{x} \rangle \leq 0 \tag{3.3} \end{equation}\] for all \(y \in \mathbb{M}\).

Proof (sketch). Construct a Cauchy sequence which converges to the infimum \(y\). Proof by contradiction.

The condition in Equation (3.3) may seem a bit mysterious at first, but we can understand it intuitively using known properties of the inner product. In short, it expresses the fact that all points \(y\) in \(\mathbb{M}\) that are not equal to \(\hat{x}\) are on the opposite side of the hyperplane defined by \(\hat{x}\) which is orthogonal to \(x - \hat{x}\), or they belong to it. For us, Theorem 3.6 is more of a Lemma, since we want to use it to prove our next result. However, since it is a result which is very important and useful in many other contexts, we leave it here as a theorem as well.

Corollary 3.2 (Projection Theorem) Let \(\mathbb{H}\) be a Hilbert space and \(\mathbb{M}\) a closed linear subspace of \(\mathbb{H}\). For any element \(x\) of \(\mathbb{H}\), there exists a unique element \(\hat{x}\) of \(\mathbb{M}\) that minimizes \(\lVert x-y \rVert\) on \(\mathbb{M}\). The minimizer is uniquely determined by the condition \(\langle \hat{x},y \rangle = \langle x,y \rangle\) for all \(y \in \mathbb{M}\).

Proof. Since \(\mathbb{M}\) is a subspace, it contains both \(y=0\) and \(y=2\hat{x}\). Substituting these into Equation (3.3) from Theorem 3.6 we get \(\langle x- \hat{x}, 2\hat{x}-\hat{x} \rangle = \langle x- \hat{x}, \hat{x} \rangle \leq 0\) and \(\langle x- \hat{x}, 0-\hat{x} \rangle = -\langle x- \hat{x}, \hat{x} \rangle \leq 0\). Hence \(\langle x - \hat{x}, \hat{x} \rangle = 0\).

Using this result, we can expand Equation (3.3) and cancel the term \(\langle x - \hat{x}, \hat{x} \rangle\) to get that \(\langle x-\hat{x}, y \rangle \leq 0\) for all \(y \in \mathbb{M}\). Notably, this implies \(\langle x-\hat{x}, -y \rangle \leq 0\) and hence \(\langle x - \hat{x}, y \rangle = 0\) for all \(y \in \mathbb{M}\).

In words, this theorem is saying that the residual of the projection is orthogonal to the subspace \(\mathbb{M}\), meaning it is orthogonal to every element in \(\mathbb{M}\). You’ve definitely seen something like this before, either in a regression course where \(x\) is the observed outcome vector and \(\hat{x}\) is the projection of this vector onto the subspace spanned by the features, or a linear algebra course.

There is another subtle detail we can glean from the projection theorem. In particular, since \(x\) was an arbitrary element of \(\mathbb{H}\), we can always write it as \(x = \hat{x} + (x - \hat{x})\), where \(\hat{x}\) is in the subspace of \(\mathbb{M}\) and \(x - \hat{x}\) is in the subspace of vectors orthogonal to \(\mathbb{M}\), which we denote by \(\mathbb{M}^{\perp}\).

Theorem 3.7 (Properties of Orthogonal Complement) Let \(\mathbb{H}\) be a Hilbert space and \(\mathbb{M}\) a subset of \(\mathbb{H}\). Then,

  1. \(\mathbb{M}^{\perp}\) is a closed subspace,
  2. \(\mathbb{M} \subset (\mathbb{M}^{\perp})^{\perp}\).

If \(\mathbb{M}\) is a subspace, we further have that,

  1. \((\mathbb{M}^{\perp})^{\perp} = \overline{\mathbb{M}}\), and,
  2. \(\mathbb{H} = \mathbb{M} \oplus \mathbb{M}^{\perp}\),

where \(A \oplus B = \{ a + b \mid a \in A, b\in B\}\) and \(A\), \(B\) orthogonal subspaces.

We take these as given since they are quite natural, but for those curious the proof can be found in the text on page \(40\).

3.3 The Bochner Integral

The Lebesgue integral plays a major role in probability theory, particularly in defining the expected value of real-valued random variables. It is specfically needed to overcome the limitations of the Riemann integral, which it does by effectively handling a broader class of functions and allowing for the integration of functions that have behaviours which make Riemann integration impossible. However, in functional data analysis we are interested in random variables which take values in a Hilbert space. In this context, niether the Lebesgue nor Riemann integral can be applied as is. Yet, many interesting properties related to random variables, such as the expected value, require a notion of integration. In this section we give a brief overview of how the notion of the Lebesgue integral can be extended to this domain. The resulting integral is known as the Bochner integral. The Bochner integral retains many of the essential properties of the Lebesgue integral, such as linearity and the ability to interchange limits and integrals under certain conditions, but it is applicable to functions that take values in a Banach space, with Hilbert spaces being a notable example.

The construction of the Bochner integral is reminiscent of that of the Lebesgue integral, hence our exposition will be brief, relying on your knowledge of the Lebesgue integral to fill in the gaps. The ultimate takeaway of this section should be that we can define integration on functions from measure spaces to Banach (and hence Hilbert) spaces, and that we can characterize when such a function’s Bochner integral will exist. Our setting is that of functions from a measure space \((E, \mathscr{B}, \mu)\) to a Banach space \((\mathbb{X}, \lVert \cdot \rVert)\).

Definition 3.5 (Simple Function) A function \(f: E \rightarrow \mathbb{X}\) is called simple if it can be represented as, \[\begin{align*} f(\omega) = \sum_{i=1}^k I_{E_i}(\omega)x_i \end{align*}\], for some positive integer \(k\), \(E_i \in \mathscr{B}\) disjoint, and \(x_i\) in \(\mathbb{X}\).

Usually we will be interested in cases where the \(E_i\) form a partition of \(E\).

Definition 3.6 (Bochner Integral of Simple Function) Any simple function \(f(\omega) = \sum_{i=1}^k \mathbb{I}_{E_i}(\omega)x_i\) with \(\mu(E_i)\) finite for all \(i\) is said to be integrable and its Bochner integral is defined as, \[\begin{align*} \int_E f\, d\mu = \sum_{i=1}^k \mu\left(E_i\right) x_i. \end{align*}\]

We now extend this notion to the case of measureable functions between \(E\) and \(\mathbb{X}\).

Definition 3.7 (Bochner Integral of Measurable Function) A measurable function \(f\) is said to be Bochner integrable if there exists a sequence of simple and Bochner integrable functions \(f_n\) such that, \[\begin{align*} \lim_{n \rightarrow \infty} \int_E \lVert f_n - f \rVert\, d\mu = 0. \end{align*}\] When this holds the Bochner integral of \(f\) is defined as \[\begin{align*} \int_E f\, d\mu = \lim_{n \rightarrow \infty} \int_E f_n\, d\mu. \end{align*}\]

We can justify this definition through the following logic. Letting \(f_n\) be any simple function we can write, \[\begin{align*} \left\lVert \int_E f_n(\omega) \,d\mu(\omega) \right\rVert &= \left\lVert \int_E \sum_{i=1}^k \mathbb{I}_{E_i}(\omega) x_i \, d\mu(\omega) \right\rVert \\ &= \left\lVert \sum_{i=1}^k x_i \mu(E_i) \right\rVert \\ &\leq \sum_{i=1}^k \lVert x_i \rVert \mu(E_i) \\ &=\int_E \sum_{i=1}^k \lVert x_i \rVert \mathbb{I}_{E_i}(\omega) \,d\mu(\omega) \\ &=\int_E \lVert f_n(\omega) \rVert \,d\mu(\omega). \end{align*}\]
where I dropped the dependence of \(k\) and the \(E_i\) on \(n\) for simplicity. Since \(f_n - f_m\) is also a simple function, we can use this derivation to write, \[\begin{align*} \left\lVert \int_E f_n(\omega) \,d\mu(\omega) - \int_E f_m(\omega) \,d\mu(\omega)\right\rVert \leq \int_E \lVert f_n(\omega) - f_m(\omega) \rVert \,d\mu(\omega), \end{align*}\] so that we’ve bounded the norm of the Bochner integral by a Lebesgue integral of norms. We can then use the triangle inequality to see that, \[\begin{align*} \int_E \lVert f_n(\omega) - f_m(\omega) \rVert \,d\mu(\omega) \leq \int_E \lVert f_n(\omega) - f(\omega) \rVert \,d\mu(\omega) + \int_E \lVert f_m(\omega) - f(\omega) \rVert \,d\mu(\omega), \end{align*}\] which coverges to \(0\) by our original assumption regarding \(f_n\). We therefore have that the sequence of integrals \(\left(\int_E f_n d\mu \right)\) is a Cauchy sequence in \(\mathbb{X}\), which converges by completeness.

Of course, this definition is nice, but the issue is it relies on finding an approximating sequence of simple functions. In practice, it is not feasible to expect one to find, or determine the existence of such a sequence for any particular function \(f: E \rightarrow \mathbb{X}\) of interest. The next result gives us guarantees on the existence of such a sequence in the special case that our space of interest is a separable Hilbert space.

Theorem 3.8 (Bochner Integrability for Hilbert Space-valued Functions) Suppose \(\mathbb{X}\) is a separable Hilbert space and \(f\) is a measurable function from \(E\) to \(\mathbb{X}\) with \(\int_E \lVert f \rVert \,d\mu < \infty\). Then \(f\) is Bochner integrable.

The Bochner integral also has some properties reminiscent of the Lebesgue integral, namely, a dominated convergence theorem and a monotonicity property. See Theorem 2.6.6 and 2.6.7 in (Hsing and Eubank 2015) for details.

3.4 Exercises

3.4.1 Reinforcing Concepts

Exercise 3.1 (Parallelogram Law) Show that any norm generated by an inner product satisfies the parallelogram law, \(\lVert x + y \rVert^2 + \lVert x - y \rVert^2 = 2\lVert x \rVert^2 + 2 \lVert y \rVert^2\).

3.4.2 Testing Understanding

Exercise 3.2 (Uncountable Orthogonal Set) Consider the space of real-valued functions on \(\mathbb{R}\) which take nonzero values at only countably many \(x\) in \(\mathbb{R}\). Define the inner product on this space as \(\langle f,g \rangle = \sum_{x \in \mathbb{R}} f(x)g(x)\). This inner product is well-defined because each function is nonzero at only countably many points. The associated space is denoted by \(\ell^2(\mathbb{R})\) and you can assume it is a Hilbert space.

  1. Show that this space contains an uncountable orthogonal set.
  2. Use the properties of this set to also conclude that this space is not separable.

Exercise 3.3 (Lp Inner-Product Spaces) Show that \(\mathbb{L}^2(E, \mathscr{B}, \mu)\) is the only \(\mathbb{L}^p\) space for which an inner-product can exist.

Exercise 3.4 (Orthonormal Sequence to Basis) Prove Theorem 3.4. Hints are provided in the textbook if needed. (pg. 34, under theorem 2.4.12).

3.4.3 Enrichment

Exercise 3.5 (When a Norm Defines an Inner Product) Let \((\mathbb{V}, \lVert \cdot \rVert)\) be a normed vector space, and suppose the norm satisfies the parallelogram law, \(\lVert x + y \rVert^2 + \lVert x - y \rVert^2 = 2\lVert x \rVert^2 + 2 \lVert y \rVert^2\). Show that, \[\begin{align*} \langle x,y \rangle = \frac{ \lVert x +y \rVert^2 - \lVert x - y \rVert^2 }{4}, \end{align*}\] defines an inner product.

References

Hsing, Tailen, and Randall Eubank. 2015. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley. https://www.oreilly.com/library/view/theoretical-foundations-of/9780470016916/.