1.2 Dot products, norms and angles

When working with vectors we often want to know their length and the angle between different vectors. In two or three dimensions we can work geometrically to determine these, but in order to define these concepts more generally in \(\mathbb{R}^n\) we first define another way of combining two vectors, namely the dot product.

Definition 1.7: (Dot product)

Let \(x,y\in\mathbb{R}^n\), then the dot product of \(x\) and \(y\) is defined by \[x\cdot y:=x_1y_1+x_2y_2+\cdots x_ny_n=\sum_{i=1}^n x_iy_i .\]

This is sometimes known as the scalar product, as it takes two vectors and outputs a scalar.

The following theorem gives us some key properties satisfied by the dot product.

Theorem 1.8:

For all \(x,y,v,w\in\mathbb{R}^n\) and \(\lambda\in\mathbb{R}\) we have that:

\(x\cdot y=y\cdot x\).
\(x\cdot(v+w)=x\cdot v+x\cdot w\) and \((x+y)\cdot v=x\cdot v+y\cdot v\).
\((\lambda x)\cdot y=\lambda (x\cdot y)\) and \(x\cdot(\lambda y)=\lambda(x\cdot y)\).
\(x\cdot x\geq 0\) and \(x\cdot x=0\) is equivalent to \(x=\mathbf{0}\).

Proof.

All these properties follow directly from the definition. So we leave most of them as an exercise, and just prove (ii) and (iv).

To prove (ii) we use the definition \[x\cdot(v+w)=\sum_{i=1}^nx_i(v_i+w_i) =\sum_{i=1}^n x_iv_i+x_iw_i=\sum_{i=1}^n x_iv_i+\sum_{i=1}^n x_iw_i =x\cdot v+x\cdot w,\] and the second identity in (ii) is proved the same way.

For (iv), we notice that \[x\cdot x=\sum_{i=1}^nx_i^2\] is a sum of squares, i.e.no term in the sum can be negative. Therefore, if the sum is \(0\), all terms in the sum must be \(0\), i.e., \(x_i=0\) for all \(i\), which means that \(x=\mathbf{0}\). Conversely, if \(x=\mathbf{0}\) then it immediately follows from the definition that \(x\cdot x=\mathbf{0}\cdot \mathbf{0}=0.\)

□

We can now use this to define the norm of a vector, which is a measure of its length.

Definition 1.9: (Norm)

The norm of a vector in \(\mathbb{R}^n\) is defined as \[\lVert x\rVert:=\sqrt{x\cdot x}=\bigg(\sum_{i=1}^nx_i^2\bigg)^{\frac{1}{2}} .\]

A vector which has a norm of 1 is called a unit vector. For a vector \(x\in \mathbb{R}^n\) we use the notation \(\hat{x}\) to represent the unit vector in the direction of the vector \(x,\) and we have that \(\hat{x}=\dfrac{1}{||x||}x.\)

If \(v, w\in \mathbb{R}^n\) then \(\lVert v-w\rVert\) is the distance between the points \(v\) and \(w\).

Example 1.10:

The norm of the vector \(v=\begin{pmatrix}5\\ 0\end{pmatrix}\) is \(\lVert v\rVert=5\), the norm of \(w=\begin{pmatrix}3\\-1\end{pmatrix}\) is \(\lVert w\rVert=\sqrt{9+1}=\sqrt{10}\), and the distance between \(v\) and \(w\) is \(\lVert v-w\rVert=\sqrt{4+1}=\sqrt{5}\).

Note that in \(\mathbb{R}^2\) the norm is just the geometric length of the distance between the point in the plane with coordinates \((v_1,v_2)\) and the origin \(\mathbf{0}\), which can be found using Pythagoras’ Theorem. In fact this can be extended to a version of Pythagoras’ Theorem in \(\mathbb{R}^n\). In order to state this, we must first extend the concept of our vectors being at right-angles, or orthogonal, to one another.

Definition 1.11: (Orthogonal)

The vectors \(x,y\in\mathbb{R}^n\) are called orthogonal if \(x\cdot y=0\). We often write \(x\perp y\) to indicate that \(x\cdot y=0\) holds.

For example, if \(x=(1,1)\) and \(y=(1,-1)\) then \(x\cdot y=1\cdot 1 + 1 \cdot(-1)=0\), so these are an example of orthogonal vectors.

We can now consider our more general version of Pythagoras’ Theorem.

Theorem 1.12: (Pythagoras’ Theorem)

Let \(x, y\in \mathbb{R}^n\). We have \(x\cdot y=0\) if and only if \[\lVert x+y\rVert^2=\lVert x\rVert^2+\lVert y\rVert^2 .\]

The proof is left as an exercise.

A fundamental property of the dot product is the Cauchy–Schwarz inequality, which relates the dot product of two vectors to their norms.

Theorem 1.13: (Cauchy–Schwarz inequality)

For any \(x,y\in\mathbb{R}^n\) \[\lVert x\cdot y\rvert\leq \lVert x\rVert\lVert y\rVert.\label{csnd}\]

Proof.

If \(y=\mathbf{0}\) the inequality is true, so we assume \(y\neq \mathbf{0}\). Notice that \(v\cdot v\geq 0\) for any \(v\in\mathbb{R}^n\), so let us try to use this inequality by applying it to \(v=x-ty\), where \(t\) is a real number which we will choose later. First we get \[0\leq (x-ty)\cdot(x-ty)=x\cdot x-2tx\cdot y+t^2y\cdot y,\] and we see how the dot products and the norm related in the Cauchy–Schwarz inequality appear. Now we have to make a clever choice for \(t\), let us try \[t=\frac{x\cdot y}{y\cdot y} ,\] this is actually the value of \(t\) for which the right hand side becomes minimal. With this choice we obtain \[0\leq \lVert x\rVert^2-\frac{(x\cdot y)^2}{\lVert y\rVert^2}\] and so \((x\cdot y)^2\leq \lVert x\rVert^2\lVert y\rVert^2\) which after taking the square root gives the desired result.

□

To make sure that we have defined the norm in a sensible way, we consider the properties that we would expect it to have and confirm that these do hold.

Exercise 1.14:

The norm of a vector is a measure of its length. Take some time to consider what properties we would want this definition to have for it to be a sensible definition.

We prove some of these properties below.

Theorem 1.15:

The norm satisfies

\(\lVert v\rVert\geq 0\) for all \(v\in\mathbb{R}^n\) and \(\lVert v\rVert=0\) if and only if \(v=\mathbf{0}\).
\(\lVert\lambda v\rVert=\lvert\lambda\rvert\lVert v\rVert\) for all \(\lambda\in\mathbb{R},v\in\mathbb{R}^n\).
\(\lVert v+w\rVert\leq \lVert v\rVert+\lVert w\rVert\) for all \(v,w\in\mathbb{R}^n\) (this is known as the triangle inequality).

Proof.

This follows from the properties of the dot product in Theorem 1.8.
This follows from a direct computation: \[\begin{aligned} \lVert\lambda v\rVert&=\sqrt{(\lambda v_1)^2+(\lambda v_2)^2 +\dots+ (\lambda v_n)^2}\\ & =\sqrt{\lambda^2(v_1^2+v_2^2+\dots+ v_n^2)}\\&=\sqrt{\lambda^2} \sqrt{v_1^2+v_2^2+\dots+ v_n^2}=\lvert\lambda\rvert\lVert v\rVert . \end{aligned}\]
We consider \[\lVert v+w\rVert^2=(v+w)\cdot(v+w)=v\cdot v+2v\cdot w+w\cdot w=\lVert v\rVert^2+2v\cdot w+\lVert w\rVert^2 .\] and now applying the Cauchy–Schwarz inequality in the form \(v\cdot w\leq \lVert v\rVert\lVert w\rVert\) to the right hand side gives \[\lVert v+w\rVert^2\leq \lVert v\rVert^2+2\lVert w\rVert\lVert v\rVert+\lVert w\rVert^2 =(\lVert v\rVert+\lVert w\rVert)^2 ,\] and taking the square root gives the triangle inequality.

\(\;\)

□

We can also use the dot product to define the angle between two vectors.

Definition 1.16: (Angle between vectors)

Let \(x,y\in \mathbb{R}^n\) with \(x\neq 0\) and \(y\neq 0\). Then the angle \(\theta\) between the two vectors is defined by \[\cos\theta=\frac{x\cdot y}{\lVert x\rVert\lVert y\rVert} .\]

Notice that this definition makes sense because the Cauchy–Schwarz inequality holds, namely Cauchy–Schwarz gives us \[-1\leq \frac{x\cdot y}{\lVert x\rVert\lVert y\rVert} \leq 1\] and therefore there exist an \(\theta\in [0,\pi)\) such that \[\cos \theta=\frac{x\cdot y}{\lVert x\rVert\lVert y\rVert} .\]

Notice that if our vectors are orthogonal as defined in Definition 1.11 this corresponds to \(\cos \theta= \frac{\pi}{2}\) as expected.

We can now use this definition to compute the angles between vectors.

Example 1.17:

If \(v=(-1,7)\) and \(w=(2,1)\), then we find \(v\cdot w=5\), \(\lVert v\rVert=\sqrt{50}\) and \(\lVert w\rVert=\sqrt{5}\), hence \(\cos \theta= 5/\sqrt{250}=1/\sqrt{10}\).

If we are working in \(\mathbb{R}^2\) we can prove that this definition of the angle using the dot product does indeed coincide with the geometric method of finding the angle. The proof of this is left as an exercise.