6.4 Composition and inverses of functions

We now turn our attention on how to apply several functions in a row.

Definition 6.16:
Consider \(f:X\to Y\) and \(g:Y\to Z\). We define the composition of \(g\) with \(f\), denoted by \(g\circ f\), by \[(g\circ f)(x)=g(f(x)), \text{ for all }x\in X.\]
Etymology:

Composition has the same roots as components and composite. It comes from co meaning “together with” and ponere meaning “to put”. So something composite is something “put together” from different components. The composition of two functions is the result of putting two functions together.

Since \(f\) assigns to \(x\in X\) exactly one value \(f(x)\in Y\), and \(g\) assigns to \(f(x)\in Y\) exactly one value in \(Z\), we have that \(g\circ f\) is a function from \(X\) to \(Z\), that is \(g\circ f:X\to Z\).

Remark:
The ordering of \(f\) and \(g\) is important, as \(g\circ f\) and \(f\circ g\) can be different. In fact, unless \(Z=X\), then \(f \circ g\) probably does not make sense.

We have that composition of functions is associative. That is:

Proposition 6.17:

Consider \(f:X\to Y\), \(g:Y\to Z\) and \(h:Z\to W\). Then \(h\circ(g\circ f)=(h\circ g)\circ f\).

Proof.

By Theorem 6.4, to show \(h\circ(g\circ f)=(h\circ g)\circ f\), we need to show that \(\forall\,x\in X\), we have \((h\circ(g\circ f))(x)=((h\circ g)\circ f)(x)\). Let \(x\in X\). Then \[(h\circ(g\circ f))(x)=h((g\circ f)(x))=h(g(f(x))), \forall\, x\in X\] and \[((h\circ g)\circ f)(x)=(h\circ g)(f(x))=h(g(f(x))), \forall\, x\in X.\] Thus, \(h\circ(g\circ f)=(h\circ g)\circ f\).

This argument holds for all \(x\in X\), hence \(h\circ(g\circ f)=(h\circ g)\circ f\).

The above theorem tells us that the notation \(h\circ g\circ f\) is unambiguous.

Example:

Let \(a,b,c,d\in\mathbb{R}\) be such that \(a<b\) and \(c<d\). Recall that \([a,b]\) denotes the closed interval from \(a\) to \(b\), that is \[[a,b]=\{x\in\mathbb{R}:\ a\le x\le b\ \}.\]

We define the following three functions:

  • \(f_1:[a,b]\to[0,b-a]\) defined by \(f_1(x)=x-a\);

  • \(f_2:[0,b-a]\to[0,d-c]\) defined by \(f_2(x)=x\frac{d-c}{b-a}\);

  • \(f_3:[0,d-c]\to[c,d]\) defined by \(f_3(x) = x+c\).

We briefly check that \(f_2\) is indeed a function from \([0,b-a]\) to \([0,d-c]\) by checking \(f_2([0,b-a])\subseteq [0,d-c]\) (in theory one should also do the same with \(f_1\) and \(f_3\), but these two cases are clearer). Pick \(x\in[0,b-a]\), so \(0\leq x\leq b-a\) by definition. Then since \(b-a>0\) we have \(0\leq \frac{x}{b-a}\leq 1\), and since \(d-c>0\) we have \(0\leq x\frac{d-c}{b-a}\leq d-c\). So \(f_2(x) \in [0,d-c]\). As this is true for all \(x\in [0,b-a]\), we indeed have \(f_2\) is a function with co-domain \([0,d-c]\).

We can construct \(f:[a,b]\to [c,d]\) by composing \(f_3\) with \(f_2\) with \(f_1\), i.e., \(f = f_3\circ f_2 \circ f_1\) is defined by \(f(x) = c + \frac{(x-a)(d-c)}{b-a}\).

Properties of functions carry through composition.

Theorem 6.18:

Consider \(f:X\to Y\) and \(g:Y\to Z\). Then:

  1. if \(f\) and \(g\) are injective, we have that \(g\circ f\) is injective.

  2. if \(f\) and \(g\) are surjective, we have that \(g\circ f\) is surjective.

Proof.

We prove a. and leave b. as an exercise.

Suppose that \(f,g\) are injective, and suppose that \(x_1,x_2\in X\) are such that \(x_1\not=x_2\). Since \(f\) is injective, we have that \(f(x_1)\not=f(x_2)\). Now, set \(y_1=f(x_1)\) and \(y_2=f(x_2)\). Then \(y_1,y_2\in Y\) with \(y_1\not=y_2\). Further, since \(g\) is injective, we have \(g(y_1)\not=g(y_2)\). It follows that \[(g\circ f)(x_1)=g(f(x_1))=g(y_1)\not=g(y_2)=g(f(x_2))=(g\circ f)(x_2).\] To conclude, for any \(x_1,x_2\in X\) with \(x_1\not=x_2\), we have \((g\circ f)(x_1)\not=(g\circ f)(x_2).\) Hence \(g\circ f\) is injective.

Corollary 6.19:

Consider \(f:X\to Y\) and \(g:Y\to Z\) be bijective, then \(g\circ f:X\to Z\) is also bijective.

Definition 6.20:

We say that a function \(f:X\to Y\) is invertible if there exists a function \(g:Y\to X\) such that:

  • \(g\circ f\) is the identity function on \(X\) (that is, \(\forall\,x\in X\), \((g\circ f)(x)=x\)), and

  • \(f\circ g\) is the identity function on \(Y\) (that is, \(\forall\,y\in Y\), \((f\circ g)(y)=y\)).

We say \(g\) is an inverse of \(f\).

If \(g\) is an inverse for \(f\), then we also have that \(f\) is an inverse for \(g\).
Etymology:

We’ve already seen that inverse comes from in and vertere which is the verb “to turn”. The inverse of an function, is one that turns the original function inside out to give back what was originally put in.

Example:
Define \(f:\mathbb{R}\to\mathbb{R}\) by \(f(x)=2x+3\), and define \(g:\mathbb{R}\to\mathbb{R}\) by \(g(x)=\frac{x-3}{2}\). Then \(\forall x\in\mathbb{R}\), we have \[(g\circ f)(x)=g(f(x))=\frac{f(x)-3}{2}=\frac{(2x+3)-3}{2}=x\] and \[(f\circ g)(x)=f(g(x))=2\cdot g(x)+3=2\left(\frac{x-3}{2}\right)+3=x.\] Hence \(g\circ f\) is the identity function on \(\mathbb{R}\), and \(f\circ g\) is the identity function on \(\mathbb{R}\). It follows that \(f\) is invertible where \(g\) is an inverse of \(f\).
Theorem 6.21:

Suppose \(f:X\to Y\), and suppose that \(g:Y\to X\) and \(h:Y\to X\) are inverses of \(f\). Then \(g=h\), that is, if \(f\) has an inverse then its inverse is unique.

Proof.

Exercise.

Remark:

Since the inverse is unique, the inverse of \(f:X\to Y\) is often denoted by \(f^{-1}:Y\to X\).

Note the difference between \(f^{-1}[ ]\) which is about pre-images (also known as inverse images), and \(f^{-1}( )\) which is about the inverse function. In particular, when using \(f^{-1}[ ]\) (i.e., calculating pre-images), we are not making any claim that \(f\) is invertible or indeed that \(f^{-1}:Y\to X\) is a function that exists. However, when using \(f^{-1}( )\), we are at the same time claiming that \(f\) is invertible and hence \(f^{-1}\) is a function.

It is important to note that we have two conditions to prove \(f:X\to Y\) is invertible. As an exercise, one can find examples of \(f:X\to Y\) and \(g:Y\to X\) such that \(f\circ g\) is the identity, but \(g\circ f\) is not. However, if we have some constraints on \(f\) or \(g\) then only one condition needs to be met.

Proposition 6.22:

Suppose that \(f:X\to Y\) and \(g:Y\to X\) are such that \(g\circ f\) is the identity function on \(X\).

  1. If \(g\) is injective, then \(f\circ g\) is the identity map on \(Y\) (and hence \(g=f^{-1}\)).

  2. If \(f\) is surjective, then \(f\circ g\) is the identity map on \(Y\) (and hence \(g=f^{-1}\)).

Proof.

Exercise.

This next theorem gives another way to test if a function is invertible or not.

Theorem 6.23:

Suppose that \(f:X\to Y\). Then \(f\) is invertible if and only if \(f\) is bijective.

Proof.

We prove both implications separately.

\(\Rightarrow\)). First, we will show that if \(f\) is invertible, then \(f\) is bijective. So suppose \(f\) is invertible, and let \(g:Y\to X\) denote the inverse of \(f\). Then \(g\circ f\) is the identity function on \(X\), and \(f\circ g\) is the identity function on \(Y\). We first show \(f\) is surjective. Take \(y_1 \in Y\), then \(y_1=f\circ g(y_1)=f(g(y_1))\). So set \(x_1=g(y_1)\). Then \(x_1 \in X\) and \(f(x_1)=f(g(y_1))=y_1\). So for all \(y\in Y\), there exists \(x\in X\) such that \(f(x)=y\).

We prove that \(f\) is also injective using the contrapositive. Suppose we have \(x_1,x_2 \in X\) with \(f(x_1)=f(x_2)=y\). Then since \(g\circ f\) is the identity on \(X\), we have \(x_1 = g(f(x_1))=y=g(f(x_2))=x_2\). Hence \(f\) is injective and surjective, so it is bijective.

\(\Leftarrow\)). Second, we will show that if \(f\) is bijective, then \(f\) is invertible. So suppose that \(f\) is bijective. We set \[g=\{(y,x)\in Y\times X: (x,y)\in f \}\subseteq Y\times X.\] Since \(f\) is bijective, by Corollary 6.10 we have that for all \(y\in Y\), there is a unique \(x\in X\) such that \(f(x)=y\). Hence, \(g\) is a function, that is \(g:Y\to X\).

We will show that \(g\) is the inverse of \(f\). For this, we first take \(x\in X\) and set \(y\in Y\) to be \(y=f(x)\). Then by the definition of \(g\), we have that \(g(y)=x\), so \((g\circ f)(x)=g(y)=x\). Since \(x\) was arbitrary, this shows that \(g\circ f\) is the identity function on \(X\). Now, choose any \(y\in Y\). Since \(f\) is bijective then for all \(y\in Y\), there exists a unique \(x\in X\) such that \(f(x)=y\). Further, since \(g\) is a function, we must have that \(g(y)=x\), and hence \((f\circ g)(y)=f(x)=y\). Since \(y\) was arbitrary, this shows that \(f\circ g\) is the identity function on \(Y\). Hence if \(f\) is bijective, we have that \(f\) is invertible.

It follows that \(f\) is invertible if and only if \(f\) is bijective.

Suppose that \(f:X\to Y\) is bijective. In the above proof, we found a way of defining \(f^{-1}:Y\to X\). That is, \(\forall\,y\in Y\), we can write \(f^{-1}(y)=x\) where \(x\in X\) such that \(f(x)=y.\)

Example:

Let \(a,b,c,d\in\mathbb{R}\) be such that \(a<b\) and \(c<d\). In the previous example, we defined \(f:[a,b]\to[c,d]\) as the composition of three functions and to be \(f(x) = c+\frac{(x-a)(d-c)}{b-a}\). We want to prove \(f\) is bijective. One method would be to show all three functions that made up its composition are bijective (and this is relatively easy to do so). But we will instead show \(f\) is invertible by constructing the inverse.

By symmetry (replacing \(a\) with \(c\), \(b\) with \(d\) etc), let us define \(g:[c,d]\to[a,b]\) via \(g(x)=a+\frac{(x-c)(b-a)}{d-c}\). We have \[\begin{align*} (g\circ f)(x)&=g(f(x))\\ &=a+(f(x)-c)\left(\frac{b-a}{d-c}\right)\\ &=a+\left[c+(x-a)\left(\frac{d-c}{b-a}\right)-c\right]\left(\frac{b-a}{d-c}\right)\\ &=a+(x-a)\\ &=x, \end{align*}\] for all \(x\in[a,b].\) Similarly, \[\begin{align*} (f\circ g)(x)&=f(g(x))\\ &=c+(g(x)-a)\left(\frac{d-c}{b-a}\right)\\ &=c+\left[a+(x-c)\frac{(b-a)}{(d-c)}\right]\left(\frac{d-c}{b-a}\right)\\ &=x, \end{align*}\] for all \(x\in[c,d]\). It follows that \(g:[c,d]\to[a,b]\) is the inverse of \(f\). Therefore, \(f\) is bijective.
Theorem 6.24:

Suppose \(f:X\to Y\) and \(g:Y\to Z\) are bijective. Then \((g\circ f)^{-1}=f^{-1}\circ g^{-1}\).

Proof.

Exercise.