6.4 Composition and inverses of functions

We now turn our attention on how to apply several functions in a row.

Definition 6.16:
Consider f:X\to Y and g:Y\to Z. We define the composition of g with f, denoted by g\circ f, by (g\circ f)(x)=g(f(x)), \text{ for all }x\in X.
Etymology:

Composition has the same roots as components and composite. It comes from co meaning “together with” and ponere meaning “to put”. So something composite is something “put together” from different components. The composition of two functions is the result of putting two functions together.

Since f assigns to x\in X exactly one value f(x)\in Y, and g assigns to f(x)\in Y exactly one value in Z, we have that g\circ f is a function from X to Z, that is g\circ f:X\to Z.

Remark:
The ordering of f and g is important, as g\circ f and f\circ g can be different. In fact, unless Z=X, then f \circ g probably does not make sense.

We have that composition of functions is associative. That is:

Proposition 6.17:

Consider f:X\to Y, g:Y\to Z and h:Z\to W. Then h\circ(g\circ f)=(h\circ g)\circ f.

Proof.

By Theorem 6.4, to show h\circ(g\circ f)=(h\circ g)\circ f, we need to show that \forall\,x\in X, we have (h\circ(g\circ f))(x)=((h\circ g)\circ f)(x). Let x\in X. Then (h\circ(g\circ f))(x)=h((g\circ f)(x))=h(g(f(x))), \forall\, x\in X and ((h\circ g)\circ f)(x)=(h\circ g)(f(x))=h(g(f(x))), \forall\, x\in X. Thus, h\circ(g\circ f)=(h\circ g)\circ f.

This argument holds for all x\in X, hence h\circ(g\circ f)=(h\circ g)\circ f.

The above theorem tells us that the notation h\circ g\circ f is unambiguous.

Example:

Let a,b,c,d\in\mathbb{R} be such that a<b and c<d. Recall that [a,b] denotes the closed interval from a to b, that is [a,b]=\{x\in\mathbb{R}:\ a\le x\le b\ \}.

We define the following three functions:

  • f_1:[a,b]\to[0,b-a] defined by f_1(x)=x-a;

  • f_2:[0,b-a]\to[0,d-c] defined by f_2(x)=x\frac{d-c}{b-a};

  • f_3:[0,d-c]\to[c,d] defined by f_3(x) = x+c.

We briefly check that f_2 is indeed a function from [0,b-a] to [0,d-c] by checking f_2([0,b-a])\subseteq [0,d-c] (in theory one should also do the same with f_1 and f_3, but these two cases are clearer). Pick x\in[0,b-a], so 0\leq x\leq b-a by definition. Then since b-a>0 we have 0\leq \frac{x}{b-a}\leq 1, and since d-c>0 we have 0\leq x\frac{d-c}{b-a}\leq d-c. So f_2(x) \in [0,d-c]. As this is true for all x\in [0,b-a], we indeed have f_2 is a function with co-domain [0,d-c].

We can construct f:[a,b]\to [c,d] by composing f_3 with f_2 with f_1, i.e., f = f_3\circ f_2 \circ f_1 is defined by f(x) = c + \frac{(x-a)(d-c)}{b-a}.

Properties of functions carry through composition.

Theorem 6.18:

Consider f:X\to Y and g:Y\to Z. Then:

  1. if f and g are injective, we have that g\circ f is injective.

  2. if f and g are surjective, we have that g\circ f is surjective.

Proof.

We prove a. and leave b. as an exercise.

Suppose that f,g are injective, and suppose that x_1,x_2\in X are such that x_1\not=x_2. Since f is injective, we have that f(x_1)\not=f(x_2). Now, set y_1=f(x_1) and y_2=f(x_2). Then y_1,y_2\in Y with y_1\not=y_2. Further, since g is injective, we have g(y_1)\not=g(y_2). It follows that (g\circ f)(x_1)=g(f(x_1))=g(y_1)\not=g(y_2)=g(f(x_2))=(g\circ f)(x_2). To conclude, for any x_1,x_2\in X with x_1\not=x_2, we have (g\circ f)(x_1)\not=(g\circ f)(x_2). Hence g\circ f is injective.

Corollary 6.19:

Consider f:X\to Y and g:Y\to Z be bijective, then g\circ f:X\to Z is also bijective.

Definition 6.20:

We say that a function f:X\to Y is invertible if there exists a function g:Y\to X such that:

  • g\circ f is the identity function on X (that is, \forall\,x\in X, (g\circ f)(x)=x), and

  • f\circ g is the identity function on Y (that is, \forall\,y\in Y, (f\circ g)(y)=y).

We say g is an inverse of f.

If g is an inverse for f, then we also have that f is an inverse for g.
Etymology:

We’ve already seen that inverse comes from in and vertere which is the verb “to turn”. The inverse of an function, is one that turns the original function inside out to give back what was originally put in.

Example:
Define f:\mathbb{R}\to\mathbb{R} by f(x)=2x+3, and define g:\mathbb{R}\to\mathbb{R} by g(x)=\frac{x-3}{2}. Then \forall x\in\mathbb{R}, we have (g\circ f)(x)=g(f(x))=\frac{f(x)-3}{2}=\frac{(2x+3)-3}{2}=x and (f\circ g)(x)=f(g(x))=2\cdot g(x)+3=2\left(\frac{x-3}{2}\right)+3=x. Hence g\circ f is the identity function on \mathbb{R}, and f\circ g is the identity function on \mathbb{R}. It follows that f is invertible where g is an inverse of f.
Theorem 6.21:

Suppose f:X\to Y, and suppose that g:Y\to X and h:Y\to X are inverses of f. Then g=h, that is, if f has an inverse then its inverse is unique.

Proof.

Exercise.

Remark:

Since the inverse is unique, the inverse of f:X\to Y is often denoted by f^{-1}:Y\to X.

Note the difference between f^{-1}[ ] which is about pre-images (also known as inverse images), and f^{-1}( ) which is about the inverse function. In particular, when using f^{-1}[ ] (i.e., calculating pre-images), we are not making any claim that f is invertible or indeed that f^{-1}:Y\to X is a function that exists. However, when using f^{-1}( ), we are at the same time claiming that f is invertible and hence f^{-1} is a function.

It is important to note that we have two conditions to prove f:X\to Y is invertible. As an exercise, one can find examples of f:X\to Y and g:Y\to X such that f\circ g is the identity, but g\circ f is not. However, if we have some constraints on f or g then only one condition needs to be met.

Proposition 6.22:

Suppose that f:X\to Y and g:Y\to X are such that g\circ f is the identity function on X.

  1. If g is injective, then f\circ g is the identity map on Y (and hence g=f^{-1}).

  2. If f is surjective, then f\circ g is the identity map on Y (and hence g=f^{-1}).

Proof.

Exercise.

This next theorem gives another way to test if a function is invertible or not.

Theorem 6.23:

Suppose that f:X\to Y. Then f is invertible if and only if f is bijective.

Proof.

We prove both implications separately.

\Rightarrow). First, we will show that if f is invertible, then f is bijective. So suppose f is invertible, and let g:Y\to X denote the inverse of f. Then g\circ f is the identity function on X, and f\circ g is the identity function on Y. We first show f is surjective. Take y_1 \in Y, then y_1=f\circ g(y_1)=f(g(y_1)). So set x_1=g(y_1). Then x_1 \in X and f(x_1)=f(g(y_1))=y_1. So for all y\in Y, there exists x\in X such that f(x)=y.

We prove that f is also injective using the contrapositive. Suppose we have x_1,x_2 \in X with f(x_1)=f(x_2)=y. Then since g\circ f is the identity on X, we have x_1 = g(f(x_1))=y=g(f(x_2))=x_2. Hence f is injective and surjective, so it is bijective.

\Leftarrow). Second, we will show that if f is bijective, then f is invertible. So suppose that f is bijective. We set g=\{(y,x)\in Y\times X: (x,y)\in f \}\subseteq Y\times X. Since f is bijective, by Corollary 6.10 we have that for all y\in Y, there is a unique x\in X such that f(x)=y. Hence, g is a function, that is g:Y\to X.

We will show that g is the inverse of f. For this, we first take x\in X and set y\in Y to be y=f(x). Then by the definition of g, we have that g(y)=x, so (g\circ f)(x)=g(y)=x. Since x was arbitrary, this shows that g\circ f is the identity function on X. Now, choose any y\in Y. Since f is bijective then for all y\in Y, there exists a unique x\in X such that f(x)=y. Further, since g is a function, we must have that g(y)=x, and hence (f\circ g)(y)=f(x)=y. Since y was arbitrary, this shows that f\circ g is the identity function on Y. Hence if f is bijective, we have that f is invertible.

It follows that f is invertible if and only if f is bijective.

Suppose that f:X\to Y is bijective. In the above proof, we found a way of defining f^{-1}:Y\to X. That is, \forall\,y\in Y, we can write f^{-1}(y)=x where x\in X such that f(x)=y.

Example:

Let a,b,c,d\in\mathbb{R} be such that a<b and c<d. In the previous example, we defined f:[a,b]\to[c,d] as the composition of three functions and to be f(x) = c+\frac{(x-a)(d-c)}{b-a}. We want to prove f is bijective. One method would be to show all three functions that made up its composition are bijective (and this is relatively easy to do so). But we will instead show f is invertible by constructing the inverse.

By symmetry (replacing a with c, b with d etc), let us define g:[c,d]\to[a,b] via g(x)=a+\frac{(x-c)(b-a)}{d-c}. We have \begin{align*} (g\circ f)(x)&=g(f(x))\\ &=a+(f(x)-c)\left(\frac{b-a}{d-c}\right)\\ &=a+\left[c+(x-a)\left(\frac{d-c}{b-a}\right)-c\right]\left(\frac{b-a}{d-c}\right)\\ &=a+(x-a)\\ &=x, \end{align*} for all x\in[a,b]. Similarly, \begin{align*} (f\circ g)(x)&=f(g(x))\\ &=c+(g(x)-a)\left(\frac{d-c}{b-a}\right)\\ &=c+\left[a+(x-c)\frac{(b-a)}{(d-c)}\right]\left(\frac{d-c}{b-a}\right)\\ &=x, \end{align*} for all x\in[c,d]. It follows that g:[c,d]\to[a,b] is the inverse of f. Therefore, f is bijective.
Theorem 6.24:

Suppose f:X\to Y and g:Y\to Z are bijective. Then (g\circ f)^{-1}=f^{-1}\circ g^{-1}.

Proof.

Exercise.