1 Introduction
- 1.1 How to use these notes
2 The building blocks of pure mathematics - sets and logic
3 The rationals are not enough
4 Proof by induction
5 Studying the integers
- 5.1 Greatest common divisor
- 5.2 Primes and the Fundamental Theory of Arithmetic
6 Moving from one set to another - Functions
7 Cardinality
8 Sets with structure - Groups
9 Linking groups together
10 Modular arithmetic and Lagrange
11 Taming infinity

1 Introduction

A mathematical theory is not to be considered complete until you have made it so clear that you can explain it to the first man whom you meet on the street. – David Hilbert –

Proofs are at the heart of mathematics, distinguishing what we know is mathematically true (or untrue) from what we still don’t know. The first aim of this course is to see different types of proofs and mathematical reasoning, and learning how to make your own mathematical arguments. Reading proofs will develop our critical thinking, e.g., how does this sentence follow from the previous one; how have we used all our assumptions; what happen if we tweak our assumptions; etc. Writing proofs will develop our creative and communication skills, e.g., how do we put together various ideas to come up with a new one; how do we explain our argument to someone else; etc.

As we can not see mathematical arguments without mathematical content, the second aim of this course is to give you a strong foundation to pure mathematics - a very broad branch of mathematics.

This notes are based on the notes from a previous Introduction to Proofs and Group Theory course (Steffi Zegowitz; Lynne Walling; Jos Gunns; Jeremy Rickard; John Mackay), and the early chapters from a previous Analysis course (Thomas Jordan; Ivor McGillivray; Oleksiy Klurman); .

1.1 How to use these notes

These notes are colour coded to help you identify the important bits.

Definition 1.1:

Definitions and Notations will be with this background colour. It is important to understand and know them to understand the course.

Theorem 1.2:

Results from the course, which will either be theorems, lemmas, propositions or corollaries, will have this background colour. It is important to know them so to be able to apply them to different situations.

Proof.

After most results, there will be a proof with this background colour. It is important to understand each proof as similar techniques can be applied in different mathematical situations. Note that many proofs are left as exercises so that one can practice coming up and writing proofs.

□

Remark:

Remarks and proof techniques will have this background colour. They are statements that are note-worthy.

Example:

Examples will not have a background colour, however there is a line so that one can tell when an example starts and when it finishes. When possible, lectures and problem classes will have different examples from the notes. This is so that students can be showcased different examples.

Interest:

Interest, History and Etymological (origin of words) notes will not have a background colour, however there is a line to show the start and the end of the note. These notes are there for general interest and is not examinable content.

Most of the Etymology notes comes from the book “The words of Mathematics” by Steven Schwartzman (The Mathematical Association of America, 1994)

Most of the history of the development of Group Theory comes from the article “The Evolution of Group Theory: A Brief Survey” by Israel Kleiner (Mathematics Magazine, Vol 59, No 4, 192-215, 1986).

2 The building blocks of pure mathematics - sets and logic

To be able to prove mathematical results, we need two ingredients - the setting in which we are doing maths; and the logical reasoning we use to do maths.

2.1 Sets

Before we can start to do maths, we need to know the setting in which we are doing maths. For this reason, sets can be seen as the building block of all maths.

Definition 2.1:

A set is a collection of objects, where we ignore repeated elements and the order they appear in.

Notation:

We use curly brackets { and } to denotes sets.

We use the symbol $\in$ to say an element is in a set. We drawn a line through it to negate it, i.e. we use $\notin$ to say an element is not in a set.

We use the symbol

<

to mean “(strictly) less than” and

\leq

to mean “less than or equal to”. Similarly we use

>

to mean “greater than” and

\geq

to mean “greater than or equal to”

Example:

The set of trigonometric functions: ${\sin (x), \cos (x), \tan (x)} = {\cos (x), \tan (x), \cos (x), \sin (x)}$ .

The set of integers between $2$ and $6$ : ${2, 3, 4, 5, 6} = {6, 4, 2, 2, 3, 4, 6, 5}$ .

Since

2 \leq 6

and

6 \leq 6

, we have

6 \in {2, 3, 4, 5, 6}

, however

15 \notin {2, 3, 4, 5, 6}

15 > 6

In mathematics, there are certain sets which are used so often that we have abbreviated notations for them. We start with two of them and we’ll build up to see more.

Notation:

If a set does not have any elements, we call it the empty set and use $\emptyset$ (or ${}$ ).
$Z$ is the set of integers, that is $Z = {\dots, - 3, - 2, - 1, 0, 1, 2, 3, \dots}$ .

Etymology:

The symbol $Z$ comes from the German “Zahlen” - which means numbers. It was first used by David Hilbert (German mathematician, 1862 - 1943), and popularised in Europe by Nicolas Bourbaki (a collective of mainly French mathematicians, 1935 - ) in their 1947 book “Algèbre”. Integers come from the Latin in, which means “not”, and tag which means “to touch”. An integer is a number that has been “untouched”, i.e. “intact” or “whole”.

The set of integers is equipped with the operations of additions $+$ and multiplication $\cdot$ that satisfy the following 10 arithmetic properties (5 relating to addition; a Distributive Law; and 4 relating to multiplication):

(A1) - Closure under addition For all $x, y \in Z$ we have $x + y \in Z$ .

(A2) - Associativity under addition For all $x, y, z \in Z$ we have $x + (y + z) = (x + y) + z$ .

(A3) - Commutativity of addition For all $x, y \in Z$ we have $x + y = y + x$ .

(A4) - Additive identity For all $x \in Z$ we have $x + 0 = x$ .

(A5) - Additive inverse For all $x \in Z$ we have $- x \in Z$ and $x + (- x) = 0$ .

(A6) - Distributive Law For all $x, y, z \in Z$ we have $x (y + z) = x y + x z$ .

(A7) - Closure under multiplication For all $x, y \in Z$ we have $x y \in Z$ .

(A8) - Associativity under multiplication For all $x, y, z \in Z$ we have $x (y z) = (x y) z$ .

(A9) - Commutativity of multiplication For all $x, y \in Z$ we have $x y = y x$ .

(A10) - Multiplicative identity For all $x \in Z$ we have $1 \cdot x = x$ .

Formally speaking, this is saying that $Z$ with $+$ and $\cdot$ is a ring. The notion of a ring is explored in more details in later units such as the second year unit Algebra 2. Properties (A1) to (A5) tells us that $Z$ with $+$ is an abelian group. We will explore the notion of groups and abelian groups later in this unit.

On top of these 10 arithmetic properties, $Z$ is well ordered, i.e., it comes with 4 order properties:

(O1) - trichotomy For all $x, y \in Z$ either $x < y$ , $x = y$ or $x > y$ .

(O2) - transitivity For all $x, y, z \in Z$ , if $x < y$ and $y < z$ then $x < z$ .

(O3) - compatibility with addition For all $x, y, z i n Z$ , if $x < y$ then $x + z < y + z$ .

(O4) - compatibility with multiplication For all $x, y, z \in Z$ , if $x < y$ and $z > 0$ then $z x < z y$ .

Etymology:

There are many words above that seems to have come from nowhere, but can be related back to words used in everyday English.

Associative comes from the Latin ad meaning “to” and socius meaning “partner, companion”. An associate is someone who is a companion to you. The property $x + (y + z) = (x + y) + z$ shows that it doesn’t matter who $y$ “keeps company with”, the result is still the same.

Commutative comes from the Latin co meaning “with” and mutare meaning “to move”. To commute is “to change, to exchange, to move”. In everyday situation, a commute is the journey (i.e., moving) from home to work. The property $x + y = y + x$ shows that we can move/exchange $x$ and $y$ and the result is the same.

Identity comes from the Latin idem meaning “same”. The additive identity is the element which keeps other elements the same when added to it. The multiplicative identity is the element which keeps other elements the same when multiplied by it.

Inverse comes from in and vertere which is the verb “to turn”. The additive inverse of $x$ is the quantity that turns back “adding $x$ ”.

Trichotomy comes from the Greek trikha meaning “in three parts” and temnein meaning “to cut”. If you pick $x \in Z$ , then you can cut $Z$ into three parts, the integers less than $x$ , the integers equal to $x$ and the integers greater than $x$ .

Transitive comes from the Latin trans meaning “across, beyond” and the verb itus/ire meaning “to go”. Knowing $x < y$ and $y < z$ allows us to go across/beyond $y$ to conclude $x < z$ .

As seen from the properties above, we often need to quantify objects in mathematics, that is we need to distinguish between a criteria always being met (“for all”), or the existence of a case where a criteria is met (“there exists”). Sometimes we also need to distinguish whether there is a unique case where a criteria is met.

Notation:

We have the following symbolic notation:

The symbol $\forall$ denotes for all, or equivalently, for every.
The symbol $\exists$ denotes there exists.
We use $\exists!$ to denote there exists a unique, or equivalently there exists one and only one.

A note on the usage of these symbols. Often we use these symbols as a shortcut when discussing maths verbally and writing down their ideas (for example: in lectures; discussing mathematics with colleagues). However, in formal text (for example: lecture notes; articles submitted to journals), often we avoid these symbols and use words. In formal text, these symbols tend to be reserved for when doing formal logic (which we will see later) or within a set setting (which we will see below).

Proof techniques:

To show that something is unique, we first show that one such case exists, and then proceed to show that if another case exists, it is equal to the first case.

□

We can use $Z$ as a starting point to construct different sets.

Notation:

Within the curly brackets of a set, we use colon, $:$ , to mean “such that”.

Example:

Returning to the previous example, the set of integers between $2$ and $6$ can be written as ${x \in Z : 2 \leq x \leq 6}$ ;

Notation:

We use $+$ to denote the positive numbers in a set. I.e., $Z_{+} = {x \in Z : x > 0} = {1, 2, \dots,}$ denotes the set of positive integers.
Similarly, $Z_{-} = {x \in Z : x < 0} = {- 1, - 2, - 3, \dots}$ denotes the set of negative integers.
We denote the set of non-negative integers by $Z_{\geq 0} = {x \in Z : x \geq 0} = {0, 1, 2, \dots,}$ .
If $n \in Z$ , we write $n Z = {n x : x \in Z} = {y \in Z : \exists x \in Z with y = x n} = {\dots, - 3 n, - 2 n, - n, 0, n, 2 n, 3 n, \dots}$ .

You may have also heard of the natural numbers denoted $N$ . However, some sources consider $0$ as a natural number (so $N = Z_{\geq 0}$ ) and other consider $0$ not to be a natural number (so $N = Z_{+}$ ). To avoid any confusion (and because we will need $Z_{\geq 0}$ sometimes and $Z_{+}$ at other times), we will not be using $N$ in this course.

Notation:

$Q$ is the set of rational numbers, that is $Q = {\frac{a}{b} : a \in Z, b \in Z_{+}}$ .

We will later see how $Q$ can be constructed from $Z$ and how this lead to a “natural” way to write each rational numbers

Etymology:

The symbol $Q$ stands for the word “quotient”, which is Latin for “how often/how many”, i.e., the quotient $\frac{a}{b}$ is “how many times does $b$ fit in $a$ ”. Surprisingly, the word “rational” to describe some numbers came after the use of “irrational” number. Ratio is Latin for “thinking/reasoning”. When the Phythagorean school in Ancient Greece realised some numbers could not be expressed as the quotient of two whole numbers (such as $\sqrt{2}$ , which we will prove later), they called those “irrational”, i.e. numbers that should not be thought about. “Rational” numbers were numbers that were not “irrational”, i.e., one could think about them.

We extend the operations of addition and multiplication as well as the order relation for the integers to the rational numbers. Let $\frac{a}{b}, \frac{c}{d} \in Q$ , then:

$\frac{a}{b} + \frac{c}{d} = \frac{a d + b c}{b d} and \frac{a}{b} \cdot \frac{c}{d} = \frac{a c}{b d} .$ Similarly to $Z$ , we have that $Q$ with $+$ and $\cdot$ satisfies the properties (A1) to (A10) as well as (O1) to (O4). It also satisfy the extra arithmetic property:

(A11) - multiplicative inverse For all $x \in Q$ with $x \neq 0$ , we have $x^{- 1} = \frac{1}{x} \in Q$ and $x^{- 1} x = 1$ .

Notice that (A11) is similar to (A5) but for multiplication. As we will see later in the course, another way of saying (A7) to (A11) is that $Q$ without $0$ under $\cdot$ is an abelian group. As you will see in Linear Algebra, the arithmetic properties of $Q$ ((A1) to (A11)) comes from the fact that $Q$ is a field.

Similarly with $Z$ , using $Q$ we can construct the sets $Q_{+}$ , $Q_{-}$ , $Q_{\geq 0}$ etc.

2.2 Truth table

Now that we have some objects to work with, we want to know what we can do with them. In a mathematical system, statements are either true or false, but not both at once. We sometimes say a statement $P$ holds to mean it is true. The label ‘true’ or ‘false’ associated to a given statement is its truth value.

Example:

The statement “

4 \in Z

” holds and the statement “

\frac{1}{2} \in Z

” is false. However, the statement “

x \in Z

” could be true or false depending on the value of

x

, but it can not be both true and false at the same time.

Definitions (i.e., $Z$ ) and axioms (i.e., (A1)-(A11) ) are statements we take to be true, while propositions, theorems and lemmas are statements that we want to prove are true and often consist of smaller statements linked together. While we often don’t write statements symbolically, looking at truth table and statements help us understand the fundamentals of how a proof works. We first introduce the four building blocks of statements.

Definition 2.2:

We use the symbol $\neg$ to mean not. The truth table below shows the value $\neg P$ takes depending on the truth value of $P$ .

$P$	$\neg P$
T	F
F	T

We will see concrete examples of how to negate statements later in this chapter.

Definition 2.3:

We use the symbol $\land$ to mean and. Let $P$ and $Q$ be two statements, we have that $P \land Q$ is true exactly when $P$ and $Q$ are true. The corresponding truth table is as follows:

$P$	$Q$	$P \land Q$
T	T	T
T	F	F
F	T	F
F	F	F

Example:

Let

x \in Z

P

be the statement

x \geq 5

and

Q

be the statement

x \leq 10

. Then

P \land Q

is the statement

x \geq 5

and

x \leq 10

, i.e.,

5 \leq x \leq 10

Definition 2.4:

We use the symbol $\lor$ to mean or. Let $P$ and $Q$ be two statements, we have that $P \lor Q$ is true exactly when at least one of $P$ or $Q$ is true. The corresponding truth table is as follows.

$P$	$Q$	$P \lor Q$
T	T	T
T	F	T
F	T	T
F	F	F

Example:

Let

x \in Z

P

be the statement

x \leq 5

and

Q

be the statement

x \geq 10

. Then

P \lor Q

is the statement

x \leq 5

x \geq 10

. (Do not write

5 \geq x \geq 10

as this makes no sense since

5 ≱ 10

Definition 2.5:

We use the symbol $⟹$ to mean implies. Let $P$ and $Q$ be two statements “ $P ⟹ Q$ ” is the same as “If $P$ , then $Q$ ”, or “for $P$ we have $Q$ ”. The corresponding truth table is as follows.

$P$	$Q$	$P ⟹ Q$
T	T	T
T	F	F
F	T	T
F	F	T

The above truth table can seem confusing, but consider the following example.

Example:

Recall (A1) - “For all $x, y \in Z$ we have $x + y \in Z$ ”. Let $P$ be the statement $x, y \in Z$ and $Q$ be the statement $x + y \in Z$ , then (A1) can be written symbolically as $\forall x, y, P ⟹ Q$ . This statement is true, regardless of what the value of $x$ and $y$ are. But let us look at the truth value of $P$ and $Q$ with different $x$ and $y$

$x$	$y$	$P$	$Q$
$0$	$1$	T	T
$0$	$\frac{3}{2}$	F	F
$\frac{1}{2}$	$1$	F	F
$\frac{1}{2}$	$\frac{3}{2}$	F	T

But no matter what

x

and

y

we pick, we will not get that

P

is true while

Q

is false.

Proof techniques:

Many theorems are of the type “If $P$ then $Q$ ”. A common method to prove such statement is to start with the assumption $P$ (i.e., assume $P$ is true) and use logical steps to arrive at $Q$ .These are often referred to as “direct proofs”.

The above example also shows that you can not start a proof with what you want to prove, as you could start with something false and end up with something true.

□

Turning back to sets, we look at examples of direct proofs.

Definition 2.6:

A set $A$ is a subset of a set $B$ , denoted by $A \subseteq B$ , if every element of $A$ is also an element of $B$ (Symbolically: $\forall x \in A$ , we have $x \in B$ ).

We write $A ⊈ B$ when $A$ is not a subset of $B$ , so there is at least one element of $A$ which is not an element of $B$ (Symbolically: $\exists x \in A$ such that $x \notin B$ ).

A set

A

is a proper subset of a set

B

, denoted by

A ⊊ B

, if

A

is a subset of

B

but

A \neq B

Remark:

We have that

\emptyset

is a subset of every sets. We also have that

\emptyset

is the only subset of

\emptyset

Example:

Let $A = 4 Z = {4 n : n \in Z}$ and $B = 2 Z = {2 n : n \in Z}$ . We will prove $A \subseteq B$ using a direct proof. [If we let $P$ be the statement $x \in A$ and $Q$ be the statement $x \in B$ , note that $\forall x \in A$ we have $x \in B$ translate symbolically to $\forall x, P ⟹ Q$ .]

Let $x \in A$ [i.e., suppose $P$ is true]. Then there exists $n \in Z$ such that $x = 4 n$ . Hence, $x = 4 n = 2 (2 n) = 2 m$ , for some $m \in Z$ , i.e. there exists $m \in Z$ such that $x = 2 m$ . Hence, $x \in B$ [i.e., $Q$ is true]. Since this argument is true for any $x \in A$ , we have that for all $x \in A, x \in B$ , hence $A \subseteq B$ .

We will now prove that $B ⊈ A$ by showing there is an element of $B$ which is not an element of $A$ . Take $x = 10$ . Then $x \in B$ [as $x = 5 \cdot 2$ ] but $x \notin A$ [as $x = \frac{5}{2} \cdot 4$ but $4 \notin Z$ ].

Combining the two statements above, it follows that

A ⊊ B

Proof techniques:

To prove something is true for all $x \in X$ , we “let $x \in X$ ” or “suppose $x \in X$ ” with no further conditions. Whatever we conclude about $x$ is true for all $x \in X$ . This is the technique we used in Part 1. above.

Suppose

A

and

B

are sets. Showing that

A = B

is the same as showing that

A \subseteq B

and

B \subseteq A

□

Example:

We show that $2 Q = Q$ , by showing $2 Q \subseteq Q$ and $Q \subseteq 2 Q$ .

Let $x \in 2 Q$ . Then there exists $y \in Q$ such that $x = 2 y$ . By (A6), since $2, y \in Q$ , we have $x = 2 y \in Q$ . As this is true for all $x \in 2 Q$ we have $2 Q \subseteq Q$ .

Let $x \in Q$ . Let $y = \frac{x}{2}$ . Note that $\frac{1}{2} \in Q$ so by (A6), since $\frac{1}{2}, x \in Q$ , we have $y = \frac{x}{2} \in Q$ . Hence, there exists $y \in Q$ such that $x = 2 y$ , so $x \in 2 Q$ . As this is true for all $x \in Q$ we have $Q \subseteq 2 Q$ .

We can combine the three basic symbols together to make more complicated statements, and use truth tables to find when their truth values based on the truth values of $P$ and $Q$ .

Example:

Let $P$ and $Q$ be two statements. The corresponding truth table for $(\neg P) \lor Q$ is as follows.

$P$	$Q$	$\neg P$	$(\neg P) \lor Q$
T	T	F	T
T	F	F	F
F	T	T	T
F	F	T	T

Example:

Let $P$ and $Q$ be two statements. The corresponding truth table for $(\neg Q) ⟹ (\neg P)$ is as follows.

$P$	$Q$	$\neg P$	$\neg Q$	$(\neg Q) ⟹ (\neg P)$
T	T	F	F	T
T	F	F	T	F
F	T	T	F	T
F	F	T	T	T

2.3 Logical Equivalence

As well as combining statements together to make new statements, we also want to know whether two statements are equivalent, that is they are the same.

Definition 2.7:

We use the symbol $⟺$ to mean if and only if. For two statements $P$ and $Q$ , “ $P ⟺ Q$ ” means $P ⟹ Q$ and $Q ⟹ P$ . In this case we say “ $P$ and $Q$ are equivalent”. The corresponding truth table is as follows.

$P$	$Q$	$P ⟺ Q$
T	T	T
T	F	F
F	T	F
F	F	T

If we take two statements which are logically equivalent, say $P$ is equivalent to $Q$ , then proving $P$ to be true is equivalent to proving $Q$ to be true. Similarly, proving $Q$ to be true is equivalent to proving $P$ to be true. We can use truth tables to prove if two (abstract) statements are equivalent.This will prove to be useful later on when we turn a statement we want to prove is true into another equivalent statement that may be easier to prove.

Theorem 2.8:

Let $P$ and $Q$ be two statements.

$P ⟹ Q$ is equivalent to $(\neg Q) ⟹ (\neg P)$ .
$P ⟹ Q$ is equivalent to $(\neg P) \lor Q$ .

Proof.

Using the last two examples and the truth table for $P ⟹ Q$ , we have the following truth table.

$P$	$Q$	$P ⟹ Q$	$(\neg Q) ⟹ (\neg P)$	$(\neg P) \lor Q$
T	T	T	T	T
T	F	F	F	F
F	T	T	T	T
F	F	T	T	T

Hence, for all truth values of $P$ and $Q$ , we have that $P ⟹ Q$ and $(\neg Q) ⟹ (\neg P)$ have the same truth values. Therefore, $P ⟹ Q$ is equivalent to $(\neg Q) ⟹ (\neg P)$ .

Similarly, for all truth values of

P

and

Q

, we have that

P ⟹ Q

and

(\neg P) \lor Q

have the same truth values. Therefore,

P ⟹ Q

is equivalent to

(\neg P) \lor Q

□

Proof techniques:

Notice that the above proof has several sentences to explain to the reader what is going on. It is made up of full sentences with a clear conclusion on what the calculation (in this case the truth table) shows. A proof should communicate clearly to the reader why the statement (be that a theorem, proposition or lemma) is true.

How much detail you put in a proof will be influenced by who your target audience is - this is a skill you will develop over your time as a student.

□

We leave the following proposition as an exercise.

Proposition 2.9:

Suppose $P$ and $Q$ are two statements. Then:

$P ⟺ (\neg (\neg P))$ .
$(P \lor Q) ⟺ ((\neg P) ⟹ Q)$

Proof.

Exercise.

□

The next proposition shows that $\land$ and $\lor$ are associative, that is, $P \land Q \land R$ and $P \lor Q \lor R$ are statements that are clear without parentheses (and therefore do not require parentheses).

Proposition 2.10:

Suppose $P, Q, R$ are statements.

$((P \land Q) \land R) ⟺ (P \land (Q \land R)) .$
$((P \lor Q) \lor R) ⟺ (P \lor (Q \lor R)) .$

Proof.

We prove part a. and leave the proof of part b. as an exercise. We have the following truth table.

$P$	$Q$	$R$	$P \land Q$	$(P \land Q) \land R$	$Q \land R$	$P \land (Q \land R)$
T	T	T	T	T	T	T
T	T	F	T	F	F	F
T	F	T	F	F	F	F
T	F	F	F	F	F	F
F	T	T	F	F	T	F
F	T	F	F	F	F	F
F	F	T	F	F	F	F
F	F	F	F	F	F	F

Hence, for any truth values of

P, Q, R

, the truth table shows that

((P \land Q) \land R) ⟺ (P \land (Q \land R)) .

□

Proposition 2.11:

Let $P, Q, R$ be statements. Then $P ⟹ (Q ⟺ R)$ and $(P ⟹ Q) ⟺ R$ are not equivalent.

Proof.

We have the following truth table

$P$	$Q$	$R$	$Q ⟺ R$	$P ⟹ (Q ⟺ R)$	$(P ⟹ Q)$	$(P ⟹ Q) ⟺ R$
T	T	T	T	T	T	T
T	T	F	F	F	T	F
T	F	T	F	F	F	F
T	F	F	T	T	F	T
F	T	T	T	T	T	T
F	T	F	F	T	T	F
F	F	T	F	T	T	T
F	F	F	T	T	T	F

From the above truth table, we can see that when

P

Q

and

R

are false, then the truth value of

P ⟹ (Q ⟺ R)

(which is true) is different from the truth value of

(P ⟹ Q) ⟺ R

(which is false). Hence

P ⟹ (Q ⟺ R)

and

(P ⟹ Q) ⟺ R

are not equivalent.

□

The above proposition shows that the statement $P ⟹ Q ⟺ R$ therefore has no clear meaning without parenthesis. Similarly, there is an exercise to show that $P ⟺ (Q ⟹ R)$ and $(P ⟺ Q) ⟹ R$ are not equivalent, so $P ⟺ Q ⟹ R$ is likewise not clear. Hence the meaning of assertions such as $P ⟹ Q ⟺ R ⟹ S$ is undefined (unless one puts in parenthesis).

As an exercise, one may also prove the following sometimes useful equivalences.

Proposition 2.12:

Let $P, Q, R$ be statements. Then:

$(P \land (Q \land R)) ⟺ ((P \land Q) \land (P \land R))$ ;
$(P \lor (Q \lor R)) ⟺ ((P \lor Q) \lor (P \lor R)) .$

Proof.

Exercise.

□

The next proposition shows that

\land

and

\lor

are distributive.

Proposition 2.13:

Let $P, Q, R$ be statements. Then

( $P \land (Q \lor R)) ⟺ ((P \land Q) \lor (P \land R)) .$
( $P \lor (Q \land R)) ⟺ ((P \lor Q) \land (P \lor R)) .$

Proof.

We will prove part a. and leave the proof of part b. as an exercise. We have the following truth table.

$P$	$Q$	$R$	$Q \lor R$	$P \land (Q \lor R)$	$P \land Q$	$P \land R$	$(P \land Q) \lor (P \land R)$
T	T	T	T	T	T	T	T
T	T	F	T	T	T	F	T
T	F	T	T	T	F	T	T
T	F	F	F	F	F	F	F
F	T	T	T	F	T	F	F
F	T	F	T	F	F	F	F
F	F	T	T	F	F	F	F
F	F	F	F	F	F	F	F

Hence, for any truth values of

P, Q, R

, the above truth table shows that

(P \land (Q \lor R)) ⟺ ((P \land Q) \lor (P \land R)) .

□

Let us return to sets to see how logic may be applied to prove statements.

Definition 2.14:

Suppose that $A$ and $B$ are subsets of some set $X$ .

$A \cup B$ denotes the union of $A$ and $B$ , that is $A \cup B = {x \in X : x \in A or x \in B} .$
$A \cap B$ denotes the intersection of $A$ and $B$ , that is $A \cap B = {x \in X : x \in A and x \in B} .$

When $A \cap B = \emptyset$ , we say that $A$ and $B$ are disjoint.

Example:

We have $Z_{\geq 0} \cup Z_{-} = Z$ . We also see that $Z_{\geq 0} \cap Z_{-} = \emptyset$ , hence they are disjoint.

Etymology:

The word union comes from the Latin unio meaning “a one-ness”. The union of two sets is a set that lists every element in each set just once (even if the element appears in both sets). While the symbol $\cup$ looks like a “U” (the first letter of union), this is a coincidence. While the symbol $\cup$ was first used by Hermann Grassmann (Polish/German mathematician, 1809 - 1877) in 1844, Giuseppe Peano (Italian mathematician, 1858-1932) used it to represent the union of two sets in 1888 in his article Calcolo geometrico secondo Ausdehnungslehre di H. Grassmann. However, at the time the union was referred to as the disjunction of two sets.

The word intersect comes from the Latin inter meaning “within, in between” and sectus meaning “to cut”. The interesection of two curves is the place they cut each other, the intersection of two sets is the “place” where two sets overlaps. While the symbol $\cap$ was first used by Gottfried Leibniz (German mathematician, 1646 - 1716), he also used it to represent regular multiplication (there are some links between the two ideas). Again, $\cap$ was used by Giuseppe Peano in 1888 to refer to intersection only.

The word disjoint comes from the Latin dis meaning “away, in two parts” and the word joint. Two sets are disjoint if they are apart from each other without any joints between them. (Compare this to disjunction, which has the same roots but is used to mean joining two things that are apart).

Lemma 2.15:

Let $X$ be a set, and for $x \in X$ , let $P (x)$ be the statement that $x$ satisfies the criteria $P$ , and let $Q (x)$ be the statement that $x$ satisfies the criteria $Q$ . Set $A = {x \in X : P (x)} and B = {x \in X : Q (x)} .$ Then $\begin{aligned} A \cap B & = {x \in X : P (x) \land Q (x)}, \\ A \cup B & = {x \in X : P (x) \lor Q (x)} . \end{aligned}$

Proof.

For $x \in X$ , we have that $x \in A$ if and only if the statement $P (x)$ holds. Similarly, we have that $x \in B$ if and only if the statement $Q (x)$ holds. Then $\begin{aligned} A \cap B & = {x \in X : (x \in A) \land (x \in B)} \\ = {x \in X : P (x) \land Q (x)} \end{aligned}$ and $\begin{aligned} A \cup B & = {x \in X : (x \in A) \lor (x \in B)} \\ = {x \in X : P (x) \lor Q (x)} . \end{aligned}$

□

We can use our work on logical equivalence to show that that $\cap$ and $\cup$ are associative.

Proposition 2.16:

Suppose $A, B, C$ are subsets of a set $X$ . Then

$A \cap (B \cap C) = (A \cap B) \cap C$ .
$A \cup (B \cup C) = (A \cup B) \cup C$ .

Proof.

We will prove part a. and leave part b. as an exercise

Suppose

x \in X

. Let

P

be the statement that

x \in A

, let

Q

be the statement that

x \in B

, and let

R

be the statement that

x \in C

. Recall Proposition 2.10,

P \land (Q \land R) ⟺ (P \land Q) \land R

. Then

\begin{aligned} x \in A \cap (B \cap C) & ⟺ (x \in A) \land (x \in B \cap C) \\ ⟺ (x \in A) \land ((x \in B) \land (x \in C)) \\ ⟺ P \land (Q \land R) \\ ⟺ (P \land Q) \land R \\ ⟺ ((x \in A) \land (x \in B)) \land (x \in C) \\ ⟺ (x \in A \cap B) \land (x \in C) \\ ⟺ x \in (A \cap B) \cap C . \end{aligned}

Hence, we have that

x \in A \cap (B \cap C)

if and only if

x \in (A \cap B) \cap C

. It follows that

A \cap (B \cap C) = (A \cap B) \cap C .

□

Proof techniques:

In the above proof, we used

⟺

in each line so that the proof works both way (and we concluded

A \cap (B \cap C) \subseteq (A \cap B) \cap C

at the same time as

(A \cap B) \cap C \subseteq A \cap (B \cap C)

). When using

⟺

, one needs to be very careful that indeed the implication works both ways, as it is very easy to make a mistake along the way. For this reason, many proofs instead show

P ⟹ Q

and

Q ⟹ P

as two separate proofs (within the same proof).

□

Similarly, we have that $\cup$ and $\cap$ are distributive.

Proposition 2.17:

Let $A, B, C$ be subsets of a set $X$ . Then

$A \cap (B \cup C) = (A \cap B) \cup (A \cap C) .$
$A \cup (B \cap C) = (A \cup B) \cap (A \cup C) .$

Proof.

We will prove part a. and leave part b. as an exercise

Suppose

x \in X

. Let

P

be the statement that

x \in A

, let

Q

be the statement that

x \in B

, and let

R

be the statement that

x \in C

. Recall Proposition 2.13 that

P \land (Q \lor R) ⟺ (P \land Q) \lor (P \land R) .

Then

\begin{aligned} x \in A \cap (B \cup C) & ⟺ (x \in A) \land (x \in B \cup C) \\ ⟺ (x \in A) \land ((x \in B) \lor (x \in C)) \\ ⟺ P \land (Q \lor R) \\ ⟺ (P \land Q) \lor (P \land R) \\ ⟺ ((x \in A) \land (x \in B)) \lor ((x \in A) \land (x \in C)) \\ ⟺ (x \in A \cap B) \lor (x \in A \cap C) \\ ⟺ x \in (A \cap B) \cup (A \cap C) . \end{aligned}

Hence, we have that

x \in A \cap (B \cup C)

if and only if

x \in (A \cap B) \cup (A \cap C)

. It follows that

A \cap (B \cup C) = (A \cap B) \cup (A \cap C)

□

2.4 Negations

Being able to negate statements is important for two reasons:

Instead of proving $P$ is true, it might be easier to prove that $\neg P$ is false (proof by contradiction).
Instead of proving $P ⟹ Q$ it might be easier (by Theorem 2.8 ) to prove $\neg Q ⟹ \neg P$ (proof by contrapositive).

We will expand on these two points later. We already know how to negate most simple statements, for example:

the negation of $x = 5$ is $x \neq 5$ .
the negation of $x > 5$ is $x \leq 5$ (notice the strict inequality became unstrict).
the negation of $x \in X$ is $x \notin X$ .

To negate simple statements that have been strung together, we use the following theorem.

Theorem 2.18:

Suppose $P, Q$ are two statements. Then

$\neg (P \land Q) ⟺ ((\neg P) \lor (\neg Q)) .$
$\neg (P \lor Q) ⟺ ((\neg P) \land (\neg Q)) .$
$\neg (P ⟹ Q) ⟺ (P \land (\neg Q)) .$

Proof.

We will prove part a. and leave part b. and c. as exercises. We have the following truth table.

$P$	$Q$	$\neg P$	$\neg Q$	$P \land Q$	$\neg (P \land Q)$	$(\neg P) \lor (\neg Q)$
T	T	F	F	T	F	F
T	F	F	T	F	T	T
F	T	T	F	F	T	T
F	F	T	T	F	T	T

Thus, for any truth value of

P, Q, R

, the truth values of

\neg (P \land Q)

and

(\neg P) \lor (\neg Q)

are the same.

□

Example:

If $x$ is not between $5$ and $10$ , then $x$ is either less than $5$ or more than $10$ . Formally speaking $\neg (5 \leq x \leq 10)$ if and only if $\neg (5 \leq x)$ or $\neg (x \leq 10)$ if and only if $x < 5$ or $10 < x$ .

For statements that involves quantifiers (i.e., $\forall, \exists$ ), we use the following theorem.

Theorem 2.19:

Let $X$ be a set, and suppose that $P (x)$ is a statement involving $x \in X$ . Then

$\neg (\forall x \in X, P (x)) ⟺ \exists x \in X such that \neg P (x) .$

We also have

\neg (\exists x \in X such that P (x)) ⟺ \forall x \in X, \neg P (x) .

Proof.

Suppose that $\forall x \in X, P (x)$ is a false statement. Then there must be at least one $x \in X$ such that $P (x)$ does not hold. That is, $\begin{matrix} (2.1) & \neg (\forall x \in X, P (x)) ⟹ \exists x \in X such that \neg P (x) . \end{matrix}$

Conversely, suppose that $\exists x \in X such that \neg P (x)$ is a true statement. Then it is not the case that $P (x)$ holds for all $x \in X$ , that is $\begin{matrix} (2.2) & \exists x \in X such that \neg P (x) ⟹ \neg (\forall x \in X, P (x)) . \end{matrix}$

By (2.1) and (2.2), it follows that $\neg (\forall x \in X, P (x) ⟺ \exists x \in X such that \neg P (x) .$

Equivalently, we have

\begin{aligned} \neg (\neg (\forall x \in X, P (x))) & ⟺ \neg (\exists x \in X such that \neg P (x)) \\ ⟺ \forall x \in X, P (x) . \end{aligned}

Setting

Q (x) = \neg P (x)

, we have

\neg (\exists x \in X such that Q (x)) ⟺ \forall x \in X, \neg Q (x) .

□

Example:

Let $X$ be a set and let us negate the statement “ $\forall x \in X, P (x) ⟹ Q (x)$ ”, by writing equivalent statements using Theorem 2.18 and Theorem 2.19.

$\begin{aligned} \neg (\forall x \in X, P (x) ⟹ Q (x)) & ⟺ & \exists x \in X such that \neg (P (x) ⟹ Q (x)) \\ ⟺ & \exists x \in X such that (P (x) \land \neg Q (x)) . \end{aligned}$

To make this more concrete, let

P (x)

be the statement

x \in 2 Z

and

Q (x)

the statement

x \in 4 Z

. We have already shown that

2 Z ⊈ 4 Z

, i.e.

\forall x \in Z, P (x) ⟹ Q (x)

is false. We did this by showing when

x = 10

P (x)

is false while

Q (x)

is true, i.e.,

\exists x \in Z

such that

P (x)

is true and

Q (x)

is false.

Negating statements is also very useful to see when an object does not satisfy a definition. We will see more examples of this later in the course, but for the moment here is an example.

Example:

Using (O2) as an example, a set $X$ satisfies transitivity (with respect to $<$ ) if for all $x, y, z \in X$ if $x < y$ and $y < z$ then $x < z$ . Let us see what it means for $X$ not to satisfy (O2). First we turn the definition into symbolic language $\forall x, y, z \in X, (x < y) \land (y < z) ⟹ (x < z)$ . We then negate this

$\begin{aligned} \neg (\forall x, y, z \in X, (x < y) \land (y < z) ⟹ (x < z)) \\ ⟺ & \exists x, y, z \in X such that \neg ((x < y) \land (y < z) ⟹ (x < z)) \\ ⟺ & \exists x, y, z \in X such that ((x < y) \land (y < z)) \land \neg (x < z)) \\ ⟺ & \exists x, y, z \in X such that (x < y) \land (y < z) \land (x \geq z) . \end{aligned}$

Proof techniques:

We have inserted phrases like “such that” to make our sentences more readable without changing their meanings.

□

Remark:

To negate a statement with the quantifier $\exists!$ , it is useful to first translate this notation to not include “!”.

Suppose $X$ is a set, and $P (x)$ is a proposition dependent on $x \in X$ . When we say there is a unique $x \in X$ so that $P (x)$ holds we mean first that:

there is some $x \in X$ so that $P (x)$ is true, and
if we have $x_{1}, x_{2} \in X$ with $P (x_{1})$ and $P (x_{2})$ are true, then $x_{1} = x_{2}$ .

More symbolically, we have $[\exists! x \in X, P (x)] ⟺ [(\exists x \in X, P (x)) \land (\forall x_{1}, x_{2} \in X, (P (x_{1}) \land P (x_{2})) ⟹ x_{1} = x_{2})] .$ So negating, we get: $\begin{aligned} \neg [\exists! x \in X, P (x)] \\ ⟺ \neg [(\exists x \in X, P (x)) \land (\forall x_{1}, x_{2} \in X, (P (x_{1}) \land P (x_{2})) ⟹ x_{1} = x_{2})] \\ ⟺ [\neg (\exists x \in X, P (x)) \lor \neg [(\forall x_{1}, x_{2} \in X, (P (x_{1}) \land P (x_{2}) ⟹ x_{1} = x_{2})]] \\ ⟺ [(\forall x \in X, \neg P (x)) \lor [\exists x_{1}, x_{2} \in X, \neg (P (x_{1}) \land P (x_{2}) ⟹ x_{1} = x_{2})]] \\ ⟺ [(\forall x \in X, \neg P (x)) \lor [\exists x_{1}, x_{2} \in X, (P (x_{1}) \land P (x_{2}) \land x_{1} \neq x_{2})]] . \end{aligned}$

Thus the negation of exactly one

x

such that

P (x)

holds is “either there exists no

x

such that

P (x)

holds or there exists more than one

x

such that

P (x)

holds”

2.5 Contradiction and the contrapositive

As mentioned earlier, negating statements are useful when trying to prove statements using different methods. First we look at proof by contradiction.

Proof techniques:

A proof by contradiction uses the following logic. Instead of trying to prove a statement

P

is true, we assume that

\neg P

is true. If we end up with a contradiction, either to something we already knew is true or to our original assumption, we must deduce that

\neg P

is false. This allows us to conclude that

P

is true.

□

While we will use this method more extensively later, let us see an easy example.

Example:

Statement: Let $x, y \in Q$ . If $x < y$ then $- x > - y$ .

Proof: For the sake of a contradiction let us assume that

x < y

and

- x \leq - y

. (Recall the negation of

P ⟹ Q

P \lor \neg Q

.) Since

x < y

we have

x - x < y - x

[by (O3)], i.e.

0 < y - x

. Since

- x \leq - y

then

- x + y \leq - y + y

, i.e.

y - x \leq 0

. So

0 < y - x

and

y - x \leq 0

, i.e. [by (O1)]

0 < 0

, which is a contradiction. Hence

x < y

and

- x \leq - y

is false, so If

x < y

then

- x > - y

◻

Another technique is a proof by contrapositive.

Definition 2.20:

Let

P

and

Q

be two statements. The contrapositive of “

P ⟹ Q

” is “

\neg Q ⟹ \neg P

”.

Example:

The contrapositive of (A1), “for all $x, y$ , if $x, y \in Z$ then $x + y \in Z$ ” is “for all $x, y$ , if $x + y \notin Z$ then $x \notin Z$ or $y \notin Z$ .”

The contrapositive of (O3), “for all

x, y, z \in Q

, if

x < y

then

x + z < y + z

” is “$for all

x, y, z \in Q

x + z \geq y + z

then

x \geq y

”.

By Theorem 2.8, we know that $P ⟹ Q$ is equivalent to its contrapositive. Sometimes, proving the contrapositive is easier than proving $P ⟹ Q$ (we will see more examples of this later).

Example:

Statement: Let $x, y \in Q$ . If $x < y$ then $- x > - y$ .

Proof: We prove the contrapositive. The contrapositive is “if

- x \leq - y

then

x \geq y

”. Suppose

- x \leq - y

then

- x + (x + y) \leq - y + (x + y)

[since

x + y \in Q

by (A1) and using (O3)*]. In other words

y \leq x

, i.e.

x \geq y

◻

Note that we have proven “ $x < y ⟹ - x > - y$ ” by contradiction and by contrapositive. It is also worth noting that we could have proven it directly. This is meant to show that often there are numerous way to prove the same thing.

Note that the contrapositive doesn’t just change the implication symbol, but it also negates $P$ and $Q$ . The contrapositive can often be confused with the converse:

Definition 2.21:

Let

P

and

Q

be two statements. The converse of “

P ⟹ Q

” is “

Q ⟹ P

”.

Note that $P ⟹ Q$ is not equivalent to its converse. Be careful when using a theorem/lemma/proposition that you are not using the converse by accident (which may not be true).

Example:

The converse of (A1) “for all

x, y \in Q

, if

x, y \in Z

then

x + y \in Z

” is “for all

x, y \in Q

, if

x + y \in Z

then

x, y \in Z

”, which we know is false (e.g., by taking

x = y = \frac{1}{2}

If the converse of $P ⟹ Q$ is true, then we can deduce that $P ⟺ Q$ .

Example:

The converse of “If $x < y$ then $- x > - y$ ” is “ $- x > - y$ then $x < y$ ”. We can show this is true, suppose $- x > - y$ then $- x + (x + y) > - y + (x + y)$ , i.e. $y > x$ , i.e. $x < y$ .

Therefore we conclude, for all

x, y \in Q

x < y

if and only if

- x > - y

Etymology:

Contradiction comes from the Latin contra which means “against” and dict which is a conjugation of the verb “to say, tell”. A contradiction is a statement that speaks against another statement.

Contrapositive also comes from the Latin contra and the Latin positus which is a conjugation of the verb “to put”. The contrapositive of a statement is a statement “put against” the original statement, we have negated both parts and reversed the order. Furthermore it is “positive” as it has the same truth value as the original statement.

On the other hand, while sounding similar, converse comes from the Latin con which means “together with” and vergere which means “to turn”. The converse turns the order of

P

and

Q

2.6 Set complement

We finish this section by looking at the complement of sets.

Definition 2.22:

Suppose that $A$ and $B$ are subsets of some set $X$ .

$A ∖ B$ denotes the relative complement of $A$ with respect to $B$ , that is $A ∖ B = {x \in X : x \in A and x \notin B} .$
$A^{c}$ denotes the complement of $A$ , that is $A^{c} = {x \in X : x \notin A} .$

Example:

Let

X = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

A = {2, 4, 6, 8}

and

B = {4, 5, 6, 7}

. Then

A ∖ B = {2, 8}

B ∖ A = {5, 7}

and

A^{c} = {1, 3, 5, 7, 9, 10}

Example:

We have

Z ∖ Z_{-} = Z_{\geq 0}

Z_{\geq 0} ∖ {0} = Z_{+}

Proposition 2.23:

Suppose $A, B$ are subsets of a set $X$ . Then

$A ∖ B = A \cap B^{c} .$
$(A ∖ B)^{c} = A^{c} \cup B$ .

Proof.

We will prove part a. and leave part b. as an exercise.

Let

x \in X

. Then

\begin{aligned} x \in A ∖ B & ⟺ (x \in A) \land (x \notin B) \\ ⟺ (x \in A) \land (x \in B^{c}) \\ ⟺ x \in A \cap B^{c} . \end{aligned}

Hence, we have that

x \in A ∖ B

is equivalent to

x \in A \cap B^{c}

. It follows that

A ∖ B = A \cap B^{c} .

□

Theorem 2.24:

Suppose that $A, B, C$ are subsets of a set $X$ . Then

$A ∖ (B \cup C) = (A ∖ B) \cap (A ∖ C) .$
$A ∖ (B \cap C) = (A ∖ B) \cup (A ∖ C) .$
$(A \cap B)^{c} = A^{c} \cup B^{c}$ .
$(A \cup B)^{c} = A^{c} \cap B^{c}$ .

Proof.

We will prove parts a. and d. and leave parts b. and c. as exercises.

Proving a.) Let $x \in X$ . Recall that for statements $P, Q, R$ , we have that $P \land (Q \land R) ⟺ (P \land Q) \land (P \land R) .$ Then $\begin{aligned} x \in A ∖ (B \cup C) & ⟺ (x \in A) \land (x \notin B \cup C) \\ ⟺ (x \in A) \land (\neg (x \in B \cup C)) \\ ⟺ (x \in A) \land (\neg ((x \in B) \lor (x \in C))) \\ ⟺ (x \in A) \land ((x \notin B) \land (x \notin C)) \\ ⟺ ((x \in A) \land (x \notin B)) \land ((x \in A) \land (x \notin C)) \\ ⟺ (x \in A ∖ B) \land (x \in A ∖ C) \\ ⟺ x \in (A ∖ B) \cap (A ∖ C) . \end{aligned}$

Hence, we have that $x \in A ∖ (B \cup C)$ if and only if $x \in (A ∖ B) \cap (A ∖ C)$ . It follows that $A ∖ (B \cup C) = (A ∖ B) \cap (A ∖ C) .$

Proving d.) Let $x \in X$ . Then $\begin{aligned} x \in (A \cup B)^{c} & ⟺ \neg (x \in A \cup B) \\ ⟺ \neg ((x \in A) \lor (x \in B)) \\ ⟺ \neg ((x \in A) \land (\neg (x \in B))) \\ ⟺ (x \in A^{c}) \land (x \in B^{c}) \\ ⟺ x \in A^{c} \cap B^{c} . \end{aligned}$

Hence, we have that

x \in (A \cup B)^{c}

if and only if

x \in A^{c} \cap B^{c}

. It follows that

(A \cup B)^{c} = A^{c} \cap B^{c}

□

History:

The above theorem is often known as De Morgan’s Laws, or De Morgan’s Theorem. Augustus De Morgan (English Mathematician, 1806 - 1871) was the first to write this theorem using formal logic (the one we are currently seeing). However this result was known and used by mathematicians and logicians since Aristotle (Greek philosopher, 384BC - 322BC), and can be found in the medieval texts by William of Ockham (English philosopher, 1287 - 1347) or Jean Buridan (French philosopher, 1301 - 1362).

Notation:

It is often convenient to denote the elements of a set using indices. For example, suppose $A$ is a set with $5$ elements. Then we can denote these elements as $a_{1}, a_{2}, a_{3}, a_{4}, a_{5}$ . So we can write $A = {a_{i} : i \in I}, where I = {1, 2, 3, 4, 5} .$ The set $I$ is called the indexing set.

Let

{A_{i}}_{i \in I}

be a collection of subsets of a set

X

where

I

is an indexing set. Then we write

⋃_{i \in I} A_{i}

to denote the union of all the sets

A_{i}

, for

i \in I

. That is,

⋃_{i \in I} A_{i} = {x \in X : \exists i \in I such that x \in A_{i}} .

Furthermore, we write

⋂_{i \in I} A_{i}

to denote the intersection of all the sets

A_{i}

, for

i \in I

. That is,

⋂_{i \in I} A_{i} = {x \in X : \forall i \in I, x \in A_{i}} .

Proposition 2.25:

Let $X$ be a set, let $A$ be a subset of $X$ , and let ${B_{i}}_{i \in I}$ be an indexed collection of subsets, where $I$ is an indexing set. Then we have

$A ∖ ⋂_{i \in I} B_{i} = ⋃_{i \in I} (A ∖ B_{i}) .$
$A ∖ ⋃_{i \in I} B_{i} = ⋂_{i \in I} (A ∖ B_{i}) .$

Proof.

We will prove part a. and leave part b. as an exercise.

We know that $x \in ⋂_{i \in I} B_{i}$ if and only if we have that $x \in B_{i}$ , for all $i \in I$ . Then $x \notin ⋂_{i \in I} B_{i}$ if and only if there exists an $i \in I$ such that $x \notin B_{i}$ . Now, suppose that $x \in A ∖ ⋂_{i \in I} B_{i}$ . Then $x \in A$ , and for some $i \in I$ , we have that $x \notin B_{i}$ . Hence, for some $i \in I$ , we have that $x \in A ∖ B_{i}$ . Then $x \in ⋃_{i \in I} (A ∖ B_{i})$ which shows that $A ∖ ⋂_{i \in I} B_{i} \subseteq ⋃_{i \in I} (A ∖ B_{i}) .$

Now, suppose that

x \in ⋃_{i \in I} (A ∖ B_{i}) .

Hence, for some

i \in I

, we have that

x \in A ∖ B_{i}

. Then for some

i \in I

, we have that

x \in A

and

x \notin B_{i}

. Since there exists some

i \in I

such that

x \notin B_{i}

, we have

x \notin ⋂_{i \in I} B_{i}

. Then

x \in A ∖ ⋂_{i \in I} B_{i}

which shows that

⋃_{i \in I} (A ∖ B_{i}) \subseteq A ∖ ⋂_{i \in I} B_{i}

. Summarising the above, we have that

A ∖ ⋂_{i \in I} B_{i} = ⋃_{i \in I} (A ∖ B_{i}) .

□

Interest:

We finish this section with a brief side-note. How does one differentiate whether a statement is a definition, a theorem, a proposition, a lemma etc? The team at Chalkdust Magazine made the following flowchart which while is not meant to be serious, does reflect quite well how one can classify different statements. That is: a definition is a statement taken to be true; roughly speaking a proposition is an interesting but non-important result; while a theorem is an interesting, important main result; and a lemma is there to build up to a theorem. The original figure can be found at https://chalkdustmagazine.com/regulars/flowchart/which-type-of-statement-are-you/

Figure 2.1: What statement are you? Copyright Chalkdust Magazine, Issue 17.

3 The rationals are not enough

Now that we have a solid foundation of logic and abstract set notation, let us explore sets within $Q$ . That is, let us look at subsets of the rationals. This will lead us to notice that irrational numbers exists, and hence exploring a new set called the reals, $R$ .

3.1 The absolute value

Before we look at subsets of $Q$ , we introduce the notion of absolute value.

Definition 3.1:

For $x \in Q$ , the absolute value or modulus of $x$ , denoted $| x |$ , is defined by $| x | := {\begin{cases} x & if x \geq 0; \\ - x & if x < 0. \end{cases}$

It is often helpful to think of the absolute value $| x |$ as the distances between the point $x$ and the origin $0$ . Likewise, $| x - y |$ is the distance between the points $x$ and $y$ .

Proposition 3.2:

For any $x, y \in Q$

$| x | \geq 0$ with $| x | = 0$ if and only if $x = 0$ ;
$| x y | = | x | | y |$ ;
$| x^{2} | = | x |^{2} = x^{2}$ .

Proof.

Exercise

□

Example:

Statement Show that $a^{2} + b^{2} \geq 2 a b$ for any $a, b \in Q$ .

Solution Let $a, b \in Q$ . We have that $(a - b)^{2} \geq 0$ , so [expanding the bracket] $a^{2} - 2 a b + b^{2} \geq 0$ . Rearranging, this gives $a^{2} + b^{2} \geq 2 a b$ .

Proposition 3.3: (Triangle Inequality)

For all $x, y \in Q$ we have $| x + y | \leq | x | + | y |$ .

Proof.

We prove this by case by case analysis. First note that for all $x \in Q$ we have $x \leq | x |$ and $- x \leq | x |$ . Let $x, y \in Q$ .

Case 1: Suppose $x \geq - y$ then $x + y \geq 0$ and so $| x + y | = x + y \leq | x | + | y |$ .

Case 2: Suppose

x < - y

then

x + y < 0

and so

| x + y | = - (x + y) = - x + (- y) \leq | x | + | y |

□

3.2 Bounds for sets

With this notion of absolute value, we can start asking whether a subset of $Q$ contains arbitrarily large or small elements.

Definition 3.4:

Let $A \subseteq Q$ be non-empty. We that that $A$ is:

bounded above (in $Q$ ) by $α \in Q$ if for all $x \in A$ , $x \leq α$ ;
bounded below (in $Q$ ) by $α \in Q$ if for all $x \in A$ , $x \geq α$ ;
bounded if it is bounded above and below;

If $A$ is bounded above by $α$ and below by $β$ , then by setting $γ = max {| α |, | β |}$ we have $A$ is bounded by $γ$ , i.e., for all $x \in A$ , we have $| x | \leq γ$ .

Note that $α$ is far from unique. For example, take the set $A = {\frac{1}{n} : n \in Z_{+}}$ . Then we can see that $A$ is bounded above by $1$ , but it is also bounded above by $2$ and by $100$ etc.

Definition 3.5:

Let $A \subseteq Q$ be non-empty. The least (or smallest) upper bound of $A$ (in $Q$ ) is $α \in Q$ such that:

$α$ is an upper bound, i.e. for all $x \in A$ , $x \leq α$ ;
any rational number $β$ less than $α$ is not an upper bound, i.e. for all $β \in Q$ with $β < α$ , there exists $x \in A$ with $β < x$ .

The greatest (or largest) lower bound of $A$ (in $Q$ ) is $α \in Q$ such that:

$α$ is a lower bound, i.e. for all $x \in A$ , $x \geq α$ ;
any rational number $β$ greater than $α$ is not a lower bound, i.e. for all $β \in Q$ with $β > α$ , there exists $x \in A$ with $β > x$ .

Remark:

We use our work on negating statements to negate the above definition and say

$α \in Q$ is not the least upper bound of $A$ if:

there exists $x \in A$ such that $x > α$ ( $α$ is not an upper bound) or;
there exists $β \in Q$ with $β < α$ and for all $x \in A$ we have $x \leq β$ (there is an upper bound lower than $α$ ).

Similarly, $α \in Q$ is not the greatest lower bound of $A$ if:

there exists $x \in A$ such that $x < α$ ( $α$ is not a lower bound) or;
there exists $β \in Q$ with $β > α$ and for all $x \in A$ we have $x \geq β$ (there is an lower bound greater than $α$ ).

Example:

Let $A = {\frac{1}{n} : n \in Z_{+}}$ . We show that $1$ is the least upper bound of $A$ . As remarked before, we have $1$ is an upper bound since if we take $x \in A$ then $x = \frac{1}{n}$ with $n \in Z_{+}$ . In particular $n \geq 1$ , so $x = \frac{1}{n} \leq \frac{1}{1} = 1$ .

We now show that $1$ is the least upper bound by showing any number less than $1$ is not an upper bound. Let $β < 1$ . By taking $n = 1 \in Z_{+}$ , we see that $1 = \frac{1}{1} \in A$ , hence $β < 1$ means $β$ is not an upper bound. Hence $1$ is the least upper bound.

We show that $0$ is the greatest lower bound. First we show $0$ is a lower bound. Let $x \in A$ then $x = \frac{1}{n}$ with $n \in Z_{+}$ . In particular $n > 0$ , so $x = \frac{1}{n} > 0$ .

We now show that $0$ is the greatest lower bound by showing any number greater than $0$ is not a lower bound. Let $β = \frac{a}{b} > 0$ [so $a, b \in Z_{+}$ ]. Set $n = b + 1 \in Z$ , so $\frac{1}{n} \in A$ . Then $\frac{1}{n} = \frac{1}{b + 1} < \frac{1}{b} \leq a \frac{1}{b} = \frac{a}{b} = β .$

So we have found $x \in A$ such that $x < β$ , so $β$ is not a lower bound.

Proof techniques:

The argument for $β$ not being a lower bound above seems to come from nowhere. Sometimes it is hard to see where to start a proof, so mathematician first do scratch work. This is the rough working we do as we explore different avenues and arguments, but that is not included in the final proof (so to keep the proof clean and easy to understand). The scratch work for the above proof of might have been along the lines:

We want to find

n \in Z_{+}

such that

β = \frac{a}{b} > \frac{1}{n}

. Rearranging, this gives

a n > b

(as both

n

and

b

are positive), so

n > \frac{b}{a}

. Since

a \geq 1

, we have

\frac{b}{a} \leq b

. Picking

n = b + 1

would satisfy

n = b + 1 > b \geq \frac{b}{a}

□

Note that in the example above the greatest lower bound of $A$ is not in $A$ itself [if $0 \in A$ , then there exists $n \in Z_{+}$ such that $0 = \frac{1}{n}$ , i.e. $0 = 1$ , which is a contradiction.]. If a set is a bounded subset of $Z$ , then we get a different story.

Theorem 3.6: (Well Ordering Principle)

Any non-empty subset of $Z_{\geq 0}$ contains a minimal element.

Proof.

Exercise to be done after we have introduced the completeness axiom.

□

Corollary 3.7:

Let $A \subseteq Z$ be non-empty. If $A$ is bounded below, it contains a minimal element (i.e. its greatest lower bound is in $A$ ). If it is bounded above, it contains a maximal element (i.e., its least upper bound is in $A$ ).

As we saw above with the set $A = {\frac{1}{n} : n \in Z_{+}}$ , this is not the case for general subsets of $Q$ . However, it is even worst, because a general subset of $Q$ might be bounded and not have a greatest lower bound or least upper bound in $Q$ , as we will see in the next section.

3.3 The irrationals and the reals

We first show that there exists irrational numbers (i.e., numbers that are not rational) and show that this means that there exists subsets of $Q$ whose lower upper bound is not rational (and hence we need a bigger number system).

Theorem 3.8:

There does not exists $x \in Q$ such that $x^{2} = 2$ .

Proof.

For the sake of a contradiction, suppose there exists $x = \frac{a}{b} \in Q$ such that $x^{2} = 2$ . Without loss of generality, since $x^{2} = (- x)^{2}$ and $0^{2} = 0$ , we can assume $x > 0$ , i.e. $a, b \in Z_{+}$ . Furthermore, if $0 < x \leq 1$ , then $x^{2} \leq 1 < 2$ , and if $x \geq 2$ then $x^{2} \geq 4 > 2$ . So we assume that $1 < x < 2$ .

Let $A = {r \in Z_{+} : r x \in Z} \subseteq Z$ . Note that $A$ is non-empty since, $b x = a \in Z_{+}$ , so $b \in A$ . We have that $A$ is bounded below by $0$ , so by the Well Ordering Principle, $A$ contains a minimal element, call if $m$ . We will prove that $m$ is not minimal by finding $0 < m_{1} < m$ with $m \in A$ . This will be a contradiction to the Well Ordering Principle.

Define

m_{1} = m (x - 1) = m x - m \in Z

. Since

1 < x

, we have

x - 1 > 0

m_{1} = m (x - 1) > 0

. Similarly, since

x < 2

we have

x - 1 < 1

m_{1} = m (x - 1) < m

. Hence

0 < m_{1} < m

. Now

m_{1} x = m (x - 1) x = m x^{2} - m x = 2 m - m x \in Z .

Hence

m_{1} \in A

and

0 < m_{1} < m

which is a contradiction.

□

Since irrational numbers exists if we restrict ourselves to only using rational number then there are many unanswerable questions. From simple geometrical problems (what is the length of the diagonal of a square with length side 1), to the fact that there are some bounded set of rationals which do not have a rational least upper bower.

Example:

Consider the set $A = {x \in Q : x^{2} < 2} .$ We have $A$ is bounded above (for example, by $2$ or $10$ ), but we show it does not have a least upper bound in the rational.

Let $α = \frac{a}{b} \in Q$ be the least upper bound of $A$ and note we can assume $α > 0$ . We either have $α^{2} < 2$ , $α^{2} = 2$ or $α^{2} > 2$ . We will show that all three of these cases leads to a contradiction.

Case 1: $α^{2} < 2$ . We show that in this case $α$ is not an upper bound by finding $x \in A$ such that $α < x$ .

[Scratch work: We look for $c \in Z$ such that ${(α + \frac{1}{b c})}^{2} = {(\frac{a c + 1}{b c})}^{2} < 2$ . Rearranging, we get $a^{2} c^{2} + 2 a c + 1 < 2 b^{2} c^{2}$ , then $2 a c + 1 < c^{2} (2 b^{2} - a^{2})$ . Since $α^{2} < 2$ , we know $a^{2} < 2 b^{2}$ , so $2 b^{2} - a^{2} > 0$ . Since $2 b^{2} - a^{2} \in Z$ , we have $2 b^{2} - a^{2} \geq 1$ , so $c^{2} (2 b^{2} - a^{2}) \geq c^{2}$ . So to find $c$ such that $2 a c + 1 < c^{2}$ , i.e. $0 < c^{2} - 2 a c - 1$ , i.e. by completing the square $0 < (c - a)^{2} - (a^{2} + 1)$ . To simplify our life, let us take $c$ to be a multiple of $a$ , say $k a$ , then we are looking for $0 < (k - 1)^{2} a^{2} - a^{2} - 1 = ((k - 1)^{2} - 1) a^{2} - 1$ , i.e. $(k - 1)^{2} - 1) > 2$ , so $k = 3$ should work, i.e. $c = 3 a$ .]

Let $x = α + \frac{1}{3 a b} > α$ . We prove that $x \in A$ (and hence $α$ is not an upper bound) by showing $x^{2} < 2$ . We have $\begin{aligned} x^{2} & = {(\frac{3 a^{2} + 1}{3 a b})}^{2} \\ = \frac{9 a^{4} + 6 a^{2} + 1}{9 a^{2} b^{2}} \\ < \frac{9 a^{4} + 9 a^{2}}{9 b^{2} a^{2}} & (since 6 a^{2} + 1 < 9 a^{2}) \\ = \frac{a^{2} + 1}{b^{2}} \\ < \frac{a^{2} + (2 b^{2} - a^{2})}{b^{2}} & (since a^{2} < 2 b^{2}) \\ < 2. \end{aligned}$

Case 2: $α^{2} = 2$ . This is a contradiction to Theorem 3.8

Case 3: $α^{2} > 2$ . We leave this as an exercise. [Hint: Find appropriate $c$ so that $x = α - \frac{1}{b c} \in Q$ is such that $x^{2} > 2$ . Argue that $x$ is an upper bound for $A$ and $x < α$ to conclude $α$ is not the least upper bound. ]

Proof techniques:

Notice that in the above we made sure to have $x > α$ by setting $x = α + ϵ$ where $ϵ > 0$ . By doing so, we reduced the numbers of properties we needed $x$ to have.

□

We use this as a motivation to introduce the real numbers.

Definition 3.9:

The set of real numbers, denoted $R$ , equipped with addition $+$ , multiplication $\cdot$ and the order relation $<$ satisfies axioms (A1) to (A11), (O1) to (O4) and the Completeness Axiom

Completeness Axiom: Every non-empty subset $A$ of $R$ which is bounded above has a least upper bound.

Interest:

It can be shown that there is exactly one quadruple $(R; +; \cdot; <)$ which satisfies these properties - up to isomorphism. We do not discuss the notion of isomorphism in this context here (although we will later look at it in the context of groups) save to remark that any two real number systems are in bijection (we will define this later) and preserves certain properties. This allows us to speak of the real numbers.

There are several ways of constructing the real number system from the rational numbers. One option is to use Dedekind cuts. Another is to define the real numbers as equivalence classes of Cauchy sequences of rational numbers (you will explore Cauchy sequences in Analysis).

You may continue to imagine the real numbers as a number line as you did pre-university.

The definition of the absolute value is the same for real numbers as for the rational numbers, as is the notion of bounded sets (and therefore all the results in the previous section still holds for $R$ ).

Interest:

We can use the absolute value to define a distance or metric on $R$ . To do so we define $d (x, y) = | x - y |$ for any two points $x, y \in R$ . This distance has the following properties, for any $x, y, z \in R$ :

$d (x, y) \geq 0$ and $d (x, y) = 0$ if and only if $x = y$ ;
$d (x, y) = d (y, x)$ ;
$d (x, y) \leq d (x, z) + d (z, y)$ .

You can explore whether other distance/metric can be defined on

R

or other sets in the 2nd year unit Metric Spaces. You can explore how we construct other sets from

Q

that satisfies the completeness axiom by looking up “p-adic numbers”.

We deduce two important result about $R$ .

Proposition 3.10:

Every non-empty subset $A$ of $R$ bounded below has a greatest lower bound.

Proof.

Let $A \subset R$ be non-empty and bounded below. Let $c \in R$ be a lower bound. Define the set $B = {- x : x \in A} .$ Let $x \in B$ , so $- x \in A$ . Since $c$ is a lower bound, $- x \geq c$ , i.e. $x \leq - c$ . So $- c$ is an upper bound for $B$ , and $B$ is non-empty [since $A$ is non-empty]. By the Completeness Axiom $B$ has a least upper bound, $u \in R$ . Let $ℓ = - u$ . We prove that $ℓ$ is the greatest lower bound for $A$ .

We first show $ℓ$ is a lower bound. Let $x \in A$ , then $- x \in B$ so $- x \leq u$ . Hence $x \geq - u = ℓ$ .

We now show $ℓ$ is the greatest lower bound by showing any real number bigger than $ℓ$ is not a lower bound. Let $y \in R$ be such that $y > ℓ$ , so $- y < - ℓ = u$ . Now $- y$ is not an upper bound for $B$ since $u$ is the least upper bound of $B$ . So by definition, there exists $b \in B$ such that $b > - y$ , i.e. $- b < y$ . Since $b \in B$ , we have $- b \in A$ . Hence $y$ is not a lower bound for $a$ .

ℓ

is the greatest lower bound for

A

□

Theorem 3.11: (Archimedean Property)

For any $x \in R$ there exists $n \in Z_{+}$ such that $x \leq n$ .

Proof.

We prove this by contradiction.

$[\neg (\forall x \in R, \exists n \in Z_{+}$ such that $x \leq n$ ) $⟺ \exists x \in R$ such that $\forall n \in Z_{+}, x > n]$ .

Suppose there exists

x \in R

such that

n < x

for all

n \in Z_{+}

. In particular, this means

Z_{+} \subseteq R

is bounded above. So by the completeness axiom,

Z_{+}

has a least upper bound

α

[in

R

]. Since

α

is the least upper bound,

α - \frac{1}{2}

is not an upper bound, i.e. there exists

b \in Z_{+}

such that

α - \frac{1}{2} < b < α

. But then,

b + 1 \in Z

and

α + \frac{1}{2} < b + 1 < α + 1

α < b

. This contradicts the fact

α

is an upper bound for

Z_{+}

□

History:

The above property is named after Archimedes of Syracuse (Sicilian/Italian mathematician, 287BC - 212 BC) although when Archimedes wrote down this theorem, he credited Eudoxus of Cnidus (Turkish mathematician and astronomer, 408BC - 355BC). It was Otto Stolz (Austrian mathematician, 1842 - 1905) who coined this property - partly because he studied fields where this property is not true and therefore needed to coin a term to distinguish between what is now know as Archimedean fields and non-Archimedean fields.

We finish this section by introducing notation for some common subsets of $R$ .

Notation:

Let $a, b \in R$ are such that $a \leq b$ , we denote:

the open interval of $a, b$ by $(a, b) = {x \in R : a < x < b}$ ;
the closed interval of $a, b$ by $[a, b] = {x \in R : a \leq x \leq b}$ .
$[a, b) = {x \in R : a \leq x < b}$ ; $(a, b] = {x \in R : a < x \leq b}$ ;
$(a, \infty) = {x \in R : a < x}$ ; $[a, \infty) = {x \in R : a \leq x}$ ;
$(- \infty, b) = {x \in R : x < b}$ ; $(- \infty, b] = {x \in R : x \leq b}$ .

By convention we have $(a, a) = [a, a) = (a, a] = \emptyset$ , while $[a, a] = {a}$ .

3.4 The supremum and infimum of a set.

Since $R$ is complete, i.e., every bounded set has a least upper bound and greatest lower bound, we introduce the notion of supremum and infimum of a set.

Definition 3.12:

Let $A \subseteq R$ . We define the supremum of $A$ , denoted $sup (A)$ as follows:

If $A = \emptyset$ , then $sup (A) = - \infty$ .
If $A$ is non-empty and is bounded above [i.e., there exists $α \in R$ such that for all $x \in A$ , $x \leq α$ ] then $sup (A)$ is the least upper bound of $A$ (which we know exists by the Completeness Axiom)
If $A$ is non-empty and is not bounded above [i.e., for all $α \in R$ , there exists $x \in A$ such that $x > α$ ] then $sup (A) = + \infty$ .

Definition 3.13:

Let $A \subseteq R$ . We define the infimum of $A$ , denoted $inf (A)$ as follows:

If $A = \emptyset$ , then $inf (A) = + \infty$ .
If $A$ is non-empty and is bounded below [i.e., there exists $α \in R$ such that for all $x \in A$ , $x \geq α$ ] then $inf (A)$ is the greatest lower bound of $A$ (which we know exists by the Completeness Axiom)
If $A$ is non-empty and is not bounded below [i.e., for all $α \in R$ , there exists $x \in A$ such that $x < α$ ] then $sup (A) = - \infty$ .

Etymology:

Supremum comes from the Latin super meaning “over, above” while infimum comes from the Latin inferus meaning “below, underneath, lower” (these words gave rise to words like superior and inferior).

Example:

Let $A = {\frac{1}{n} : n \in Z_{+}}$ . We have already seen that $sup (A) = 1$ (in $A$ ) and $inf (A) = 0$ (not in $A$ ).

Proposition 3.14:

Let $a, b \in R$ with $a < b$ . Then

$sup ((a, b)) = sup ((a, b]) = sup ([a, b)) = sup ([a, b]) = b$ ;
$inf ((a, b)) = inf ((a, b]) = inf ([a, b)) = inf ([a, b]) = a$ ;
$sup ((a, \infty)) = sup ([a, \infty)) = + \infty$ ;
$inf ((- \infty, a)) = inf ((- \infty, a]) = - \infty$ ;
$sup ((- \infty, a)) = sup ((- \infty, a]) = inf ([a, \infty)) = inf ((a, \infty)) = a$ .

Proof.

We will only prove $sup ((a, b)) = b$ and $sup ((a, \infty)) = + \infty$ and leave the rest as the arguments are very similar.

Let $a, b \in R$ with $a < b$ , we will show $sup ((a, b)) = b$ . [Note that $(a, b)$ is non-empty (as $a \neq b$ ) and it is bounded above.]

First we show that $b$ is an upper bound. Indeed, let $x \in (a, b)$ then by definition $a < x < b$ so $x \leq b$ as required.

Next we show that $b$ is the least upper bound [by showing any real number less than $b$ is not an upper bound]. Let $y \in R$ with $y < b$ .Suppose $a < y$ (i.e. $y \in (a, b)$ ) and let $x = \frac{b + y}{2} = y + \frac{b - y}{2} = b - \frac{b - y}{2}$ . Note that $x < b$ and $x > y > a$ , so $x \in (a, b)$ . Since $x > y$ , we have $y$ is not an upper bound. Suppose $y \leq a$ (i.e. $y \notin (a, b)$ ) and let $x = \frac{a + b}{2} = a + \frac{b - a}{2} = b - \frac{b - a}{2}$ . Then $x < b$ and $x > a \leq y$ , so $x \in (a, b)$ . Since $x > y$ , we have $y$ is not an upper bound. In either cases, $y$ is not an upper bound, so $b$ is the least upper bound.

Let $a \in R$ , we show that $sup ((a, \infty)) = + \infty$ . [Note that $(a, \infty)$ is non-empty, so we want to show it is not bounded above].

Suppose for contradiction that

(a, \infty)

is bounded above [i.e.,

sup ((a, \infty)) \neq + \infty

]. Let

u \in R

be an upper bound for

(a, \infty)

. Set

x = | a | + | u | + 1 \in R

. Note that

x > | a | \geq a

, so

x \in (a, \infty)

. Hence

u

being an upper bound means

x \leq u

, however

x > | u | \geq u

. This is a contradiction.

□

Example:

Problem: Let $A = {\frac{n^{2} + 1}{| n + 1 / 2 |} : n \in Z} .$ Show that $sup (B) = + \infty$ and $inf (B) = 4 / 3$ .

Solution: Let us first look at the supremum. [The question is asking us to show $B$ is not bounded above.] Let $x \in R$ . By the Archimedean Principle, choose $n \in Z_{+} \subseteq Z$ such that $n > 2 x$ [Scratch work missing to work out why we choose this particular $n$ ]. Define $a = \frac{n^{2} + 1}{n + 1 / 2} \in A$ . Then $\begin{aligned} a & = n \frac{n + \frac{1}{n}}{n + \frac{1}{2}} \\ \geq n \frac{n}{n + \frac{1}{2}} & as x + \frac{1}{n} \geq n \\ \geq n \frac{n}{2 n} & as n + \frac{1}{2} \leq 2 n \\ = \frac{n}{2} \\ > x & . \end{aligned}$ So we have found $a \in A$ such that $a > x$ , so $A$ is not bounded above. Hence $sup (A) = + \infty$ .

We now look at the infimum. We first show that $4 / 3$ is a lower bound. Let $a = \frac{n^{2} + 1}{| n + 1 / 2 |}$ for some $n \in Z$ , so $a \in A$ . Consider $\begin{aligned} n^{2} + 1 - (4 / 3) | n + 1 / 2 | & \geq n^{2} + 1 - (4 / 3) (| n | + 1 / 2) & by the triangle inequality \\ = n^{2} - (4 / 3) | n | + 1 / 3 \\ = (| n | - 2 / 3)^{2} - 1 / 9 & by completing the square \\ \geq 0 & since for all n \in Z, (| n | - 2 / 3)^{2} \geq 1 / 9 . \end{aligned}$ Therefore $n^{2} + 1 \geq (4 / 3) | n + 1 / 2 |$ , i.e. $a = \frac{n^{2} + 1}{| n + 1 / 2 |} \geq 4 / 3$ as required.

We next show that $4 / 3$ is the greatest lower bound for $A$ [by showing any number greater than it is not a lower bound]. First note that by setting $n = 1$ , we have $\frac{n^{2} + 1}{| n + \frac{1}{2} |} = \frac{2}{3 / 2} = \frac{4}{3} .$ So $4 / 3 \in A$ . Thus, no value $y > 4 / 3$ can be a lower bound for $B$ . This shows that $4 / 3$ is the greatest lower bound. Hence $inf (B) = 4 / 3$ .

The next example is more theoretical.

Example:

Problem: Let $A$ and $B$ be bounded non-empty subsets of $R$ . Define the sum set as $A + B = {a + b : a \in A, b \in B} .$ Show that $sup (A + B) = sup (A) + sup (B)$ .

Solution: Let $α = sup (A)$ and $β sup (B)$ . Note that $α, β \in R$ as both $A$ and $B$ are bounded. We show $α + β = sup (A + B)$ .

We first show $α + β$ is an upper bound for $A + B$ . Let $c \in A + B$ , by definition, there exists $a \in A$ and $b \in B$ such that $c = a + b$ . Now $a \leq α$ and $b \leq β$ , so $c = a + b \leq α + β .$

We now show $α + β$ is the least upper bound for $A + B$ . Let $ϵ > 0$ , we show that $α + β - ϵ$ is not an upper bound. Since $α$ is the least upper bound of $A$ , then $α - ϵ / 2$ is not an upper bound, so there exists $a \in A$ such that $a > α - ϵ / 2$ . Similarly, there exists $b \in B$ such that $b > β - ϵ / 2$ . Define $c = a + b \in A + B$ . Then $c = a + b > (α - ϵ / 2) + (β - ϵ / 2) = α + β - ϵ .$ This shows for any $ϵ > 0$ , $α + β - ϵ$ is not an upper bound for $A + B$ . So $α + β$ is the least upper bound for $A + B$ , hence the supremum of $A + B$ .

Proof techniques:

Sometimes, instead of showing something is true for all $y > x$ , it is easier to show it is true for all $x + ϵ$ where $ϵ > 0$ . We can do this because if $y > x$ , then setting $e p s i l o n = y - x > 0$ , we see that $y = x + ϵ$ .

□

Definition 3.15:

Let $A \subseteq R$ . We say that $A$ has a maximum if $sup (A) \in A$ . In this case we write $max (A)$ to stand for the element $a \in A$ with $a = sup (A)$ .

Similarly we say that

A

has a minimum if

inf (A) \in A

. In this case we write

min (A)

to stand for the element

a \in A

with

a = inf (A)

Note that a set may not have a minimum or a maximum.

Example:

Let $A = {\frac{n^{2} + 1}{| n + 1 / 2 |} : n \in Z} .$ We have seen $inf (A) = 4 / 3 \in A$ , so $min (A) = 4 / 3$ . However $A$ does not have a maximum as $sup (A) = + \infty \notin A$ (as $\infty \notin R$ and $A \subseteq R$ ).

Example:

Let $A = {\frac{1}{n} : n \in Z_{+}}$ . Then we have seen that $sup (A) = 1$ and $1 \in A$ , so $max (A) = 1$ . However it does not have a minimum as we have seen $inf (A) = 0$ and $0 \notin A$ .

4 Proof by induction

We have seen several type of proofs so far:

direct proof (sometimes known as deductive proof);
proof by case analysis;
proof by contradiction;
proof using the contrapositive.

In this chapter, we introduce a new type of proof, called proof by induction.

Proof techniques:

Let $n_{0} \in Z_{+}$ and let $P (n)$ be a statement for $n \geq n_{0}$ . A proof by induction is where one prove that:

$P (n_{0})$ is true, and
$P (n) ⟹ P (n + 1)$ for all $n \geq n_{0}$ .

Combining these two statements, by the principle of induction, we deduce that

P (n)

is true for all

n \geq n_{0}

□

Etymology:

The word induction comes from the Latin in and ductus meaning “to lead”. An inductive proof is one where a starting case leads into the next case and so on.

(In contrast, deduction has the prefix de meaning “down from”. When we do a proof by deduction, we start from certain rules and truths that “lead down” to specific things that must follow as a consequence.)

Example:

Statement: Show that, for every $n \in Z_{+}$ , we have $1 + 2 + 3 + \dots + n = \sum_{i = 1}^{n} = \frac{n (n + 1)}{2} .$

Proof: For $n \in Z_{+}$ , let $P (n)$ be the following statement. $1 + 2 + 3 + \dots + n = \frac{n (n + 1)}{2} .$ We will show that $P (n)$ is a true statement, for all $n \in Z_{+}$ by giving a proof by induction.

First, let us consider $P (1)$ . We have $1 = \frac{1 (1 + 1)}{2}$ , hence, $P (1)$ holds.

Now, suppose that $P (k)$ is true for some positive integer $k$ , that is $1 + 2 + 3 + \dots + k = \frac{k (k + 1)}{2} .$
Then $\begin{aligned} 1 + 2 + 3 + \dots + k + (k + 1) & = \frac{k (k + 1)}{2} + (k + 1) \\ = \frac{k (k + 1)}{2} + \frac{2 (k + 1)}{2} \\ = \frac{k^{2} + 3 k + 2}{2} \\ = \frac{(k + 1) (k + 2)}{2} . \end{aligned}$ This shows that if $P (k)$ is true then $P (k + 1)$ is true. By the Principle of Mathematical Induction, it follows that $P (n)$ is true for all natural numbers $n$ .

◻

It is not enough to show $P (n) ⟹ P (n + 1)$ , we need to also show $P (n_{0})$ holds. As we saw in the above example, we often have that $n_{0} = 1$ . However, this is not always the case. The following example highlights these two points.

Example:

Let $P (n)$ be the statement $n^{2} \leq 2^{n - 1}$ . To highlight the importance of the base case, let us first show that for $n \geq 3$ we have $P (n) ⟹ P (n + 1)$ .

Suppose that $P (k)$ is a true statement for some natural number $k \geq 3$ , that is $k^{2} \leq 2^{k - 1}$ .

Then

If we set $n_{0} = 1$ , we do have $P (1)$ holds (since $1 \leq 1$ ), but have not proven $P (1) ⟹ P (2)$ (since $1 ≱ 3$ ). In fact, we can not prove $P (1) ⟹ P (2)$ since $P (2)$ is false ( $2^{2} ≰ 2^{1}$ ).

Since we have shown that for $n \geq 3$ we have $P (n) ⟹ P (n + 1)$ , we might want to set $n_{0} = 3$ . However, $P (3)$ is also false since $3^{2} ≰ 2^{2}$ . In fact, we can check that $P (3), P (4), P (5)$ and $P (6)$ are all false. However, $P (7)$ is true since we have that $7^{2} = 49 \leq 64 = 2^{6}$ .

Therefore, we have

n_{0} = 7

and by the Principle of mathematical induction, we have shown that

n^{2} \leq 2^{n - 1}

for all

n \in Z_{+}

such that

n \geq 7

Sometimes, the principle of induction is not strong enough, either because we need more than one base case, or because to prove $P (n)$ is true we need to know $P (m)$ is true for some unknown $m < n$ . This is where we can use the strong principle of mathematical induction.

Proof techniques:

Let $n_{0} \in Z_{+}$ and let $P (n)$ be a statement for $n \geq n_{0}$ . A proof by strong induction is where one prove that:

$P (n_{0}), P (n_{0} + 1), \dots P (n_{0} + k)$ is true (for some $k \geq 0$ ), and
for all $n \geq n_{0} + k$ , show that $P (i)$ is true for all $i \leq n ⟹ P (n + 1)$ is true.

Combining these two statements, by the principle of induction, we deduce that

P (n)

is true for all

n \geq n_{0}

□

Example:

Statement: Suppose that $x_{1} = 3$ and $x_{2} = 5$ and for $n \geq 3$ , define $x_{n} = 3 x_{n - 1} - 2 x_{n - 2}$ . Show that $x_{n} = 2^{n} + 1$ , for all $n \in Z_{+}$ .

Proof: Let $P (n)$ be the statement “ $x_{n} = 2^{n} + 1$ ”. We will show that $P (n)$ is a true statement, for all $n \in Z_{+}$ , by giving a proof by induction. First, we consider our base cases $n_{0} = 1$ and $n_{0 + 1} = 2$ . We have the given initial conditions $x_{1} = 3$ and $x_{2} = 5$ . Using the formula $x_{n} = 2^{n} + 1$ , we indeed have $x_{1} = 2^{1} + 1 = 3$ and $x_{2} = 2^{2} + 1 = 5$ .Therefore, $P (1)$ and $P (2)$ hold.

Let

n \in Z_{+}

and suppose for all

i \in Z_{+}

such that

i \leq n

we have

P (n)

holds. Then by our assumption

x_{n - 1} = 2^{n - 1} + 1

and

x_{n} = 2^{n} + 1

. We have: This shows that if

P (i)

is true, for

1 \leq i \leq n

, then

P (n + 1)

is true. By the Strong Principle of Mathematical Induction, it follows that

P (n)

is true for all natural numbers

n

◻

Example:

Statement: Show that every natural number can be written as the sum of distinct powers of $2$ . That is for all $n \in Z_{+}$ , we can write $n = 2^{a_{1}} + 2^{a_{2}} + \dots + 2^{a_{r}}$ with $a_{i} \in Z$ , $a_{i} \geq 0$ , and $a_{i} \neq a_{j}$ if $i \neq j$ .

Proof: Let $P (n)$ be the statements “there exists $a_{1}, \dots a_{r} \in Z$ such that $a_{i} \geq 0$ , $a_{i} \neq a_{j}$ if $i \neq j$ and $n = 2^{a_{1}} + 2^{a_{2}} + \dots + 2^{a_{r}}$ ”. We check that our base case $n_{0} = 1$ is true. Indeed $1 = 2^{0}$ so $P (1)$ holds.

Let $n \in Z_{+}$ and suppose for all $i \in Z_{+}$ such that $i \leq n$ we have $P (n)$ holds. Consider $A = {n + 1 - 2^{ℓ} : ℓ \in Z_{> 0} \land n + 1 - 2^{ℓ} \geq 0}$ . We note that $A \subseteq Z_{\geq 0}$ by definition. Taking $ℓ = 1$ , we see that $n - 1 \in A$ , so $A \neq \emptyset$ . So, by the Well Ordering Principle, $A$ has a minimal element, call it $m$ . If $m = 0$ then $n + 1 = 2^{ℓ}$ and $P (n + 1)$ holds. If $m \neq 0$ , then since $P (m)$ holds, we have there exists $a_{1}, \dots a_{r} \in Z$ such that $a_{i} \geq 0$ , $a_{i} \neq a_{j}$ if $i \neq j$ and $m = 2^{a_{1}} + 2^{a_{2}} + \dots + 2^{a_{r}}$ . Then $n + 1 = 2^{ℓ} + 2^{a_{1}} + 2^{a_{2}} + \dots + 2^{a_{r}}$ , so we just need to show $ℓ \neq a_{i}$ for all $i$ . For a contradiction, suppose $ℓ = a_{i}$ for some $i$ . Then $n + 1 \geq 2^{ℓ} + 2^{ℓ} = 2^{ℓ + 1}$ , so $0 \leq n + 1 - 2^{ℓ + 1} < m$ which contradicts the definition of $m$ . Therefore the powers are distinct and $P (n + 1)$ holds.

Therefore, by the principle of strong induction, we have showed that every natural number can be written as the sum of distinct powers of $2$ .

Remark:

A proof by induction is a special case of a proof by strong induction (taking $k = 0$ )! We present these two ideas separately as it is easier to understand induction before understanding strong induction. However, most mathematician will say “induction” to mean both induction and strong induction and do not distinguish between the two (so feel free to also not distinguish between the two).

5 Studying the integers

The integers, $Z$ , is an interesting set as unlike $Q$ or $R$ , we can not divide. This restriction bring a lot of interesting properties that we will now study in more details. (The Well Ordering Principle also sets $Z$ apart from $Q$ and $R$ and we’ll also make use of that principle.)

5.1 Greatest common divisor

While we can not do division in general, there are cases when we can divide $b \in Z$ by $a \in Z$ .

Definition 5.1:

For

a, b \in Z

, we say that

a

divides

b

, denoted by

a ∣ b

, if

\exists c \in Z

such that

b = a c

. That is

b \in a Z = {a x : x \in Z}

. In this case we say

a

is a divisor of

b

Remark:

We negate the above definition to say that

a

does not divide

b

, (or

a

is not a divisor of

b

) denoted by

a ∤ b

, if

\forall c \in Z

we have

b \neq a c

. That is

b \notin a Z

Interest:

Notice that if we tried to extend this definition into $Q$ , we will find that for all $a \in Q ∖ {0}$ , $b \in Q$ we have $a ∣ b$ .

Note that if $b \neq 0$ then $a ∣ b$ implies $| a | \leq | b |$ .

Theorem 5.2: (Division Theorem)

Let $a, b \in Z$ with $a \neq 0$ . Then there exists a unique $q, r \in Z$ such that $b = a q + r$ and $0 \leq r < | a |$ .

We call

q

the quotient and

r

the remainder

Proof.

Let $A = {b - a k : k \in Z \land b - a k \in Z_{\geq 0}}$ . By definition $A \subseteq Z_{\geq 0}$ . If $b \geq 0$ , taking $k = 0$ gives $b \in A$ . If $b < 0$ , taking $k = a \cdot b$ gives $b - a^{2} b = b (1 - a^{2}) \in A$ , since $b < 0$ and $(1 - a^{2}) \leq 0$ . So $A$ is non-empty, and contains a least element. Let this element be $r$ and the corresponding $k$ be $q$ . Since $r \in A$ , we know $r \geq 0$ . We show $r < | a |$ by contradiction. Suppose $r \geq | a |$ and consider $r > r - | a | \geq 0$ . Note that $r - | a | = (b - a k) - | a | = b - (k \pm 1) a \in A$ , which contradicts the minimality of $r$ . Hence $r < a$ as required.

It remains to show that

r

and

q

are unique. For the sake of a contradiction, suppose there exists

q_{1}, q_{2}, r_{1}, r_{2} \in Z

with

0 \leq r_{1} < r_{2} < | a |

(without loss of generality, we assume

r_{1} < r_{2}

as they are not equal) and

b = q_{i} a + r_{i}

for

i = 1, 2

. Then

q_{1} a + r_{1} = q_{2} a + r_{2}

means

r_{2} - r_{1} = q_{1} a - q_{2} a = a (q_{1} - q_{2})

. Since

q_{1}, q_{2} \in Z

, we have

q_{1} - q_{2} \in Z

, so

a | (r_{2} - r_{1})

, i.e.

(r_{2} - r_{1}) \in a Z

. But

0 < r_{2} - r_{1} < | a |

and there are no integers between

0

and

| a |

which is divisible by

a

. This is a contradiction to our assumption. Hence

r_{1} = r_{2}

, and we deduce that

q_{1} = q_{2}

□

Note that the division theorem shows that $a | b$ if and only if the remainder is equal to $0$ .

Because we have a notion of division, the following definition is natural.

Definition 5.3:

Let

a, b, c \in Z

. We say that

c

is a common divisor of

a

and

b

c ∣ a

and

c ∣ b

Note that $1$ is always a common divisor of $a$ and $b$ , and if $a \neq 0$ , no integer larger than $| a |$ can be a common divisor of $a$ and $b$ .

Definition 5.4:

Let

a, b \in Z

with

a, b

not both equal to

0

. We define the greatest common divisor of

a

and

b

, denoted by

gcd (a, b)

(or

hcf (a, b)

for highest common factor), to be largest positive divisor that divides both

a

and

b

Remark:

Note that

gcd (0, 0)

does not exist since every integer is a divisor of

0

. However, for

a \in Z

with

a \neq 0

, we have that

gcd (a, 0) = | a |

Definition 5.5:

Let $a_{1}, a_{2}, \dots, a_{n} \in Z$ with not all the $a_{i}$ ’s being equal to $0$ . We define the greatest common divisor of $a_{1}$ to $a_{n}$ , denoted by $gcd (a_{1}, \dots, a_{n})$ , to be the largest positive divisor that divides $a_{i}$ for all $1 \leq i \leq n$ .

The following lemma showcases different properties of the gcd.

Lemma 5.6:

Suppose $a, b \in Z$ with $a$ and $b$ not both 0.

We have $gcd (a, b) = gcd (| a |, b) = gcd (a, | b |) = gcd (| a |, | b |) .$
Set $c = gcd (a, b)$ , and take $x, y \in Z$ so that $a = c x$ , $b = c y$ ; then $gcd (x, y) = 1$ .
For all $x \in Z$ we have $gcd (a, b) = gcd (a, a x + b)$ .
For all $a_{1}, a_{2}, a_{3} \in Z$ we have $gcd (a_{1}, a_{2}, a_{3}) = gcd (gcd (a_{1}, a_{2}), a_{3})$ .

Proof.

Exercise.

□

Theorem 5.7:

Let $a, b \in Z$ with $a, b$ not both equal to $0$ , and let $gcd (a, b) = c .$ Then there exists $s, t \in Z$ such that $c = a s + b t .$

Proof.

Let $A$ be the set $A = {a s + b t : (s, t \in Z) \land (a s + b t > 0)} .$

By taking $s = a, t = b$ we see that $a^{2} + b^{2} \in A$ , hence $A$ is a non-empty subset of $Z_{+}$ . By the Well Ordering Principle, it has a minimal value. We denote this minimum value by $d$ and take $s, t \in Z$ such that $d = a s + b t .$ Note that since $c ∣ a$ and $c ∣ b$ , we have $c | (a s + b t)$ , so $c | d$ . Hence, $c \leq d$ .

Now, using Theorem 5.2 on $a$ and $d$ , let $q, r \in Z$ be such that $a = d q + r$ with $0 \leq r < d$ . Then $r = a - d q = a - (a s + b t) q = a (1 - s q) + b (- t q) .$ If $r > 0$ , then $r \in A$ with $r < d$ , contrary to how we chose $d$ . Hence, we must have that $r = 0$ , which means $d ∣ a$ . Similarly, we can show that $d ∣ b$ , so $d$ is a common divisor of $a$ and $b$ . Since $c = gcd (a, b),$ we have that $d \leq c$ .

Since

c \leq d

and

d \leq c

, it follows that

c = d

. Hence,

gcd (a, b) = c = d = a s + b t .

□

Proof techniques:

Note here that we proved

x = y

by showing

x \leq y

and

y \leq x

. This trick is often exploited when tryingt to prove two numbers are the same.

□

Remark:

Note that

s

and

t

are not unique at all. Suppose that

s, t \in Z

are such that

gcd (a, b) = a s + b t

, then using the fact that

a b + b (- a) = 0

, we see that

gcd (a, b) = a s + b t + (a b + b (- a)) = a (s + b) + b (t - a)

, i.e.,

s + b, t - a \in Z

also satisfies Theorem 5.7.

This theorem has several important corollaries (and one generalisation).

Corollary 5.8:

Let $a, b, c \in Z$ with $c \neq 0$ .

If $c ∣ a b$ and $gcd (b, c) = 1$ , then $c ∣ a$ .
If $c ∣ a$ and $c ∣ b$ then $c ∣ gcd (a, b)$ .

Proof.

Exercise.

□

Corollary 5.9:

Let $a_{1}, \dots, a_{n} \in Z$ , not all $a_{i}$ ’s equal to $0$ . Then there exists $s_{1}, \dots, s_{n} \in Z$ such that $gcd (a_{1}, \dots, a_{n}) = a_{1} s_{1} + \dots + a_{n} s_{n}$ .

Proof.

Exercise.

□

Note that the proof of Theorem 5.7 only shows the existence of $s$ and $t$ but doesn’t give us a way to find/calculate the value of $s$ and $t$ (that is the proof is not constructive). To find $s$ and $t$ we need to use an algorithm. An algorithm is a logical step-by-step procedure for solving a problem in a finite number of steps. Many algorithms are recursive, meaning that after one or more initial steps, a general method is given for determining each subsequent step on the basis of steps already taken.

Etymology:

The word algorithm comes from Abu Ja’far Muhammad ibn Musa Al-Khwarizmi (Arabic Mathematician, 780-850). Al-Khwarizmi (meaning “from Khwarazm”) wrote a book detailing how to use Hindu-Arabic numerals. When this book was translated for Europeans, it was given the Latin name Liber Algorismi meaning “Book of al-Khowarazmi”. As a consequence, any manipulation of Arabic numerals (which are the ones we use nowadays) was known as an algorism. The current form algorithm is due to a “pseudo-etymological perversion” where algorism was confused with the word arithmetic to give the current spelling of algorithm.

It is interesting to note that Al-Khwarizmi also wrote the book Hisab al-jabr w’al-muqabala. We have al-jabr comes from the Arabic al- meaning “the” and jabara meaning “to reunite”, so his book was on “the reunion of broken part”, i.e. how to solve equations with unknowns. When the book made its way to Europe, Europeans shorten the Arabic title to algeber. Over time, this mutated to the word algebra which is currently used.

Algorithm 5.10: (Extended Euclidean algorithm)

Input: Integers $a$ and $b$ such that $a > 0$ .

Output: Integers $s$ , $t$ and $gcd (a, b)$ such that $gcd (a, b) = a s + b t$ .

Algorithm:

Step 0: Set $s_{0} = 0$ , $t_{0} = 1$ and $r_{0} = a$
Step 1: Set $s_{1} = 1$ and $t_{1} = 0$ . Find unique $q_{1}, r_{1} \in Z$ such that $b = a q_{1} + r_{1}$ and $0 \leq r_{1} < a$ . If $r_{1} = 0$ then proceed to the final step, else go to Step 2.
Step 2: Set $s_{2} = s_{0} - q_{1} s_{1}$ and $t_{2} = t_{0} - q_{1} t_{1}$ . Find unique $q_{2}, r_{2} \in Z$ such that $a = r_{1} q_{2} + r_{2}$ and $0 \leq r_{2} < r_{1}$ . If $r_{2} = 0$ then proceed to the final step, else go to Step 3.
Step $k$ (for $k \geq 3$ ): Set $s_{k} = s_{k - 2} - q_{k - 1} s_{k - 1}$ and $t_{k} = t_{k - 2} - q_{k - 1} t_{k - 1}$ . Find unique $q_{k}, r_{k} \in Z$ such that $r_{k - 2} = r_{k - 1} q_{k} + r_{k}$ and $0 \leq r_{k} < r_{k - 1}$ . If $r_{k} = 0$ then proceed to the final step, else go to Step $k + 1$ .
Final Step: If the last step was Step $n$ (with $n \geq 1$ ) then output $r_{n - 1} = gcd (a, b)$ , $s_{n} = s$ and $t_{n} = t$ .

Proposition 5.11:

The Extended Euclidean algorithm terminates and is correct.

Proof.

Notice that after $k$ steps, we have $a > r_{1} > r_{2} > \dots > r_{k} \geq 0.$ Hence, the algorithm must terminate after at most $a$ steps.

If it terminates after Step 1 (i.e. $r_{1} = 0$ ), then $gcd (a, b) = a = r_{0}$ . Furthermore $a = a \cdot 1 + b \cdot 0 = a s_{1} + b t_{1} .$

So suppose the algorithm terminates after $n$ steps, for $n > 1$ . We note that $r_{1} = b - a \cdot q_{1}$ and $r_{2} = a - r_{1} \cdot q_{2}$ , and for $3 \leq k < n$ , we have $r_{k} = r_{k - 2} - r_{k - 1} \cdot q_{k}$ . Then by Lemma 5.6, we have $gcd (a, b) = gcd (a, r_{1}) = gcd (r_{1}, r_{2}) = \dots = gcd (r_{n - 1}, r_{n}) = gcd (r_{n - 1}, 0) = r_{n - 1} .$

Finally, we prove by (strong) induction that

a s_{n} + b t_{n} = r_{n - 1}

at every stage of the algorithm. So let

P (n)

be the statement that

a s_{n} + b t_{n} = r_{n}

. We have seen above that

P (1)

holds. We also have

s_{2} = - q_{1}

and

t_{2} = 1

. So

r_{1} = b - a \cdot q_{1} = a \cdot (- q_{1}) + b \cdot 1 = a s_{2} + b t_{2}

. So

P (2)

holds. Assume

P (n)

holds for all

n < k

, then we have

\begin{aligned} a s_{k} + b t_{k} & = a (s_{k - 2} - q_{k - 1} s_{k - 1}) + b (t_{k - 2} - q_{k - 1} t_{k - 1}) \\ = (a s_{k - 2} + b t_{k - 2}) + q_{k - 1} (a s_{k - 1} + b t_{k - 1}) \\ = r_{k - 3} - r_{k - 2} \cdot q_{k - 1} \\ = r_{k - 1} . \end{aligned}

Hence

P (k)

holds, so by strong induction

P (n)

holds.

□

Notice that if $a < 0$ , then apply the above algorithm to $| a |$ and $b$ then use $- s$ instead of $s$ .

History:

Theorem 5.7 is often known as Bezout’s identity named after Étienne Bézout (French mathematician 1730–1783), while the Extended Euclidean algorithm is named after Euclid of Alexandria (Greek mathematician, 325BC-265BC). One might wonder at the gap between between those two mathematicians given Bezout’s identity follows immediately from the algorithm.

In his series of books The Elements, Euclid gives an algorithm to find the gcd of two numbers. The algorithm uses the fact that

gcd (a, b) = gcd (a, b - a)

and Euclid uses repeated subtractions. This was improved to give the above algorithm without

s_{k}

and

t_{k}

. Around 1000 years later, Aryabhata (Indian mathematician, 476-550) wrote the book Aryabhatiya which contained a version of the Extended Euclidean Algorithm. It is not clear when the algorithm was known in Europe, but it was used by Claude Gaspar Bachet de Méziriac (French mathematician, 1581-1638), who was alive a full 100 years before Bezout. What Bezout did was to show that a version the Extended Euclidean Algorithm could be used for polynomials and a version of Bezout’s identity existed for polynomials (You can find out how polynomial rings are similar to

Z

in units like Algebra 2).

Example:

We want to compute $gcd (323, 1451)$ and find $s, t \in Z$ such that $gcd (1451, 323) = 323 s + 1451 t$ . We go through the algorithm using the following table

$k$	$s_{k}$	$t_{k}$	Calculation	$q_{k}$	$r_{k}$
$0$	$0$	$1$	-	-	-
$1$	$1$	$0$	$1451 = 323 \cdot 4 + 159$	$4$	$159$
$2$	$0 - 1 \cdot 4 = - 4$	$1 - 0 \cdot 4 = 1$	$323 = 159 \cdot 2 + 5$	$2$	$5$
$3$	$1 - (- 4) \cdot 2 = 9$	$0 - 1 \cdot 2 = - 2$	$159 = 5 \cdot 31 + 4$	$31$	$4$
$4$	$- 4 - 9 \cdot 31 = - 283$	$1 - (- 2) \cdot 31 = 63$	$5 = 4 \cdot 1 + 1$	$1$	$1$
$5$	$9 - (- 283) \cdot 1 = 292$	$- 2 - 63 \cdot 1 = - 65$	$4 = 1 \cdot 4 + 0$	$4$	$0$

Hence $gcd (323, 1451) = 1 = 323 \cdot (292) + 1451 \cdot (- 65)$ .

5.2 Primes and the Fundamental Theory of Arithmetic

We now move on to look at a fundamental property of $Z$ . First we need some more definitions.

Definition 5.12:

We say an integer $p > 1$ is prime if for all $a, b \in Z$ if $p ∣ a b$ then $p ∣ a$ or $p ∣ b$ .

Equivalently, if

a b \in p Z

then

a \in p Z

b \in p Z

Remark:

We negate the above statement to say an integer

p > 1

is not a prime (often called a composite) if there exists

a, b \in Z

such that

p ∣ a b

p ∤ a

and

p ∤ b

Theorem 5.13:

An integer $p > 1$ is a prime if and only if the only positive divisors of $p$ are $1$ and $p$ .

Proof.

We prove both directions separately.

$\Rightarrow) .$ For a proof by contradiction, suppose $p$ is prime but there exists a positive divisor $a$ which is not $1$ or $p$ . Then we have $1 < a < p$ and there exists $b \in Z$ such that $a b = p$ . Since $p > 1$ and $a > 1$ , we have $1 < b < p$ . Now $p = a b$ means $p | a b$ . However, $p ∤ a$ and $p ∤ b$ since $1 < a, b < p$ . Hence $p$ is not prime, which is a contradiction. So if $p$ is prime, then the only positive integers of $p$ are $1$ and $p$ .

\Leftarrow) .

Suppose

p > 1

is an integer whose only positive divisors are

1

and

p

. Let

a, b \in Z

be such that

p ∣ a b

. If

p ∣ a

then we satisfy the definition of

p

being prime. So suppose

p ∤ a

, then

gcd (p, a) = 1

(since

p

has no other positive divisors). By Corollary 5.8, we have that

p ∣ b

, hence satisfying the definition of being prime.

□

Remark:

The above theorem means that if an integer

n > 1

is composite, it has a positive divisors which is not

1

n

. I.e., there exists

a \in Z

such that

a | n

and

1 < a < n

Interest:

Most student are used to Theorem 5.13 to be the definition of a prime number. In fact, Euclid introduced prime numbers to be ones that satisfied that definition (he used the Greek word protos meaning first, as prime numbers are the first numbers from which every other numbers arise. The word prime comes from Latin “primus” which also means first). Around the 19th century, it was discovered that there are some rings in which elements satisfying Definition 5.12 and elements satisfying Theorem 5.13 are different. Following this, mathematician decided that elements satisfying Theorem 5.13 should be called irreducible (they can not be reduced into smaller elements). You can learn more about irreducible elements in general rings in a unit like Algebra 2.

Theorem 5.14: (Fundamental Theorem of Arithmetic)

Let $n \in Z_{+}$ be such that $n > 1$ . Then there exist primes $p_{1}, \dots, p_{r}$ such that $n = p_{1} p_{2} \dots p_{r}$ .

Furthermore, this factorisation is unique up to re-arranging, that is if

n = p_{1} \dots p_{r} = q_{1} \dots q_{t}

where

q_{1}, \dots, q_{t}

are also primes, then

r = t

and, if we order

p_{i}

and

q_{i}

such that

p_{i} \leq p_{i + 1}

and

q_{i} \leq q_{i + 1}

for all

i

, then

p_{i} = q_{i}

for all

i

Proof.

We use a proof by induction to show the existence of the prime divisors of $n$ . Let $P (n)$ be the statement that “there exists some primes $p_{1}, \dots, p_{r}$ such that $n = p_{1} \dots p_{r}$ ”. We note that $2$ is prime, hence $P (2)$ holds. Assume $P (k)$ holds for all $k \leq n$ and consider $P (n + 1)$ .

If $n + 1$ is prime, then $P (n + 1)$ holds. If $n + 1$ is composite, then it has a non-trivial divisor, say $a$ . So $n + 1 = a b$ . Note as before that $1 < a, b < n + 1$ . So by the induction hypothesis, both $a$ and $b$ can be written as a product of prime. Hence $n$ can be written as a product of prime and $P (n)$ holds. So by the principle of (strong) induction, we have all integers $n \geq 2$ can be written as a product of primes.

We use a proof by contradiction to show the uniqueness (up to re-ordering) of the prime divisors of $n$ . Let $S$ be the subset of $Z_{+}$ containing the integers whose factorisation is not unique. For a contradiction, we assume $S$ is non-empty, hence by the Well Ordering Principle, it has a least element, call it $n$ . Suppose $n = p_{1} \dots p_{r} = q_{1} \dots q_{t}$ where $p_{i}$ and $q_{i}$ are primes.

Note that since

p_{1} | n

we have

p_{1} ∣ q_{1} \dots q_{t} = q_{1} (q_{2} \dots q_{t})

. If

p_{1} ∤ q_{1}

, by definition

p_{1} ∣ (q_{2} \dots q_{t})

. In this case, if

p_{1} ∤ q_{2}

then by definition

p_{1} ∣ (q_{3} \dots q_{t})

. Continuing this argument, we see that there exists

i

such that

p_{1} ∣ q_{i}

. Without loss of generality, re-order the

q_{i}

’s so that

p_{1} | q_{1}

. Since

q_{1}

is prime, it has two positive divisors,

1

and

q_{1}

. Since

p_{1} > 1

(as it is prime), and a divisor of

q_{1}

we deduce

p_{1} = q_{1}

. Hence

p_{1} \dots p_{r} = q_{1} \dots q_{t}

implies

p_{2} \dots p_{r} = q_{2} \dots q_{t} = a

. Since

p_{2} \dots p_{r} = q_{2} \dots q_{t}

is two distinct prime factorisation of

a

, we have

a \in S

. However,

a < n

contradicting the minimality of

n

. Hence

S

must be empty, so every

n > 1

has a unique factorisation.

□

Corollary 5.15:

There are infinitely many prime numbers in $Z_{+}$ .

Proof.

To the contrary, suppose that there are only finitely many primes, say $p_{1}, p_{2}, \dots, p_{m}$ for some $m \in Z_{+}$ . Set $n = p_{1} p_{2} \dots p_{m} + 1$ . Clearly $m \geq 2$ , as $2$ is a prime. Hence $n > 1$ . By the Fundamental Theorem of Arithmetic, $n$ can be factored as a product of primes. So we let $q$ be some prime dividing $n$ . Then $n = q k$ for some $k \in Z_{+}$ . Since there are only finitely many primes, we must have that $q = p_{j}$ for some $j \in Z_{+}$ , $1 \leq j \leq m$ . Hence, $n = p_{j} k = p_{1} p_{2} \dots p_{m} + 1,$ so $1 = n - p_{1} p_{2} \dots p_{m} = p_{j} (k - \frac{p_{1} p_{2} \dots p_{m}}{p_{j}}) .$ Hence the prime $p_{j}$ divides $1$ , which is a contradiction. The result follows.

□

History:

The above proof is often called Euclid’s proof as it can be found in his books the elements. The proof follows directly his proof that every number can be factorised as a product of primes. Euclid did not prove the uniqueness of such factorisation, instead a proof of the uniqueness of such factorisation can be found in Diophantus of Alexandria (Greek mathematician, 200BC - 284BC) book Arithmetica (the title presumably linked to the Greek word arithmos meaning “number”).

The Fundamental Theorem of Arithmetic can be used to find solutions to equations such as the one below.

Example:

Problem: Find all square numbers $n^{2} \in Z_{+}$ such that there exists a prime $p$ with $n^{2} = 5 p + 9$ .

Solution: First, we suppose we have a prime $p$ such that $5 p + 9 = n^{2}$ for some $n \in Z_{+}$ . We want to find constraints on $p$ . We have that $5 p = (n + 3) (n - 3) .$ By the Fundamental Theorem of Arithmetic, the only positive factors of $5 p$ are $1, 5, p$ and $5 p$ . That is $n + 3 \in {1, 5, p, 5 p}$ , so let us analyse these four cases.

Suppose $n + 3 = 1$ . Then $n - 3 = - 5$ , that is $5 p = (n + 3) (n - 3) = - 5$ . But this implies that $p = - 1$ , which is not prime. It follows that $n + 3 \neq 1$ .
Suppose $n + 3 = 5$ . Then $n - 3 = - 1$ , so $5 p = (n + 3) (n - 3) = - 5$ and hence $p = - 1$ , which is a contradiction. Hence, $n + 3 \neq 5$ .
Suppose $n + 3 = p$ . Then $n - 3 = p - 6$ , so $5 p = p (p - 6)$ . Hence $5 = p - 6$ , so $p = 11$ , which is prime.
Suppose $n + 3 = 5 p$ . Then $5 p = (n + 3) (n - 3) = 5 p (n - 3)$ . Hence $n - 3 = 1$ , and so $n = 4$ . Then $5 p = (n + 3) (n - 3) = 7$ . But this is not possible since $5 ∤ 7$ . Hence, $n + 3 \neq 5 p$ .

Summarising all of the above cases, we have that if $p$ is a prime such that $5 p + 9 = n^{2}$ , for some $n \in Z_{+}$ , then $p = 11.$

On the other hand, let

p = 11

. Then

5 p + 9 = 55 + 9 = 64 = 8^{2}

. Hence, the only squares

n \in Z_{+}

such that there exists a prime

p

with

5 p + 9 = n^{2}

n = 8

(and

p = 11

Remark:

Two things to note:

One can use Theorem 5.14 to show that for $x \in Q_{+}$ , there are $a, b \in Z_{+}$ so that $gcd (a, b) = 1$ and $x = \frac{a}{b} .$ Furthermore, one can show that $a, b$ are unique. We leave this as an exercise.
Every $n \in Z$ with $n > 1$ can be expressed uniquely as $n = \prod_{i = 1}^{r} p_{i}^{n_{i}}$ where $p_{i}$ are distinct primes and $n_{i} \in Z_{\geq 0}$ . We can extend this representation to include $1$ by saying $1$ is equal to the empty product.

With the above representation of integers, we can revisit the greatest common divisor.

Lemma 5.16:

Let $a, b \in Z_{+}$ and write $a = \prod_{i = 1}^{r} p_{i}^{a_{i}}$ and $b = \prod_{i = 1}^{r} p_{i}^{b_{i}}$ where $a_{i}, b_{i} \in Z$ and $p_{i}$ are the prime divisors of $a$ or $b$ (and hence $a_{i}, b_{i}$ can be $0$ ). Then $a ∣ b$ if and only if $a_{i} \leq b_{i}$ for all $i$ .

Proof.

We prove both directions separately.

$\Rightarrow) .$ For a contradiction, suppose $a ∣ b$ and there exists $i$ such that $a_{i} > b_{i}$ . We can write $a = p_{i}^{a_{i}} \cdot A$ where $gcd (p_{i}, A) = 1$ , and we can write $b = p_{i}^{b_{i}} \cdot B$ where $gcd (p_{i}, B) = 1$ . Since $a | b$ , there exists $c \in Z_{+}$ such that $a c = b$ , that is $p_{i}^{a_{i}} \cdot A \cdot c = p_{i}^{b_{i}} \cdot B$ . Rearranging, we have $p_{i}^{a_{i} - b_{i}} \cdot A \cdot c = B$ . Now since $a_{i} - b_{i} > 0$ , we have $p_{i} | B$ , which is a contradiction since $gcd (B, p_{i}) = 1$ . Hence, for all $i$ , $a_{i} \leq b_{i}$ .

\Leftarrow) .

a_{i} \leq b_{i}

for all

i

then

b = (\prod_{i = 1}^{r} p_{i}^{a_{i}}) (\prod_{i = 1}^{r} p_{i}^{b_{i} - a_{i}}) = a c .

Hence

a ∣ b

□

Corollary 5.17:

Let $a, b \in Z$ such that neither are $0$ and write $| a | = \prod_{i = 1}^{r} p_{i}^{a_{i}}$ and $| b | = \prod_{i = 1}^{r} p_{i}^{b_{i}}$ where $a_{i}, b_{i} \in Z$ and $p_{i}$ are the prime divisors of $| a |$ or $| b |$ (and hence $a_{i}, b_{i}$ can be $0$ ). Then $gcd (a, b) = gcd (| a |, | b |) = \prod_{i = 1}^{r} p_{i}^{min (a_{i}, b_{i})} .$

Proof.

The greatest common divisor is a divisor of both and therefore the prime factorization of the gcd can contain only primes that appear in both $| a |$ and $| b |$ . Each of those primes can only appear with at most the minimum power appearing in both. The greatest common divisor will then be obtained by choosing as many primes as possible with the minimum power appearing.

□

Definition 5.18:

We say that

a, b

are co-prime if

gcd (a, b) = 1

The word co-prime is to demonstrate that there are no prime factors in common between $a$ and $b$ (they are prime with respect to each other).

Remark:

We say

a, b

are not-coprime if there exists

c > 1

such that

c | a

and

c | b

Definition 5.19:

Let

a, b, c \in Z

. Then

c

is a common multiple of

a

and

b

if both

a ∣ c

and

b ∣ c .

Definition 5.20:

Let

a, b \in Z

such that neither are

0

. Then the lowest common multiple of

a

and

b

, denoted by

lcm (a, b)

, is the smallest positive integer which is divisible by both

a

and

b

Lemma 5.21:

Let $a, b \in Z$ such that neither are $0$ and write $| a | = \prod_{i = 1}^{r} p_{i}^{a_{i}}$ and $| b | = \prod_{i = 1}^{r} p_{i}^{b_{i}}$ where $a_{i}, b_{i} \in Z$ and $p_{i}$ are the prime divisors of $| a |$ or $| b |$ (and hence $a_{i}, b_{i}$ can be $0$ ). Then $lcm (a, b) = lcm (| a |, | b |) = \prod_{i = 1}^{r} p_{i}^{max (a_{i}, b_{i})} .$

Proof.

The lowest common multiple is a multiple of both and therefore the prime factorization of the lcm must contain all the primes that appear in both $| a |$ and $| b |$ . Those primes must appear with at least the maximum power appearing in each. The lest common multiple will then be obtained by choosing as few primes as possible with the maximum power appearing.

□

Theorem 5.22:

Let $a, b \in Z$ such that neither are $0$ . Then $lcm (a, b) gcd (a, b) = | a b | .$

Proof.

We first note that for any two numbers $a_{i}, b_{i}$ we have $min (a_{i}, b_{i}) + max (a_{i}, b_{i}) = a_{i} + b_{i}$ . Write $| a | = \prod_{i = 1}^{r} p_{i}^{a_{i}}$ and $| b | = \prod_{i = 1}^{r} p_{i}^{b_{i}}$ . Then: $\begin{aligned} lcm (a, b) gcd (a, b) & = (\prod_{i = 1}^{r} p_{i}^{max (a_{i}, b_{i})}) (\prod_{i = 1}^{r} p_{i}^{min (a_{i}, b_{i})}) \\ = \prod_{i = 1}^{r} p_{i}^{min (a_{i}, b_{i}) + max (a_{i}, b_{i})} \\ = \prod_{i = 1}^{r} p_{i}^{a_{i} + b_{i}} \\ = (\prod_{i = 1}^{r} p_{i}^{a_{i}}) (\prod_{i = 1}^{r} p_{i}^{b_{i}}) \\ = | a | | b | = | a b | . \end{aligned}$

□

Corollary 5.23:

Let $a, b \in Z$ such that neither are $0$ . If $g c d (a, b) = 1$ , then $lcm (a, b) = | a b | .$

Interest:

If there’s time, we’ll talk more about polynomial ring or famous open problems in Number Theory or the fundamental theorem of algebra.

6 Moving from one set to another - Functions

In mathematics, we are very often concerned with functions (also called maps). Some functions model the behaviour of complex systems, while other functions allow us to compare two sets. Loosely speaking, a function is a black box which takes an input from one set and gives an output in another. Every input must have an output, and there is no randomness, so the same input always gives the same output. To give a formal definition of functions, we need to go back to sets.

6.1 Definitions

Before we can start working with functions, we need to give a precise mathematical definition.

Definition 6.1:

Given two sets

X, Y

, we define the Cartesian product of

X

and

Y

, denoted by

X \times Y

, by

X \times Y = {(x, y) : x \in X, y \in Y} .

Remark:

Note that if

X = \emptyset

Y = \emptyset

, then

X \times Y = \emptyset

Example:

We have that

R \times R = {(x, y) : x, y \in R} .

R \times R

is the Cartesian plane.

Example:

Let $X = {1, 2, 3}$ and $Y = {4, 5, 6}$ . Then $X \times Y = {(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)},$ $Y \times X = {(4, 1), (4, 2), (4, 3), (5, 1), (5, 2), (5, 3), (6, 1), (6, 2), (6, 3)} .$

Note that

X \times Y \neq Y \times X

Definition 6.2:

Let $X, Y$ be non-empty sets. A function $f$ from $X$ into $Y$ , denoted by $f : X \to Y$ , is a set $f \subseteq X \times Y$ which satisfies that for each element $x \in X$ there exists exactly one element $f (x) \in Y$ such that $(x, f (x)) \in f$ . That is, $f = {(x, f (x)) : x \in X} \subseteq X \times Y .$

Symbolically, we have

f \subseteq X \times Y

is a function if

\forall x \in X, \exists! y \in Y

such that

(x, y) \in f

(or

y = f (x)

Remark:

We negate the above statement to say that, symbolically, $f \subseteq X \times Y$ is not a function if $\exists x \in X$ such that $\forall y \in Y, (x, y) \notin f$ , or, $\exists x \in X$ such that $\exists y_{1}, y_{2} \in Y$ such that $(x, y_{1}) \in f, (x, y_{2}) \in f$ and $y_{1} \neq y_{2}$ .

In other words,

f \subseteq X \times Y

is not a function if there is

x \in X

such that either

(x, y) \notin f

for all

y \in Y

, or there exists two distinct

y_{1}, y_{2} \in Y

with

(x, y_{1}), (x, y_{2}) \in f

Example:

Let

X = {1, 2, 3}

and

Y = {4, 5, 6}

. Let

f = {(1, 4), (2, 5), (3, 4)},

g = {(1, 4), (1, 5), (3, 6)} .

Then

f

is a function from

X

into

Y

since, for each

x \in X

, there is exactly one

y \in Y

such that

(x, y) \in f

. However,

g

is not a function from

X

into

Y

for two different reasons. One reason is because

2 \in X

but there is no value of

y \in Y

such that

(2, y) \in g

. The other is

1 \in X

, but there are two different values of

y \in Y

(namely

y = 4

and

y = 5

) such that

(1, y) \in g

Notation:

Consider $f : X \to Y$ and let $A \subseteq X$ . Then, the image of $A$ under $f$ is denoted by $f [A] = {f (x) : x \in A} = {y \in Y : \exists x \in A with y = f (x)} .$

We have

f [\emptyset] = \emptyset

Remark:

In this introductory course we use the notation

f []

to distinguish when we expect the input and the output to be a set (as opposed to

f ()

which expects an element as input and output). However, many sources make no such distinction, and will use

f ()

regardless of whether the input is a set or an element. The key thing to note is that the output is an element of

Y

if the input is an element of

X

, and a subset of

Y

if the input is a subset of

X

Definition 6.3:

Consider

f : X \to Y

. We say that

X

is the domain of

f

and

Y

is the co-domain of

f

. The image (or range) of

f

is the image of

X

under

f

, i.e., the set

f [X]

Etymology:

The word function comes from the Latin functus which is a form of the verb meaning “to perform”. A function can be thought as a set of operations that are performed on each values that are inputed. Interestingly, the word function (which was first introduced by Gottfried Leibniz (German mathematician, 1646 - 1716) ) was used for maps that returned more than one output for the same input. It was only in the later half of the 20th century that a function was restricted to “exactly one output for each input”.

The word domain comes from the Latin domus which means “house, home”. The domain is the place where all the inputs of

f

live in. Another way to look at it is, the word domus gave the word dominus which means “lord, master”, the person that ruled the house (i.e., a domain is the property owned by a Lord). The domain of a function is the set of all the

x

that

f

has “control over”.

Example:

Define $f : Z \times Z \to Z$ by $f ((m, n)) = n^{2}$ , for all $m, n \in Z$ . We have $Z \times Z$ is the domain of $f$ , while $Z$ is the codomain. The image of $f$ is ${n^{2} : n \in Z}$ (i.e., the square numbers).

Now, let

A = {(m, n) : m, n \in Z, n = 2 m}

. Note that

A = {(m, 2 m) : m \in Z} .

The image of

A

under

f

\begin{aligned} f [A] & = {f ((m, n)) : (m, n) \in A} \\ = {f ((m, 2 m)) : m \in Z} \\ = {(2 m)^{2} : m \in Z} \\ = {4 m^{2} : m \in Z} . \end{aligned}

Theorem 6.4:

Consider $f : X \to Y$ and $g : X \to Y$ . Then $f = g$ if and only if $f (x) = g (x)$ for all $x \in X$ .

Proof.

We prove both implication separately.

$\Rightarrow$ ). First, suppose that $f = g$ . Take (arbitrary) $x \in X$ and choose (the unique) $y \in Y$ such that $(x, y) \in f$ . Then $(x, y) \in g$ (since $f = g$ ) and by definition $f (x) = y = g (x)$ . This is true for all $x \in X$ , so it follows that $f (x) = g (x)$ for every $x \in X$ .

$\Leftarrow$ ). Second, suppose that $f (x) = g (x)$ for all $x \in X$ . Then $f = {(x, f (x)) : x \in X} = {(x, g (x)) : x \in X} = g .$

It follows that

f = g

if and only if

f (x) = g (x),

for all

x \in X

□

Proof techniques:

Notice that in this proof (and all others) that the correct punctuation follows each equations, whether they are in-line or own their own (displayed). Maths should be read as part of a grammatically correct sentence. For example “Consider $f : X \to Y$ and $g : X \to Y$ ” reads “Consider $f$ a function from $X$ to $Y$ and $g$ a function from $X$ to $Y$ ”, or “then $f = {(x, f (x)) : x \in X} = {(x, g (x)) : x \in X} = g .$ ” reads as “then $f$ is equal to the set of pairs $(x, f (x))$ where $x$ is in $X$ which is equal to….”.

It is for this reason that we should not start a sentence with maths symbol.

□

6.2 Injective, surjective and bijective

We now introduce three important properties that a function may or may not have.

Definition 6.5:

We say a function $f : X \to Y$ is injective (or one-to-one, or an injection) if for all $x_{1}, x_{2} \in X$ , we have that if $x_{1} \neq x_{2}$ then $f (x_{1}) \neq f (x_{2})$ .

Using the contrapositive, an alternative definition is that a function

f : X \to Y

is injective if for all

x_{1}, x_{2} \in X

, we have if

f (x_{1}) = f (x_{2})

then

x_{1} = x_{2}

Remark:

We negate the above statement to say that

f : X \to Y

is not injective if there exists two distinct

x_{1}, x_{2} \in X

such that

f (x_{1}) = f (x_{2})

Definition 6.6:

We say a function

f : X \to Y

is surjective (or onto, or a surjection) if for all

y \in Y

, there exists

x \in X

such that

f (x) = y

Remark:

We negate the above statement to say that

f : X \to Y

is not surjective if there exists

y \in Y

such that for all

x \in X

f (x) \neq y

An injective function from $X$ to $Y$, there are at most one arrow going to any point in $Y$. However the function is not surjective as there are some points in $Y$ with no arrow going to it. Figure made with Geogebra.

Figure 6.1: An injective function from $X$ to $Y$ , there are at most one arrow going to any point in $Y$ . However the function is not surjective as there are some points in $Y$ with no arrow going to it. Figure made with Geogebra.

A surjective function from $X$ to $Y$, every point in $Y$ has at least one arrow going to it. However the function is not injective as there are some points in $Y$ with at least two arrows going to it. Figure made with Geogebra

Figure 6.2: A surjective function from $X$ to $Y$ , every point in $Y$ has at least one arrow going to it. However the function is not injective as there are some points in $Y$ with at least two arrows going to it. Figure made with Geogebra

Definition 6.7:

A function

f : X \to Y

is called bijective if it is both injective and surjective.

Remark:

We negate the above statement to say that

f : X \to Y

is not bijective if it is not injective or it is not surjective.

Example:

Define $f : R \times R \to R \times R$ by $f ((m, n)) = (m + n, m - n),$ for all $m, n \in R$ . We want to show that $f$ is bijective.

We first show it is injective by using the contrapositive. Let $(m_{1}, n_{1}), (m_{2}, n_{2}) \in R \times R$ (the domain) be such that $f ((m_{1}, n_{1})) = f ((m_{2}, n_{2}))$ . That is $(m_{1} + n_{1}, m_{1} - n_{1}) = (m_{2} + n_{2}, m_{2} - n_{2})$ , i.e. $m_{1} + n_{1} = m_{2} + n_{2}$ and $m_{1} - n_{1} = m_{2} - n_{2}$ . Therefore, $2 m_{1} = (m_{1} + n_{1}) + (m_{1} - n_{1}) = (m_{2} + n_{2}) + (m_{2} - n_{2}) = 2 m_{2}$ . Since $2 \neq 0$ , we can divide by $2$ to deduce $m_{1} = m_{2}$ . Hence $m_{1} + n_{1} = m_{2} + n_{2} = m_{1} + n_{2}$ means $n_{1} = n_{2}$ . Therefore $(m_{1}, n_{1}) = (m_{2}, n_{2})$ and $f$ is injective.

We next show that $f$ is surjective. We begin by choosing $(u, v) \in R \times R$ [the co-domain].

[Scratch work: We want to find $(m, n) \in R \times R$ such that $f (m, n) = (u, v)$ . So we need to find $(m, n) \in R \times R$ such that $m + n = u$ and $m - n = v$ . Hence, we must have $m = u - n$ and $m = v + n$ . Then $u - n = v + n$ , or equivalently, $u - v = 2 n$ , or equivalently, $\frac{u - v}{2} = n$ . If we have $\frac{u - v}{2} = n$ and $m = u - n$ , then $m = u - \frac{u - v}{2} = \frac{u + v}{2} .$ ]

We set $m = \frac{u + v}{2}$ and $n = \frac{u - v}{2}$ . Then $(m, n) \in R \times R$ [the domain], and $f (m, n) = (m + n, m - n) = (\frac{u + v}{2} + \frac{u - v}{2}, \frac{u + v}{2} - \frac{u - v}{2}) = (u, v) .$ It follows that $f$ is surjective.

Since

f

is injective and surjective, it is bijective.

Remark:

Note that a function consists of three information: the domain; the co-domain; and how

f (x)

is defined. Changing any of these three things can change the behaviour of

f : X \to Y

Example:

Let $f_{1} : Z_{+} \to Z_{+}$ be defined by $f_{1} (n) = n^{2}$ for all $n \in Z_{+}$ . We claim that $f_{1}$ is injective but not surjective.

We first show $f_{1}$ is injective. Let $n_{1}, n_{2} \in Z_{+}$ be such that $n_{1} \neq n_{2}$ . Without loss of generality, we assume $n_{1} < n_{2}$ . Note that $n_{1}, n_{2} > 0$ by definition, so $n_{1} < n_{2}$ means $n_{1}^{2} < n_{2} n_{1}$ and similarly, $n_{1} n_{2} < n_{2}^{2}$ . Combining these two inequalities together, we get $n_{1}^{2} < n_{1} n_{2} < n_{2}^{2}$ , i.e. $f_{1} (n_{1}) < f_{1} (n_{2})$ so $f (n_{1}) \neq f (n_{2})$ . So $f_{1}$ is injective.

We next show

f_{1}

is not surjective. We take

2 \in Z_{+}

and claim that for all

n \in Z_{+}

we have

f_{1} (n) = n^{2} \neq 2

. For a contradiction, suppose such an

n

exists, i.e.

n^{2} = 2

. Note that

n > 1

since

1^{1} = 1 < 2

. However, if

n \geq 2

, then

n^{2} \geq 4 > 2

which is not possible. Hence

1 < n < 2

, but there are no natural numbers between

1

and

2

. Hence

n

does not exists and

f_{1}

is not surjective.

Example:

Let

A = {n^{2} : n \in Z_{+}}

and let

f_{2} : Z \to A

be defined by

f_{2} (n) = n^{2}

for all

n \in Z_{+}

. We claim that

f_{2}

is surjective but not injective. Indeed, we have that

1, - 1 \in Z

with

1 \neq - 1

but

f_{2} (- 1) = 1 = f_{2} (1)

. Hence

f

is not injective. Next, take

m \in A

. By definition of

A

, there exists

n \in Z_{+}

such that

m = n^{2}

. Since

Z_{+} ⊊ Z

, we have

n \in Z

and

f_{2} (n) = n^{2} = m

as required.

Example:

Let

f_{3} : Z_{+} \to A

be defined by

f_{3} (n) = n^{2}

for all

n \in Z_{+}

. We leave it as an exercise to show that

f_{3}

is bijective.

Proposition 6.8:

Consider $f : X \to Y$ . Then we have that $f$ is injective if and only if for all $y \in f [X]$ , there exists a unique $x \in X$ such that $f (x) = y$ .

Proof.

We show the two implications separately.

$\Rightarrow$ ). Suppose that $f$ is injective, and take $y \in f [X]$ . Hence, $\exists x_{1} \in X$ such that $f (x_{1}) = y$ . Now, suppose $\exists x_{2} \in X$ such that $x_{2} \neq x_{1}$ . Since $f$ is injective, we have that $f (x_{1}) \neq f (x_{2}) = y .$ Hence, $\forall y \in f [X]$ , $\exists! x \in X$ such that $f (x) = y$ .

$\Leftarrow$ ). Suppose that $\forall y \in f [X]$ , $\exists! x \in X$ such that $f (x) = y$ . Take $x_{1}, x_{2} \in X$ such that $x_{1} = x_{2}$ and let $y = f (x_{1}) .$ By assumption, $x$ is the only element of $X$ that $f$ maps to $y$ . Then $y \neq f (x_{2}),$ so $f (x_{1}) \neq f (x_{2}) .$ Hence, we have shown that for $x_{1}, x_{2} \in X$ with $x_{1} \neq x_{2}$ , we have that $f (x_{1}) \neq f (x_{2})$ . Therefore, $f$ is injective.

It follows that

f

is injective if and only if

\forall y \in f [X]

\exists! x \in X

such that

f (x) = y

□

Remark:

Caution! Note that the converse of the above definition is, “for all $x \in X$ , there exists a unique $y \in Y$ such that $f (x) = y$ ”, which is the definition of a function.

So do not confuse the definition of injectivity with the definition of a function. In particular, one definition can be true without the other being true.

For example, consider

f = {(y^{2}, y) : y \in Z} \subseteq Z \times Z

. We have that for each

y \in Z

, there exists a unique

x \in Z

such that

(x, y) \in f

(namely

x = y^{2}

) so

f

satisfies the definition of being injective. However

f

is not a function since, for example,

(4, 2), (4, - 2) \in f

(so it doesn’t make sense to talk about whether

f

is injective or not).

Proposition 6.9:

Consider $f : X \to Y$ . Then $f$ is surjective if and only if $f [X] = Y$ .

Proof.

We show the two implications separately.

$\Rightarrow$ ). Suppose $f$ is surjective, we will show $f [X] = Y$ by showing $f [X] \subseteq Y$ and $Y \subseteq f [X]$ . Note that by definition $f [X] = {y \in Y : \exists x \in X with f (x) = y} \subseteq Y$ .

Take $y_{0} \in Y$ . Since $f : X \to Y$ is surjective, there exists $x_{0} \in X$ so that $f (x_{0}) = y_{0}$ . Thus $y_{0} \in f [X]$ . As $y_{0}$ was arbitrary, we have $\forall y_{0} \in Y$ , $y_{0} \in f [x]$ . Hence $Y \subseteq f [X]$ .

\Leftarrow

). Suppose

f [X] = Y

. Choose

y_{0} \in Y = f [X]

. By definition of

f [X]

, there exists

x_{0} \in X

such that

y_{0} = f (x_{0})

. This argument holds for all

y_{0} \in Y

, hence

f

is surjective.

□

Corollary 6.10:

Consider $f : X \to Y$ . Then we have that $f$ is bijective if and only if $\forall y \in Y$ , $\exists! x \in X$ such that $f (x) = y$ .

Proof.

We show the two implications separately.

$\Rightarrow$ ). Suppose $f$ is bijective, then it is surjective so by Proposition 6.9 $Y = f [X]$ . We also have $f$ is injective, so by Proposition 6.8 we have $\forall y \in f [X] = Y$ , $\exists! x \in X$ such that $f (x) = y$ as required.

\Leftarrow

). Suppose that

\forall y \in Y

\exists! x \in X

such that

f (x) = y

. In particular,

\forall y \in Y

\exists x \in X

such that

f (x) = y

, i.e. by definition

f

is surjective. By Proposition 6.9

Y = f [X]

, so our assumption becomes

\forall y \in Y = f [X]

\exists! x \in X

such that

f (x) = y

. Hence by Proposition 6.8,

f

is injective. Since we have shown

f

is surjective and injective, we deduce

f

is bijective.

□

Etymology:

The words injective, surjective and bijective all have the same ending “-jective”. This comes from the Latin verb jacere which means “to throw”.

Injective has the prefix in, a Latin word meaning “in, into”. This gave rise to the medical term “injection” where we throw some medecine into the body. An injective function is one that throws the set $X$ into the set $Y$ (every elements of $X$ stay distincts once they are in $Y$ .)

Surjective has the prefix sur a French word meaning “over”, itself coming from the Latin super which means “(from) above”. A surjective function is one that covers the whole of $Y$ (in the sense $X$ is thrown from above to cover $Y$ ).

Bijective has the prefix bi, a Latin word meaning “two”. Once we talk about inverse functions, it will become clear that a bijection is a function that works in two directions.

Theorem 6.11:

Consider $f : X \to Y$ and let $X = U \cup V$ . Then

$f [X] = f [U] \cup f [V]$ .
if $f$ is injective and $U \cap V = \emptyset$ , then $f [U] \cap f [V] = \emptyset$ .

Proof.

a.) Since $U, V \subseteq X$ , we have that $f [U], f [V] \subseteq f [X]$ , so $f [U] \cup f [V] \subseteq f [X]$ . On the other hand, take $x \in X$ . Then $x \in U$ or $x \in V$ , so $f (x) \in f [U]$ or $f (x) \in f [V]$ . Therefore $f (x) \in f [U] \cup f [V]$ . Since this holds for all $x \in X$ , we have $f [X] \subseteq f [U] \cup f [V]$ . Hence, $f [X] = f [U] \cup f [V]$ .

b.) Suppose that

f

is injective and

U \cap V = \emptyset

. To the contrary, suppose that there exists some

y \in f (U) \cap f (V)

. Hence, there exists some

u \in U

such that

y = f (u)

, and there exists some

v \in V

such that

y = f (v)

. It follows that

f (u) = y = f (v)

. Since

f

is injective, we have that

u = v

. Hence,

u \in U \cap V

, which contradicts our initial assumption. It follows that

f (U) \cap f (V) = \emptyset .

□

Corollary 6.12:

Suppose that $f : X \to Y$ is bijective, and let $A \subseteq X$ and $B = X ∖ A$ . Then $f [A] \cap f [B] = \emptyset$ and $f [A] \cup f [B] = Y$ .

Proof.

Since $f$ is bijective it is surjective and $Y = f [X]$ . We also note that $A \cup B = X$ , so we satisfy the hypothesis of the above theorem and deduce $f [A] \cup f [B] = f [X] = Y$ .

Furthermore since

f

is bijective, it is injective and we have

A \cap B = \emptyset

. Hence we satisfy the hypothesis of part b.) of the above theorem and can deduce that

f [A] \cap f [B] = \emptyset

□

6.3 Pre-images

As well as looking at the image of various subset of $X$ under the function $f : X \to Y$ , we are often interested in looking at where certain subsets of $Y$ come from.

Definition 6.13:

Consider $f : X \to Y$ and let $V \subseteq Y$ . We define the pre-image (or inverse image) of $V$ under $f$ by $f^{- 1} [V] = {x \in X : f (x) \in V} .$

We have

f^{- 1} [\emptyset] = \emptyset

Example:

Let $X = {1, 2, 3}$ and $Y = {4, 5, 6}$ . Define the function $f = {(1, 5), (2, 5), (3, 4)} \subseteq X \times Y$ . Then:

$f^{- 1} [{4, 6}] = {3}$ ,
$f^{- 1} [{4, 5}] = {1, 2, 3} = X$ ,
$f^{- 1} [{6}] = \emptyset$ ,
$f^{- 1} [{5}] = {1, 2}$ .

Example:

Let $f : R \times R \to R$ be defined by $f ((x, y)) = 2 x - 5 y$ . We will find $f^{- 1} [{0}]$ and $f^{- 1} [(0, 1)]$ (recall that the open interval $(0, 1) = {x \in R : 0 < x < 1}$ ).

$\begin{aligned} f^{- 1} [{0}] & = {(x, y) \in R \times R : f ((x, y)) \in {0}} \\ = {(x, y) \in R \times R : 2 x - 5 y = 0} \\ = {(x, y) \in R \times R : y = \frac{2 x}{5}} \\ = {(x, \frac{2 x}{5}) : x \in R} . \end{aligned}$

$\begin{aligned} f^{- 1} [(0, 1)] & = {(x, y) \in R \times R : f ((x, y)) \in (0, 1)} \\ = {(x, y) \in R \times R : 2 x - 5 y \in (0, 1)} \\ = {(x, y) \in R \times R : 0 < 2 x - 5 y < 1} \\ = {(x, y) \in R \times R : \frac{5}{2} y < x < \frac{5}{2} y + \frac{1}{2}} . \end{aligned}$

We can also rearrange the last line to say:

f^{- 1} [(0, 1)] = {(\frac{5}{2} y + ε, y) : ϵ, y \in R, 0 < ε < \frac{1}{2}} .

Example:

Define $g : R \to R$ by $g (x) = x^{2}$ . Take $V = [4, \infty) = {x \in R : 4 \leq x}$ . Then

\begin{aligned} g^{- 1} [V] & = {x \in R : g (x) \in V} \\ = {x \in R : x^{2} \in [4, \infty)} \\ = {x \in R : (x \geq 2) \lor (x \leq - 2)} \\ = (- \infty, - 2] \cup [2, \infty) . \end{aligned}

We have the nice results that union and intersection behave as expected.

Theorem 6.14:

Consider $f : X \to Y$ and let $U, V \subseteq Y .$ Then

$f^{- 1} [U \cap V] = f^{- 1} [U] \cap f^{- 1} [V]$ ;
$f^{- 1} [U \cup V] = f^{- 1} [U] \cup f^{- 1} [V] .$

Proof.

We prove a. and leave b. as an exercise.

We have $\begin{aligned} f^{- 1} [U \cap V] & = {x \in X : f (x) \in U \cap V} \\ = {x \in X : f (x) \in U \land f (x) \in V} \\ = {x \in X : f (x) \in U} \cap {x \in X : f (x) \in V} \\ = f^{- 1} [U] \cap f^{- 1} [V] . \end{aligned}$

Therefore the two sets are equal, i.e.

f^{- 1} [U \cap V] = f^{- 1} [U] \cap f^{- 1} [V] .

□

The link between images and pre-images is explored in the following theorem.

Theorem 6.15:

Consider $f : X \to Y$ , and let $U \subseteq X$ and $V \subseteq Y$ . Then

$f [f^{- 1} [V]] \subseteq V$ , and for $f$ surjective, we have $f [f^{- 1} [V]] = V$ .
$U \subseteq f^{- 1} [f [U]]$ , and for $f$ injective, we have $U = f^{- 1} [f [U]] .$

Proof.

We prove a. and leave b. as an exercise.

If $V = \emptyset$ , then $f^{- 1} [V] = \emptyset$ and $f [f^{- 1} [V]] = \emptyset = V$ . So we assume $V \neq \emptyset$ . Similarly, as $\emptyset \subseteq V$ , we have that if $f [f^{- 1} [V]] = \emptyset$ then $f [f^{- 1} [V]] \subseteq V$ . So we also assume $f [f^{- 1} [V]] \neq \emptyset$ .

Suppose that $y \in f [f^{- 1} [V]]$ . Then $y = f (w)$ for some $w \in f^{- 1} [V]$ . By the definition of $f^{- 1} [V]$ , we have $f (w) \in V$ . Hence, $y = f (w) \in V$ . Since $y$ is arbitrary, this shows that every element of $f [f^{- 1} [V]]$ lies in $V$ , that is $f [f^{- 1} [V]] \subseteq V$ .

Now suppose that

f

is surjective. We need to show that

V \subseteq f [f^{- 1} [V]] .

Suppose that

v \in V

. Since

f

is surjective, there exists an

x \in X

such that

f (x) = v

. Then

f (x) \in V

, and it follows that

x \in f^{- 1} [V] .

Hence,

v = f (x) \in f [f^{- 1} [V]] .

Since

v

was arbitrary, this shows that

V \subseteq f [f^{- 1} [V]] .

Summarising, we have that

f [f^{- 1} [V]] = V

whenever

f

is surjective.

□

6.4 Composition and inverses of functions

We now turn our attention on how to apply several functions in a row.

Definition 6.16:

Consider

f : X \to Y

and

g : Y \to Z

. We define the composition of

g

with

f

, denoted by

g \circ f

, by

(g \circ f) (x) = g (f (x)), for all x \in X .

Etymology:

Composition has the same roots as components and composite. It comes from co meaning “together with” and ponere meaning “to put”. So something composite is something “put together” from different components. The composition of two functions is the result of putting two functions together.

Since $f$ assigns to $x \in X$ exactly one value $f (x) \in Y$ , and $g$ assigns to $f (x) \in Y$ exactly one value in $Z$ , we have that $g \circ f$ is a function from $X$ to $Z$ , that is $g \circ f : X \to Z$ .

Remark:

The ordering of

f

and

g

is important, as

g \circ f

and

f \circ g

can be different. In fact, unless

Z = X

, then

f \circ g

probably does not make sense.

We have that composition of functions is associative. That is:

Proposition 6.17:

Consider $f : X \to Y$ , $g : Y \to Z$ and $h : Z \to W$ . Then $h \circ (g \circ f) = (h \circ g) \circ f$ .

Proof.

By Theorem 6.4, to show $h \circ (g \circ f) = (h \circ g) \circ f$ , we need to show that $\forall x \in X$ , we have $(h \circ (g \circ f)) (x) = ((h \circ g) \circ f) (x)$ . Let $x \in X$ . Then $(h \circ (g \circ f)) (x) = h ((g \circ f) (x)) = h (g (f (x))), \forall x \in X$ and $((h \circ g) \circ f) (x) = (h \circ g) (f (x)) = h (g (f (x))), \forall x \in X .$ Thus, $h \circ (g \circ f) = (h \circ g) \circ f$ .

This argument holds for all

x \in X

, hence

h \circ (g \circ f) = (h \circ g) \circ f

□

The above theorem tells us that the notation $h \circ g \circ f$ is unambiguous.

Example:

Let $a, b, c, d \in R$ be such that $a < b$ and $c < d$ . Recall that $[a, b]$ denotes the closed interval from $a$ to $b$ , that is $[a, b] = {x \in R : a \leq x \leq b} .$

We define the following three functions:

$f_{1} : [a, b] \to [0, b - a]$ defined by $f_{1} (x) = x - a$ ;
$f_{2} : [0, b - a] \to [0, d - c]$ defined by $f_{2} (x) = x \frac{d - c}{b - a}$ ;
$f_{3} : [0, d - c] \to [c, d]$ defined by $f_{3} (x) = x + c$ .

We briefly check that $f_{2}$ is indeed a function from $[0, b - a]$ to $[0, d - c]$ by checking $f_{2} ([0, b - a]) \subseteq [0, d - c]$ (in theory one should also do the same with $f_{1}$ and $f_{3}$ , but these two cases are clearer). Pick $x \in [0, b - a]$ , so $0 \leq x \leq b - a$ by definition. Then since $b - a > 0$ we have $0 \leq \frac{x}{b - a} \leq 1$ , and since $d - c > 0$ we have $0 \leq x \frac{d - c}{b - a} \leq d - c$ . So $f_{2} (x) \in [0, d - c]$ . As this is true for all $x \in [0, b - a]$ , we indeed have $f_{2}$ is a function with co-domain $[0, d - c]$ .

We can construct

f : [a, b] \to [c, d]

by composing

f_{3}

with

f_{2}

with

f_{1}

, i.e.,

f = f_{3} \circ f_{2} \circ f_{1}

is defined by

f (x) = c + \frac{(x - a) (d - c)}{b - a}

Properties of functions carry through composition.

Theorem 6.18:

Consider $f : X \to Y$ and $g : Y \to Z$ . Then:

if $f$ and $g$ are injective, we have that $g \circ f$ is injective.
if $f$ and $g$ are surjective, we have that $g \circ f$ is surjective.

Proof.

We prove a. and leave b. as an exercise.

Suppose that

f, g

are injective, and suppose that

x_{1}, x_{2} \in X

are such that

x_{1} \neq x_{2}

. Since

f

is injective, we have that

f (x_{1}) \neq f (x_{2})

. Now, set

y_{1} = f (x_{1})

and

y_{2} = f (x_{2})

. Then

y_{1}, y_{2} \in Y

with

y_{1} \neq y_{2}

. Further, since

g

is injective, we have

g (y_{1}) \neq g (y_{2})

. It follows that

(g \circ f) (x_{1}) = g (f (x_{1})) = g (y_{1}) \neq g (y_{2}) = g (f (x_{2})) = (g \circ f) (x_{2}) .

To conclude, for any

x_{1}, x_{2} \in X

with

x_{1} \neq x_{2}

, we have

(g \circ f) (x_{1}) \neq (g \circ f) (x_{2}) .

Hence

g \circ f

is injective.

□

Corollary 6.19:

Consider $f : X \to Y$ and $g : Y \to Z$ be bijective, then $g \circ f : X \to Z$ is also bijective.

Definition 6.20:

We say that a function $f : X \to Y$ is invertible if there exists a function $g : Y \to X$ such that:

$g \circ f$ is the identity function on $X$ (that is, $\forall x \in X$ , $(g \circ f) (x) = x$ ), and
$f \circ g$ is the identity function on $Y$ (that is, $\forall y \in Y$ , $(f \circ g) (y) = y$ ).

We say $g$ is an inverse of $f$ .

g

is an inverse for

f

, then we also have that

f

is an inverse for

g

Etymology:

We’ve already seen that inverse comes from in and vertere which is the verb “to turn”. The inverse of an function, is one that turns the original function inside out to give back what was originally put in.

Example:

Define

f : R \to R

f (x) = 2 x + 3

, and define

g : R \to R

g (x) = \frac{x - 3}{2}

. Then

\forall x \in R

, we have

(g \circ f) (x) = g (f (x)) = \frac{f (x) - 3}{2} = \frac{(2 x + 3) - 3}{2} = x

and

(f \circ g) (x) = f (g (x)) = 2 \cdot g (x) + 3 = 2 (\frac{x - 3}{2}) + 3 = x .

Hence

g \circ f

is the identity function on

R

, and

f \circ g

is the identity function on

R

. It follows that

f

is invertible where

g

is an inverse of

f

Theorem 6.21:

Suppose $f : X \to Y$ , and suppose that $g : Y \to X$ and $h : Y \to X$ are inverses of $f$ . Then $g = h$ , that is, if $f$ has an inverse then its inverse is unique.

Proof.

Exercise.

□

Remark:

Since the inverse is unique, the inverse of $f : X \to Y$ is often denoted by $f^{- 1} : Y \to X$ .

Note the difference between

f^{- 1} []

which is about pre-images (also known as inverse images), and

f^{- 1} ()

which is about the inverse function. In particular, when using

f^{- 1} []

(i.e., calculating pre-images), we are not making any claim that

f

is invertible or indeed that

f^{- 1} : Y \to X

is a function that exists. However, when using

f^{- 1} ()

, we are at the same time claiming that

f

is invertible and hence

f^{- 1}

is a function.

It is important to note that we have two conditions to prove $f : X \to Y$ is invertible. As an exercise, one can find examples of $f : X \to Y$ and $g : Y \to X$ such that $f \circ g$ is the identity, but $g \circ f$ is not. However, if we have some constraints on $f$ or $g$ then only one condition needs to be met.

Proposition 6.22:

Suppose that $f : X \to Y$ and $g : Y \to X$ are such that $g \circ f$ is the identity function on $X$ .

If $g$ is injective, then $f \circ g$ is the identity map on $Y$ (and hence $g = f^{- 1}$ ).
If $f$ is surjective, then $f \circ g$ is the identity map on $Y$ (and hence $g = f^{- 1}$ ).

Proof.

Exercise.

□

This next theorem gives another way to test if a function is invertible or not.

Theorem 6.23:

Suppose that $f : X \to Y$ . Then $f$ is invertible if and only if $f$ is bijective.

Proof.

We prove both implications separately.

$\Rightarrow$ ). First, we will show that if $f$ is invertible, then $f$ is bijective. So suppose $f$ is invertible, and let $g : Y \to X$ denote the inverse of $f$ . Then $g \circ f$ is the identity function on $X$ , and $f \circ g$ is the identity function on $Y$ . We first show $f$ is surjective. Take $y_{1} \in Y$ , then $y_{1} = f \circ g (y_{1}) = f (g (y_{1}))$ . So set $x_{1} = g (y_{1})$ . Then $x_{1} \in X$ and $f (x_{1}) = f (g (y_{1})) = y_{1}$ . So for all $y \in Y$ , there exists $x \in X$ such that $f (x) = y$ .

We prove that $f$ is also injective using the contrapositive. Suppose we have $x_{1}, x_{2} \in X$ with $f (x_{1}) = f (x_{2}) = y$ . Then since $g \circ f$ is the identity on $X$ , we have $x_{1} = g (f (x_{1})) = y = g (f (x_{2})) = x_{2}$ . Hence $f$ is injective and surjective, so it is bijective.

$\Leftarrow$ ). Second, we will show that if $f$ is bijective, then $f$ is invertible. So suppose that $f$ is bijective. We set $g = {(y, x) \in Y \times X : (x, y) \in f} \subseteq Y \times X .$ Since $f$ is bijective, by Corollary 6.10 we have that for all $y \in Y$ , there is a unique $x \in X$ such that $f (x) = y$ . Hence, $g$ is a function, that is $g : Y \to X$ .

We will show that $g$ is the inverse of $f$ . For this, we first take $x \in X$ and set $y \in Y$ to be $y = f (x)$ . Then by the definition of $g$ , we have that $g (y) = x$ , so $(g \circ f) (x) = g (y) = x$ . Since $x$ was arbitrary, this shows that $g \circ f$ is the identity function on $X$ . Now, choose any $y \in Y$ . Since $f$ is bijective then for all $y \in Y$ , there exists a unique $x \in X$ such that $f (x) = y$ . Further, since $g$ is a function, we must have that $g (y) = x$ , and hence $(f \circ g) (y) = f (x) = y$ . Since $y$ was arbitrary, this shows that $f \circ g$ is the identity function on $Y$ . Hence if $f$ is bijective, we have that $f$ is invertible.

It follows that

f

is invertible if and only if

f

is bijective.

□

Suppose that $f : X \to Y$ is bijective. In the above proof, we found a way of defining $f^{- 1} : Y \to X$ . That is, $\forall y \in Y$ , we can write $f^{- 1} (y) = x$ where $x \in X$ such that $f (x) = y .$

Example:

Let $a, b, c, d \in R$ be such that $a < b$ and $c < d$ . In the previous example, we defined $f : [a, b] \to [c, d]$ as the composition of three functions and to be $f (x) = c + \frac{(x - a) (d - c)}{b - a}$ . We want to prove $f$ is bijective. One method would be to show all three functions that made up its composition are bijective (and this is relatively easy to do so). But we will instead show $f$ is invertible by constructing the inverse.

By symmetry (replacing

a

with

c

b

with

d

etc), let us define

g : [c, d] \to [a, b]

via

g (x) = a + \frac{(x - c) (b - a)}{d - c}

. We have

\begin{aligned} (g \circ f) (x) & = g (f (x)) \\ = a + (f (x) - c) (\frac{b - a}{d - c}) \\ = a + [c + (x - a) (\frac{d - c}{b - a}) - c] (\frac{b - a}{d - c}) \\ = a + (x - a) \\ = x, \end{aligned}

for all

x \in [a, b] .

Similarly,

\begin{aligned} (f \circ g) (x) & = f (g (x)) \\ = c + (g (x) - a) (\frac{d - c}{b - a}) \\ = c + [a + (x - c) \frac{(b - a)}{(d - c)}] (\frac{d - c}{b - a}) \\ = x, \end{aligned}

for all

x \in [c, d]

. It follows that

g : [c, d] \to [a, b]

is the inverse of

f

. Therefore,

f

is bijective.

Theorem 6.24:

Suppose $f : X \to Y$ and $g : Y \to Z$ are bijective. Then $(g \circ f)^{- 1} = f^{- 1} \circ g^{- 1}$ .

Proof.

Exercise.

□

7 Cardinality

Now that we know about functions, we return to sets, in particular “measuring their size”. One reason sets were introduced was to “tame” and understand infinity. Indeed, as we will see at the end of course, the study of sets leads us to realise that there are different kinds of infinities.

Bijective functions allowed us to construct a “dictionary” from one set to another: every element in one set is in one-to-one correspondence with the elements in another bijective set. This brings the idea that there’s a fundamental property shared by these two sets. This is the motivation of this section, but first some notation.

Notation:

Let $A$ be a set, we denoted by $| A |$ the cardinality of $A$ .

The above notation doesn’t actually define what we mean by the cardinality of a set.

Definition 7.1:

We build our notion of the cardinality of different sets as follows.

We define $| \emptyset | = 0$ .
For $n \in Z_{+}$ we define $| {1, 2, \dots, n} | = n$ .
If $A$ and $B$ are two non-empty sets, we say $| A | = | B |$ if and only if there exists a bijective $f : A \to B$ .

Again, for an individual infinite set $A$ this does not define what we mean by $| A |$ , but it does define what we mean by $| A | = | B |$ for two infinite sets $A$ and $B$ .

Remarks 7.2:

We need to ensure that our last definition makes sense by checking the following:

For any non-empty set $A$ , we expect $| A | = | A |$ . We verify this is true by noting that $f : A \to A$ defined by $f (a) = a$ is bijective.
For any non-empty sets $A$ , $B$ , we expect that if $| A | = | B |$ then $| B | = | A |$ . Let us verify this. Suppose $| A | = | B |$ , then there exists a bijective $f : A \to B$ . Since $f$ is bijective it is invertible, so there exists $f^{- 1} : B \to A$ . Since $f^{- 1}$ is invertible, it is bijective, hence $| B | = | A |$ by definition.
For any non-empty sets $A$ , $B$ , $C$ , we expect that if $| A | = | B |$ and $| B | = | C |$ then $| A | = | C |$ . Let us verify this. Suppose $| A | = | B |$ and $| B | = | C |$ , then there exists bijective $f : A \to B$ and $g : B \to C$ . Consider $(g \circ f) : A \to C$ , which is bijective by Corollary 6.19, hence $| A | = | C |$ as required.

Looking forward, we will revisit the three properties above when we look at equivalence relations.

By combining the last two definitions together, we see that if $f : {1, 2, \dots, n} \to A$ is bijective, then $| A | = n$ and we say $A$ has $n$ elements. Note that in this case we can enumerate the elements of $A$ as $a_{1}, \dots, a_{n}$ (where $a_{i} = f (i)$ ).

Definition 7.3:

We have two cases

When $| A | \in Z_{\geq 0}$ then we say $A$ is a finite set.
If $A$ is not a finite set, we say $A$ is an infinite set.

We take as a fact that $Z_{+}$ is infinite (which makes sense since by the Archimedean Property, $Z_{+}$ is not bounded above).

Etymology:

Finite comes from the Latin finis meaning “end”. A finite set is one that ends. While infinite has the prefix in- meaning “not”, so an infinite set is one that has no end.

Cardinality comes from the Latin cardin meaning “hinges” (which are used to pivot doors). This was used to talk about the “Cardinal virtues”, i.e., the pivotal virtues, the most important one. From there, the word developed to mean “principal”. This was used by mathematician to name the set of whole numbers

{0, 1, 2, 3, \dots}

to be the Cardinals, as they are the principal numbers (compared to fractions, reals etc.). From there, the cardinality of a set referred to the whole number/the cardinal associated to the set.

Lemma 7.4:

Suppose $f : X \to Y$ is injective and $A \subseteq X$ . Then $| A | = | f [A] |$ .

Proof.

Let $B = f [A]$ . Restrict $f$ to $A$ by defining $g : A \to B$ via $g (a) = f (a)$ , for all $a \in A$ . By the definition of $B$ , we have that $B = f [A] = g [A]$ , so $g$ is surjective. Now, suppose that $a_{1}, a_{2} \in A$ are such that $g (a_{1}) = g (a_{2})$ . Then $f (a_{1}) = f (a_{2})$ , and since $f$ is injective, this means that $a_{1} = a_{2}$ . Hence, $g$ is injective. It follows that $g$ is bijective, so $| A | = | g [A] |$ . We also know that $g [A] = B = f [A]$ , so $| A | = | g [A] | = | f [A] |$ .

□

Theorem 7.5:

Let $A$ , $B$ be two finite sets with $| A | = n$ and $| B | = m$ . If there exists an injective $f : A \to B$ then $n \leq m$ .

Proof.

Let $C = f [A]$ , then by the previous lemma we have $| C | = | A | = n$ and we know $C \subseteq B$ . Hence $B$ has at least $n$ elements, so $m = | B | \geq n$ .

□

Remark:

The contrapositive of the theorem is “If $| A | > | B |$ then for any $f : A \to B$ there exist distinct $a_{1}, a_{2} \in A$ such that $f (a_{1}) = f (a_{2})$ ”. Theorem 7.5 is often known as the Pigeonhole Principle as it can be understood as “if you have more pigeons than pigeonholes, then two pigeons will need to share a pigeonhole”.

We extend the above principle to all (finite or infinite) sets.

Definition 7.6:

For any two sets

A

and

B

, if there exists injective

f : A \to B

then we say

| A | \leq | B |

. We say

| A | < | B |

| A | \leq | B |

and

| A | \neq | B |

Note that in particular, for any two sets $A$ and $B$ such that $B \subseteq A$ , since we can define an injective function $f : B \to A$ via $f (b) = b$ for all $b \in B$ , we have $| B | \leq | A |$ .

Lemma 7.7:

Let $A$ and $B$ be two sets such that $B \subseteq A$ . Then

If $A$ is finite then $B$ is finite.
If $B$ is infinite then $A$ is infinite.

Proof.

Assume $| A | = n$ . We have $| B | \leq | A | = n$ , i.e. $| B | = m$ for some $0 \leq m \leq n$ . By definition, $B$ is finite.

Note that part b.) is the contrapositive of part a.)

□

Theorem 7.8:

If $X, Y$ are sets with $| X | \leq | Y |$ and $| Y | \leq | X |$ then $| X | = | Y |$ .

In particular, if there exists injective

f : X \to Y

and injective

g : Y \to X

then there exists bijective

h : X \to Y

. (Note: we are not claiming

f

g

are bijective).

Proof.

Non-examinable. It is beyond the scope of this course.

□

Remark:

While we have the tools to prove the theorem in the case when

X

Y

is finite, the proof for when both

X

and

Y

are infinite is beyond the scope of this course (but see Set Theory in third year). There is an interesting proof of the theorem in the book Algebra by Pierre Grillet, which is available as an electronic book from the University of Bristol library (and in which the theorem is called the Cantor-Bernstein Theorem).

History:

Theorem 7.8 has many names: Cantor-Schröder-Bernstein Theorem; Schröder–Bernstein theorem or (as above) the Cantor-Bernstein theorem. Cantor himself suggested that it should be called the equivalence theorem.

The reasons for so many names is that Georg Cantor (Russian mathematician, 1845-1912) first stated the theorem in 1887, but did not prove it. In 1897, Sergei Bernstein (Ukranian mathematician, 1880-1968) and Ernst Schröder (German mathematician, 1841-1902) independently published different proofs. However, five years later, in 1902, Alwin Korselt (German mathematician, 1864-1947) found a flaw in Schröder’s proof (which was confirmed by Schröder just before he died).

Proposition 7.9:

Let $X$ be a set and let $A, B \subseteq X$ be disjoint ( $A \cap B = \emptyset$ ) finite sets. Then $| A \cup B | = | A | + | B |$ .

Proof.

Let $m, n \in Z_{+}$ such that $| A | = m$ and $| B | = n$ . Then $A = {a_{1}, a_{2}, \dots, a_{m}}$ where $a_{i} = a_{j}$ only if $i = j$ , for integers $1 \leq i, j \leq m$ . Similarly, we have $B = {b_{1}, b_{2}, \dots, b_{n}}$ where $b_{i} = b_{j}$ only if $i = j$ , for integers $1 \leq i, j \leq n$ . Since $A \cap B = \emptyset$ , we know that for any integers $i, j$ with $1 \leq i \leq m$ and $1 \leq j \leq n$ , we have $a_{i} \neq b_{j}$ . Now, define $f : {1, 2, \dots, m + n} \to A \cup B$ by $f (k) = {\begin{cases} a_{k} & if 1 \leq k \leq m, \\ b_{k - m} & if m < k \leq m + n . \end{cases}$ We leave it as an exercise that $f$ is bijective.

□

Corollary 7.10:

Let $X$ be a set and let $A, B \subseteq X$ be finite sets. Then $| A \cup B | = | A | + | B | - | A \cap B |$

Proof.

Exercise.

□

We will return to the idea of cardinality in Chapter 11, where we see there are different infinite cardinalities.

8 Sets with structure - Groups

Having looked at sets and functions between sets, we now turn our attention to “groups”. Formally these are sets with extra structure given by a binary operation (e.g. $R$ with $+$ ), but they arose historically (and crop up across maths today) as a way to understand symmetry.

8.1 Motivational examples - Symmetries

Before even giving the definition of a group, we’ll look at several examples of symmetries of objects, so that the formal definition will make more sense when we come to it.

Exactly what we mean by a “symmetry” will vary from example to example, and sometimes there’s more than one sensible notion for the same object, so rather than giving a general definition, we’ll clarify what we mean in each example, and the common features should become clear.

8.1.1 Permutations of a set

Let us consider the set of three elements. Imagine them arranged in a line:

Figure 8.1: three points on a line

We can rearrange them, putting them in a different order. For example, we could swap the two elements furthest to the right:

Figure 8.2: the three points above but with the second and third swapped

This leaves us with exactly the same picture, so we’ll regard this as a “symmetry” in this context.

Now let’s label the elements and put both pictures next to each other to see how the elements have been permuted.

Figure 8.3: showing how we permuted the elements

This picture (Figure 8.3) brings to mind the function $f : {1, 2, 3} \to {1, 2, 3}$ defined by ${(1, 1), (2, 3), (3, 2)}$ (i.e. $f (1) = 1, f (2) = 3$ and $f (3) = 2$ ).

Or if we cyclically permuted the elements as follows:

Figure 8.4: a cyclic permutation

this (Figure 8.4) would correspond to the function $f : {1, 2, 3} \to {1, 2, 3}$ defined by ${(1, 2), (2, 3), (3, 1)}$ (i.e. $f (1) = 2, f (2) = 3$ and $f (3) = 1$ ).

Remark:

We could alternatively think of the function where

f (x)

tells us which element has moved to position

x

, in which case, in the last example we’d have

f (1) = 3

f (2) = 1

and

f (3) = 2

. There’s nothing wrong with doing this, but note that it gives a different function, and to avoid confusion we’ll stick to the first convention.

If this is to count as a ``symmetry’’ of the original position, then we don’t want to move two elements to the same place, or to leave a place unoccupied. For example:

Figure 8.5: not a permutation

gives a different pattern. Therefore we need the function $f$ to be bijective.

Definition 8.1:

Let $X$ be a set. A permutation of the set $X$ is a bijection $f : X \to X$ .

Let’s look in more detail at the permutations of a set $X = {1, 2, 3}$ with three elements. It’s easy to check that there are just six of these, which we’ll denote by $e, f, g, h, i, j$ : see Figure 8.6, where we just show the position the three elements end up.

Figure 8.6: the six permutations of three elements

Notice that the function $e$ is the identity function, and likewise it is called the identity permutation.

Now we can look at what happens when we perform one permutation followed by another. For example, if we do $f$ (swap the two elements furthest to the right) and then $g$ (swap the two elements that are now furthest to the left), then we are just taking the composition $g \circ f$ of the two functions. This gives $g (f (1)) = g (1) = 2$ , $g (f (2)) = g (3) = 3$ and $g (f (3)) = g (2) = 1$ , so $g \circ f = i$ .

For simplicity, we’ll leave out the composition symbol $\circ$ and write $g f$ instead of $g \circ f$ . Let’s see what $f g$ is: $f (g (1)) = f (2) = 3$ , $f (g (2)) = f (1) = 1$ and $f (g (3)) = f (3) = 2$ , so $f g = h$ .

Remark:

When composing permutations or other symmetries, the order does matter in general.

g f

will always mean “do

f

first and then

g

” as in the notation for functions (

g (f (x))

is what we get when we apply the function

f

and then the function

g

), rather than reading from left to right.

When we come to define groups, we’ll think of “composition” as an operation to combine permutations or other kinds of symmetry (if $f$ and $g$ are symmetries of an object, then $g f$ is the symmetry “do $f$ and then $g$ ”), much as ``multiplication’’ is an operation to combine numbers. In fact we’re using pretty much the same notation: compare $g f$ for permutations with $x y$ for numbers. Let’s explore this analogy further with permutations of a set $X$ .

Note that since the composition of two bijections is a bijections, we have that the composition of two permutations is a permutation. This is similar to axioms (A1) and (A7).
We have seen that composition of functions is associative (see Proposition 6.17 ), hence if $a, b, c$ are permutation, we have $a (b c) = (a b) c$ . This is similar to axioms (A2) and (A8).
We always have an identity permutation, $e : X \to X$ defined by $e (x) = x$ for all $x \in X$ . Let $f$ be any other permutation of $X$ , then for all $x \in X$ we have $e f (x) = e (f (x)) = f (x) = f (e (x)) = f e (x),$ i.e. $e f = f = f e$ . This is similar to axioms (A4) and (A10).
We know that a bijective function is invertible, so for all permutation $f : X \to X$ , there exists $f^{- 1}$ such that $f f^{- 1} = e = f^{- 1} f$ . This is similar to axioms (A5) and (A11).

[You can check in the permutations of ${1, 2, 3}$ that $e^{- 1} = e; f^{- 1} = f; g^{- 1} = g; h^{- 1} = i; i^{- 1} = h$ and $j^{- 1} = j$ ]

Note that we can’t have anything similar to axioms (A3) or (A9) as we have seen an example of two permutations $f, g$ of ${1, 2, 3}$ such that $f g \neq f g$ .

8.1.2 Symmetries of polygons

Consider a regular $n$ -sided polygon (for example, if $n = 3$ an equilateral triangle, or if $n = 4$ a square). Examples of symmetries are:

A rotation through an angle of $2 π / n$ (an $n$ th of a revolution).
A reflection in a line that goes through the centre of the polygon and one of its vertices.

There are many ways to make precise what we mean by a symmetry. For example, considering the polygon as a subset $X$ of $R^{2}$ , we could look at bijections $f : X \to X$ that preserve distance: i.e., such that the distance between $f (x)$ and $f (y)$ is the same as the distance between $x$ and $y$ . It’s not hard to see that this implies that $f$ must send vertices to vertices, and moreover must send adjacent vertices to adjacent vertices. To keep things simple, we’ll use this as our definition of a symmetry:

Definition 8.2:

A symmetry of a regular

n

-sided polygon is a permutation

f

of the set of

n

vertices that preserves adjacency: i.e., so that for vertices

u

and

v

u

and

v

are adjacent if and only if

f (u)

and

f (v)

are adjacent.

Note that for $n = 3$ , every pair of vertices is adjacent, so in that case every permutation of the vertices is a symmetry, and so we are just looking at permutations of a set of three elements as in the last section.

In the case

n = 4

, we have a square. Let’s label the vertices as follows:

Figure 8.7: the four vertices labelled

Then if we rotate it anticlockwise through an angle of

π / 2

(let’s call this symmetry

f

) we get (Figure 8.8):

Figure 8.8: rotating the square

While if we reflect in the line of symmetry going from top right to bottom left (let’s call this symmetry $g$ ) we get (Figure 8.9):

Figure 8.9: reflecting the square

However the following does not represent a symmetry as vertices $1$ and $2$ have not remained adjacent (Figure 8.10):

Figure 8.10: not a symmetry

As with permutations, we can compose symmetries together and get a new symmetry. In fact, the analogues of (A1)/(A7), (A2)/(A8), (A4)/(A10) and (A5)/(A11) all hold (Prove that the composition of two bijections that preserves adjacency is a bijection that preserves adjacency, and that the inverse of a function that preserve adjacency is a function that preserves adjacency.)

Let’s look at what happens when we compose the symmetries

f

and

g

(Figure 8.11).

Figure 8.11: composing the two previous symmetries

We see that $g f$ is a reflection in the horizontal line of symmetry while $f g$ is a reflection in the vertical line of symmetry. So here again we have $g f \neq f g$ .

8.1.3 Symmetries of a circle

We can look at bijections from a circle to itself that preserve distances between points. It’s not too hard to see that these symmetries are either rotations (through an arbitrary angle) or reflections (in an arbitrary line through the centre).

Again, if $f$ and $g$ are symmetries, then usually $f g \neq g f$ .

8.1.4 Symmetries of a cube

We can look at the eight vertices of a cube, and define a symmetry to be a permutation of the set of vertices that preserves adjacency (as we did with a polygon). There are $48$ symmetries: most are either rotations or reflections, but there is also the symmetry that takes each vertex to the opposite vertex.

8.1.5 Rubik’s Cube

A Rubik’s cube has $54$ coloured stickers ( $9$ on each face), and we could define a “symmetry” to be a permutation of the stickers that we can achieve by repeatedly turning faces (as one does with a Rubik’s cube). It turns out that there are $43, 252, 003, 274, 489, 856, 000$ symmetries. Again we can compose symmetries (do one sequence of moves and then another), and every symmetry has an inverse symmetry.

8.2 Formal definition

We want to formalize the structure of symmetries of the kind we looked at in the last section. First of all noticed that in all cases we looked at combining two permutations/symmetries to get a third. We can formalize this as follows.

Definition 8.3:

A binary operation on a set

G

is a function

⋆ : G \times G \to G .

Etymology:

The word “binary” refers to the fact that the function takes two inputs $x$ and $y$ from $G$ . (Technically speaking, it takes one input from $G \times G$ .)

Notation:

We’ll usually write $x ⋆ y$ instead of $⋆ (x, y)$ .

Example:

Let $X$ be a non-empty set. Composition $\circ$ is a binary operation on the set $G$ of permutations of a set $X$ .
Addition $+$ is a binary operation on the set $R$ (or on $Z$ , or on $Q$ ).
Multiplication and subtraction are also binary operations on $R$ , $Q$ or $Z$ .
Intersection and unions are binary operations on the set of subsets of $R$ .

Note that in the definition of a binary operation, the function $⋆$ maps to $G$ , so if we have a definition of $x ⋆ y$ so that $x ⋆ y$ is not always in $G$ , then this is not a binary operation on $G$ (we say that $G$ is not closed under $⋆$ ). Also, the domain of $⋆$ is $G \times G$ , so $x ⋆ y$ needs to be defined for all pairs of elements $x, y$ .

Example:

Subtraction is not a binary operation on $Z_{+}$ since, for example, $4 - 7 = - 3 \notin Z_{+}$ . So $Z_{+}$ is not closed under subtraction (it is closed under addition).
Division is not a binary operation on $Z$ since, for example, $1 / 2 \notin Z$ .
Division is not a binary operation on $Q$ since $x / y$ is not defined when $y = 0$ . (But division is a binary operation on the set $Q ∖ {0}$ of non-zero rational numbers).

For a general binary operation, the order of the elements $x, y$ matters: $x ⋆ y$ is not necessarily equal to $y ⋆ x$ .

Definition 8.4:

A binary operation

⋆

on a set

G

is called commutative if

x ⋆ y = y ⋆ x

for all

x, y \in G

Example:

Some examples and non-example of commutative binary operations

Addition and multiplication are commutative binary operations on $R$ (axioms (A3) and (A9) ).
Subtraction is not commutative on $R$ since, for example, $2 - 1 = 1$ but $1 - 2 = - 1$ .
Composition is not a commutative operation on the set of permutations of the set ${1, 2, 3}$ .

Bearing in mind the analogies we drew between some of the axioms of $R$ and the compositions of permutations/symmetries, we’ll now define a group.

Definition 8.5:

A group $(G, ⋆)$ is a set $G$ together with a binary operation $⋆ : G \times G \to G$ satisfying the following properties (or “axioms”):

(Associativity) For all $x, y, z \in G$ , $(x ⋆ y) ⋆ z = x ⋆ (y ⋆ z) .$
(Existence of an identity element) There is an element $e \in G$ (called the identity element of the group) such that, for every $x \in G$ , $x ⋆ e = x = e ⋆ x .$
(Existence of inverses) For every $x \in G$ , there is an element $x^{- 1} \in G$ (called the inverse of $x$ ) such that $x ⋆ x^{- 1} = e = x^{- 1} ⋆ x .$

Strictly speaking, the group consists of both the set $G$ and the operation $⋆$ , but we’ll often talk about “the group $G$ ” if it’s clear what operation we mean, or say “ $G$ is a group under the operation $⋆$ ”. But the same set $G$ can have several different group operations, so we need to be careful.

Example:

If $X$ is a set, $S (X)$ is the set of all permutations of $X$ (i.e., bijective functions $X \to X$ ), and $f \circ g$ is the composition of bijections $f$ and $g$ , then $(S (X), \circ)$ is a group.

Note that in this example, there are two sets involved ( $X$ and the set $S (X)$ of permutations). It is the set $S (X)$ that is the group, not $X$ (we haven’t defined a binary operation on $X$ ).

The set of all symmetries of a regular $n$ -sided polygon is a group under composition, as is the set of all symmetries of a cube, or a Rubik’s cube.

These examples, and similar ones, of symmetry groups, are the motivation for the definition of a group, but there are some other familiar examples of sets of numbers with arithmetical operations that fit the definition.

Example:

Here we explore some sets and associated operations and check if they are a group or not.

We have $(R, +)$ is a group. [by axioms (A2),(A4),(A5)]. Similarly $(Z, +)$ and $(Q, +)$ are groups.
The set $Z_{+}$ is not a group under addition, since it doesn’t have an identity element [ $0 \notin Z_{+}$ ]. Note $Z_{\geq 0}$ is still not a group (under addition), as although it has the identity element $0$ , no integer $n > 0$ has an inverse (since for any $n \in Z_{+}$ , $- n \notin Z_{\geq 0}$ ).
We have $(R ∖ {0}, \cdot)$ is a group [by axioms (A8),(A10) and (A11)]. Similarly $(Q ∖ {0}, \cdot)$ is a group.
We have $(R, \cdot)$ is not a group, since $0$ does not have an inverse.
We have $R$ is not a group under subtraction, since associativity fails: $(x - y) - z = x - y - z$ , but $x - (y - z) = x - y + z$ , and so these are different whenever $z \neq 0$ .

Matrices are another source of examples.

Example:

Let $M_{n} (R)$ be the set of $n \times n$ matrices with real entries. Then $M_{n} (R)$ is a group under (matrix) addition.

$M_{n} (R)$ is not a group under matrix multiplication, since not every matrix has an inverse. However, the set of invertible $n \times n$ matrices is a group under multiplication.

We’ve stressed that $x ⋆ y$ and $y ⋆ x$ are typically different in symmetry groups. But in the examples coming from addition and multiplication of integers, they are the same.

Definition 8.6:

A group

(G, ⋆)

is called abelian if

x ⋆ y = y ⋆ x

for all

x, y \in G

History:

The word “abelian” was introduced in the 1880s by German mathematician Heinrich Weber (1842 - 1913). He derived the name from the Norwegian mathematician Niels Henrik Abel (1802 - 1829) who studied the permutation of solutions of polynomials. Very loosely speaking he showed in 1824 that for a given polynomial if the permutation of the roots form an Abelian group then the polynomial can be solved in radicals (i.e. involving $+, -, \cdot, \div$ and $n$ -th roots only). You can learn more about this in the fourth year course Galois Theory (named after French mathematician Évariste Galois, 1811 - 1832).

Even in a non-abelian group, $x ⋆ y = y ⋆ x$ may hold for some elements $x$ and $y$ , in which case we say that $x$ and $y$ commute. For example, the identity commutes with every other element. However for the group to be abelian it must hold for all elements.

Example:

The group $S (X)$ of permutations of a set $X$ is non-abelian if $X$ has at least three elements, since if $x, y, z \in X$ are three distinct elements, $f$ is the permutation that swaps $x$ and $y$ (and fixing all other elements), $g$ is the permutation that swaps $y$ and $z$ , then $f \circ g \neq g \circ f$ .
More generally, symmetry groups are typically non-abelian (although sometimes they are). In particular, the symmetry group of a regular $n$ -sided polygon (where $n \geq 3$ ) is non-abelian.
$(R, +)$ is abelian by axiom (A3).
$(R ∖ {0}, \cdot)$ is abelian by axiom (A9).
$M_{n} (R)$ is an abelian group under matrix addition.
If $n \geq 2$ , then the set of invertible $n \times n$ real matrices is a non-abelian group under matrix multiplication, since for example $(\begin{matrix} 1 & 1 \\ 0 & 1 \end{matrix}) (\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}) = (\begin{matrix} 1 & 1 \\ 1 & 0 \end{matrix})$ but $(\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}) (\begin{matrix} 1 & 1 \\ 0 & 1 \end{matrix}) = (\begin{matrix} 0 & 1 \\ 1 & 1 \end{matrix}) .$

Notation:

Often, especially when we’re dealing with abstract properties of general groups, we’ll simplify the notation by writing $x y$ instead of $x ⋆ y$ , as though we’re multiplying. In this case we’ll say, for example, “Let $G$ be a multiplicatively-written group”.

Note that this is purely a matter of the notation we use for the group operation: any group can be “multiplicatively-written” if we choose to use that notation, so if we prove anything about a general multiplicatively-written group, it will apply to all groups, no matter what the group operation is.

Of course, if we’re looking at something like the group of real numbers under addition, it would be incredibly confusing to write this multiplicatively, so in cases like that, where multiplication already has some other meaning, or where there’s already another standard notation for the group operation, we’ll tend not to use multiplicative notation.

Notice that in a multiplicatively-written group, the associativity axiom says $(x y) z = x (y z),$ and from this it follows easily (by induction, for example) that any product such as $w x y z$ has a unique meaning: we could bracket it as $(w x) (y z)$ or $(w (x y)) z$ or $w ((x y) z)$ or any of the other possible ways, and because of associativity they would all give the same element.

Remark:

Because examples such as

(R, +)

are abelian, which is not typical for a group, it can be misleading to try to get your intuition for how groups work from examples like this. A much better simple example to use is the group of permutations of

{1, 2, 3}

or, if you like thinking more geometrically, the group of symmetries of an equilateral triangle (actually, since all vertices of a triangle are adjacent, this is really just the same as the group of permutations of the set of vertices), or of a square.

8.3 Elementary consequences of the definition

Throughout this section, $G$ will be a multiplicatively written group. Remember that being “multiplicatively written” is purely a matter of notation, so everything will apply to any group.

A familiar idea from elementary algebra is that if you have some equation (say involving $a, b, x \in R$ ) such as $a x = b x$ then, as long as $x \neq 0$ , you can “cancel” the $x$ to deduce that $a = b$ . Formally, we multiply both sides by $x^{- 1}$ (which exists by axiom (A11)).

A similar principle applies in a group, because of the fact that elements all have inverses, with one complication caused by the fact that the group operation may not be commutative.

Since multiplying on the left and on the right are in general different, so we need to decide which is appropriate.

Proposition 8.7: (Right cancellation)

Let $a, b, x$ be elements of a multiplicatively written group $G$ . If $a x = b x$ , then $a = b$ .

Proof.

Multiply on the right by $x^{- 1}$ : $\begin{aligned} a x = b x & \Rightarrow (a x) x^{- 1} = (b x) x^{- 1} \\ \Rightarrow a (x x^{- 1}) = b (x x^{- 1}) \\ \Rightarrow a e = b e \\ \Rightarrow a = b . \end{aligned}$

□

Proposition 8.8: (Left cancellation)

Let $a, b, x$ be elements of a multiplicatively written group $G$ . If $x a = x b$ , then $a = b$ .

Proof.

Multiply on the left by $x^{- 1}$ : $\begin{aligned} x a = x b & \Rightarrow x^{- 1} (x a) = x^{- 1} (x b) \\ \Rightarrow (x x^{- 1}) a = (x x^{- 1}) b \\ \Rightarrow e a = e b \\ \Rightarrow a = b . \end{aligned}$

□

Remark:

Warning: If

a x = x b

, then in a non-abelian group it is not necessarily true that

a = b

, since to “cancel”

x

from both sides of the equation we need to multiply the left hand side by

x^{- 1}

on the right, but multiply the right hand side by

x^{- 1}

on the left, and these are different operations.

This simple principle has some nice consequences that make studying groups easier. One is that the defining property of identity element $e$ is enough to identify it: no other element has the same property.

Proposition 8.9: (Uniqueness of the identity)

Let $a, x$ be elements of a multiplicatively written group. If $a x = a$ then $x = e$ . Similarly, if $x a = a$ then $x = e$ .

Proof.

If $a x = a$ then $a x = a e$ . By “left cancellation”, we can cancel $a$ to deduce $x = e$ . Similarly, if $x a = a$ then $x a = e a$ , so by “right cancellation” we can cancel $a$ to deduce $x = e$ .

□

A similar proof shows that an element of a group can only have one inverse.

Proposition 8.10: (Uniqueness of inverses)

Let $x, y$ be elements of a multiplicatively written group. If $x y = e$ then $x = y^{- 1}$ and $y = x^{- 1}$ .

Proof.

If $x y = e$ then $x y = x x^{- 1}$ , and so, by left cancellation $y = x^{- 1}$ . Similarly, if $x y = e$ then $x y = y^{- 1} y$ , and so by right cancellation $x = y^{- 1}$ .

□

This means that, to prove that one element $x$ of a group is the inverse of another element $y$ , we just need to check that their product (either way round: $x y$ or $y x$ ) is equal to the identity. Here are some examples of useful facts that we can prove like this:

Proposition 8.11:

Let $x$ be an element of a multiplicatively written group. Then the inverse of $x^{- 1}$ is $x$ : $(x^{- 1})^{- 1} = x$ .

Proof.

By uniqueness of inverses, we just need to check that $x x^{- 1} = e$ , which is true.

□

Proposition 8.12:

Let $x, y$ be elements of a multiplicatively written group. Then the inverse of $x y$ is $(x y)^{- 1} = y^{- 1} x^{- 1}$ .

Proof.

By uniqueness of inverses, we just need to check that $(x y) (y^{- 1} x^{- 1}) = e$ . But $(x y) (y^{- 1} x^{- 1}) = ((x y) y^{- 1}) x^{- 1} = (x (y y^{- 1})) x^{- 1} = (x e) x^{- 1} = x x^{- 1} = e .$

□

Make sure you understand how each step of the previous proof follows from the definition of a group, and in particular how we have used the associative property. The notes will be less explicit about this in future proofs, leaving out brackets.

Remark:

Warning: Note that in a non-abelian group it is not in general true that

(x y)^{- 1} = x^{- 1} y^{- 1}

, since

x^{- 1} y^{- 1} \neq y^{- 1} x^{- 1}

in general.

Remark:

Compare the above proposition (and the remark) with Theorem 6.24 about the inverse of the composition of two bijective functions, and with the similar fact for inverting the product of two matrices.

Notation:

Next, some notation. If $x \in G$ , we write write $x^{2}$ for the product $x x$ of $x$ with itself, $x^{3}$ for the product $x (x^{2})$ (which is the same as $(x^{2}) x$ by associativity), and so on.

Note that for $n = - 1$ we also have a meaning for $x^{n}$ , since $x^{- 1}$ is notation we use for the inverse of $x$ .

We extend this even further by defining $x^{- n}$ to be $(x^{n})^{- 1}$ if $n > 0$ (which gives us a meaning for $x^{n}$ for any non-zero integer $n$ , positive or negative) and defining $x^{0}$ to be the identity element $e$ (so that we now have a meaning for $x^{n}$ for every $n \in Z$ .

We call

x^{n}

the $n$ th power of

x

Remark:

G = (R ∖ {0}, \cdot)

, then this meaning of

x^{n}

is the same as the meaning we’re used to.

To justify why this is a sensible notation, let us see what happens when we multiply powers. First:

Lemma 8.13:

If $n > 0$ then $x^{- n} = (x^{- 1})^{n}$ .

Proof.

By definition, $x^{- n} = (x^{n})^{- 1}$ . To prove this is the same as $(x^{- 1})^{n}$ we just have to show $(x^{- 1})^{n} x^{n} = e$ , by uniqueness of inverses. But $(x^{- 1})^{n} x^{n} = x^{- 1} \dots x^{- 1} x^{- 1} x x \dots x = x^{- 1} \dots x^{- 1} e x \dots x = x^{- 1} \dots x^{- 1} x \dots x = \dots = e,$ cancelling each $x^{- 1}$ with an $x$ .

□

Proposition 8.14:

If $x$ is an element of a multiplicatively written group $G$ , and $m$ and $n$ are integers, then $(x^{m}) (x^{n}) = x^{m + n} .$

Proof.

Let us fix $n \in Z$ and prove it is true for all $m \in Z$ .

We first prove this by induction for when $m \geq 0$ . It is true when $m = 0$ since $(x^{0}) (x^{n}) = e (x^{n}) = x^{n} .$ Suppose it is true for $m = k - 1$ [that is $(x^{k - 1}) (x^{n}) = x^{k - 1 + n}$ ]. We show it is true for $m = k$ . We have $(x^{k}) (x^{n}) = (x (x^{k - 1})) (x^{n}) = x ((x^{k - 1} (x^{n})) = x (x^{k + n - 1}) = x^{k + n} .$ So by induction it is true for all $m \geq 0$ .

m < 0

, and

y = x^{- 1}

, then by the lemma

(x^{m}) (x^{n}) = (y^{- m}) (y^{- n})

which is equal to

y^{- (m + n)} = x^{m + n}

by applying what we’ve already proved with

y

in place of

x

□

We’ve already proved the formula $(x y)^{- 1} = y^{- 1} x^{- 1}$ . What about $(x y)^{n}$ for other values of $n$ ? In a non-abelian group, there is no simple formula. In particular:

Remark:

Warning: If

x, y

are elements of a non-abelian group, then in general

(x y)^{n} \neq x^{n} y^{n}

. The point is that (for

n > 0

, say)

(x y)^{n} = x y x y \dots x y

and unless the group is abelian we can not rearrange the terms to get

x x \dots x y y \dots y = x^{n} y^{n} .

8.4 Dihedral Groups

As we study group theory, it will be useful to have a supply of examples of groups to think about. Of course, no single example will be ideal for everything, but some are more helpful than others. Some of the more “arithmetic” groups, such as $(Z, +)$ and $(R ∖ {0}, \times)$ , have the advantage of being very familiar, but the disadvantage of being rather untypical because they’re abelian. If you have a “standard example” that you like to think about, then it will be much less misleading if it’s non-abelian.

A good first family of examples are the “dihedral groups”, which are the symmetry groups of regular polygons, and are non-abelian, but still fairly uncomplicated. In the next section we will look at a second important family of examples, the “symmetric groups”.

Let $X$ be a regular $n$ -sided polygon, with vertices labelled anticlockwise $1, 2, \dots, n$ . Recall that by a symmetry of $X$ we mean a permutation of the vertices that takes adjacent vertices to adjacent vertices.

For example, we can send vertex $1$ to any other vertex by an appropriate rotation. If $f$ is a symmetry sending vertex $1$ to vertex $i$ , then, since it preserves adjacency, it must send vertex $2$ to one of the two vertices adjacent to vertex $i$ , and once we know $f (1)$ and $f (2)$ , then $f (3)$ is determined as the other vertex adjacent to $f (2)$ , and so on around the polygon for $f (4), \dots, f (n)$ .

So the total number of symmetries is $2 n$ (since there are $n$ choices for $f (1)$ and for each of these there are two choices for $f (2)$ ). This explains the following choice of notation.

Definition 8.15:

The dihedral group

D_{2 n}

of order

2 n

is the group of symmetries of a regular

n

-sided polygon.

Remark:

Some books use the symbol

D_{n}

where we use

D_{2 n}

(i.e., they label the group with the size of the polygon rather than the size of the group), although at least the

D

is fairly standard.

Let’s fix some notation for two particular symmetries.

Notation:

We set $a \in D_{2 n}$ to be a rotation anticlockwise through an angle of $2 π / n$ . We set $b \in D_{2 n}$ to be a reflection in the line through vertex $1$ and the centre of the polygon.

So $a (1) = 2, a (2) = 3, \dots, a (n - 1) = n, a (n) = 1,$ and $\dots, b (n - 1) = 3, b (n) = 2, b (1) = 1, b (2) = n, b (3) = n - 1, \dots$

Figure 8.12: The symmetries a and b on $D_{8}$

$The symmetries a and b on $D_{10}$$

Figure 8.13: The symmetries a and b on $D_{10}$

Now consider the symmetries $a^{i}$ and $a^{i} b$ for $0 \leq i < n$ . These both send vertex $1$ to vertex $1 + i$ , but $a^{i}$ sends vertices $2, 3, \dots, n$ to the vertices following anticlockwise around the polygon, whereas $a^{i} b$ sends them to the vertices following clockwise around the polygon. So all of these symmetries are different, and every symmetry is of this form, and so every element of $D_{2 n}$ can be written in terms of $a$ and $b$ .

Proposition 8.16:

We have $D_{2 n} = {e, a, a^{2}, \dots, a^{n - 1}, b, a b, a^{2} b, \dots, a^{n - 1} b}$ .

Let $G$ be a group with a finite number of elements: its multiplication table (often also called the Cayley table) has one row and one column for each element of the group, and the entry in row $x$ and column $y$ is the product $x y$ . So the table displays the group operation.

Example:

The Cayley table for $D_{6} = {e, a, a^{2}, b, a b, a^{2} b}$ is as follows:

	$e$	$a$	$a^{2}$	$b$	$a b$	$a^{2} b$
$e$	$e$	$a$	$a^{2}$	$b$	$a b$	$a^{2} b$
$a$	$a$	$a^{2}$	$a^{3}$	$a b$	$a^{2} b$	$a^{3} b$
$a^{2}$	$a^{2}$	$a^{3}$	$a^{4}$	$a^{2} b$	$a^{3} b$	$a^{4} b$
$b$	$b$	$b a$	$b a^{2}$	$b^{2}$	$b a b$	$b a^{2} b$
$a b$	$a b$	$a b a$	$a b a^{2}$	$a b^{2}$	$a b a b$	$a b a^{2} b$
$a^{2} b$	$a^{2} b$	$a^{2} b a$	$a^{2} b a^{2}$	$a^{2} b^{2}$	$a^{2} b a b$	$a^{2} b a^{2} b$

History:

Arthur Cayley was an English mathematician (1821 - 1895) who gave the first abstract definition of a finite group in 1854. It is around then that he showed that a group is determined by its multiplication table (i.e., its Cayley table), and gave several examples of groups that were not symmetric groups. However, at the time the mathematical community did not pursue this further, preferring to stick to studying the concrete symmetric groups. It wasn’t until 1878 when he re-introduced his definition that other mathematicians refined it and an agreed formal definition of a group emerged.

As we can see in the Cayley graph above, we have expressions such as $b a$ or $a^{- 1}$ appearing, but they are not in our list of standard form for elements. So, we must be able to work out how to write elements such as $b a$ or $a^{- 1}$ into standard form.

There are three basic rules that let us easily do calculations.

Proposition 8.17:

We have $a^{n} = e$ .

This is clearly true, as it just says that if we repeat a rotation through an angle of $2 π / n$ $n$ times, then this is the same as the identity.

Note that this means that $a (a^{n - 1}) = e$ and so by uniqueness of inverses $a^{- 1} = a^{n - 1},$ which allows us to write any power (positive or negative) of $a$ as one of $e, a, \dots, a^{n - 1}$ .

Proposition 8.18:

We have $b^{2} = e$ .

This is clearly true, as repeating a reflection twice gives the identity.

This implies, by uniqueness of inverses again, that $b^{- 1} = b$ , and so any power of $b$ is one of $e$ or $b$ .

What about $b a$ ? Well $a (1) = 2$ and $b (2) = n$ , so $b a (1) = n$ , similarly $b a (2) = n - 1$ and so the other vertices follow clockwise. This is the same as $a^{n - 1} b$ , or $a^{- 1} b$ .

Proposition 8.19:

We have $b a = a^{n - 1} b = a^{- 1} b$ .

We’ll see that these three rules allow us to simplify any expression.

For example, $b a^{- 1} = b a^{n - 1} = b a a^{n - 2} = a^{- 1} b a^{n - 2} = a^{- 1} b a a^{n - 3} = a^{- 1} a^{- 1} b a^{n - 3} = \dots = a^{- n + 1} b = a b,$ where we repeatedly “swap” a $b$ with an $a$ or $a^{- 1}$ , changing the $a$ into $a^{- 1}$ or the $a^{- 1}$ into $a$ .

We get the following rules for multiplying expressions in standard form.

Theorem 8.20:

Fix $n \in Z$ . In $D_{2 n}$ , for $0 \leq i, j < n$ ,

\begin{aligned} a^{i} a^{j} & = {\begin{cases} a^{i + j} & if i + j < n \\ a^{i + j - n} & if i + j \geq n \end{cases} \\ a^{i} (a^{j} b) & = {\begin{cases} a^{i + j} b & if i + j < n \\ a^{i + j - n} b & if i + j \geq n \end{cases} \\ (a^{i} b) a^{j} & = {\begin{cases} a^{i - j} b & if i - j \geq 0 \\ a^{i - j + n} b & if i - j < 0 \end{cases} \\ (a^{i} b) (a^{j} b) & = {\begin{cases} a^{i - j} & if i - j \geq 0 \\ a^{i - j + n} & if i - j < 0 \end{cases} \end{aligned}

You are encouraged to practice some calculations with the dihedral group, especially with $n = 3$ and $n = 4$ , as we’ll frequently be using these as examples.

Example:

We rewrite The Cayley table for $D_{6} = {e, a, a^{2}, b, a b, a^{2} b}$ (check the calculation yourself):

	$e$	$a$	$a^{2}$	$b$	$a b$	$a^{2} b$
$e$	$e$	$a$	$a^{2}$	$b$	$a b$	$a^{2} b$
$a$	$a$	$a^{2}$	$e$	$a b$	$a^{2} b$	$b$
$a^{2}$	$a^{2}$	$e$	$a$	$a^{2} b$	$b$	$a b$
$b$	$b$	$a^{2} b$	$a b$	$e$	$a^{2}$	$a$
$a b$	$a b$	$b$	$a^{2} b$	$a$	$e$	$a^{2}$
$a^{2} b$	$a^{2} b$	$a b$	$b$	$a^{2}$	$a$	$e$

Remark:

The cancellation properties which we previously found have a nice interpretation in terms of Cayley tables:

The left cancellation proposition just says that all the entries in each row are different: if two entries $x y$ and $x z$ in row $x$ are the same, then $x y = x z$ and so $y = z$ . Similarly the right cancellation property says that all the entries in each column are different. This can be a useful method for deducing information about a group from a partial multiplication table.

Observe that these hold with the Cayley table above.

Remark:

When

n = 3

, all vertices of a regular

n

-sided polygon (i.e., an equilateral triangle) are adjacent to one another, so

D_{6}

contains all permutations of

{1, 2, 3}

. But this doesn’t apply for larger

n

8.5 Symmetric Groups and Cycles

We have already met symmetric groups, but not with that name. A symmetric group is just the group of all permutations of a set. We’ll only really consider finite sets, although the definition makes sense for any set.

Definition 8.21:

Let $X$ be a set. The symmetric group on $X$ is the group $S (X)$ of all permutations of $X$ (i.e., bijective functions $f : X \to X$ ), with composition as the group operation.
The symmetric group $S_{n}$ of degree $n$ is the group of all permutations of the set ${1, 2, \dots, n}$ of $n$ elements.

Let’s think of ways of describing permutations. Of course, we could just say what $f (i)$ is for each value of $i$ , and so, for example, refer to the element $f$ of $S_{6}$ with $f (1) = 3$ , $f (2) = 6$ , $f (3) = 1$ , $f (4) = 4$ , $f (5) = 2$ and $f (6) = 5$ .

This is quite hard to read, and one easy way to set out the information in a more easily readable form is to use a “double row” notation with $1, 2, \dots, n$ along the top row, and $f (1), f (2), \dots, f (n)$ along the bottom row.

For example, if $f$ is the element of $S_{6}$ described above, then we could write $f = (\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 \\ 3 & 6 & 1 & 4 & 2 & 5 \end{matrix}) .$

But there’s an even more compact method, that is also much more convenient for understanding group theoretic aspects of permutations.

Definition 8.22:

Let $k$ and $n$ be positive integers with $k \leq n$ . A $k$ -cycle $f$ in $S_{n}$ is a permutation of the following form. There are $k$ distinct elements $i_{1}, i_{2}, \dots, i_{k} \in {1, 2, \dots, n}$ such that $f (i_{1}) = i_{2}, f (i_{2}) = i_{3}, \dots, f (i_{k - 1}) = i_{k}, f (i_{k}) = i_{1}$ and $f (i) = i$ for $i \notin {i_{1}, i_{2}, \dots, i_{k}}$ . (So $f$ “cycles” the elements $i_{1}, i_{2}, \dots, i_{k}$ and leaves other elements unmoved.)

Such an $f$ is denoted by $f = (i_{1}, i_{2}, \dots, i_{k})$ .

We call

k

the length of this cycle.

Example:

In $S_{8}$ , the $6$ -cycle $g = (2, 7, 8, 5, 6, 3)$ has $g (2) = 7, g (7) = 8, g (8) = 5, g (5) = 6, g (6) = 3, g (3) = 2, g (1) = 1 and g (4) = 4,$ or, written in double row notation, $(\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ 1 & 7 & 2 & 4 & 6 & 3 & 8 & 5 \end{matrix}) .$ The notation $(2, 7, 8, 5, 6, 3)$ would also denote an element of $S_{9}$ with $g (9) = 9$ , so this notation doesn’t specify which symmetric group we’re looking at, although that is rarely a problem.

Note that the

6

-cycle

(7, 8, 5, 6, 3, 2)

is exactly the same permutation as

g

, so the same cycle can be written in different ways (in fact, a

k

-cycle can be written in

k

different ways, as we can choose to start the cycle with any of the

k

elements

i_{1}, i_{2}, \dots, i_{k}

Remark:

We’ll allow

k = 1

, but every

1

-cycle

(i_{1})

is just the identity permutation, since it send

i_{1}

i_{1}

and every

i \neq i_{1}

i

, so we’ll rarely bother to write down

1

-cycles.

Definition 8.23:

A transposition is another name for a

2

-cycle. So this is just a permutation that swaps two elements and leaves all other elements unmoved.

Etymology:

Transposition comes from the Latin trans meaning “across, beyond” and positus meaning “to put”. A transposition takes two elements and put them across each other, i.e. swap position.

Of course, not every permutation is a cycle. For example, the permutation $f = (\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 \\ 3 & 6 & 1 & 4 & 2 & 5 \end{matrix})$ that we used as an example earlier is not. However, we can write it as a composition $f = (1, 3) (2, 6, 5)$ of two cycles. These two cycles are “disjoint” in the sense that they move disjoint set of elements ${1, 3}$ and ${2, 5, 6}$ , and this means that it makes no difference which of the two cycles we apply first. So also $f = (2, 6, 5) (1, 3) .$

Definition 8.24:

A set of cycles in $S_{n}$ is disjoint if no element of ${1, 2, \dots, n}$ is moved by more than one of the cycles.

Theorem 8.25:

Every element of $S_{n}$ is may be written as a product of disjoint cycles.

Proof.

We give a constructive proof, that is we give a method which will write any $f \in S_{n}$ as such a product.

We’ll write down a set of disjoint cycles by considering repeatedly applying $f$ to elements of ${1, 2, \dots, n}$ , starting with the smallest elements.

So we start with $1$ , and consider $f (1), f^{2} (1), \dots$ until we reach an element $f^{k} (1)$ that we have already reached. The first time this happens must be with $f^{k} (1) = 1$ , since if $f^{k} (1) = f^{l} (1)$ for $0 < l < k$ then $f^{k - 1} (1) = f^{l - 1} (1)$ , and so we’d have repeated the element $f^{l - 1} (1)$ earlier. So we have a cycle $(1, f (1), f^{2} (1), \dots, f^{k} (1)) .$

Now we start again with the smallest number $i$ that is not involved in any of the cycles we’ve already written down, and get another cycle $(i, f (i), \dots, f^{s} (i))$ for some $s$ .

We keep repeating until we’ve dealt with all the elements of

{1, 2, \dots, n}

□

This will probably be clearer if we look at an example.

Example:

Consider the element $f = (\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\ 2 & 5 & 1 & 9 & 7 & 6 & 3 & 4 & 8 \end{matrix})$ of $S_{9}$ written in double row notation. To write this as a product of disjoint cycles, we start with $1$ , and calculate $f (1) = 2, f (2) = 5, f (5) = 7, f (7) = 3, f (3) = 1,$ so we have a $5$ -cycle $(1, 2, 5, 7, 3)$ .

The smallest number we haven’t yet dealt with is $4$ , and $f (4) = 9, f (9) = 8, f (8) = 4,$ so we have a $3$ -cycle $(4, 9, 8)$ .

The only number we haven’t dealt with is $6$ , and $f (6) = 6,$ so finally we have a $1$ -cycle $(6)$ .

So $f = (1, 2, 5, 7, 3) (4, 9, 8) (6)$ as a product of disjoint cycles. Since $1$ -cycles are just the identity permutation, we can (and usually will) leave them out, and so we can write $f = (1, 2, 5, 7, 3) (4, 9, 8) .$

Since the order in which we apply disjoint permutations doesn’t matter, we could write the cycles in a different order, or we could start each cycle at a different point. So, for example,

f = (9, 8, 4) (5, 7, 3, 1, 2)

is another way of writing the same permutation as a product of disjoint cycles.

It’s very easy to read off the product of permutations written in disjoint cycle notation, just by using the method in the proof of Theorem 8.25.

Example:

Let $f = (1, 5, 3, 4) (2, 6, 7)$ and $g = (3, 4, 8, 1, 2, 5)$ be elements of $S_{8}$ written in disjoint cycle notation. Let’s calculate $f g$ and $g f$ . $f g = (1, 5, 3, 4) (2, 6, 7) (3, 4, 8, 1, 2, 5),$ but of course these are not disjoint cycles. So we start with $1$ and calculate where it is sent by performing the permutation $f g$ repeatedly:

$1$ is sent to $2$ by $(3, 4, 8, 1, 2, 5)$ , which is sent to $6$ by $(2, 6, 7)$ , which is not moved by $(1, 5, 3, 4)$ . So $f g (1) = 6$ .
$6$ is not moved by $(3, 4, 8, 1, 2, 5)$ . It is sent to $7$ by $(2, 6, 7)$ , which is not moved by $(1, 5, 3, 4)$ . So $f g (6) = 7$ .
$7$ is not moved by $(3, 4, 8, 1, 2, 5)$ . It is sent to $2$ by $(2, 6, 7)$ , which is not moved by $(1, 5, 3, 4)$ . So $f g (7) = 2$ .
$2$ is sent to $5$ by $(3, 4, 8, 1, 2, 5)$ , which is not moved by $(2, 6, 7)$ and is sent to $3$ by $(1, 5, 3, 4)$ . So $f g (2) = 3$ .
$3$ is sent to $4$ by $(3, 4, 8, 1, 2, 5)$ , which is not moved by $(2, 6, 7)$ , and sent to $1$ by $(1, 5, 3, 4)$ . So $f g (3) = 1$ .
So we’ve completed our first cycle $(1, 6, 7, 2, 3)$ .
$4$ is sent to $8$ by $(3, 4, 8, 1, 2, 5)$ , which is not moved by $(2, 6, 7)$ or $(1, 5, 3, 4)$ . So $f g (4) = 8$ .
$8$ is sent to $1$ by $(3, 4, 8, 1, 2, 5)$ , which is not moved by $(2, 6, 7)$ and sent to $5$ by $(1, 5, 3, 4)$ . So $f g (8) = 5$ .
$5$ is sent to $3$ by $(3, 4, 8, 1, 2, 5)$ , which is not moved by $(2, 6, 7)$ and sent to $4$ by $(1, 5, 3, 4)$ . So $f g (5) = 4$ .
So we’ve completed another cycle $(4, 8, 5)$ .
We’ve now dealt with all the numbers from $1$ to $8$ , so $f g = (1, 6, 7, 2, 3) (4, 8, 5)$ as a product of disjoint cycles.

Similarly,

g f = (3, 4, 8, 1, 2, 5) (1, 5, 3, 4) (2, 6, 7) = (1, 3, 8) (2, 6, 7, 5, 4)

as a product of disjoint cycles.

Example:

It is easy to write down the inverse of a permutation written as a product of disjoint cycles. Just write the permutation backwards. For example, if

f = (1, 4, 3, 5, 7) (2, 6, 8)

then

f^{- 1} = (8, 6, 2) (7, 5, 3, 4, 1) .

Of course, we have a choice of which order to take the cycles, and where to start each cycle, and if we carried out the method in the proof of Theorem 8.25 then we’d get the alternative representation

f^{- 1} = (1, 7, 5, 3, 4) (2, 8, 6) .

History:

Symmetric groups were one of the first type of groups to emerge as mathematicians studied the permutation of roots of equations in the 1770s. This was before the formal abstract definition of a group was introduced - around 100 years later in the 1870s.

8.6 Order of a group and of elements

With those two main examples in mind, we now explore some more properties of groups.

Definition 8.26:

The order of a group $G$ , denoted $| G |$ , is the cardinality of the set $G$ .

Similarly, we say a group

G

is finite if the set

G

is finite, and infinite otherwise.

Example:

We have seen that $| D_{2 n} | = 2 n$ (justifying our notation).
We have $| (Z, +) |$ is infinite.
We have $| ({- 1, 1}, \cdot) | = 2$ .

Proposition 8.27:

We have $| S_{n} | = n!$ .

Proof.

We’ll count the permutations of ${1, 2, \dots, n}$ . Let $f \in S_{n}$ . Since $f (1)$ can be any of the elements $1, 2, \dots, n$ , there are $n$ possibilities for $f (1)$ . For each of these, there are $n - 1$ possibilities for $f (2)$ , since it can be any of the elements $1, 2, \dots, n$ except for $f (1)$ . Given $f (1), f (2)$ , there are $n - 2$ possibilities for $f (3)$ , since it can be any of the elements $1, 2, \dots, n$ except for $f (1)$ and $f (2)$ . And so on. So in total there are $n (n - 1) (n - 2) \dots 2 \cdot 1 = n!$ permutations of ${1, 2, \dots, n}$ .

□

Definition 8.28:

Let $x$ be an element of a multiplicatively-written group $G$ . Then:

If $x^{n} = e$ for some $n \in Z_{+}$ , then the order of $x$ , and denoted $ord (x)$ (or ${ord}_{G} (x)$ if it may be unclear what group $G$ we’re referring to) is $ord (x) = min {n \in Z_{+} : x^{n} = e}$ , i.e. the least positive integer $n$ such that $x^{n} = e$ . [Note this is well defined by the Well Ordering Principle.]
If there is no $n \in Z_{+}$ such that $x^{n} = e$ , then we say that $x$ has infinite order and write $ord (x) = \infty$ (or ${ord}_{G} (x) = \infty$ ).

Remark:

If you are asked to prove that

ord (x) = n

, then as well as proving that

x^{n} = e

, remember that you also have to prove that

n

is the least positive integer for which this is true: in other words, prove that if

0 < m < n

then

x^{m} \neq e

Remark:

We used the same word for the order of a group and the order of an element. Although the two meanings are different, we’ll see later that there is a very close relationship between them.

Example:

In any group

G

with identity element

e

ord (e) = 1

and

x = e

is the only element with

ord (x) = 1

Example:

In the dihedral group

D_{2 n}

, we same notation as before,

ord (a) = n

and

ord (b) = 2

Example:

In the group

(R ∖ {0}, \times)

we have

ord (1) = 1

ord (- 1) = 2

(since

(- 1)^{2} = 1

but

(- 1)^{1} \neq 1

), and for any other

x

ord (x) = \infty

(since either

| x | < 1

, in which case

| x^{n} | < 1

for all integers

n > 0

and so

x^{n} \neq 1

, or

| x | > 1

, in which case

| x^{n} | > 1

for all integers

n > 0

and so

x^{n} \neq 1

Example:

In the group

(Z, +)

ord (0) = 1

, but

ord (s) = \infty

for any other

s \in Z

. Note that when the group operation is addition and the identity element is

0

, as here, for an element

s

with finite order

n

we would require

s + s + \dots + s

(

n

times) to be

0

Proposition 8.29:

The order of a $k$ -cycle in the symmetric group $S_{n}$ is $k$ .

Proof.

It is clear that if we repeat a $k$ -cycle $(i_{1}, i_{2}, \dots, i_{k})$ $k$ times, each element $i_{j}$ is sent back to itself (and if we repeat it $l < k$ times, then $i_{1}$ is sent to $i_{l + 1} \neq i_{1}$ ).

□

In fact, one benefit of disjoint cycle notation in $S_{n}$ is that it makes it easy to calculate the order of a permutation.

Theorem 8.30:

If $f$ is the product of disjoint cycles of lengths $k_{1}, k_{2}, \dots, k_{r}$ , then $ord (f) = lcm (k_{1}, k_{2}, \dots, k_{r}) .$

Proof.

Consider when $f^{k}$ is the identity permutation. For $f^{k} (j_{i}) = j_{i}$ when $j_{i}$ is one of the numbers involved in the $k_{i}$ -cycle, we need $k_{i}$ to divide $k$ . But if $k_{i}$ divides $k$ for all $i$ , then this applies to all the numbers moved by $f$ . So the smallest such $k$ is the lowest common multiple of $k_{1}, \dots, k_{r}$ .

□

Example:

The order of the permutation

(1, 6, 7, 2, 3) (4, 8, 5)

from a previous example is

lcm (5, 3) = 15

. Notice that if we write this permutation as

(1, 5, 3, 4) (2, 6, 7) (3, 8, 1, 2, 5) (4, 3),

using cycles that are not disjoint, it is much less clear what the order is, and it is definitely not the lowest common multiple of the cycle lengths.

We’ll now see what we can say about the powers of an element of a group if we know its order, starting with elements of infinite order.

Proposition 8.31:

Let $x$ be an element of a group $G$ with $ord (x) = \infty$ . Then the powers $x^{i}$ of $x$ are all distinct: i.e., $x^{i} \neq x^{j}$ if $i \neq j$ are integers.

Proof.

Suppose $x^{i} = x^{j}$ . Without loss of generality we’ll assume $i \geq j$ . Then for $n = i - j \geq 0$ , $x^{n} = x^{i - j} = x^{i} (x^{j})^{- 1} = (x^{i}) (x^{i})^{- 1} = e,$ but since $ord (x) = \infty$ there is no positive integer $n$ with $x^{n} = e$ , so we must have $n = 0$ and so $i = j$ .

□

Corollary 8.32:

If $x$ is an element of a finite group $G$ , then $ord (x) < \infty$ .

Proof.

If $ord (x) = \infty$ then the previous proposition tells us that the elements $x^{i}$ (for $i \in Z$ ) are all different, and so $G$ must have infinitely many elements.

□

In fact, we’ll see later that the order $| G |$ of a finite group severely restricts the possible (finite) orders of its elements. Next for elements of finite order.

Proposition 8.33:

Let $x$ be an element of a group $G$ with $ord (x) = n < \infty$ .

For an integer $i$ , $x^{i} = e$ if and only if $i$ is divisible by $n$ .
For integers $i, j$ , $x^{i} = x^{j}$ if and only if $i - j$ is divisible by $n$ .
$x^{- 1} = x^{n - 1}$ .
The distinct powers of $x$ are $e, x, x^{2}, \dots, x^{n - 1}$ .

Proof.

Firstly, if $i$ is divisible by $n$ , so that $i = n k$ for some integer $k$ , then $x^{i} = x^{n k} = (x^{n})^{k} = e^{k} = e,$ since $x^{n} = e$ .

Conversely, suppose that $x^{i} = e$ . We can write $i = n q + r$ with $q, r \in Z$ and $0 \leq r < n$ (by Theorem 5.2 ). Then $e = x^{i} = x^{n q + r} = (x^{n})^{q} x^{r} = e^{q} x^{r} = x^{r} .$ But $n$ is the least positive integer with $x^{n} = e$ and $x^{r} = e$ with $0 \leq r < n$ , so $r$ can’t be positive, and we must have $r = 0$ . So $i$ is divisible by $n$ .

$x^{i} = x^{j}$ if and only if $x^{i - j} = x^{i} (x^{j})^{- 1} = e$ , which by a.) is the case if and only if $i - j$ is divisible by $n$ .
Take $i = n - 1$ , $j = - 1$ . Then $i - j = n$ is divisible by $n$ , and so $x^{n - 1} = x^{- 1}$ by b.
For any integer $i$ , write $i = n q + r$ for $q, r$ integers with $0 \leq r < n$ . Then $i - r$ is divisible by $n$ , and so $x^{i} = x^{r}$ by $(2)$ . So every power of $x$ is equal to one of $e = x^{0}, x = x^{1}, x^{2}, \dots, x^{n - 1}$ . Conversely, If $i, j \in {0, 1, \dots, n - 1}$ and $i - j$ is divisible by $n$ , then $i = j$ , and so by b.) the elements $e, x, \dots, x^{n - 1}$ are all different.

□

If we know the order of an element of a group, then we can work out the order of any power of that element.

Proposition 8.34:

Let $x$ be an element of a group $G$ , and $i$ an integer.

If $ord (x) = \infty$ and $i \neq 0$ , then $ord (x^{i}) = \infty$ . (If $i = 0$ , then $x^{i} = e$ , and so $ord (x^{i}) = 1$ ).
If $ord (x) = n < \infty$ , then $ord (x^{i}) = \frac{n}{gcd (n, i)} .$

Proof.

Suppose $i > 0$ . If $ord (x^{i}) = m < \infty$ , then $x^{i m} = (x^{i})^{m} = e$ with $i m$ a positive integer, contradicting $ord (x) = \infty$ . Similarly, if $i < 0$ then $x^{- i m} = e$ with $- i m$ a positive integer, again giving a contradiction. So in either case $ord (x^{i}) = \infty$ .
Since $gcd (n, i)$ divides $i$ , $n$ divides $\frac{n i}{gcd (n, i)}$ , and so $(x^{i})^{n / gcd (n, i)} = x^{n i / gcd (n, i)} = e .$

0 < m

and

(x^{i})^{m} = x^{i m} = e

, then

n

divides

i m

, so

\frac{n}{gcd (n, i)}

divides

m

, and in particular

\frac{n}{gcd (n, i)} \leq m

. So

\frac{n}{gcd (n, i)}

is the smallest positive exponent

d

such that

(x^{i})^{d} = e

, and is therefore the order of

x^{i}

□

Example:

In the dihedral group

D_{12}

ord (a) = 6

. So the Proposition gives

ord (a^{4}) = 3

, since

gcd (6, 4) = 2

Example:

Taking

i = - 1

, the Proposition gives

ord (x^{- 1}) = ord (x)

, since

gcd (n, - 1) = 1

for any

n

. [But this case is very easy to check directly, since

(x^{- 1})^{d} = (x^{d})^{- 1}

and so

x^{d} = e

if and only if

(x^{- 1})^{d} = e

9 Linking groups together

We linked sets to each other by considering subsets, functions between sets, and Cartesian products of sets. In a related way, in this section we look at appropriate ideas of “subgroups”, functions between groups and direct products of groups.

9.1 Subgroups

Definition 9.1:

A subgroup of a group

G

is a subset

H \subseteq G

that is itself a group with the same operation as

G

. We sometimes write

H \leq G

Remark:

It is important that the group operation is the same. For example

R ∖ {0}

is a subset of

R

, but we do not regard

(R ∖ {0}, \times)

as a subgroup of

(R, +)

since the group operations are different.

Example:

For any group

G

, the trivial subgroup

{e}

and

G

itself are subgroups of

G

Definition 9.2:

We call a subgroup not equal to

{e}

a non-trivial subgroup, and a subgroup not equal to

G

a proper subgroup of

G

Example:

We have

(Q, +)

is a proper subgroup of

(R, +)

. We have

(Z, +)

is a proper subgroup of both

(Q, +)

and

(R, +)

Example:

The group $(2 Z, +)$ is a subgroup of $(Z, +)$ .

n

is a positive integer, then the group

(n Z, +)

is a subgroup of

(Z, +)

Example:

The group of rotations of a regular

n

-sided polygon (i.e.,

{e, a, a^{2}, \dots, a^{n - 1}}

) is a subgroup of the dihedral group

D_{2 n}

Here’s a simple description of what needs to be checked for a subset to be a subgroup.

Theorem 9.3:

Let $G$ be a multiplicatively written group and let $H \subseteq G$ be a subset of $G$ , Then $H$ is a subgroup of $G$ if and only if the following conditions are satisfied.

(Closure) If $x, y \in H$ then $x y \in H$ .
(Identity) $e \in H$ .
(Inverses) If $x \in H$ then $x^{- 1} \in H$ .

Proof.

We don’t need to check associativity, since $x (y z) = (x y) z$ is true for all elements of $G$ , so is certainly true for all elements of $H$ . So the conditions imply $H$ is a group with the same operation as $G$ .

H

is a group, then (Closure) must hold. By uniqueness of identity and inverses, the identity of

H

must be the same as that of

G

, and the inverse of

x \in H

is the same in

H

as in

G

, so (Identity) and (Inverses) must hold.

□

Proposition 9.4:

If $H, K$ are two subgroups of a group $G$ , then $H \cap K$ is also a subgroup of $G$ .

Proof.

We check the three properties in the theorem.

If $x, y \in H \cap K$ then $x y \in H$ by closure of $H$ , and $x y \in K$ by closure of $K$ , and so $x y \in H \cap K$ .
$e \in H$ and $e \in K$ , so $e \in H \cap K$ .
If $x \in H \cap K$ , then $x^{- 1} \in H$ since $H$ has inverses, and $x^{- 1} \in K$ since $K$ has inverses. So $x^{- 1} \in H \cap K$ .

□

9.2 Cyclic groups and cyclic subgroups

An important family of (sub)groups are those arising as collections of powers of a single element.

Notation:

Given a (multiplicatively-written) group $G$ and an element $x \in G$ , we’ll define $⟨ x ⟩$ to be the subset ${x^{i} : i \in Z}$ of $G$ consisting of all powers of $x$ .

It is easy to check that:

Proposition 9.5:

The set $⟨ x ⟩$ is a subgroup of $G$ .

Proof.

We need to check the conditions of Theorem 9.3.

If $x^{i}, x^{j}$ are powers of $x$ , then $x^{i} x^{j} = x^{i + j}$ is a power of $x$ , so $⟨ x ⟩$ is closed.
We have $e = x^{0}$ is a power of $x$ , so $e \in ⟨ x ⟩$ .
If $x^{i} \in ⟨ x ⟩$ , then $(x^{i})^{- 1} = x^{- i} \in ⟨ x ⟩$ , so $⟨ x ⟩$ is closed under taking inverses.

□

Definition 9.6:

x \in G

, then

⟨ x ⟩

is called the cyclic subgroup of

G

generated by

x

The following explicit description of $⟨ x ⟩$ follows immediately from Propositions 8.31 and 8.33.

Proposition 9.7:

If $ord (x) = \infty$ then $⟨ x ⟩$ is infinite with $x^{i} \neq x^{j}$ unless $i = j$ .

ord (x) = n

then

⟨ x ⟩ = {e, x, \dots, x^{n - 1}}

is finite, with

n

distinct elements.

Remark:

This justifies using the word “order” in two different ways. The order (as an element) of

x

is the same as the order (as a group) of

⟨ x ⟩

Example:

G = D_{2 n}

is the dihedral group of order

2 n

, then

⟨ a ⟩

is the group

{e, a, \dots a^{n - 1}}

of rotations, and

⟨ b ⟩ = {e, b}

Example:

Let

G = (R ∖ {0}, \times)

. Then

⟨ - 1 ⟩ = {1, - 1}

and

⟨ 2 ⟩ = {\dots, \frac{1}{4}, \frac{1}{2}, 1, 2, 4, \dots}

consists of all powers of

2

(and of

\frac{1}{2}

, since we include negative powers of

2

Example:

Let

G = (R, +)

. Since the operation is now addition,

⟨ x ⟩

consists of all multiples of

x

. So, for example,

⟨ 1 ⟩ = Z

and

⟨ 3 ⟩ = 3 Z = {3 n : n \in Z}

Definition 9.8:

A group

G

is called cyclic if

G = ⟨ x ⟩

for some

x \in G

. Such an element

x

is called a generator of

G

Example:

Let

G = (Z, +)

, then

G

is cyclic, since

Z = ⟨ 1 ⟩ = ⟨ - 1 ⟩

. Hence

1

and

- 1

are generators.

This example shows that there may be more than one possible choice of generator. However, not every element is a generator, since, for example, $⟨ 0 ⟩ = {0}$ and $⟨ 2 ⟩ = 2 Z \neq Z$ .

Example:

Let

H

be the cyclic subgroup of

D_{12}

generated by

a

. Then

a

and

a^{- 1}

are generators of

H

, but

a^{2}

is not, since

⟨ a^{2} ⟩ = {e, a^{2}, a^{4}}

contains only the even powers of

a

Example:

Let

H

be the cyclic subgroup of

D_{14}

generated by

a

. Then

a^{2}

is a generator of

H

, since

⟨ a ⟩ = {e, a^{2}, a^{4}, a^{6}, a^{8} = a, a^{10} = a^{3}, a^{12} = a^{5}}

contains all the powers of

a

. In fact, all elements of

H

except

e

are generators.

Proposition 9.9:

Every cyclic group is abelian.

Proof.

Suppose $G = ⟨ x ⟩$ is cyclic with a generator $x$ . Then if $g, h \in G$ , $g = x^{i}$ and $h = x^{j}$ for some integers $i$ and $j$ . So $g h = x^{i} x^{j} = x^{i + j} = x^{j} x^{i} = h g,$ and so $G$ is abelian.

□

Of course, this means that not every group is cyclic, since no non-abelian group is. But there are also abelian groups, even finite ones, that are not cyclic.

Proposition 9.10:

Let $G$ be a finite group with $| G | = n$ . Then $G$ is cyclic if and only if it has an element of order $n$ . An element $x \in G$ is a generator if and only if $ord (x) = n$ .

Proof.

Suppose $x \in G$ . Then $| ⟨ x ⟩ | = ord (x)$ , and since $⟨ x ⟩ \leq G$ , $⟨ x ⟩ = G$ if and only if $ord (x) = | ⟨ x ⟩ | = | G | = n$ .

□

Example:

Let

G = D_{8}

and let

H = {e, a^{2}, b, a^{2} b}

. Then

H

is an abelian subgroup of

G

(check this), but it is not cyclic, since

| H | = 4

but

ord (a^{2}) = ord (b) = ord (a^{2} b) = 2

and

ord (e) = 1

, so

H

has no element of order

4

Theorem 9.11:

Every subgroup of a cyclic group is cyclic.

Proof.

Let $G = ⟨ x ⟩$ be a cyclic group with generator $x$ , and let $H \leq G$ be a subgroup.

If $H = {e}$ is the trivial subgroup, then $H = ⟨ e ⟩$ is cyclic. Otherwise, $x^{i} \in H$ for some $i \neq 0$ , and since also $x^{- i} \in H$ since $H$ is closed under taking inverses, we can assume $i > 0$ .

Let $m$ be the smallest positive integer such that $x^{m} \in H$ . We shall show that $H = ⟨ x^{m} ⟩$ , and so $H$ is cyclic.

Certainly

⟨ x^{m} ⟩ \subseteq H

, since every power of

x^{m}

is in

H

. Suppose

x^{k} \in H

and write

k = m q + r

where

q, r \in Z

and

0 \leq r < m

. Then

x^{r} = x^{k - m q} = x^{k} (x^{m})^{q} \in H,

since

x^{k} \in H

and

x^{m} \in H

. But

0 \leq r < m

, and

m

is the smallest positive integer with

x^{m} \in H

, so

r = 0

or we have a contradiction. So

x^{k} = (x^{m})^{q} \in ⟨ x^{m} ⟩

. Since

x^{k}

was an arbitrary element of

H

H \subseteq ⟨ x^{m} ⟩

□

9.3 Group homomorphism and Isomorphism

After studying sets, we looked at how functions allows us to go from one set to another. A group is a set with some extra structure, and so to learn about groups we want to consider functions between groups which take account of this structure somehow.

A “group homomorphism” is a function between two groups that links the two group operations in the following way.

Definition 9.12:

Let

(G, *)

and

(H, ∙)

be groups. A group homomorphism is a function

φ : G \to H

such that

φ (x * y) = φ (x) ∙ φ (y)

for all

x, y \in G

Remark:

If the groups

G

and

H

are written multiplicatively, then

φ : G \to H

is a homomorphism if

φ (x y) = φ (x) φ (y)

for all

x, y \in G

. But it should be noted that on the left hand side of that equation we multiply

x

and

y

G

, but on the right hand side we multiply

φ (x)

and

φ (y)

H

, so this still links the group operations of different groups.

Remark:

If you are doing Linear Algebra, then you may find it helpful to note a similarity between the definitions of a group homomorphism and a linear map. These are both functions that “commute with the operations” in the sense that the definition of a homomorphism says that multiplying two elements and then applying the homomorphism gives the same as applying the homomorphism to the two elements and then multiplying the resulting elements, and the definition of a linear map says the same for the operations of addition and multiplication by scalars in place of the group operation. In fact, many of the basic facts about homomorphisms are very similar to basic facts about linear maps, usually with pretty much the same proof.

Example:

G

and

H

are groups and

e_{H}

is the identity element of

H

, the function

φ : G \to H

given by

φ (x) = e_{H}

for all

x \in G

is a homomorphism (the trivial homomorphism).

Example:

H \leq G

are groups, then the inclusion map

i : H \to G

given by

i (x) = x \in G

for all

x \in H

is a homomorphism. This is injective but not surjective (unless

H = G

If we have a group homomorphism, there are certain things we can deduce:

Lemma 9.13:

Let $φ : G \to H$ be a homomorphism, let $e_{G}$ and $e_{H}$ be the identity elements of $G$ and $H$ respectively, and let $x \in G$ . Then

$φ (e_{G}) = e_{H}$ ,
$φ (x^{- 1}) = φ (x)^{- 1}$ ,
$φ (x^{i}) = φ (x)^{i}$ for any $i \in Z$ .

Proof.

We go through all three statements.

We have $φ (e_{G}) = φ (e_{G} e_{G}) = φ (e_{G}) φ (e_{G})$ , so by uniqueness of the identity $φ (e_{G}) = e_{H}$ .
We have $e_{H} = φ (e_{G}) = φ (x x^{- 1}) = φ (x) φ (x^{- 1})$ , so by uniqueness of inverses $φ (x^{- 1}) = φ (x)^{- 1}$ .
This is true for positive $i$ by a simple induction (it is true for $i = 1$ , and if it is true for $i = k$ then $φ (x^{k + 1}) = φ (x^{k}) φ (x) = φ (x)^{k} φ (x) = φ (x)^{k + 1},$ so it is true for $i = k + 1$ ). Then by b.) it is true for negative $i$ , and by a.) it is true for $i = 0$ .

□

As we have seen, bijective functions allows us to treat two sets of the same cardinality as essentially the same sets, just changing the name of the elements. A similar story occurs with groups. Suppose $G = ⟨ x ⟩ = {e, x, x^{2}}$ and $H = ⟨ y ⟩ = {e, y, y^{2}}$ are two cyclic groups of order $3$ . Strictly speaking they are different groups, since (for example) $x$ is an element of $G$ but not of $H$ . But clearly they are “really the same” in some sense: the only differences are the names of the elements, and the “abstract structure” of the two groups is the same. To make this idea, of two groups being abstractly the same, precise, we introduce the idea of an isomorphism of groups.

Definition 9.14:

Let

(G, *)

and

(H, ∙)

be groups. An isomorphism from

G

H

is a bijective homomorphism. That is, a bijective function

φ : G \to H

such that

φ (x * y) = φ (x) ∙ φ (y)

for all elements

x, y \in G

Remark:

As when we defined homomorphisms, we used different symbols for the two group operations to point out that the isomorphism links the two different operations. As ever, we’ll usually write the groups multiplicatively, in which case the defining property of an isomorphism becomes

φ (x y) = φ (x) φ (y),

but again it should be stressed that this involves two different kinds of “multiplication”: on the left hand side of the equation we are multiplying in

G

, but on the right hand side in

H

Etymology:

Homomorphism comes from the Greek homo to mean “same” and morph to mean “shape”. Isomorphism comes from the Greek iso to mean “equal” (and morph to mean “shape”).

A homomorphism is a map between two groups that keeps the same shape, i.e. preserves the group operation (but not necessarily anything else), while an isomorphism is a map between two groups that have equal shape (it preserves the group operation and many other useful properties as we will see below).

Example:

Let

G = ⟨ x ⟩

and

H = ⟨ y ⟩

be two cyclic groups of the same order. Then

φ : G \to H

defined by

φ (x^{i}) = y^{i}

for every

i \in Z

is an isomorphism, since it is a bijection and

φ (x^{i + j}) = y^{i + j} = y^{i} y^{j} = φ (x^{i}) φ (x^{j})

for all

i, j

Remark:

Since

φ

is a bijection, it pairs off elements of

G

with elements of

H

, and then the defining property of an isomorphism says that we can use

φ

and its inverse

φ^{- 1}

as a dictionary to translate between elements of

G

and elements of

H

without messing up the group operation. If we take the multiplication (Cayley) table of

G

and apply

φ

to all the entries, we get the multiplication table of

H

: the groups

G

and

H

are “really the same” apart from the names of the individual elements.

Next we’ll prove some of the easy consequences of the definition.

Proposition 9.15:

Let $φ : G \to H$ be an isomorphism between (multiplicatively-written) groups. Then $φ^{- 1} : H \to G$ is also an isomorphism.

Proof.

Since $φ$ is a bijection, it has an inverse function $φ^{- 1}$ that is also a bijection.

Let

u, v \in H

. Since

φ

is a bijection, there are unique elements

x, y \in G

with

u = φ (x)

and

v = φ (y)

. Then

φ^{- 1} (u v) = φ^{- 1} (φ (x) φ (y)) = φ^{- 1} (φ (x y)) = x y = φ^{- 1} (u) φ^{- 1} (v),

and so

φ^{- 1}

is an isomorphism.

□

Because of this Proposition, the following definition makes sense, since it tells us that there is an isomorphism from $G$ to $H$ if an only if there is one from $H$ to $G$ .

Definition 9.16:

Two groups

G

and

H

are said to be isomorphic, or we say

G

is isomorphic to

H

, if there is an isomorphism

φ : G \to H

, and then we write

G ≅ H

, or

φ : G ≅ H

if we want to specify an isomorphism.

Proposition 9.17:

Let $G, H, K$ be three groups. If $G$ is isomorphic to $H$ and $H$ is isomorphic to $K$ , then $G$ is isomorphic to $K$ .

Proof.

Let $φ : G \to H$ and $θ : H \to K$ be isomorphisms. Then the composition $θ φ : G \to K$ is a bijection and if $x, y \in G$ then $θ (φ (x y)) = θ (φ (x) φ (y)) = θ (φ (x)) θ (φ (y)),$ and so $θ φ$ is an isomorphism.

□

Proposition 9.18:

Let $φ : G \to H$ be an isomorphism between (multiplicatively-written) groups, let $e_{G}$ and $e_{H}$ be the identity elements of $G$ and $H$ respectively, and let $x \in G$ . Then

$φ (e_{G}) = e_{H}$ ,
$φ (x^{- 1}) = φ (x)^{- 1}$ .
$φ (x^{i}) = φ (x)^{i}$ for every $i \in Z$ .
${ord}_{G} (x) = {ord}_{H} (φ (x))$ .

Proof.

The first three statements follows directly from Proposition 9.13 since an isomorphism is an homomorphism.

For the last statement, by a.) and c.), we have

x^{i} = e_{G} \Leftrightarrow φ (x)^{i} = e_{H}

, and so

{ord}_{G} (x) = {ord}_{H} (φ (x))

□

To prove that two groups are isomorphic usually requires finding an explicit isomorphism. Proving that two groups are not isomorphic is often easier, as if we can find an “abstract property” that distinguishes them, then this is enough, since isomorphic groups have the same “abstract properties”. We’ll make this precise, and prove it, with some typical properties, after which you should be able to see how to give similar proofs for other properties, just using an isomorphism to translate between properties of $G$ and of $H$ .

Proposition 9.19:

Let $G$ and $H$ be isomorphic groups. Then $| G | = | H |$ .

Proof.

This follows just from the fact that an isomorphism is a bijection $φ : G \to H$ .

□

Proposition 9.20:

Let $G$ and $H$ be isomorphic groups. If $H$ is abelian then so is $G$ .

Proof.

Suppose that $φ : G \to H$ is an isomorphism and that $H$ is abelian. Let $x, y \in G$ . Then $φ (x y) = φ (x) φ (y) = φ (y) φ (x) = φ (y x),$ since $φ (x), φ (y) \in H$ , which is abelian. Since $φ$ is a bijection (and in particular injective) it follows that $x y = y x$ . So $G$ is abelian.

□

Proposition 9.21:

Let $G$ and $H$ be isomorphic groups. If $H$ is cyclic then so is $G$ .

Proof.

Suppose $φ : G \to H$ is an isomorphism and $H = ⟨ y ⟩$ is cyclic. So every element of $H$ is a power of $y$ . So if $g \in G$ then $φ (g) = y^{i}$ for some $i \in Z$ , and so $g = φ^{- 1} (y)^{i}$ . So if $x = φ^{- 1} (y)$ then every element of $G$ is a power of $x$ , and so $G = ⟨ x ⟩$ .

□

Notation:

If $n \in Z_{+}$ , we write $C_{n}$ to mean a (multiplicatively-written) cyclic group of order $n$ .

This notation is sensible since a cyclic group can only be isomorphic to another cyclic group of the same order, and since two cyclic groups of the same order are isomorphic.

Remark:

We can use this proposition to show that there exists groups of the same order which are not isomorphic. Consider the groups $C_{8}$ and $D_{8}$ , both have order $8$ , however we have seen $D_{8}$ is not abelian, and hence not cyclic.

This again highlights that a group is a set with a binary operation: a bijection exists between the sets

C_{8}

and

D_{8}

, but it does not preserve the group operations.

Proposition 9.22:

Let $G$ and $H$ be isomorphic groups and $n \in Z \cup {\infty}$ . Then $G$ and $H$ have the same number of elements of order $n$ .

Proof.

By Proposition 9.18, an isomorphism $φ : G \to H$ induces a bijection between the set of elements of $G$ with order $n$ and the set of elements of $H$ with order $n$ .

□

The idea of isomorphism gives an important tool for understanding unfamiliar or difficult groups. If we can prove that such a group is isomorphic to a group that we understand well, then this is a huge step forward.

The following example of a group isomorphism was used for very practical purposes in the past.

Example:

The logarithm function $\log_{10} : (R_{+}, \cdot) \to (R, +)$ is an isomorphism from the group of positive real numbers under multiplication to the group of real numbers under addition (of course we could use any base for the logarithms). It is a bijection since the function $y \mapsto 10^{y}$ is the inverse function, and the group isomorphism property is the familiar property of logarithms that $\log_{10} (a b) = \log_{10} (a) + \log_{10} (b) .$ Now, if you don’t have a calculator, then addition is much easier to do by hand than multiplication, and people used to use “log tables” to make multiplication easier. If they wanted to multiply two numbers, they would look up the logarithms, add them and then look up the “antilogarithm”.

In group theoretic language they were exploiting the fact that there is an isomorphism between the “difficult” group

(R_{+}, \cdot)

and the “easy” group

(R, +)

9.4 Direct Product

In this section we’ll study a simple way of combining two groups to build a new, larger, group. We will do that by generalising the idea of the Cartesian product of two sets (Definition 6.1).

Definition 9.23:

Let

H

and

K

be (multiplicatively-written) groups. The direct product

H \times K

H

and

K

is the Cartesian product of the sets

H

and

K

, with the binary operation

(x, y) (x^{'}, y^{'}) = (x x^{'}, y y^{'})

for

x, x^{'} \in H

and

y, y^{'} \in K

Proposition 9.24:

The direct product $H \times K$ of groups is itself a group.

Proof.

Associativity of $H \times K$ follows from associativity of $H$ and $K$ , since if $x, x^{'}, x^{″} \in H$ and $y, y^{'}, y^{″} \in K$ , then $\begin{aligned} ((x, y) (x^{'}, y^{'})) (x^{″}, y^{″}) & = (x x^{'}, y y^{'}) (x^{″}, y^{″}) \\ = (x x^{'} x^{″}, y y^{'} y^{″}) \\ = (x, y) (x^{'} x^{″}, y^{'} y^{″}) \\ = (x, y) ((x^{'}, y^{'}) (x^{″}, y^{″})) . \end{aligned}$

If $e_{H}$ and $e_{K}$ are the identity elements of $H$ and $K$ , then $(e_{H}, e_{K})$ is the identity element of $H \times K$ , since if $x \in H$ and $y \in K$ , then $(e_{H}, e_{K}) (x, y) = (e_{H} x, e_{K} y) = (x, y) = (x e_{H}, y e_{K}) = (x, y) (e_{H}, e_{K}) .$

The inverse of

(x, y) \in H \times K

(x^{- 1}, y^{- 1})

, since

(x, y) (x^{- 1}, y^{- 1}) = (x x^{- 1}, y y^{- 1}) = (e_{H}, e_{K}) = (x^{- 1} x, y^{- 1} y) = (x^{- 1}, y^{- 1}) (x, y) .

□

Notice that for all aspects of the group structure, we simply apply the corresponding idea in the first coordinate for $H$ and in the second coordinate for $K$ . This is generally how we understand $H \times K$ , by considering the two coordinates separately, and if we understand $H$ and $K$ , then $H \times K$ is easy to understand. For example, it is easy to see that, for any $i \in Z$ and any $(x, y) \in H \times K$ , $(x, y)^{i} = (x^{i}, y^{i}),$ so also taking powers in a direct product is just a matter of taking powers of the coordinates separately.

Here are some very easy consequences of the definition.

Proposition 9.25:

Let $H$ and $K$ be (multiplicatively-written) groups, and let $G = H \times K$ be the direct product.

$G$ is finite if and only if both $H$ and $K$ are finite, in which case $| G | = | H | | K |$ .
$G$ is abelian if and only if both $H$ and $K$ are abelian.
If $G$ is cyclic then both $H$ and $K$ are cyclic.

Proof.

We go through all three statements.

This is just a familiar property of Cartesian products of sets (that was left as an exercise).
Suppose $H$ and $K$ are abelian, and let $(x, y), (x^{'}, y^{'}) \in G$ . Then $(x, y) (x^{'}, y^{'}) = (x x^{'}, y y^{'}) = (x^{'} x, y^{'} y) = (x^{'}, y^{'}) (x, y),$ and so $G$ is abelian.

Suppose $G$ is abelian, and $x, x^{'} \in H$ , then $(x x^{'}, e_{K}) = (x, e_{K}) (x^{'}, e_{K}) = (x^{'}, e_{K}) (x, e_{K}) = (x^{'} x, e_{K}),$ and considering the first coordinates, $x x^{'} = x^{'} x$ , and so $H$ is abelian. Similarly $K$ is abelian.

Suppose $G$ is cyclic, and $(x, y)$ is a generator, so that every element of $G$ is a power of $(x, y)$ . Let $x^{'} \in H$ . Then $(x^{'}, e_{K}) \in G$ , so $(x^{'}, e_{K}) = (x, y)^{i} = (x^{i}, y^{i})$ for some $i \in Z$ , and so $x^{'} = x^{i}$ . So every element of $H$ is a power of $x$ , so $H = ⟨ x ⟩$ is cyclic. Similarly $K$ is cyclic.

□

Remark:

The converse of c.) is not true in general: the direct product of cyclic groups may not be cyclic. For example, if

H = K = C_{2}

, then

(x, y)^{2} = (e_{H}, e_{K})

for every

(x, y) \in C_{2} \times C_{2}

, so

C_{2} \times C_{2}

has no element of order

| H \times K | = 4

, and so can’t be cyclic.

Proposition 9.26:

Let $H$ and $K$ be (multiplicatively-written) groups, and $x \in H$ , $y \in K$ elements with finite order. Then $(x, y) \in H \times K$ has finite order equal to the lowest common multiple $lcm ({ord}_{H} (x), {ord}_{K} (y)) .$

Proof.

Let $i \in Z$ . Then $(x, y)^{i} = (e_{H}, e_{K})$ if and only if $x^{i} = e_{H}$ and $y^{i} = e_{K}$ , which is the case if and only if $i$ is divisible by both ${ord}_{H} (x)$ and by ${ord}_{K} (y)$ . The least positive such $i$ is $lcm ({ord}_{H} (x), {ord}_{K} (y))$ , and so this is the order of $(x, y)$ .

□

We can now decide precisely when the direct product of cyclic groups is cyclic.

Theorem 9.27:

Let $H = C_{n}$ and $K = C_{m}$ be finite cyclic groups. Then $C_{n} \times C_{m}$ is cyclic if and only if $gcd (n, m) = 1$ .

Proof.

Let $(x, y) \in H \times K$ . Then ${ord}_{H} (x) \leq | H |$ and ${ord}_{K} (y) \leq | K |$ . So by Proposition 9.26, ${ord}_{H \times K} (x, y) = lcm ({ord}_{H} (x), {ord}_{K} (y)) \leq {ord}_{H} (x) {ord}_{K} (y) \leq | H | | K |,$ where the first inequality is an equality if and only if ${ord}_{H} (x)$ and ${ord}_{K} (y)$ are coprime, and the second inequality is an equality if and only if $H = ⟨ x ⟩$ and $K = ⟨ y ⟩$ .

Since

| H \times K | = | H | | K |

H \times K

is cyclic if and only if it has an element of order

| H | | K |

, which by the argument above is true if and only if

gcd ({ord}_{H} (x), {ord}_{K} (y)) = 1

□

Since cyclic groups of the same order are isomorphic, the last theorem says that $C_{m} \times C_{n} ≅ C_{m n}$ if and only if $gcd (m, n) = 1$ .

Remark:

While the above proof tells us the existence of an isomorphism between

C_{m} \times C_{n}

and

C_{m n}

it does not tell us explicitly what the isomorphism is. However, we can use Theorem 10.14 to construct the isomorphism between

C_{m} \times C_{n}

and

C_{m n}

. Let

C_{m} = ⟨ x ⟩

C_{n} = ⟨ y ⟩

C_{m n} = ⟨ z ⟩

and define

φ : C_{m} \times C_{n} \to C_{m n}

φ ((x^{i}, y^{j})) = z^{k}

where by Theorem 10.14

k

is such that

k \equiv i (\mod m)

and

k \equiv j (\mod n)

Example:

C_{2} \times C_{2}

and

C_{4} \times C_{2}

are not cyclic.

Remark:

C_{2} \times C_{2}

is an abelian group of order

4

such that every element apart from the identity has order

2

. It is easy to check that if

G = {e, a, b, c}

is any group with these properties, then

a b = c = b a

a c = b = c a

and

b c = a = c b

: i.e., the product of any two of the three non-identity elements is the other non-identity element. This means that there is only one possible multiplication table for such a group, and so any two groups with these properties are isomorphic.

Definition 9.28:

A Klein 4-group is a group of order

4

such that every element except the identity has order

2

History:

Felix Klein, German mathematician (1849 - 1925) applied ideas from group theory to geometry, which led to the emergence of transformation groups. These were some of the first example of infinite groups (beyond the groups $(R, +)$ , $(R ∖ {0}, \times)$ , which weren’t really studied as groups). Klein wrote about the Klein 4-group, although he called it Vierergruppe (meaning “four-group”), as it is the smallest group which is not cyclic.

Example:

$C_{2} \times C_{3} ≅ C_{6}$
$C_{2} \times C_{9} ≅ C_{18}$
$C_{90} ≅ C_{2} \times C_{45} ≅ C_{2} \times (C_{9} \times C_{5})$

We can clearly extend the definition of a direct product of two groups to three, four or more groups and make sense of things like $G \times H \times K$ , which would be a group whose elements are the ordered triples $(x, y, z)$ with $x \in G$ , $y \in H$ and $z \in K$ .

We can then write the last example as $C_{90} ≅ C_{2} \times C_{9} \times C_{5} .$

More generally, if $n = p_{1}^{r_{1}} p_{2}^{r_{2}} \dots p_{k}^{r_{k}}$ , where $p_{1}, p_{2}, \dots, p_{k}$ are distinct primes, then $C_{n} ≅ C_{p_{1}^{r_{1}}} \times C_{p_{2}^{r_{2}}} \times \dots \times C_{p_{k}^{r_{k}}},$ so that every finite cyclic group is isomorphic to a direct product of groups with order powers of primes.

We’ll finish this section with a different example of a familiar group that is isomorphic to a direct product.

Proposition 9.29:

$(R ∖ {0}, \times) ≅ (R_{> 0}, \times) \times ({1, - 1}, \times)$ .

Proof.

Define a map $φ : R_{> 0} \times {1, - 1} \to R ∖ {0}$ by $φ (x, ε) = ε x$ . This is clearly a bijection, and $φ ((x, ε) (x^{'} ε^{'})) = φ (x x^{'}, ε ε^{'}) = ε ε^{'} x x^{'} = (ε x) (ε^{'} x^{'}) = φ (x, ε) φ (x^{'}, ε^{'}),$ so that $φ$ is an isomorphism.

□

10 Modular arithmetic and Lagrange

We finish this course by linking several threads together. We will explore the important Lagrange’s theorem of group theory (which actually was first formulated long before group theory was developed). We will then see how this theorem has applications to the study of the integers (which was one of our earlier threads). First though, we need to take a step back and explore the concept of partitioning a set, and saying when objects are “basically the same”.

10.1 Equivalence relations and partition of sets

There are many situations in maths where we have different objects that we want to “consider the same”, often because they share a desired property. The requirement that two objects are exactly equal is often too restrictive, so we generalise this concept to be broader (this theory is applied to many branches of mathematics).

Definition 10.1:

A relation $\sim$ on a nonempty set $X$ is a subset $R \subseteq X \times X$ . We say $x$ is related to $y$ , denoted by $x \sim y$ , when $(x, y) \in R$ .

A relation $\sim$ is reflexive if for all $x \in X$ , $x \sim x$ .
A relation $\sim$ is symmetric if for all $x, y \in X$ we have $x \sim y$ implies $y \sim x$ .
A relation $\sim$ is transitive if for all $x, y, z \in X$ , if $x \sim y$ and $y \sim z$ then $x \sim z$ .

Remark:

We use the notation $x ≁ y$ to say $x$ is not related to $y$ .

We negate the above three properties to say that:

A relation $\sim$ is not reflexive if there exists $x \in X$ such that $x ≁ x$ .
A relation $\sim$ is not symmetric if there exists $x, y \in X$ such that $x \sim y$ and $y ≁ x$ .
A relation $\sim$ is not transitive if there exists $x, y, z \in X$ , such that $x \sim y$ and $y \sim z$ and $x ≁ z$ .

Definition 10.2:

A relation $\sim$ on a nonempty set $X$ is an equivalence relation if it is reflexive, symmetric and transitive.

In such case we read

x \sim y

as “

x

is equivalent to

y

”.

Etymology:

Reflexive comes from the Latin re meaning “back” and flectere meaning “to bend”. This gave rise to words like “reflect” (where an object is bent back through a mirror) and “reflection”. A relationship is reflexive if it reflects every element (i.e., every element is in relation with itself).

Symmetric comes from the Greek sun meaning “together with” and metron meaning “a measure”. Given a point “together with a distance (from a line of symmetry)”, then you have found the image of the point using that symmetry.

We’ve seen the word transitive before ( (O2) ) - transitive comes from the Latin trans meaning “across, beyond” and the verb itus/ire meaning “to go”. A transitive relationship is one where knowing $x \sim y$ and $y \sim z$ allows us to go from $x$ beyond $y$ all the way to $z$ .

Equivalence comes from the Latin oequus meaning “even, level” and valere meaning “to be strong, to have value”. Two elements are equivalent if they are level (i.e., equal) in their value. One can be substituted for the other (notice this has the same roots as the word equal).

Example:

Since we claim that equivalence relations are a generalisation of $=$ , let us show that $=$ is indeed an equivalence relation. Define a relation $\sim$ on $R$ via $x \sim y$ if and only if $x = y$ . We show $\sim$ is an equivalence relation.

Let $x \in R$ . Since $x = x$ , we have $x \sim x$ . This is true for all $x \in R$ , hence $\sim$ is reflexive.

Let $x, y \in R$ be such that $x \sim y$ . By definition, this means $x = y$ , which means $y = x$ , i.e. $y \sim x$ . This is true for all $x, y \in R$ , hence $\sim$ is symmetric.

Let $x, y, z \in R$ be such that $x \sim y$ and $y \sim z$ . This means $x = y$ and $y = z$ , so $x = y = z$ , i.e. $x = z$ . So $x \sim z$ . This is true for all $x, y, z \in R$ , hence $\sim$ is transitive.

Since

\sim

is reflexive, symmetric and transitive, we have that

\sim

is an equivalence relation.

Example:

Define a relation

\sim

R

via

x \sim y

if and only if

x \leq y

. We have that

\sim

is not an equivalence relation because it is not symmetric. Indeed let

x = 1

and

y = 2

, we have

1 \leq 2

1 \sim 2

but

2 ≰ 1

2 ≁ 1

. (The relation is however reflexive and transitive).

Example:

We define a relation

\sim

on the set of all subsets of

R

via

X \sim Y

if and only if there exists a bijection

f : X \to Y

. Remark 7.2 proved that

\sim

is reflexive, symmetric and transitive Hence

\sim

is an equivalence relation. (and the reason why we could define Cardinality in a way that makes sense)

Example:

Define a relation

≅

on the set of groups via

G ≅ H

if and only if

G

is isomorphic to

H

. This is reflexive (since

G

is isomorphic to itself via the identity function), Proposition 9.15 shows that

≅

is symmetric, and Proposition 9.17 proved that

≅

is transitive. So

≅

is an equivalence relation (and hence the rationale for why we care about groups “up to isomorphism”).

Definition 10.3:

Suppose

\sim

is an equivalence relation on a (nonempty) set

X

. For

x \in X

, we define the equivalence class of

x

, denoted by

[x]

, to be

[x] = {y \in X : y \sim x} .

Example:

Let

X = {- 1, 0, 1, 2}

and define a relation

\sim

X

via

x \sim y

if and only if either

x = y

x y > 0

. One can check this is an equivalence relation (it takes a bit of work to show

\sim

is transitive). We have the following equivalence classes: Hence, we have three distinct equivalence classes, namely

[- 1]

[0]

and

[1]

Notice how in the above example each distinct equivalence class did not intersect each other. This is true in general.

Proposition 10.4:

Suppose $\sim$ is an equivalence relation on a (nonempty) set $X$ . Then for any $x, y \in X$ , $[x] \neq [y]$ if and only if $[x] \cap [y] = \emptyset .$

Proof.

Take $x, y \in X$ .

$\Rightarrow$ ). First, we will show that if $[x] \neq [y]$ then $[x] \cap [y] = \emptyset$ by using the contrapositive. That is will prove that if $[x] \cap [y] \neq \emptyset$ , then $[x] = [y]$ .

So suppose that $[x] \cap [y] \neq \emptyset$ . Then there exists some $z \in [x] \cap [y]$ . Hence, $z \in [x]$ , so $z \sim x$ . Similarly, $z \in [y]$ , so $z \sim y$ . Since $\sim$ is symmetric, we have that $x \sim z$ . Since $\sim$ is transitive, we have that $x \sim y$ . Now, choose $w \in [x]$ . Then $w \sim x$ , and since $x \sim y$ and $\sim$ is transitive, we have that $w \sim y$ . Hence, $w \in [y]$ . Since $w \in [x]$ is arbitrary, we have that $[x] \subseteq [y]$ . Similarly, we can show that, for any $w \in [y]$ , we have $w \in [x]$ , so $[y] \subseteq [x]$ . Hence $[x] = [y] .$

$\Leftarrow$ ). Second, we will show that if $[x] \cap [y] = \emptyset$ then $[x] \neq [y]$ by using the contrapositive. So we will show that if $[x] = [y]$ , then $[x] \cap [y] \neq \emptyset$ .

So suppose that $[x] = [y]$ . Since $\sim$ is reflexive, we know that $x \in [x]$ . Hence, $x \in [x] = [x] \cap [y]$ , so $[x] \cap [y] \neq \emptyset .$

Summarising, we have that

[x] \neq [y]

if and only if

[x] \cap [y] = \emptyset .

□

The above set-up leads us to the following definition which breaks a set into “parts”.

Definition 10.5:

A partition of a non-empty set $X$ is a collection ${A_{i} : i \in I}$ of non-empty subsets of $X$ such that

$\forall x \in X$ , $\exists i \in I$ such that $x \in A_{i}$ , and
$\forall x \in X$ and $\forall i, j \in I$ , if $x \in A_{i} \cap A_{j}$ , then $A_{i} = A_{j}$ .

Remark:

The above two conditions can be rephrased as follows:

$X = ⋃_{i \in I} A_{i}$ , and
$A_{i} \cap A_{j} \neq \emptyset$ if and only if $A_{i} = A_{j}$ .

The above remark and Proposition 10.4 suggests a strong link between equivalence relations and partitions of sets, as we explore in the next two theorems.

Theorem 10.6:

Suppose $\sim$ is an equivalence relation on a (non-empty) set $X$ . Then $Π = {[x] : x \in X}$ is a partition of $X$ .

Proof.

Take $a \in X$ . Then $a \in [a] \in Π$ . Hence, every element of $X$ is in one of the equivalence classes in $Π$ . Furthermore, by Proposition 10.4 we have that $[x] \neq [y]$ if and only if $[x] \cap [y] = \emptyset .$ , i.e. (by the contrapositive) $[x] \cap [y] \neq \emptyset$ if and only if $[x] = [y]$ as required.

□

Furthermore, given a partition of a set, we can construct an equivalence relation.

Theorem 10.7:

Suppose $Π = {A_{i} : i \in I}$ is a partition of a (nonempty) set $X$ , for some indexing set $I$ . For $x, y \in X$ , define $x \sim y$ if and only if there exists an $i \in I$ such that $x, y \in A_{i}$ . Then $\sim$ is an equivalence relation on $X$ .

Proof.

We show $\sim$ is reflexive. Take $x \in X$ . Since $Π$ is a partition of $X$ , there exists some $i \in I$ such that $x \in A_{i}$ . Hence, $x \sim x$ .

We show $\sim$ is symmetric. Suppose $x, y \in X$ such that $x \sim y$ . Then there exists some $i \in I$ such that $x, y \in A_{i}$ . It follows that $y, x \in A_{i}$ , so $y \sim x$ .

We show $\sim$ is transitive. Suppose that $x, y, z \in X$ are such that $x \sim y$ and $y \sim z$ . Then there exists some $i \in I$ such that $x, y \in A_{i}$ and there exists some $j \in I$ such that $y, z \in A_{j}$ . Hence, we have that $y \in A_{i}$ and $y \in A_{j}$ . Since $Π$ is a partition, we must have that $A_{i} = A_{j}$ . It follows that $x, z \in A_{i}$ , so $x \sim z$ .

Summarising, we have that

\sim

is an equivalence relation on

X

□

Example:

Let us take the previous example further. Define a relation $\sim$ on $Z$ via $x \sim y$ if and only if either $x = y$ or $x y > 0$ . This is an equivalence relation. We have that $\sim$ partitions $Z$ into three sets, $[- 1] = Z_{-}$ , $[0] = {0}$ and $[1] = Z_{+}$ .

Example:

We will construct $Q$ from $Z$ by partitioning the set $Z \times Z_{+}$ .

We define the relation $\sim$ on $Z \times Z_{+}$ via $(a, b) \sim (c, d)$ if and only if $a \cdot d = b \cdot c$ . We check this is an equivalence relation.:

Let $(a, b) \in Z \times Z_{+}$ . Since $a \cdot b = b \cdot a$ , we have that $(a, b) \sim (a, b)$ and $\sim$ is reflexive.

Let $(a, b), (c, d) \in Z \times Z_{+}$ be such that $(a, b) \sim (c, d)$ . Then we have $a \cdot d = b \cdot c$ , which can be re-written as $c \cdot b = d \cdot c$ , so $(c, d) \sim (a, b)$ and $\sim$ is symmetric.

Let $(a, b), (c, d), (f, g) \in Z \times Z_{+}$ be such that $(a, b) \sim (c, d)$ and $(c, d) \sim (f, g)$ . Then we have $a \cdot d = b \cdot c$ and $c \cdot g = d \cdot f$ . Multiplying the first equation by $g$ we get $a d g = c b g$ , and using the second equation we can write it as $a d g = b d f$ . Since we have $d \neq 0$ (as $d \in Z_{+}$ ) we can divide through to give $a g = b f$ which means that $(a, b) \sim (f, g)$ and $\sim$ is transitive.

Therefore $\sim$ is reflexive, symmetric and transitive, and so it is an equivalence relation.

We define elements of

Q

, which we denote

\frac{a}{b}

to be the equivalence classes

[(a, b)] = {(a, b), (2 a, 2 b), \dots}

History:

We will not try to summarise the interesting 2019 article by Amir Asghari on the history of equivalence relation (separating the definition from when the notion was first used), but we would recommend it as something worth reading just to showcase how ideas can be used before they are formalised, and how the mathematics we do today is different from how mathematics used to be done. The full reference is Asghari, A. Equivalence: an attempt at a history of the idea. Synthese 196, 4657–4677 (2019). https://doi.org/10.1007/s11229-018-1674-2

10.2 Congruences and modular arithmetic

We now use our work on equivalence classes to partition $Z$ . Recall that for any two integers $a, n \in Z$ , $n \neq 0$ , there exists a quotient $q$ and a remainder $r$ such that $a = n q + r$ . Sometimes we only care about the remainder $r$ , which leads us into the following definition.

Definition 10.8:

Let

n \in Z_{+}

. For

a, b \in Z

, we say that

a

is congruent to

b

modulo

n

, denoted by

a \equiv b (\mod n)

, if

n | (a - b)

, i.e. if

a - b \in n Z

Theorem 10.9:

Let $n \in Z_{+}$ . Define a relation on $Z$ via $a \sim b$ if and only if $a \equiv b (\mod n)$ . Then $\sim$ is an equivalence relation.

Proof.

Exercise.

□

Using Theorem 5.2, we can see that, given an integer $n$ , every integer $a$ is congruent modulo $n$ to a unique $r$ such that $0 \leq r < n$ . I.e, we partition $Z$ into exactly $n$ congruence classes modulo $n$ given by $[0], [1], \dots, [n - 1]$ where $[r] = {a \in Z : a \equiv r (\mod n)} = r + n Z .$

Notation:

We write $Z / n Z$ for the set of congruence classes module $n$ .

If we’re clear that we’re considering

[r]

as an element of

Z / n Z

we’ll often simply write

r

Both addition and multiplication make sense modulo $n$ , by the following theorem, which is very useful in computations.

Theorem 10.10:

Fix $n \in Z_{+}$ . Suppose that $a, b, c, d \in Z$ are such that $a \equiv c (\mod n)$ and $b \equiv d (\mod n)$ . Then $a + b \equiv c + d (\mod n) and a b \equiv c d (\mod n) .$

Proof.

By assumption, we have that $n ∣ a - c$ and $n ∣ b - d$ . Hence, for some $x, y \in Z$ , we have that $a - c = n x$ and $b - d = n y$ . It follows that $(a + b) - (c + d) = (a - c) + (b - d) = n x + n y = n (x + y) .$ Since $x + y \in Z$ , this means that $n ∣ ((a + b) - (c + d))$ , so $a + b \equiv c + d (\mod n) .$ Further, since $a = c + n x$ and $b = d + n y$ , we have that $a b = (c + n x) (d + n y) = c d + n (c y + d x + n x y)$ so $a b - c d = n (c y + d x + n x y)$ . Since $c y + d x + n x y$ is an integer, we have that $n ∣ a b - c d$ , so $a b \equiv c d (\mod n) .$

□

Example:

We want to compute

3^{5} + 2^{8} (\mod 7)

. We have

3^{2} \equiv 9 \equiv 2 (\mod 7)

. So

3^{4} = 3^{2} \cdot 3^{2} \equiv 2 \cdot 2 \equiv 4 (\mod 7)

. Hence

3^{5} \equiv 3^{4} \cdot 3 \equiv 12 \equiv 5 (\mod 7) .

Further, we have

2^{3} \equiv 8 \equiv 1 (\mod 7)

, so

2^{6} \equiv 2^{3} \cdot 2^{3} \equiv 1 \cdot 1 \equiv 1 (\mod 7) .

Hence, we have that

2^{8} \equiv 2^{6} \cdot 2^{2} \equiv 1 \cdot 4 \equiv 4 (\mod 7) .

It follows that

3^{5} + 2^{8} \equiv 5 + 4 \equiv 9 \equiv 2 (\mod 7) .

Another consequence of Theorem 10.10 is that we can “shift” the congruence classes by a given constant $c \in Z$ , i.e. the congruence classes can be given by $[c], [c + 1], \dots, [c + n - 1]$ .

Example:

For example

Z / 7 Z = {[0], [1], [2], [3], [4], [5], [6]} = {[- 3], [- 2], [- 1], [0], [1], [2], [3]} = {[5], [6], [7], [8], [9], [10], [11]}

Theorem 10.11:

We have $(Z / n Z, +)$ is an abelian group.

Proof.

Addition on $Z / n Z$ is a well-defined binary operation by Theorem 10.10. Addition $(\mod n)$ is associative, since $([a] + [b]) + [c] = [a + b + c] = [a] + ([b] + [c]) .$ The identity element is $[0]$ since $[a] + [0] = [a] = [0] + [a]$ for any $a \in Z$ . The inverse of $[a]$ is $[- a]$ , since $[a] + [- a] = [0] = [- a] + [a] .$ So $(Z / n Z, +)$ is a group. It is abelian since $[a] + [b] = [a + b] = [b + a]$ for all $a$ and $b$ .

□

In fact, it is a cyclic group, since the following is clear.

Proposition 10.12:

$(Z / n Z, +) = ⟨ 1 ⟩$ .

Remark:

Since $(Z / n Z, +)$ is cyclic, it is isomorphic to $C_{n} = ⟨ x ⟩$ . The isomorphism is $φ : (Z / n Z, +) \to C_{n}$ defined by $φ (i) = x^{i}$ .

This means that cyclic groups are particularly simple to understand if we know a generator, as the group operation is just addition of exponents: in a cyclic group

G = ⟨ x ⟩

x^{i} x^{j} = x^{i + j}

, so the group operation in an infinite cyclic group is “just like” addition of integers, and the group operation in a finite cyclic group of order

n

is “just like”” addition of integers

(\mod n)

However, $(Z / n Z, \times)$ is not a group, since although it is associative and has an identity element $[1]$ , not every element has an inverse. For example, $[0]$ never has an inverse, and in $Z / 4 Z$ , $[2]$ does not have a multiplicative inverse.

Proposition 10.13:

In $Z / n Z$ , $[a]$ has a multiplicative inverse if and only if $gcd (a, n) = 1$ .

Proof.

If $gcd (a, n) = 1$ then Theorem 5.7 there exists $s, t \in Z$ such that $a s + n t = 1$ for some . So $a s \equiv 1 - n t \equiv 1 (\mod n),$ so $[a] [s] = [1]$ in $Z / n Z$ , and so $[s]$ is a multiplicative inverse of $[a]$ .

Conversely, if

gcd (a, n) = h > 1

, and if

a s \equiv 1 (\mod n)

, then

1 = a s + n q

for some

q \in Z

. But

h

divides both

a

and

n

, so it divides

a s + n q

. But no integer

h > 1

divides

1

. So there is no

s

such that

[s]

is a multiplicative inverse of

a

□

Note that the first part of the proof is constructive and allows us to solve some congruence equations.

Example:

Find $x \in Z$ such that $4 x = 3 (\mod 7)$ .

We find the inverse of $[4]$ in $Z / 7 Z$ (which exists since $gcd (4, 7) = 1$ ). We find $s, t \in Z$ such that $4 s + 7 t = 1$ . We have that

$k$	$s_{k}$	$t_{k}$	Calculation	$q_{k}$	$r_{k}$
$0$	$0$	$1$	-	-	-
$1$	$1$	$0$	$7 = 4 \cdot 1 + 3$	$1$	$3$
$2$	$0 - 1 \cdot 1 = - 1$	$1 - 0 \cdot 1 = 1$	$4 = 3 \cdot 1 + 1$	$1$	$1$
$3$	$1 - (- 1) \cdot 1 = 2$	$0 - 1 \cdot 1 = - 1$	$3 = 1 \cdot 3 + 0$	$3$	$0$

so $s = 2$ and $t = - 1$ . So $[2]$ is the inverse of $[4]$ in $Z / 7 Z$ .

Multiply both side of $4 x \equiv 3 (\mod 7)$ by $2$ , we get $x \equiv 2 \cdot 4 x \equiv 2 \cdot 3 \equiv 6 (\mod 7)$ .

We can also look at solving a system of simultaneous congruence equations.

Theorem 10.14:

Let $m, n \in Z_{+}$ be co-prime. For any $a, b \in Z$ , $\exists x \in Z$ such that $x \equiv a (\mod m) and x \equiv b (\mod n) .$ Furthermore, this $x$ is unique modulo $m n$ . That is for $x^{'} \in Z$ , we have that $x^{'} \equiv a (\mod m) and x^{'} \equiv b (\mod n)$ if and only if $x^{'} \equiv x (\mod m n) .$

Proof.

We will prove the existence of $x$ and leave the uniqueness of $x$ modulo $m n$ as an exercise.

Since

gcd (m, n) = 1

, by Theorem 5.7 we know that there exist

s, t \in Z

such that

m s + n t = 1

. Then

1 \equiv m s + n t \equiv n t (\mod m)

and

1 \equiv m s + n t \equiv m s (\mod n) .

So let

x = m s b + n t a \in Z

. Then

x \equiv m s b + n t a \equiv n t a \equiv 1 \cdot a \equiv a (\mod m)

and

x \equiv m s b + n t a \equiv m s b \equiv 1 \cdot b \equiv b (\mod n) .

□

History:

The above theorem is often known as the Chinese Remainder Theorem. An example of this theorem was first discussed by Sun Zi (Chinese mathematician, around 400-460) in his text Sunzi suanjing (Sun Zi’s Mathematical Manual). While Sun Zi came to the correct solution he posed in his text using the methods we would use nowadays, there is no sign that he developed a general method. Instead it was Qin Jiushao (Chinese mathematician, 1202-1261) who wrote in his text “Shushu Jiuzang” how to solve simultaneous linear congruent equations. The origin of the name “Chinese Remainder Theorem” is unclear; it arose in the west some time after a 1852 article by Wiley on the history of Chinese mathematics (however, he did not use this term).

Note that the above proof is a constructive proof, that is, it gives a way to find $x$ .

Example:

We want to find $x \in Z$ such that $x \equiv 4 (\mod 5)$ and $x \equiv 7 (\mod 8)$ .

We have that $gcd (5, 8) = 1$ , so we set $x = m s b + n t a$ , where $a = 4, m = 5, b = 7, n = 8$ and $s, t$ are such that $g c d (5, 8) = 5 s + 8 t$ . We have that

$k$	$s_{k}$	$t_{k}$	Calculation	$q_{k}$	$r_{k}$
$0$	$0$	$1$	-	-	-
$1$	$1$	$0$	$8 = 5 \cdot 1 + 3$	$1$	$3$
$2$	$0 - 1 \cdot 1 = - 1$	$1 - 0 \cdot 1 = 1$	$5 = 3 \cdot 1 + 2$	$1$	$2$
$3$	$1 - (- 1) \cdot 1 = 2$	$0 - 1 \cdot 1 = - 1$	$3 = 2 \cdot 1 + 1$	$1$	$1$
$4$	$- 1 - 2 \cdot 1 = - 3$	$1 - (- 1) \cdot 1 = 2$	$2 = 1 \cdot 2 + 0$	$2$	$0$

s = - 3

and

t = 2

. Hence, we set

x = m s b + n t a = 5 \cdot - 3 \cdot 7 + 8 \cdot 2 \cdot 4 = - 105 + 64 = - 41.

We double check that

x

indeed satisfies the given congruences: Note that

x^{'}

such that

x^{'} \equiv x (\mod 40) = - 1, 39, 79, \dots

are also solutions.

One can use induction and the Fundamental Theorem of Arithmetic to prove the following generalisation of Theorem 10.14:

Let $r \in Z_{+}$ with $r \geq 2$ , and let $m_{1}, \dots, m_{r} \in Z_{+}$ be pairwise co-prime, that is $gcd (m_{i}, m_{j}) = 1$ for $i, j \in Z$ with $1 \leq i \leq r$ , $1 \leq j \leq r$ and $i \neq j$ . Then for any $a_{1}, \dots, a_{r} \in Z$ , there exists some $x \in Z$ such that $x \equiv a_{i} (\mod m_{i})$ for all $i \in Z$ with $1 \leq i \leq r$ . Further, this $x$ is unique modulo $m_{1} m_{2} \dots m_{r}$ .

One can also think about combining all these methods together to solve simultaneous congruent equation where the coefficient in front of $x$ is not $1$ .

10.3 Lagrange’s theorem

Let $G = D_{2 n}$ be the dihedral group of order $2 n$ , and consider the orders of elements. Every reflection has order $2$ , which divides $| G |$ . The rotation $a$ has order $n$ , which divides $| G |$ . Every other rotation $a^{i}$ has order $n / gcd (n, i)$ , which divides $| G |$ . In this section we’ll show that this is a general phenomenon for finite groups. In fact, we’ll prove a more general fact about orders of subgroups (of course, this includes the case of the order of an element $x$ , since $ord (x)$ is equal to the order of the cyclic subgroup $⟨ x ⟩$ generated by $x$ ).

History:

The theorem in question is named after Joseph-Louis Lagrange, an Italian mathematician ( 1736 - 1813 ), who proved a special case of the theorem in 1770 (long before abstract group theory existed).

Theorem 10.15: (Lagrange’s Theorem)

Let $G$ be a finite group, and $H \leq G$ a subgroup. Then $| H |$ divides $| G |$ .

The idea of the proof is to partition the set $G$ into subsets, each with the same number of elements as $H$ , so that $| G |$ is just the number of these subsets times $| H |$ .

Definition 10.16:

For (possibly infinite) groups

H \leq G

, and

x \in G

, the left coset

x H

is the subset

x H = {x h \in G : h \in H} .

Remark:

This is a subset of

G

, but not usually a subgroup since the identity element

e

is only in

x H

e = x h

for some

h \in H

, in which case

x = h^{- 1}

and so

x \in H

. So

x H

is only a subgroup if

x \in H

, in which case

x H = H

Remark:

We could also define a right coset

H x

in the obvious way. In general this may be different from

x H

, but it would make little difference to what follows if we used right cosets instead of left cosets.

The set of left cosets partition $G$ .

Lemma 10.17:

For any group $G$ and subgroup $H \leq G$ , ${x H : x \in G}$ is a partition of the set $G$ .

Proof.

Clearly property 1 is satisfied: $G = \cup_{x \in G} x H$ because any $x = x e \in x H$ . So we just need to check property 2, that for $x, y \in G$ we have $x H \cap y H \neq \emptyset$ if and only if $x H = y H$ .

Suppose

x H \cap y H \neq \emptyset

, and choose

g \in x H \cap y H

. Then

g = x a = y b

for some

a, b \in H

, so that

x = y b a^{- 1}

. If

h \in H

then

x h = y (b a^{- 1} h) \in y H

since

b a^{- 1} h \in H

. So every element of

x H

is in

y H

. Similarly every element of

y H

is in

x H

, so that

x H = y H

□

Interest:

Given that we have a partition on the set $G$ , we can construct an equivalence relation on the set $G$ and say $x \sim_{H} y$ if and only if there exists $z$ such that $x, y \in z H$ . With a bit of work, we can show that this is the same as saying $x, y \in G$ are equivalent (with respect to/modulo $H$ ) if and only if $y^{- 1} x \in H$ [you can check this is an equivalence relation].

Compare this with the (infinite) group

G = (Z, +)

and (infinite) subgroup

H = (n Z, +)

. We partitioned

Z

using

n Z

and defined an equivalence relation

a \equiv b (\mod n)

if and only if

a - b \in n Z

Next we verify that each left coset has the same cardinality.

Lemma 10.18:

Let $H \leq G$ and $x \in G$ . Then there is a bijection $α : H \to x H$ , so that $| x H | = | H |$ .

Proof.

Define $α$ by $α (h) = x h$ . Then $α$ is surjective, since by definition every element of $x H$ is of the form $x h = α (h)$ for some $h \in H$ . But also $α$ is injective, since if $h, h^{'} \in H$ then $α (h) = α (h^{'}) \Rightarrow x h = x h^{'} \Rightarrow h = h^{'} .$

□

Everything so far works for possibly infinite $H, G$ , but now for finite $G$ (and hence $H$ ) we can put everything together to prove Lagrange’s Theorem.

Proof (Proof of 10.15).

Suppose that $k$ is the number of distinct left cosets $x H$ . By the two previous lemmas, the cosets partition $G$ and each coset contains $| H |$ elements. So the number of elements of $G$ is $k | H |$ .

□

Example:

Let

G = D_{3}

be the dihedral group of order

6

and let

H = ⟨ a ⟩ = {e, a, a^{2}}

be the cyclic subgroup generated by

a

. If

x \in H

, then

x H = H

. If

x \notin H

, so

x = b a^{i}

for some

i

, then

x H = {b a^{i}, b a^{i + 1}, b a^{i + 2}} = b H

. So there are two left cosets

{e, a, a^{2}}

and

{b, b a, b a^{2}}

. [In this case the right cosets are the same as the left cosets.]

Example:

Let $G = D_{6}$ again, but let $H = ⟨ b ⟩ = {e, b}$ be the cyclic subgroup generated by $b$ . Then

$e H = {e, b} = b H$ ;
$a H = {a, a b} = a b H$ ;
$a^{2} H = {a^{2}, a^{2} b} = a^{2} b H$ ,

so there are three left cosets. [In this case the right cosets are different, since for example

H a = {a, b a} = {a, a^{2} b}

, which is not a left coset.]

Definition 10.19:

Let

H \leq G

. Then the index

| G : H |

is the number (possibly infinite if

G

is infinite) of left cosets

x H

G

Remark:

You may have seen the word index (Latin for “pointer”) in the context of summation (e.g., $\sum_{i = 1}^{3} i^{2}$ , where $i$ is the index). In group theory, there are many time where one might want to do $\sum_{i = 1}^{| G : H |}$ . Here, the index is telling us how many terms are in our sum.

Remark:

So the proof of Lagrange’s Theorem shows that

| G | = | H | | G : H |

G

is finite.

Remark:

Even if

G

is infinite, the index

| G : H |

may be finite. For example, let

G = (Z, +)

and let

H = (n Z, +)

for some

n \in Z_{+}

. Then we have seen that

| Z : n Z | = n

, with the left cosets being the congruence classes

(\mod n)

Corollary 10.20: (Lagrange for orders of elements)

Let $G$ be a finite group with $| G | = n$ . Then for any $x \in G$ , the order of $x$ divides $n$ , and so $x^{n} = e$ .

Proof.

By Lagrange’s Theorem, the order of the cyclic subgroup $⟨ x ⟩$ divides $n$ . But the order of this subgroup is just $ord (x)$ , so $ord (x)$ divides $n$ , and so $x^{n} = e$ .

□

10.4 Some applications of Lagrange’s theorem

Lagrange’s Theorem gives most information about a group when the order of the group has relatively few factors, as then it puts more restrictions on possible orders of subgroups and elements.

Let’s consider the extreme case, when the order of the group is a prime $p$ , and so the only factors are $1$ and $p$ .

Theorem 10.21:

Let $p$ be a prime and $G$ a group with $| G | = p$ . Then

$G$ is cyclic.
Every element of $G$ except the identity has order $p$ and generates $G$ .
The only subgroups of $G$ are the trivial subgroup ${e}$ and $G$ itself.

Proof.

Let $x \in G$ with $x \neq e$ . Then $ord (x)$ divides $| G | = p$ by Corollary 10.20, and $ord (x) \neq 1$ since $x \neq e$ . So $ord (x) = p$ . So the cyclic subgroup $⟨ x ⟩$ generated by $x$ has order $p$ , and so must be the whole of $G$ . This proves a.) and b.).

Let

H \leq G

. Then by Theorem 10.15

| H |

divides

| G | = p

, and so

| H | = 1

, in which case

H

is the trivial subgroup

{e}

, or

| H | = p

, in which case

H = G

□

Remark:

In particular this shows that if

p

is prime then all groups of order

p

are isomorphic.

Corollary 10.22:

If $p$ is prime and $P$ and $Q$ are two subgroups of a group $G$ with $| P | = p = | Q |$ , then either $P = Q$ or $P \cap Q = {e}$ .

Proof.

If $P \cap Q \neq {e}$ then choose $x \in P \cap Q$ with $x \neq e$ . By the previous theorem, $x$ generates both $P$ and $Q$ , so $P = ⟨ x ⟩ = Q$ .

□

Now some other simple general consequences of Lagrange’s Theorem.

Proposition 10.23:

Let $G$ be a group and $H$ , $K$ two finite subgroups of $G$ with $| H | = m$ , $| K | = n$ and $gcd (m, n) = 1$ . Then $H \cap K = {e}$ .

Proof.

Recall that the intersection of two subgroups is itself a subgroup, so that $I = H \cap K$ is a subgroup both of $H$ and of $K$ . Since it’s a subgroup of $H$ , Lagrange’s Theorem implies that $| I |$ divides $m = | H |$ . But similarly $| I |$ divides $n = | K |$ . So since $gcd (m, n) = 1$ , $| I | = 1$ and so $I = {e}$ .

□

Theorem 10.24:

Let $G$ be a group with $| G | = 4$ . Then either $G$ is cyclic or $G$ is isomorphic to the Klein $4$ -group $C_{2} \times C_{2}$ . In particular there are just two non-isomorphic groups of order $4$ , both abelian.

Proof.

By Corollary 10.20 the order of any element of $G$ divides $4$ , and so must be $1$ , $2$ or $4$ .

If $G$ has an element of order $4$ then it is cyclic.

If not, it must have one element (the identity $e$ ) of order $1$ and three elements $a, b, c$ of order $2$ . So $a^{- 1} = a$ , $b^{- 1} = b$ and $c^{- 1} = c$ .

Consider which element is $a b$ . If $a b = e$ then $b = a^{- 1}$ , which is false, since $a^{- 1} = a$ . If $a b = a$ then $b = e$ , which is also false. If $a b = b$ then $a = e$ , which is also false. So $a b = c$ .

Similarly

b a = c

a c = b = c a

and

b c = a = c b

, and

G

is isomorphic to the Klein

4

-group.

□

We’ll finish this section with some other results about groups of small order that we won’t prove. These are easier to prove with a bit more theory, which are all proved in the third year Group Theory unit.

Theorem 10.25:

Let $p$ be an odd prime. Then every group of order $2 p$ is either cyclic or isomorphic to the dihedral group $D_{2 p}$ .

Theorem 10.26:

Let $p$ be a prime. Every group of order $p^{2}$ is either cyclic or isomorphic to $C_{P} \times C_{p}$ (and so in particular is abelian).

However there are non-abelian groups of order $p^{3}$ for every prime $p$ . The dihedral group $D_{8}$ is one example for $p = 2$ .

Theorem 10.27:

There are five groups of order $8$ up to isomorphism. Three, $C_{8}$ , $C_{4} \times C_{2}$ and $C_{2} \times C_{2} \times C_{2}$ , are abelian, and two, the dihedral group $D_{8}$ and another group $Q_{8}$ called the quaternion group are non-abelian.

The first few orders not dealt with by the general theorems above are $12$ , $15$ , $16$ and $18$ . It turns out that are five non-isomorphic groups of order $12$ , every group of order $15$ is cyclic, there are fourteen non-isomorphic groups of order $16$ , and five of order $18$ .

The number of non-isomorphic groups of order $2^{n}$ grows very quickly with $n$ . There are $49, 487, 365, 422$ (nearly fifty billion) non-isomorphic groups of order $1024 = 2^{10}$ .

10.5 Applications to number theory

We now use some of the tools from group theory to look back at questions linked to modular arithmetic.

History:

It is worth noting that much of the following theory predates and motivated the development of abstract group theory, in particular the study of Abelian groups (from 1800). Alongside Symmetry groups and Transformation groups (which we did not explore), Abelian groups were concrete examples of groups that drove this development.

First, recall that $(Z / n Z, \cdot)$ is not a group as some elements (such as $[0]$ ) do not have a multiplicative inverse (recall Proposition 10.13). Let us define the following group.

Definition 10.28:

U_{n}

is the subset

{[a] : a \in Z and gcd (a, n) = 1}

Z / n Z

Remark:

gcd (a, n) = 1

then

gcd (a + n t, n) = 1

for any

t \in Z

and so it makes no difference which element of

[a]

we use to check the condition for

[a] \in U_{n}

. Since every congruence class

[a]

contains an element

a

with

0 \leq a < n

, we’ll usually use these elements.

Theorem 10.29:

$(U_{n}, \cdot)$ is an abelian group.

Proof.

Suppose $[a], [b] \in U_{n}$ , so $gcd (a, n) = 1 = gcd (b, n)$ . Then $gcd (a b, n) = 1$ and so $[a b] \in U_{n}$ and $U_{n}$ is closed under multiplication.

Since $([a] [b]) [c] = [a b c] = [a] ([b] [c])$ multiplication on $U_{n}$ is associative.

$[1]$ is an identity element, since $[1] [a] = [a] = [a] [1]$ for any $a$ .

If $[a] \in U_{n}$ , so $gcd (a, n) = 1$ , then $a s + n t = 1$ for integers $s, t$ , and $gcd (s, n) = 1$ . So $[s] \in U_{n}$ is an inverse of $[a]$ .

U_{n}

is a group under multiplication. It is abelian since

[a] [b] = [a b] = [b] [a]

for all

a, b \in Z

□

Interest:

The notation $U$ stands for “units”. In Algebra 2, you’ll see that a unit is an element which has a multiplicative inverse, so $U_{n}$ is the group of units of $Z / n Z$ .

11 Taming infinity

In Chapter 7 we saw the definition of cardinality, to be precise, for sets $A$ and $B$ , we defined $| A | = | B |, | A | \leq | B |$ and $| A | < | B |$ , with a focus on finite sets. In this chapter, we will see that there are different sizes of infinity.

11.1 Countable

We first notice that there are some infinite sets where we can “count” the elements.

Definition 11.1:

We say a set

X

is countable if there exists a bijection

f : Z_{+} \to X

, or equivalently, if there is a bijection

g : X \to Z_{+}

Remark:

In essence a countable set is an infinite set where we can enumerate the elements, $x_{1}, x_{2}, x_{3}, \dots$ where $x_{i} = f (i)$ for $i \in Z_{+}$ .

Note that some texts in literature say that a set if countable if it is finite or if there exists a bijective function $f : Z_{+} \to X$ , and when there exists a bijective function $f : Z_{+} \to X$ , these texts say $X$ is countably infinite. However, in this course, a countable set must be infinite (it makes some theorems easier to state).

Combining the definition of countability with the definition of cardinality, we have that

X

is a countable set if and only if

| X | = | Z_{+} |

Example:

We show that the set of positive even integers is countable.

Let $2 Z_{+} = {2 x : x \in Z_{+}}$ and define $f : Z_{+} \to 2 Z_{+}$ by $f (x) = 2 x$ , for all $x \in Z_{+}$ . To see that $f$ is injective, suppose that $x, y \in Z_{+}$ such that $f (x) = f (y)$ . Then $2 x = 2 y$ , so $x = y$ , showing that $f$ is injective.

To see that $f$ is surjective, take $a \in Z_{+}$ . Then $a = 2 x$ , for some $x \in Z_{+}$ , and hence $a = 2 x = f (x)$ . Hence, $f$ is surjective. This shows that $f$ is bijective. Therefore, $A$ is countable.

Similarly, the set of odd positive integers,

{2 x - 1 : x \in Z_{+}}

, can be shown to be countable.

Proposition 11.2:

Let $X$ and $Y$ be two sets such that $| X | = | Y |$ . If $X$ is countable, then $Y$ is countable.

Proof.

Let $| X | = | Y |$ and suppose that $X$ is countable. Then $| X | = | Z_{+} |$ so $| Y | = | X | = | Z_{+} |$ . Hence $| Y |$ is countable.

□

Theorem 11.3:

Every infinite set contains a countable subset.

Proof.

Non-examinable. It is beyond the scope of this course.

□

Remark:

While the statement seems intuitive, proving this result is outside the scope of this course. It involves using the Axiom of Choice, which says given a countable number of non-empty sets, we can choose one element from each set.

Interestingly, some research is done into how many theorems are true independent of the Axiom of Choice (and therefore would be true if we did not assume the Axiom of Choice to be true.)

Proposition 11.4:

Let $X$ be a subset of $Z_{+}$ . Then $X$ is either finite or countable.

Proof.

If $X$ is finite then we are done. So suppose $X$ is infinite. We have an injective map $g : X \to Z_{+}$ given by $g (x) = x$ , so $| X | \leq | Z_{+} |$ . On the other hand, we know that $X$ contains a countable subset $A$ . Hence, there exists a bijective map $h : Z_{+} \to A$ . Now, define $f : Z_{+} \to X$ by $f (n) = h (n)$ , for all $n \in Z_{+}$ . Then $f$ is an injective map from $Z_{+}$ into $X$ . Hence, $| Z_{+} | \leq | X |$ . So by Theorem 7.8, we have that $| X | = | Z_{+} |$ . It follows that $X$ is countable.

□

Example:

Since there exist infinitely many primes, the set of primes is countable.

The next result is very useful when proving that a set is countable.

Corollary 11.5:

Suppose $X$ is an infinite set. Then $X$ is countable if and only if there exists an injective $f : X \to Z_{+}$ .

Proof.

We prove both directions separately.

( $\Rightarrow$ ) First, suppose that $X$ is countable. Then there exists a bijection $f : X \to Z_{+}$ . Since $f$ is bijective, it is injective.

( $\Leftarrow$ ) Second, suppose there exists such an injective $f : X \to Z_{+}$ . Then $| X | = | f [X] |$ , and $f [X] \subseteq Z_{+}$ . Since $X$ is not finite and $| X | = | f [X] |$ , $f [X]$ is not finite. Hence, $f [X]$ is countable. Therefore, we must have that $X$ is countable.

Summarising,

X

is countable if and only if there exists an injective

f : X \to Z_{+}

□

Theorem 11.6:

Let $X$ be a set and let $A, B \subseteq X$ . Suppose that $A$ is a countable set and that $B$ is a nonempty finite set disjoint from $A$ ( $A \cap B = \emptyset$ ). Then $A \cup B$ is countable.

Proof.

Since $A$ is countable, there exists an injective map $f : A \to Z_{+}$ . Since $B$ is finite we have that $B = {b_{1}, \dots, b_{m}}$ , for some $m \in Z_{+}$ , where $| B | = m$ . Now, define $g : A \cup B \to Z_{+}$ by $g (x) = {\begin{cases} i, & if x = b_{i}, \\ f (x) + m, & if x \in A . \end{cases}$ We claim that $g$ is injective. To see this, take $x, y \in A \cup B$ such that $x \neq y$ . If $x, y \in B$ , then $x = b_{i}$ and $y = b_{j}$ for some $i, j \in Z_{+}$ such that $i \leq m$ , $j \leq m$ with $i \neq j$ . Hence, we have that $g (x) = i \neq j = g (y)$ . If $x \in B$ and $y \in A$ , then $x = b_{i}$ for some $i \in Z_{+}$ with $i \leq m$ . Hence, we have that $g (x) = i < m + 1 \leq g (y)$ . If $x, y \in A$ , then $f (x) \neq f (y)$ since $x \neq y$ and $f$ is injective. So $g (x) = f (x) + m + 1 \neq f (y) + m + 1 = g (y)$ . It follows that $g$ is injective.

Since

A \subseteq A \cup B

and

A

is infinite,

A \cup B

is infinite. Since

g : A \cup B \to Z_{+}

is injective and

A \cup B

is infinite,

A \cup B

is countable.

□

History:

The above proof was popularised by David Hilbert (Prussian mathematician, 1862-1943) with what is now called the “Hilbert hotel”, where there is always room for another guest:
The Hilbert hotel has countably many rooms, labelled $1, 2, 3, \dots$ (so for each number in $Z_{+}$ , there is a room with that number). One night, the hotel is full, and another potential guest arrives at the hotel looking for a room. The manager says, no problem! Then the manager announces to the guests that every guest is to move to the next room (so the guests in room $n$ move to room $n + 1$ ). Thus all the guests still have rooms, and room 1 has been made available to the new arrival.

Theorem 11.7:

We have that $Z_{+} \times Z_{+}$ is countable.

Proof.

First we note that $Z_{+} \times Z_{+}$ is infinite as $Z_{+} \times {1} = {(n, 1) : n \in Z_{+}} \subseteq Z_{+} \times Z_{+}$ , and $Z_{+} \times {1}$ is countable (using the bijection $f : Z_{+} \to Z_{+} \times {1}$ defined by $f (n) = (n, 1)$ ).

We define

f : Z_{+} \times Z_{+} \to Z_{+}

f ((a, b)) = 2^{a} 3^{b} \in Z_{+}

. Suppose

f ((a_{1}, b_{1})) = f ((a_{2}, b_{2}))

, then

2^{a_{1}} 3^{b_{1}} = 2^{a_{2}} 3^{b_{2}} = n

. Note that

n \geq 2

(since

a_{1} \geq 1

), so by the Fundamental Theorem of Arithmetic,

n

is expressed uniquely as a product of primes. That is

a_{1} = a_{2}

and

b_{1} = b_{2}

, so

(a_{1}, b_{1}) = (a_{2}, b_{2})

. Hence

f

is injective and

Z_{+} \times Z_{+}

is infinite, so

Z_{+} \times Z_{+}

is countable.

□

There are many proofs of this theorem. Above we presented one that uses the Fundamental Theorem of Arithmetic. Below we outline the idea of another proof that doesn’t use the Fundamental Theorem of Arithmetic. We leave the rigorous details as an exercise.

We arrange the elements of $Z_{+} \times Z_{+}$ in a grid: $\begin{array}{ccccc} (1, 1) & (1, 2) & (1, 3) & (1, 4) & \dots \\ (2, 1) & (2, 2) & (2, 3) & (2, 4) & \dots \\ (3, 1) & (3, 2) & (3, 3) & (3, 4) & \dots \\ (4, 1) & (4, 2) & (4, 3) & (4, 4) & \dots \\ ⋮ & ⋮ & ⋮ & ⋮ \end{array}$ We order the elements of this grid along the cross-diagonals: $(1, 1); (1, 2), (2, 1); (1, 3), (2, 2), (3, 1); \dots$ So we want to construct a function $f : Z_{+} \times Z_{+} \to Z_{+}$ such that $f ((1, 1)) = 1, f ((1, 2)) = 2, f ((2, 1)) = 3, \dots$ . We leave it an exercise to find a general formula for $f ((n, m))$ and show that $f$ is injective.

The following lemma lists the basic properties of countable sets.

Lemma 11.8:

Suppose that $X$ is a countable set.

if $A \subseteq X$ , then $A$ is either finite or countable.
if $A \subseteq X$ and $A$ is finite, then $X ∖ A$ is countable.
there exists $B \subseteq X$ such that $B$ and $X ∖ B$ are countable.
if $f : C \to X$ is injective, then $C$ is either a finite set or a countable set.

Proof.

Exercise.

□

Using the above lemma, and the fact that $Z_{+} \times Z_{+}$ is countable, we can deduce:

Corollary 11.9:

We have that $Q_{+}$ , $Q$ and $Z$ are all countable. In particular $| Z_{+} | = | Z | = | Q |$ .

Corollary 11.10:

Let $X$ and $Y$ be countable sets. Then

$X \times Y$ is countable.
if $X \cap Y = \emptyset$ , then $X \cup Y$ is countable.

Proof.

Exercise.

□

Note that by repeated application of the above lemma, we can see that for countable $X$ and for any $n \in Z_{+}$ , we have $X^{n} = X \times X \times X \times \dots \times X$ (the Cartesian product of $n$ copies of $X$ ) is countable.

Corollary 11.11:

Let ${A_{n} : n \in Z_{+}}$ be a (countable) collection of countable sets that are pairwise disjoint (i.e. $A_{i} \cap A_{j} = \emptyset$ for all $i \neq j$ ). Then $⋃_{n \in Z_{+}} A_{n}$ is countable.

Proof.

First, note that since $A_{1}$ is countable, we have that it is infinite. Since $A_{1} \subseteq ⋃_{n \in Z_{+}} A_{n}$ , we know that $⋃_{n \in Z_{+}} A_{n}$ is also infinite. We now construct an injective $g : ⋃_{n \in Z_{+}} A_{n} \to Z_{+} \times Z_{+}$ .

For each

n \in Z_{+}

, enumerate the elements of

A_{n}

a_{n, 1}, a_{n, 2}, a_{n, 3}, \dots

. Now, define

g : ⋃_{n = 1}^{\infty} A_{n} \to Z_{+} \times Z_{+}

g (a_{m, k}) = (m, k),

for all

a_{m, k} \in ⋃_{n = 1}^{\infty} A_{n}

. To see that

g

is injective, suppose that

g (a_{m, k}) = g (a_{s, t})

for some

m, k, s, t \in Z_{+}

. Then

(m, k) = (s, t)

, so

m = s

k = t

, and hence

a_{m, k} = a_{s, t}

. It follows that

g

is injective. Hence, we have an injective function from

⋃_{n = 1}^{\infty} A_{n}

into a countable set. Since

⋃_{n = 1}^{\infty} A_{n}

is infinite, it follows that

⋃_{n = 1}^{\infty} A_{n}

is countable.

□

Remark:

The result of Corollary 11.11 still holds in the case when

{A_{n} : n \in Z_{+}}

is a (countable) collection of countable sets which are not pairwise disjoint. In this case, one shows that there exists a (countable) collection

{B_{n} : n \in Z_{+}}

of countable sets which are pairwise disjoint such that

⋃_{n \in Z_{+}} B_{n} = ⋃_{n \in Z_{+}} A_{n}

11.2 Uncountable

Having explored the countable sets, we look at whether there exists sets that are not finite and not countable.

Definition 11.12:

A set

X

is uncountable if it is infinite but not countable.

Remark:

An uncountable set is one where we can not enumerate its elements.

We note that the cardinality of $Z_{+}$ separates finite, countable and uncountable sets. Indeed we leave it as an exercise to show that:

$X$ is finite if and only if $| X | < | Z_{+} |$ ;
$X$ is countable if and only if $| X | = | Z_{+} |$ ;
$X$ is uncountable if and only if $| X | > | Z_{+} |$ .

We will prove that the set $(0, 1) = {x \in R : 0 < x < 1}$ is uncountable. First we need to set up some notation. Let us assume that every real numbers between $0$ and $1$ has a decimal expansion of the form $0. a_{1} a_{2} a_{3} \dots = \sum_{k \in Z_{+}} a_{k} 10^{- k}$ with $0 \leq a_{k} \leq 9$ for each $k \in Z_{+}$ .

Note however that this representation is not unique. Indeed, we have $0.99999 \dots = 1$ , and in general let $0 < α < 1$ have decimal expansion $0. a_{1} a_{2} a_{3} \dots a_{N} 99999 \dots$ . That is, there exists $Z_{+}$ such that $a_{N} \neq 9$ and $a_{n} = 9$ for all $n > N$ . Then, using the results $\sum_{k \in Z_{+}} 10^{- k} = \frac{10^{- 1}}{1 - 10^{- 1}} = \frac{1}{9} .$ We have: $\begin{aligned} 0. a_{1} a_{2} a_{3} \dots a_{N} 999 \dots & = \sum_{k \in Z_{+}} a_{k} \cdot 10^{- k} \\ = \sum_{k = 1}^{N} a_{k} \cdot 10^{- k} + \sum_{k \in Z_{+}, k > N} 9 \cdot 10^{- k} \\ = \sum_{k = 1}^{N} a_{k} \cdot 10^{- k} + 10^{- N} \cdot 9 \cdot \sum_{k \in Z_{k}} 10^{- k} \\ = \sum_{k = 1}^{N} a_{k} \cdot 10^{- k} + 10^{- N} \\ = 0. a_{1} a_{2} a_{3} \dots a_{N - 1} b_{N}, \end{aligned}$ where $b_{N} = a_{N} + 1$ .

Therefore, we will take for granted that every $α \in R$ such that $0 < α < 1$ can be uniquely expressed as $0. a_{1} a_{2} a_{3} . . . .$ with $0 \leq a_{k} \leq 9$ and $\neg (\exists N \in Z_{+} so that \forall n \in Z_{+}, n > N ⟹ a_{n} = 9) .$ (i.e., $\forall N \in Z_{+}, \exists n \in Z_{+}, n > N so that a_{n} \neq 9$ ).

Theorem 11.13:

The interval $(0, 1) = {x \in R : 0 < x < 1}$ is uncountable.

Proof.

We know the interval $(0, 1)$ is infinite, since $f : Z \to (0, 1)$ defined by $f (k) = 10^{- k}$ is easily shown to be injective.

For the sake of contradiction, suppose

(0, 1)

is countable. Thus we can enumerate the elements of

(0, 1)

α_{1}, α_{2}, α_{3}, \dots

. Write each

α_{k}

as a decimal expansion as described above:

α_{k} = 0. a_{k 1} a_{k 2} a_{k 3} \dots

where

0 \leq a_{k i} \leq 9

and

\neg (\exists N \in Z_{+} so that \forall n \in Z_{+}, n > N ⟹ a_{n} = 9) .

For each

k \in Z_{+}

, set

b_{k} = {\begin{cases} 1 & if a_{k k} \neq 1, \\ 2 & if a_{k k} = 1 . \end{cases}

Set

β = 0. b_{1} b_{2} b_{3} \dots

. Thus

β \in R

with

0 < β < 1

and

\neg (\exists N \in Z_{+} so that \forall n \in Z_{+}, n > N ⟹ a_{n} = 9) .

Hence by assumption,

β = α_{m}

for some

m \in Z_{+}

. But

b_{m} \neq a_{m m}

, contradicting the uniqueness of the representation of

β

as a decimal expansion not ending in an infinite sequence of 9s. Thus the assumption that the interval

(0, 1)

is countable leads to a contradiction, so

(0, 1)

must be uncountable.

□

History:

The above theorem is sometimes referred to as Cantor’s diagonalisation argument. While it was not the first proof that Cantor published to show $(0, 1)$ is uncountable, it showcases a technique that was used subsequently in many other proofs by other mathematicians.

Corollary 11.14:

We have that $R$ is uncountable.

Proof.

Since $(0, 1) \subseteq R$ , we have $| (0, 1) | \leq | R |$ . Since $(0, 1)$ is uncountable, we have $| Z_{+} | < | (0, 1) | \leq | R |$ . Hence $R$ is uncountable.

□

While we have shown $| (0, 1) | \leq | R |$ , the following theorem shows that in fact $| (0, 1) | = | R |$ .

Theorem 11.15:

There is a bijection between the interval $(0, 1)$ and $R$ .

Proof.

Exercise.

□

In a way, this highlight another difference between $R$ , $Q$ and $Z$ . In $R$ we can have a bounded subset which is uncountable. However $Q$ a bounded subset is either finite or countable, while in $Z$ a bounded subset must be finite.

Recall that we have shown that given any $a, b, c, d \in R$ such that $a < b$ and $c < d$ there exists a bijection $f : (a, b) \to (c, d)$ . In particular if we take $c = 0, d = 1$ we have that $| (0, 1) | = | (a, b) |$ for all $a, b \in R$ such that $a < b$ . No matter how “small” (length wise) we take $(a, b)$ to be, as a set it is extremely large, i.e. uncountable. If we combine this with the fact that if $X$ is countable then $X^{n}$ is countable for all $n \in Z_{+}$ , we have strange results like “for any $n \in Z_{+}$ , $| Q^{n} | < | (0, \frac{1}{n}) |$ ”.

Interest:

Since we have shown $| Z | < | R |$ , a natural question is “does there exists a set $A$ such that $| Z | < A < | R |$ ”. This was one of the 23 problems set by Hilbert at the turn of the 20th Century, and is known as the continuum hypothesis. It turns out that this is a subtle question without a yes/no answer: G"odel and Cohen showed that the two statements “the continuum hypothesis is true” and “the continuum hypothesis is false” are both consistent with the standard (ZFC) axioms of mathematics.

11.3 Power sets

We finish this section by showing that there are infinitely many different types of infinities.

Definition 11.16:

Let

A

be a set. We define the power set of

A

, denoted by

P (A)

, to be the set

P (A) = {C : C \subseteq A} .

Example:

We have the following examples:

$P (\emptyset) = {\emptyset}$ , so $| P (\emptyset) | = 1$ .
$P ({1, 2}) = {\emptyset, {1}, {2}, {1, 2}},$ so $| P ({1, 2}) | = 4 = 2^{2}$ .
$P ({1, 2, 3}) = {\emptyset, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}} .$ So $| P ({1, 2, 3}) | = 8 = 2^{3}$ .

Note that, for any nonempty set $X$ , we know that $\emptyset, X$ are distinct subsets of $X$ . Hence, we have that $| P (X) | \geq 2$ . We have the following two results, whose proofs are left as an exercise.

Lemma 11.17:

Suppose that $A$ is a finite set with $| A | = n$ for some $n \in Z_{\geq 0}$ . Then $| P (A) | = 2^{n}$ .

Proof.

Exercise.

□

Proposition 11.18:

Let $A, B$ be sets. Then

$A \subseteq B$ if and only if $P (A) \subseteq P (B) .$
$P (A) \cup P (B) \subseteq P (A \cup B)$ .
$P (A) \cap P (B) = P (A \cap B) .$

Proof.

Exercise.

□

Theorem 11.19:

Let $X$ be a set. Then $| X | < | P (X) | .$

Proof.

Suppose $X = \emptyset$ . Then $| X | = 0 < 1 = | P (X) | .$

So suppose $X$ is nonempty, and define $f : X \to P (X)$ by $f (x) = {x}$ , for all $x \in X$ . We show that $f$ is injective. Let $x_{1}, x_{2} \in X$ be such that $f (x_{1}) = f (x_{2})$ . Then ${x_{1}} = {x_{2}}$ . Hence, we have that $x_{1} = x_{2}$ . Therefore, $f$ is injective, so $| X | \leq | P (X) | .$

Next, we use contradiction to show that there exists no bijection between $X$ and $P (X)$ . Suppose there exists such a bijection $g : X \to P (X) .$ Note that for every $x \in X$ , $g (x) \in P (X)$ means $g (x) \subseteq X$ . Define $A = {x \in X : x \notin g (x)} .$ Then $A$ is a subset of $X$ , so $A \in P (X) .$ Furthermore, since $g$ is bijective, there exists some $z \in X$ such that $g (z) = A$ . By the definition of $A$ , we have $z \in A$ if and only if $z \notin g (z) = A$ , which is a contradiction (namely $z \in A ⟺ z \notin A$ ). Therefore, our assumption that there exists a bijection $g : X \to P (X)$ is false. So $| X | \neq | P (X) | .$

It follows that

| X | < | P (X) | .

□

History:

The above theorem is sometimes known as Cantor’s theorem. For a finite set, the result is clear, and Cantor re-used the diagonalisation method to prove the above for infinite set.

Corollary 11.20:

We have:

$P (Z_{+})$ is uncountable.
$| P (R) | > | R |$ , i.e. there are different type of uncountability.

We can extend on our last bullet point to see that $| R | < | P (R) | < | P (P (R)) | < | P (P (P (R))) | < . . . .$ , i.e. there are infinitely many different type of infinities.

$P$	$Q$	$R$	$Q \lor R$	$P \land (Q \lor R)$	$P \land Q$	$P \land R$	$(P \land Q) \lor (P \land R)$
T	T	T	T	T	T	T	T
T	T	F	T	T	T	F	T
T	F	T	T	T	F	T	T
T	F	F	F	F	F	F	F
F	T	T	T	F	T	F	F
F	T	F	T	F	F	F	F
F	F	T	T	F	F	F	F
F	F	F	F	F	F	F	F

$P$	$Q$	$R$	$Q \lor R$	$P \land (Q \lor R)$	$P \land Q$	$P \land R$	$(P \land Q) \lor (P \land R)$
T	T	T	T	T	T	T	T
T	T	F	T	T	T	F	T
T	F	T	T	T	F	T	T
T	F	F	F	F	F	F	F
F	T	T	T	F	T	F	F
F	T	F	T	F	F	F	F
F	F	T	T	F	F	F	F
F	F	F	F	F	F	F	F

Introduction to Pure Mathematics

Florian Bouyer and John Mackay

16 September 2024

1 Introduction

1.1 How to use these notes

2 The building blocks of pure mathematics - sets and logic

2.1 Sets

2.2 Truth table

2.3 Logical Equivalence

2.4 Negations

2.5 Contradiction and the contrapositive

2.6 Set complement

3 The rationals are not enough

3.1 The absolute value

3.2 Bounds for sets

3.3 The irrationals and the reals

3.4 The supremum and infimum of a set.

4 Proof by induction

5 Studying the integers

5.1 Greatest common divisor

5.2 Primes and the Fundamental Theory of Arithmetic

6 Moving from one set to another - Functions

6.1 Definitions

6.2 Injective, surjective and bijective

6.3 Pre-images

6.4 Composition and inverses of functions

7 Cardinality

8 Sets with structure - Groups

8.1 Motivational examples - Symmetries

8.1.1 Permutations of a set

8.1.2 Symmetries of polygons

8.1.3 Symmetries of a circle

8.1.4 Symmetries of a cube

8.1.5 Rubik’s Cube

8.2 Formal definition

8.3 Elementary consequences of the definition

8.4 Dihedral Groups

8.5 Symmetric Groups and Cycles

8.6 Order of a group and of elements

9 Linking groups together

9.1 Subgroups

9.2 Cyclic groups and cyclic subgroups

9.3 Group homomorphism and Isomorphism

9.4 Direct Product

10 Modular arithmetic and Lagrange

10.1 Equivalence relations and partition of sets

10.2 Congruences and modular arithmetic

10.3 Lagrange’s theorem

10.4 Some applications of Lagrange’s theorem

10.5 Applications to number theory

11 Taming infinity

11.1 Countable

11.2 Uncountable

11.3 Power sets

$P$	$Q$	$R$	$Q \lor R$	$P \land (Q \lor R)$	$P \land Q$	$P \land R$	$(P \land Q) \lor (P \land R)$
T	T	T	T	T	T	T	T
T	T	F	T	T	T	F	T
T	F	T	T	T	F	T	T
T	F	F	F	F	F	F	F
F	T	T	T	F	T	F	F
F	T	F	T	F	F	F	F
F	F	T	T	F	F	F	F
F	F	F	F	F	F	F	F