1 Introduction

A mathematical theory is not to be considered complete until you have made it so clear that you can explain it to the first man whom you meet on the street. – David Hilbert –

Proofs are at the heart of mathematics, distinguishing what we know is mathematically true (or untrue) from what we still don’t know. The first aim of this course is to see different types of proofs and mathematical reasoning, and learning how to make your own mathematical arguments. Reading proofs will develop our critical thinking, e.g., how does this sentence follow from the previous one; how have we used all our assumptions; what happen if we tweak our assumptions; etc. Writing proofs will develop our creative and communication skills, e.g., how do we put together various ideas to come up with a new one; how do we explain our argument to someone else; etc.

As we can not see mathematical arguments without mathematical content, the second aim of this course is to give you a strong foundation to pure mathematics - a very broad branch of mathematics.

This notes are based on the notes from a previous Introduction to Proofs and Group Theory course (Steffi Zegowitz; Lynne Walling; Jos Gunns; Jeremy Rickard; John Mackay), and the early chapters from a previous Analysis course (Thomas Jordan; Ivor McGillivray; Oleksiy Klurman); .

1.1 How to use these notes

These notes are colour coded to help you identify the important bits.

Definition 1.1:
Definitions and Notations will be with this background colour. It is important to understand and know them to understand the course.
Theorem 1.2:

Results from the course, which will either be theorems, lemmas, propositions or corollaries, will have this background colour. It is important to know them so to be able to apply them to different situations.

Proof.

After most results, there will be a proof with this background colour. It is important to understand each proof as similar techniques can be applied in different mathematical situations. Note that many proofs are left as exercises so that one can practice coming up and writing proofs.

Remark:
Remarks and proof techniques will have this background colour. They are statements that are note-worthy.
Example:
Examples will not have a background colour, however there is a line so that one can tell when an example starts and when it finishes. When possible, lectures and problem classes will have different examples from the notes. This is so that students can be showcased different examples.
Interest:

Interest, History and Etymological (origin of words) notes will not have a background colour, however there is a line to show the start and the end of the note. These notes are there for general interest and is not examinable content.

Most of the Etymology notes comes from the book “The words of Mathematics” by Steven Schwartzman (The Mathematical Association of America, 1994)

Most of the history of the development of Group Theory comes from the article “The Evolution of Group Theory: A Brief Survey” by Israel Kleiner (Mathematics Magazine, Vol 59, No 4, 192-215, 1986).

2 The building blocks of pure mathematics - sets and logic

To be able to prove mathematical results, we need two ingredients - the setting in which we are doing maths; and the logical reasoning we use to do maths.

2.1 Sets

Before we can start to do maths, we need to know the setting in which we are doing maths. For this reason, sets can be seen as the building block of all maths.

Definition 2.1:
A set is a collection of objects, where we ignore repeated elements and the order they appear in.
Notation:

We use curly brackets { and } to denotes sets.

We use the symbol to say an element is in a set. We drawn a line through it to negate it, i.e. we use to say an element is not in a set.

We use the symbol < to mean “(strictly) less than” and to mean “less than or equal to”. Similarly we use > to mean “greater than” and to mean “greater than or equal to”

Example:

The set of trigonometric functions: {sin(x),cos(x),tan(x)}={cos(x),tan(x),cos(x),sin(x)}.

The set of integers between 2 and 6: {2,3,4,5,6}={6,4,2,2,3,4,6,5}.

Since 26 and 66, we have 6{2,3,4,5,6}, however 15{2,3,4,5,6} as 15>6.
In mathematics, there are certain sets which are used so often that we have abbreviated notations for them. We start with two of them and we’ll build up to see more.
Notation:

  • If a set does not have any elements, we call it the empty set and use (or { }).

  • Z is the set of integers, that is Z={,3,2,1,0,1,2,3,}.

Etymology:

The symbol Z comes from the German “Zahlen” - which means numbers. It was first used by David Hilbert (German mathematician, 1862 - 1943), and popularised in Europe by Nicolas Bourbaki (a collective of mainly French mathematicians, 1935 - ) in their 1947 book “Algèbre”. Integers come from the Latin in, which means “not”, and tag which means “to touch”. An integer is a number that has been “untouched”, i.e. “intact” or “whole”.

The set of integers is equipped with the operations of additions + and multiplication that satisfy the following 10 arithmetic properties (5 relating to addition; a Distributive Law; and 4 relating to multiplication):

(A1) - Closure under addition For all x,yZ we have x+yZ.

(A2) - Associativity under addition For all x,y,zZ we have x+(y+z)=(x+y)+z.

(A3) - Commutativity of addition For all x,yZ we have x+y=y+x.

(A4) - Additive identity For all xZ we have x+0=x.

(A5) - Additive inverse For all xZ we have xZ and x+(x)=0.

(A6) - Distributive Law For all x,y,zZ we have x(y+z)=xy+xz.

(A7) - Closure under multiplication For all x,yZ we have xyZ.

(A8) - Associativity under multiplication For all x,y,zZ we have x(yz)=(xy)z.

(A9) - Commutativity of multiplication For all x,yZ we have xy=yx.

(A10) - Multiplicative identity For all xZ we have 1x=x.

Formally speaking, this is saying that Z with + and is a ring. The notion of a ring is explored in more details in later units such as the second year unit Algebra 2. Properties (A1) to (A5) tells us that Z with + is an abelian group. We will explore the notion of groups and abelian groups later in this unit.

On top of these 10 arithmetic properties, Z is well ordered, i.e., it comes with 4 order properties:

(O1) - trichotomy For all x,yZ either x<y, x=y or x>y.

(O2) - transitivity For all x,y,zZ, if x<y and y<z then x<z.

(O3) - compatibility with addition For all x,y,zinZ, if x<y then x+z<y+z.

(O4) - compatibility with multiplication For all x,y,zZ, if x<y and z>0 then zx<zy.

Etymology:

There are many words above that seems to have come from nowhere, but can be related back to words used in everyday English.

Associative comes from the Latin ad meaning “to” and socius meaning “partner, companion”. An associate is someone who is a companion to you. The property x+(y+z)=(x+y)+z shows that it doesn’t matter who y “keeps company with”, the result is still the same.

Commutative comes from the Latin co meaning “with” and mutare meaning “to move”. To commute is “to change, to exchange, to move”. In everyday situation, a commute is the journey (i.e., moving) from home to work. The property x+y=y+x shows that we can move/exchange x and y and the result is the same.

Identity comes from the Latin idem meaning “same”. The additive identity is the element which keeps other elements the same when added to it. The multiplicative identity is the element which keeps other elements the same when multiplied by it.

Inverse comes from in and vertere which is the verb “to turn”. The additive inverse of x is the quantity that turns back “adding x”.

Trichotomy comes from the Greek trikha meaning “in three parts” and temnein meaning “to cut”. If you pick xZ, then you can cut Z into three parts, the integers less than x, the integers equal to x and the integers greater than x.

Transitive comes from the Latin trans meaning “across, beyond” and the verb itus/ire meaning “to go”. Knowing x<y and y<z allows us to go across/beyond y to conclude x<z.

As seen from the properties above, we often need to quantify objects in mathematics, that is we need to distinguish between a criteria always being met (“for all”), or the existence of a case where a criteria is met (“there exists”). Sometimes we also need to distinguish whether there is a unique case where a criteria is met.

Notation:

We have the following symbolic notation:

  • The symbol denotes for all, or equivalently, for every.

  • The symbol denotes there exists.

  • We use ! to denote there exists a unique, or equivalently there exists one and only one.

A note on the usage of these symbols. Often we use these symbols as a shortcut when discussing maths verbally and writing down their ideas (for example: in lectures; discussing mathematics with colleagues). However, in formal text (for example: lecture notes; articles submitted to journals), often we avoid these symbols and use words. In formal text, these symbols tend to be reserved for when doing formal logic (which we will see later) or within a set setting (which we will see below).

Proof techniques:
To show that something is unique, we first show that one such case exists, and then proceed to show that if another case exists, it is equal to the first case.

We can use Z as a starting point to construct different sets.

Notation:

Within the curly brackets of a set, we use colon, :, to mean “such that”.

Example:

Returning to the previous example, the set of integers between 2 and 6 can be written as {xZ:2x6};

Notation:

  • We use + to denote the positive numbers in a set. I.e., Z+={xZ:x>0}={1,2,,} denotes the set of positive integers.

  • Similarly, Z={xZ:x<0}={1,2,3,} denotes the set of negative integers.

  • We denote the set of non-negative integers by Z0={xZ:x0}={0,1,2,,}.

  • If nZ, we write nZ={nx:xZ}={yZ:xZ with y=xn}={,3n,2n,n,0,n,2n,3n,}.

You may have also heard of the natural numbers denoted N. However, some sources consider 0 as a natural number (so N=Z0) and other consider 0 not to be a natural number (so N=Z+). To avoid any confusion (and because we will need Z0 sometimes and Z+ at other times), we will not be using N in this course.

Notation:

Q is the set of rational numbers, that is Q={ab: aZ, bZ+}.

We will later see how Q can be constructed from Z and how this lead to a “natural” way to write each rational numbers

Etymology:

The symbol Q stands for the word “quotient”, which is Latin for “how often/how many”, i.e., the quotient ab is “how many times does b fit in a”. Surprisingly, the word “rational” to describe some numbers came after the use of “irrational” number. Ratio is Latin for “thinking/reasoning”. When the Phythagorean school in Ancient Greece realised some numbers could not be expressed as the quotient of two whole numbers (such as 2, which we will prove later), they called those “irrational”, i.e. numbers that should not be thought about. “Rational” numbers were numbers that were not “irrational”, i.e., one could think about them.

We extend the operations of addition and multiplication as well as the order relation for the integers to the rational numbers. Let ab,cdQ, then:

ab+cd=ad+bcbd and abcd=acbd. Similarly to Z, we have that Q with + and satisfies the properties (A1) to (A10) as well as (O1) to (O4). It also satisfy the extra arithmetic property:

(A11) - multiplicative inverse For all xQ with x0, we have x1=1xQ and x1x=1.

Notice that (A11) is similar to (A5) but for multiplication. As we will see later in the course, another way of saying (A7) to (A11) is that Q without 0 under is an abelian group. As you will see in Linear Algebra, the arithmetic properties of Q ((A1) to (A11)) comes from the fact that Q is a field.

Similarly with Z, using Q we can construct the sets Q+, Q, Q0 etc.

2.2 Truth table

Now that we have some objects to work with, we want to know what we can do with them. In a mathematical system, statements are either true or false, but not both at once. We sometimes say a statement P holds to mean it is true. The label ‘true’ or ‘false’ associated to a given statement is its truth value.

Example:
The statement “4Z” holds and the statement “12Z” is false. However, the statement “xZ” could be true or false depending on the value of x, but it can not be both true and false at the same time.

Definitions (i.e., Z) and axioms (i.e., (A1)-(A11) ) are statements we take to be true, while propositions, theorems and lemmas are statements that we want to prove are true and often consist of smaller statements linked together. While we often don’t write statements symbolically, looking at truth table and statements help us understand the fundamentals of how a proof works. We first introduce the four building blocks of statements.

Definition 2.2:

We use the symbol ¬ to mean not. The truth table below shows the value ¬P takes depending on the truth value of P.

P ¬P
T F
F T

We will see concrete examples of how to negate statements later in this chapter.

Definition 2.3:

We use the symbol to mean and. Let P and Q be two statements, we have that PQ is true exactly when P and Q are true. The corresponding truth table is as follows:

P Q PQ
T T T
T F F
F T F
F F F
Example:
Let xZ, P be the statement x5 and Q be the statement x10. Then PQ is the statement x5 and x10, i.e., 5x10.
Definition 2.4:

We use the symbol to mean or. Let P and Q be two statements, we have that PQ is true exactly when at least one of P or Q is true. The corresponding truth table is as follows.

P Q PQ
T T T
T F T
F T T
F F F
Example:
Let xZ, P be the statement x5 and Q be the statement x10. Then PQ is the statement x5 or x10. (Do not write 5x10 as this makes no sense since 510).
Definition 2.5:

We use the symbol to mean implies. Let P and Q be two statements “PQ” is the same as “If P, then Q”, or “for P we have Q”. The corresponding truth table is as follows.

P Q PQ
T T T
T F F
F T T
F F T

The above truth table can seem confusing, but consider the following example.

Example:

Recall (A1) - “For all x,yZ we have x+yZ”. Let P be the statement x,yZ and Q be the statement x+yZ, then (A1) can be written symbolically as x,y,PQ. This statement is true, regardless of what the value of x and y are. But let us look at the truth value of P and Q with different x and y

x y P Q
0 1 T T
0 32 F F
12 1 F F
12 32 F T
But no matter what x and y we pick, we will not get that P is true while Q is false.
Proof techniques:

Many theorems are of the type “If P then Q”. A common method to prove such statement is to start with the assumption P (i.e., assume P is true) and use logical steps to arrive at Q.These are often referred to as “direct proofs”.

The above example also shows that you can not start a proof with what you want to prove, as you could start with something false and end up with something true.

Turning back to sets, we look at examples of direct proofs.

Definition 2.6:

A set A is a subset of a set B, denoted by AB, if every element of A is also an element of B (Symbolically: xA, we have xB).

We write AB when A is not a subset of B, so there is at least one element of A which is not an element of B (Symbolically: xA such that xB).

A set A is a proper subset of a set B, denoted by AB, if A is a subset of B but AB.
Remark:
We have that is a subset of every sets. We also have that is the only subset of .
Example:

Let A=4Z={4n:nZ} and B=2Z={2n:nZ}. We will prove AB using a direct proof. [If we let P be the statement xA and Q be the statement xB, note that xA we have xB translate symbolically to x,PQ.]

Let xA [i.e., suppose P is true]. Then there exists nZ such that x=4n. Hence, x=4n=2(2n)=2m, for some mZ, i.e. there exists mZ such that x=2m. Hence, xB [i.e., Q is true]. Since this argument is true for any xA, we have that for all xA,xB, hence AB.

We will now prove that BA by showing there is an element of B which is not an element of A. Take x=10. Then xB [as x=52] but xA [as x=524 but 4Z].

Combining the two statements above, it follows that AB.
Proof techniques:

To prove something is true for all xX, we “let xX” or “suppose xX” with no further conditions. Whatever we conclude about x is true for all xX. This is the technique we used in Part 1. above.

Suppose A and B are sets. Showing that A=B is the same as showing that AB and BA.

Example:

We show that 2Q=Q, by showing 2QQ and Q2Q.

Let x2Q. Then there exists yQ such that x=2y. By (A6), since 2,yQ, we have x=2yQ. As this is true for all x2Q we have 2QQ.

Let xQ. Let y=x2. Note that 12Q so by (A6), since 12,xQ, we have y=x2Q. Hence, there exists yQ such that x=2y, so x2Q. As this is true for all xQ we have Q2Q.

We can combine the three basic symbols together to make more complicated statements, and use truth tables to find when their truth values based on the truth values of P and Q.

Example:

Let P and Q be two statements. The corresponding truth table for (¬P)Q is as follows.

P Q ¬P (¬P)Q
T T F T
T F F F
F T T T
F F T T
Example:

Let P and Q be two statements. The corresponding truth table for (¬Q)(¬P) is as follows.

P Q ¬P ¬Q (¬Q)(¬P)
T T F F T
T F F T F
F T T F T
F F T T T

2.3 Logical Equivalence

As well as combining statements together to make new statements, we also want to know whether two statements are equivalent, that is they are the same.

Definition 2.7:

We use the symbol to mean if and only if. For two statements P and Q, “PQ” means PQ and QP. In this case we say “P and Q are equivalent”. The corresponding truth table is as follows.

P Q PQ
T T T
T F F
F T F
F F T

If we take two statements which are logically equivalent, say P is equivalent to Q, then proving P to be true is equivalent to proving Q to be true. Similarly, proving Q to be true is equivalent to proving P to be true. We can use truth tables to prove if two (abstract) statements are equivalent.This will prove to be useful later on when we turn a statement we want to prove is true into another equivalent statement that may be easier to prove.

Theorem 2.8:

Let P and Q be two statements.

  • PQ is equivalent to (¬Q)(¬P).

  • PQ is equivalent to (¬P)Q.

Proof.

Using the last two examples and the truth table for PQ, we have the following truth table.

P Q PQ (¬Q)(¬P) (¬P)Q
T T T T T
T F F F F
F T T T T
F F T T T

Hence, for all truth values of P and Q, we have that PQ and (¬Q)(¬P) have the same truth values. Therefore, PQ is equivalent to (¬Q)(¬P).

Similarly, for all truth values of P and Q, we have that PQ and (¬P)Q have the same truth values. Therefore, PQ is equivalent to (¬P)Q.

Proof techniques:

Notice that the above proof has several sentences to explain to the reader what is going on. It is made up of full sentences with a clear conclusion on what the calculation (in this case the truth table) shows. A proof should communicate clearly to the reader why the statement (be that a theorem, proposition or lemma) is true.

How much detail you put in a proof will be influenced by who your target audience is - this is a skill you will develop over your time as a student.

We leave the following proposition as an exercise.
Proposition 2.9:

Suppose P and Q are two statements. Then:

  • P(¬(¬P)).

  • (PQ)((¬P)Q)

Proof.

Exercise.

The next proposition shows that and are associative, that is, PQR and PQR are statements that are clear without parentheses (and therefore do not require parentheses).

Proposition 2.10:

Suppose P,Q,R are statements.

  1. ((PQ)R)(P(QR)).

  2. ((PQ)R)(P(QR)).

Proof.

We prove part a. and leave the proof of part b. as an exercise. We have the following truth table.

P Q R PQ (PQ)R QR P(QR)
T T T T T T T
T T F T F F F
T F T F F F F
T F F F F F F
F T T F F T F
F T F F F F F
F F T F F F F
F F F F F F F
Hence, for any truth values of P,Q,R, the truth table shows that ((PQ)R)(P(QR)).

Proposition 2.11:

Let P,Q,R be statements. Then P(QR) and (PQ)R are not equivalent.

Proof.

We have the following truth table

P Q R QR P(QR) (PQ) (PQ)R
T T T T T T T
T T F F F T F
T F T F F F F
T F F T T F T
F T T T T T T
F T F F T T F
F F T F T T T
F F F T T T F
From the above truth table, we can see that when P, Q and R are false, then the truth value of P(QR) (which is true) is different from the truth value of (PQ)R (which is false). Hence P(QR) and (PQ)R are not equivalent.

The above proposition shows that the statement PQR therefore has no clear meaning without parenthesis. Similarly, there is an exercise to show that P(QR) and (PQ)R are not equivalent, so PQR is likewise not clear. Hence the meaning of assertions such as PQRS is undefined (unless one puts in parenthesis).

As an exercise, one may also prove the following sometimes useful equivalences.

Proposition 2.12:

Let P,Q,R be statements. Then:

  • (P(QR))((PQ)(PR));

  • (P(QR))((PQ)(PR)).

Proof.

Exercise.

The next proposition shows that and are distributive.
Proposition 2.13:

Let P,Q,R be statements. Then

  1. (P(QR))((PQ)(PR)).

  2. (P(QR))((PQ)(PR)).

Proof.

We will prove part a. and leave the proof of part b. as an exercise. We have the following truth table.

P Q R QR P(QR) PQ PR (PQ)(PR)
T T T T T T T T
T T F T T T F T
T F T T T F T T
T F F F F F F F
F T T T F T F F
F T F T F F F F
F F T T F F F F
F F F F F F F F
Hence, for any truth values of P,Q,R, the above truth table shows that (P(QR))((PQ)(PR)).

Let us return to sets to see how logic may be applied to prove statements.

Definition 2.14:

Suppose that A and B are subsets of some set X.

  • AB denotes the union of A and B, that is AB={xX: xA or xB }.

  • AB denotes the intersection of A and B, that is AB={xX: xA and xB }.

When AB=, we say that A and B are disjoint.

Example:

We have Z0Z=Z. We also see that Z0Z=, hence they are disjoint.

Etymology:

The word union comes from the Latin unio meaning “a one-ness”. The union of two sets is a set that lists every element in each set just once (even if the element appears in both sets). While the symbol looks like a “U” (the first letter of union), this is a coincidence. While the symbol was first used by Hermann Grassmann (Polish/German mathematician, 1809 - 1877) in 1844, Giuseppe Peano (Italian mathematician, 1858-1932) used it to represent the union of two sets in 1888 in his article Calcolo geometrico secondo Ausdehnungslehre di H. Grassmann. However, at the time the union was referred to as the disjunction of two sets.

The word intersect comes from the Latin inter meaning “within, in between” and sectus meaning “to cut”. The interesection of two curves is the place they cut each other, the intersection of two sets is the “place” where two sets overlaps. While the symbol was first used by Gottfried Leibniz (German mathematician, 1646 - 1716), he also used it to represent regular multiplication (there are some links between the two ideas). Again, was used by Giuseppe Peano in 1888 to refer to intersection only.

The word disjoint comes from the Latin dis meaning “away, in two parts” and the word joint. Two sets are disjoint if they are apart from each other without any joints between them. (Compare this to disjunction, which has the same roots but is used to mean joining two things that are apart).

Lemma 2.15:

Let X be a set, and for xX, let P(x) be the statement that x satisfies the criteria P, and let Q(x) be the statement that x satisfies the criteria Q. Set A={xX:P(x)}andB={xX:Q(x)}. Then AB={xX:P(x)Q(x)},AB={xX:P(x)Q(x)}.

Proof.

For xX, we have that xA if and only if the statement P(x) holds. Similarly, we have that xB if and only if the statement Q(x) holds. Then AB={xX:(xA)(xB)}={xX:P(x)Q(x)} and AB={xX:(xA)(xB)}={xX:P(x)Q(x)}.

We can use our work on logical equivalence to show that that and are associative.

Proposition 2.16:

Suppose A,B,C are subsets of a set X. Then

  1. A(BC)=(AB)C.

  2. A(BC)=(AB)C.

Proof.

We will prove part a. and leave part b. as an exercise

Suppose xX. Let P be the statement that xA, let Q be the statement that xB, and let R be the statement that xC. Recall Proposition 2.10, P(QR)(PQ)R. Then xA(BC)(xA)(xBC)(xA)((xB)(xC))P(QR)(PQ)R((xA)(xB))(xC)(xAB)(xC)x(AB)C. Hence, we have that xA(BC) if and only if x(AB)C. It follows that A(BC)=(AB)C.

Proof techniques:
In the above proof, we used in each line so that the proof works both way (and we concluded A(BC)(AB)C at the same time as (AB)CA(BC) ). When using , one needs to be very careful that indeed the implication works both ways, as it is very easy to make a mistake along the way. For this reason, many proofs instead show PQ and QP as two separate proofs (within the same proof).

Similarly, we have that and are distributive.

Proposition 2.17:

Let A,B,C be subsets of a set X. Then

  1. A(BC)=(AB)(AC).

  2. A(BC)=(AB)(AC).

Proof.

We will prove part a. and leave part b. as an exercise

Suppose xX. Let P be the statement that xA, let Q be the statement that xB, and let R be the statement that xC. Recall Proposition 2.13 that P(QR)(PQ)(PR). Then xA(BC) (xA)(xBC) (xA)((xB)(xC)) P(QR) (PQ)(PR) ((xA)(xB))((xA)(xC)) (xAB)(xAC) x(AB)(AC). Hence, we have that xA(BC) if and only if x(AB)(AC). It follows that A(BC)=(AB)(AC).

2.4 Negations

Being able to negate statements is important for two reasons:

  • Instead of proving P is true, it might be easier to prove that ¬P is false (proof by contradiction).

  • Instead of proving PQ it might be easier (by Theorem 2.8 ) to prove ¬Q¬P (proof by contrapositive).

We will expand on these two points later. We already know how to negate most simple statements, for example:

  • the negation of x=5 is x5.

  • the negation of x>5 is x5 (notice the strict inequality became unstrict).

  • the negation of xX is xX.

To negate simple statements that have been strung together, we use the following theorem.

Theorem 2.18:

Suppose P,Q are two statements. Then

  1. ¬(PQ)((¬P)(¬Q)).

  2. ¬(PQ)((¬P)(¬Q)).

  3. ¬(PQ)(P(¬Q)).

Proof.

We will prove part a. and leave part b. and c. as exercises. We have the following truth table.

P Q ¬P ¬Q PQ ¬(PQ) (¬P)(¬Q)
T T F F T F F
T F F T F T T
F T T F F T T
F F T T F T T
Thus, for any truth value of P,Q,R, the truth values of ¬(PQ) and (¬P)(¬Q) are the same.

Example:

If x is not between 5 and 10, then x is either less than 5 or more than 10. Formally speaking ¬(5x10) if and only if ¬(5x) or ¬(x10) if and only if x<5 or 10<x.

For statements that involves quantifiers (i.e., ,), we use the following theorem.

Theorem 2.19:

Let X be a set, and suppose that P(x) is a statement involving xX. Then

¬( xX, P(x)) xX such that ¬P(x).

We also have

¬( xX such that P(x)) xX, ¬P(x).

Proof.

Suppose that  xX,P(x) is a false statement. Then there must be at least one xX such that P(x) does not hold. That is, (2.1)¬( xX, P(x)) xX such that ¬P(x).

Conversely, suppose that  xX such that ¬P(x) is a true statement. Then it is not the case that P(x) holds for all xX, that is (2.2) xX such that ¬P(x)¬( xX, P(x)).

By (2.1) and (2.2), it follows that ¬( xX, P(x) xX such that ¬P(x).

Equivalently, we have ¬(¬( xX, P(x)))¬( xX such that ¬P(x)) xX, P(x). Setting Q(x)=¬P(x), we have ¬( xX such that Q(x)) xX, ¬Q(x).

Example:

Let X be a set and let us negate the statement “xX,P(x)Q(x)”, by writing equivalent statements using Theorem 2.18 and Theorem 2.19.

¬(xX,P(x)Q(x))  xXsuch that¬(P(x)Q(x))  xXsuch that (P(x)¬Q(x)).

To make this more concrete, let P(x) be the statement x2Z and Q(x) the statement x4Z. We have already shown that 2Z4Z, i.e. xZ,P(x)Q(x) is false. We did this by showing when x=10, P(x) is false while Q(x) is true, i.e., xZ such that P(x) is true and Q(x) is false.

Negating statements is also very useful to see when an object does not satisfy a definition. We will see more examples of this later in the course, but for the moment here is an example.

Example:

Using (O2) as an example, a set X satisfies transitivity (with respect to < ) if for all x,y,zX if x<y and y<z then x<z. Let us see what it means for X not to satisfy (O2). First we turn the definition into symbolic language x,y,zX,(x<y)(y<z)(x<z). We then negate this

¬(x,y,zX,(x<y)(y<z)(x<z))x,y,zX such that ¬((x<y)(y<z)(x<z))x,y,zX such that ((x<y)(y<z))¬(x<z))x,y,zX such that (x<y)(y<z)(xz).

Proof techniques:
We have inserted phrases like “such that” to make our sentences more readable without changing their meanings.

Remark:

To negate a statement with the quantifier !, it is useful to first translate this notation to not include “!”.

Suppose X is a set, and P(x) is a proposition dependent on xX. When we say there is a unique xX so that P(x) holds we mean first that:

  1. there is some xX so that P(x) is true, and

  2. if we have x1,x2X with P(x1) and P(x2) are true, then x1=x2.

More symbolically, we have [!xX,P(x)][(xX, P(x))(x1,x2X, (P(x1)P(x2))x1=x2)]. So negating, we get: ¬[!xX,P(x)]¬[(xX, P(x))(x1,x2X, (P(x1)P(x2))x1=x2)][¬(xX, P(x))¬[(x1,x2X, (P(x1)P(x2)x1=x2)]][(xX, ¬P(x))[x1,x2X, ¬(P(x1)P(x2)x1=x2)]][(xX, ¬P(x))[x1,x2X, (P(x1)P(x2)x1x2)]].

Thus the negation of exactly one x such that P(x) holds is “either there exists no x such that P(x) holds or there exists more than one x such that P(x) holds”

2.5 Contradiction and the contrapositive

As mentioned earlier, negating statements are useful when trying to prove statements using different methods. First we look at proof by contradiction.

Proof techniques:
A proof by contradiction uses the following logic. Instead of trying to prove a statement P is true, we assume that ¬P is true. If we end up with a contradiction, either to something we already knew is true or to our original assumption, we must deduce that ¬P is false. This allows us to conclude that P is true.

While we will use this method more extensively later, let us see an easy example.

Example:

Statement: Let x,yQ. If x<y then x>y.

Proof: For the sake of a contradiction let us assume that x<y and xy. (Recall the negation of PQ is P¬Q.) Since x<y we have xx<yx [by (O3)], i.e. 0<yx. Since xy then x+yy+y, i.e. yx0. So 0<yx and yx0, i.e. [by (O1)] 0<0, which is a contradiction. Hence x<y and xy is false, so If x<y then x>y.

Another technique is a proof by contrapositive.

Definition 2.20:
Let P and Q be two statements. The contrapositive of “PQ” is “¬Q¬P”.
Example:

The contrapositive of (A1), “for all x,y, if x,yZ then x+yZ” is “for all x,y, if x+yZ then xZ or yZ.”

The contrapositive of (O3), “for all x,y,zQ, if x<y then x+z<y+z” is “$for all x,y,zQ if x+zy+z then xy”.

By Theorem 2.8, we know that PQ is equivalent to its contrapositive. Sometimes, proving the contrapositive is easier than proving PQ (we will see more examples of this later).

Example:

Statement: Let x,yQ. If x<y then x>y.

Proof: We prove the contrapositive. The contrapositive is “if xy then xy”. Suppose xy then x+(x+y)y+(x+y) [since x+yQ by (A1) and using (O3)*]. In other words yx, i.e. xy.

Note that we have proven “x<yx>y” by contradiction and by contrapositive. It is also worth noting that we could have proven it directly. This is meant to show that often there are numerous way to prove the same thing.

Note that the contrapositive doesn’t just change the implication symbol, but it also negates P and Q. The contrapositive can often be confused with the converse:

Definition 2.21:
Let P and Q be two statements. The converse of “PQ” is “QP”.

Note that PQ is not equivalent to its converse. Be careful when using a theorem/lemma/proposition that you are not using the converse by accident (which may not be true).

Example:
The converse of (A1) “for all x,yQ, if x,yZ then x+yZ” is “for all x,yQ, if x+yZ then x,yZ”, which we know is false (e.g., by taking x=y=12.)

If the converse of PQ is true, then we can deduce that PQ.

Example:

The converse of “If x<y then x>y” is “x>y then x<y”. We can show this is true, suppose x>y then x+(x+y)>y+(x+y), i.e. y>x, i.e. x<y.

Therefore we conclude, for all x,yQ, x<y if and only if x>y.
Etymology:

Contradiction comes from the Latin contra which means “against” and dict which is a conjugation of the verb “to say, tell”. A contradiction is a statement that speaks against another statement.

Contrapositive also comes from the Latin contra and the Latin positus which is a conjugation of the verb “to put”. The contrapositive of a statement is a statement “put against” the original statement, we have negated both parts and reversed the order. Furthermore it is “positive” as it has the same truth value as the original statement.

On the other hand, while sounding similar, converse comes from the Latin con which means “together with” and vergere which means “to turn”. The converse turns the order of P and Q.

2.6 Set complement

We finish this section by looking at the complement of sets.

Definition 2.22:

Suppose that A and B are subsets of some set X.

  • AB denotes the relative complement of A with respect to B, that is AB={xX: xA and xB }.

  • Ac denotes the complement of A, that is Ac={xX: xA }.

Example:
Let X={1,2,3,4,5,6,7,8,9,10}, A={2,4,6,8} and B={4,5,6,7}. Then AB={2,8}, BA={5,7} and Ac={1,3,5,7,9,10}.
Example:
We have ZZ=Z0 or Z0{0}=Z+.
Proposition 2.23:

Suppose A,B are subsets of a set X. Then

  1. AB=ABc.
  2. (AB)c=AcB.

Proof.

We will prove part a. and leave part b. as an exercise.

Let xX. Then xAB(xA)(xB)(xA)(xBc)xABc. Hence, we have that xAB is equivalent to xABc. It follows that AB=ABc.

Theorem 2.24:

Suppose that A,B,C are subsets of a set X. Then

  1. A(BC)=(AB)(AC).

  2. A(BC)=(AB)(AC).

  3. (AB)c=AcBc.

  4. (AB)c=AcBc.

Proof.

We will prove parts a. and d. and leave parts b. and c. as exercises.

Proving a.) Let xX. Recall that for statements P,Q,R, we have that P(QR)(PQ)(PR). Then xA(BC)(xA)(xBC)(xA)(¬(xBC))(xA)(¬((xB)(xC)))(xA)((xB)(xC))((xA)(xB))((xA)(xC))(xAB)(xAC)x(AB)(AC).

Hence, we have that xA(BC) if and only if x(AB)(AC). It follows that A(BC)=(AB)(AC).

Proving d.) Let xX. Then x(AB)c¬(xAB)¬((xA)(xB))¬((xA)(¬(xB)))(xAc)(xBc)xAcBc.

Hence, we have that x(AB)c if and only if xAcBc. It follows that (AB)c=AcBc.

History:

The above theorem is often known as De Morgan’s Laws, or De Morgan’s Theorem. Augustus De Morgan (English Mathematician, 1806 - 1871) was the first to write this theorem using formal logic (the one we are currently seeing). However this result was known and used by mathematicians and logicians since Aristotle (Greek philosopher, 384BC - 322BC), and can be found in the medieval texts by William of Ockham (English philosopher, 1287 - 1347) or Jean Buridan (French philosopher, 1301 - 1362).

Notation:

It is often convenient to denote the elements of a set using indices. For example, suppose A is a set with 5 elements. Then we can denote these elements as a1,a2,a3,a4,a5. So we can write A={ai: iI }, where I={1,2,3,4,5}. The set I is called the indexing set.

Let {Ai}iI be a collection of subsets of a set X where I is an indexing set. Then we write iIAi to denote the union of all the sets Ai, for iI. That is, iIAi={xX: iI such that xAi}. Furthermore, we write iIAi to denote the intersection of all the sets Ai, for iI. That is, iIAi={xX: iI, xAi}.

Proposition 2.25:

Let X be a set, let A be a subset of X, and let {Bi}iI be an indexed collection of subsets, where I is an indexing set. Then we have

  1. AiIBi=iI(ABi).

  2. AiIBi=iI(ABi).

Proof.

We will prove part a. and leave part b. as an exercise.

We know that xiIBi if and only if we have that xBi, for all iI. Then xiIBi if and only if there exists an iI such that xBi. Now, suppose that xAiIBi. Then xA, and for some iI, we have that xBi. Hence, for some iI, we have that xABi. Then xiI(ABi) which shows that AiIBiiI(ABi).

Now, suppose that xiI(ABi). Hence, for some iI, we have that xABi. Then for some iI, we have that xA and xBi. Since there exists some iI such that xBi, we have xiIBi. Then xAiIBi which shows that iI(ABi)AiIBi. Summarising the above, we have that AiIBi=iI(ABi).

Interest:

We finish this section with a brief side-note. How does one differentiate whether a statement is a definition, a theorem, a proposition, a lemma etc? The team at Chalkdust Magazine made the following flowchart which while is not meant to be serious, does reflect quite well how one can classify different statements. That is: a definition is a statement taken to be true; roughly speaking a proposition is an interesting but non-important result; while a theorem is an interesting, important main result; and a lemma is there to build up to a theorem. The original figure can be found at https://chalkdustmagazine.com/regulars/flowchart/which-type-of-statement-are-you/

What statement are you? Copyright Chalkdust Magazine, Issue 17.

Figure 2.1: What statement are you? Copyright Chalkdust Magazine, Issue 17.

3 The rationals are not enough

Now that we have a solid foundation of logic and abstract set notation, let us explore sets within Q. That is, let us look at subsets of the rationals. This will lead us to notice that irrational numbers exists, and hence exploring a new set called the reals, R.

3.1 The absolute value

Before we look at subsets of Q, we introduce the notion of absolute value.

Definition 3.1:

For xQ, the absolute value or modulus of x, denoted |x|, is defined by |x|:={x if x0;x if x<0.

It is often helpful to think of the absolute value |x| as the distances between the point x and the origin 0. Likewise, |xy| is the distance between the points x and y.

Proposition 3.2:

For any x,yQ

  1. |x|0 with |x|=0 if and only if x=0;

  2. |xy|=|x||y|;

  3. |x2|=|x|2=x2.

Proof.

Exercise

Example:

Statement Show that a2+b22ab for any a,bQ.

Solution Let a,bQ. We have that (ab)20, so [expanding the bracket] a22ab+b20. Rearranging, this gives a2+b22ab.

Proposition 3.3: (Triangle Inequality)

For all x,yQ we have |x+y||x|+|y|.

Proof.

We prove this by case by case analysis. First note that for all xQ we have x|x| and x|x|. Let x,yQ.

Case 1: Suppose xy then x+y0 and so |x+y|=x+y|x|+|y|.

Case 2: Suppose x<y then x+y<0 and so |x+y|=(x+y)=x+(y)|x|+|y|.

3.2 Bounds for sets

With this notion of absolute value, we can start asking whether a subset of Q contains arbitrarily large or small elements.

Definition 3.4:

Let AQ be non-empty. We that that A is:

  • bounded above (in Q) by αQ if for all xA, xα;

  • bounded below (in Q) by αQ if for all xA, xα;

  • bounded if it is bounded above and below;

If A is bounded above by α and below by β, then by setting γ=max{|α|,|β|} we have A is bounded by γ, i.e., for all xA, we have |x|γ.

Note that α is far from unique. For example, take the set A={1n:nZ+}. Then we can see that A is bounded above by 1, but it is also bounded above by 2 and by 100 etc.

Definition 3.5:

Let AQ be non-empty. The least (or smallest) upper bound of A (in Q) is αQ such that:

  • α is an upper bound, i.e. for all xA, xα;

  • any rational number β less than α is not an upper bound, i.e. for all βQ with β<α, there exists xA with β<x.

The greatest (or largest) lower bound of A (in Q) is αQ such that:

  • α is a lower bound, i.e. for all xA, xα;

  • any rational number β greater than α is not a lower bound, i.e. for all βQ with β>α, there exists xA with β>x.

Remark:

We use our work on negating statements to negate the above definition and say

αQ is not the least upper bound of A if:

  • there exists xA such that x>α (α is not an upper bound) or;

  • there exists βQ with β<α and for all xA we have xβ (there is an upper bound lower than α).

Similarly, αQ is not the greatest lower bound of A if:

  • there exists xA such that x<α (α is not a lower bound) or;

  • there exists βQ with β>α and for all xA we have xβ (there is an lower bound greater than α).

Example:

Let A={1n:nZ+}. We show that 1 is the least upper bound of A. As remarked before, we have 1 is an upper bound since if we take xA then x=1n with nZ+. In particular n1, so x=1n11=1.

We now show that 1 is the least upper bound by showing any number less than 1 is not an upper bound. Let β<1. By taking n=1Z+, we see that 1=11A, hence β<1 means β is not an upper bound. Hence 1 is the least upper bound.

We show that 0 is the greatest lower bound. First we show 0 is a lower bound. Let xA then x=1n with nZ+. In particular n>0, so x=1n>0.

We now show that 0 is the greatest lower bound by showing any number greater than 0 is not a lower bound. Let β=ab>0 [so a,bZ+]. Set n=b+1Z, so 1nA. Then 1n=1b+1<1ba1b=ab=β.

So we have found xA such that x<β, so β is not a lower bound.

Proof techniques:

The argument for β not being a lower bound above seems to come from nowhere. Sometimes it is hard to see where to start a proof, so mathematician first do scratch work. This is the rough working we do as we explore different avenues and arguments, but that is not included in the final proof (so to keep the proof clean and easy to understand). The scratch work for the above proof of might have been along the lines:

We want to find nZ+ such that β=ab>1n. Rearranging, this gives an>b (as both n and b are positive), so n>ba. Since a1, we have bab. Picking n=b+1 would satisfy n=b+1>bba.

Note that in the example above the greatest lower bound of A is not in A itself [if 0A, then there exists nZ+ such that 0=1n, i.e. 0=1, which is a contradiction.]. If a set is a bounded subset of Z, then we get a different story.

Theorem 3.6: (Well Ordering Principle)

Any non-empty subset of Z0 contains a minimal element.

Proof.

Exercise to be done after we have introduced the completeness axiom.

Corollary 3.7:

Let AZ be non-empty. If A is bounded below, it contains a minimal element (i.e. its greatest lower bound is in A). If it is bounded above, it contains a maximal element (i.e., its least upper bound is in A).

As we saw above with the set A={1n:nZ+}, this is not the case for general subsets of Q. However, it is even worst, because a general subset of Q might be bounded and not have a greatest lower bound or least upper bound in Q, as we will see in the next section.

3.3 The irrationals and the reals

We first show that there exists irrational numbers (i.e., numbers that are not rational) and show that this means that there exists subsets of Q whose lower upper bound is not rational (and hence we need a bigger number system).

Theorem 3.8:

There does not exists xQ such that x2=2.

Proof.

For the sake of a contradiction, suppose there exists x=abQ such that x2=2. Without loss of generality, since x2=(x)2 and 02=0, we can assume x>0, i.e. a,bZ+. Furthermore, if 0<x1, then x21<2, and if x2 then x24>2. So we assume that 1<x<2.

Let A={rZ+:rxZ}Z. Note that A is non-empty since, bx=aZ+, so bA. We have that A is bounded below by 0, so by the Well Ordering Principle, A contains a minimal element, call if m. We will prove that m is not minimal by finding 0<m1<m with mA. This will be a contradiction to the Well Ordering Principle.

Define m1=m(x1)=mxmZ. Since 1<x, we have x1>0 so m1=m(x1)>0. Similarly, since x<2 we have x1<1 so m1=m(x1)<m. Hence 0<m1<m. Now m1x=m(x1)x=mx2mx=2mmxZ. Hence m1A and 0<m1<m which is a contradiction.

Since irrational numbers exists if we restrict ourselves to only using rational number then there are many unanswerable questions. From simple geometrical problems (what is the length of the diagonal of a square with length side 1), to the fact that there are some bounded set of rationals which do not have a rational least upper bower.

Example:

Consider the set A={xQ:x2<2}. We have A is bounded above (for example, by 2 or 10), but we show it does not have a least upper bound in the rational.

Let α=abQ be the least upper bound of A and note we can assume α>0. We either have α2<2, α2=2 or α2>2. We will show that all three of these cases leads to a contradiction.

Case 1: α2<2. We show that in this case α is not an upper bound by finding xA such that α<x.

[Scratch work: We look for cZ such that (α+1bc)2=(ac+1bc)2<2. Rearranging, we get a2c2+2ac+1<2b2c2, then 2ac+1<c2(2b2a2). Since α2<2, we know a2<2b2, so 2b2a2>0. Since 2b2a2Z, we have 2b2a21, so c2(2b2a2)c2. So to find c such that 2ac+1<c2, i.e. 0<c22ac1, i.e. by completing the square 0<(ca)2(a2+1). To simplify our life, let us take c to be a multiple of a, say ka, then we are looking for 0<(k1)2a2a21=((k1)21)a21, i.e. (k1)21)>2, so k=3 should work, i.e. c=3a.]

Let x=α+13ab>α. We prove that xA (and hence α is not an upper bound) by showing x2<2. We have x2=(3a2+13ab)2=9a4+6a2+19a2b2<9a4+9a29b2a2 (since 6a2+1<9a2 )=a2+1b2<a2+(2b2a2)b2 (since a2<2b2)<2.

Case 2: α2=2. This is a contradiction to Theorem 3.8

Case 3: α2>2. We leave this as an exercise. [Hint: Find appropriate c so that x=α1bcQ is such that x2>2. Argue that x is an upper bound for A and x<α to conclude α is not the least upper bound. ]

Proof techniques:

Notice that in the above we made sure to have x>α by setting x=α+ϵ where ϵ>0. By doing so, we reduced the numbers of properties we needed x to have.

We use this as a motivation to introduce the real numbers.

Definition 3.9:

The set of real numbers, denoted R, equipped with addition +, multiplication and the order relation < satisfies axioms (A1) to (A11), (O1) to (O4) and the Completeness Axiom

Completeness Axiom: Every non-empty subset A of R which is bounded above has a least upper bound.

Interest:

It can be shown that there is exactly one quadruple (R;+;;<) which satisfies these properties - up to isomorphism. We do not discuss the notion of isomorphism in this context here (although we will later look at it in the context of groups) save to remark that any two real number systems are in bijection (we will define this later) and preserves certain properties. This allows us to speak of the real numbers.

There are several ways of constructing the real number system from the rational numbers. One option is to use Dedekind cuts. Another is to define the real numbers as equivalence classes of Cauchy sequences of rational numbers (you will explore Cauchy sequences in Analysis).

You may continue to imagine the real numbers as a number line as you did pre-university.

The definition of the absolute value is the same for real numbers as for the rational numbers, as is the notion of bounded sets (and therefore all the results in the previous section still holds for R).

Interest:

We can use the absolute value to define a distance or metric on R. To do so we define d(x,y)=|xy| for any two points x,yR. This distance has the following properties, for any x,y,zR:

  • d(x,y)0 and d(x,y)=0 if and only if x=y;

  • d(x,y)=d(y,x);

  • d(x,y)d(x,z)+d(z,y).

You can explore whether other distance/metric can be defined on R or other sets in the 2nd year unit Metric Spaces. You can explore how we construct other sets from Q that satisfies the completeness axiom by looking up “p-adic numbers”.

We deduce two important result about R.

Proposition 3.10:

Every non-empty subset A of R bounded below has a greatest lower bound.

Proof.

Let AR be non-empty and bounded below. Let cR be a lower bound. Define the set B={x:xA}. Let xB, so xA. Since c is a lower bound, xc, i.e. xc. So c is an upper bound for B, and B is non-empty [since A is non-empty]. By the Completeness Axiom B has a least upper bound, uR. Let =u. We prove that is the greatest lower bound for A.

We first show is a lower bound. Let xA, then xB so xu. Hence xu=.

We now show is the greatest lower bound by showing any real number bigger than is not a lower bound. Let yR be such that y>, so y<=u. Now y is not an upper bound for B since u is the least upper bound of B. So by definition, there exists bB such that b>y, i.e. b<y. Since bB, we have bA. Hence y is not a lower bound for a.

So is the greatest lower bound for A.

Theorem 3.11: (Archimedean Property)

For any xR there exists nZ+ such that xn.

Proof.

We prove this by contradiction.

[¬(xR,nZ+ such that xn) xR such that nZ+,x>n].

Suppose there exists xR such that n<x for all nZ+. In particular, this means Z+R is bounded above. So by the completeness axiom, Z+ has a least upper bound α [in R]. Since α is the least upper bound, α12 is not an upper bound, i.e. there exists bZ+ such that α12<b<α. But then, b+1Z and α+12<b+1<α+1 so α<b. This contradicts the fact α is an upper bound for Z+.

History:

The above property is named after Archimedes of Syracuse (Sicilian/Italian mathematician, 287BC - 212 BC) although when Archimedes wrote down this theorem, he credited Eudoxus of Cnidus (Turkish mathematician and astronomer, 408BC - 355BC). It was Otto Stolz (Austrian mathematician, 1842 - 1905) who coined this property - partly because he studied fields where this property is not true and therefore needed to coin a term to distinguish between what is now know as Archimedean fields and non-Archimedean fields.

We finish this section by introducing notation for some common subsets of R.

Notation:

Let a,bR are such that ab, we denote:

  • the open interval of a,b by (a,b)={xR:a<x<b};

  • the closed interval of a,b by [a,b]={xR:axb}.

  • [a,b)={xR:ax<b}; (a,b]={xR:a<xb};

  • (a,)={xR:a<x}; [a,)={xR:ax};

  • (,b)={xR:x<b}; (,b]={xR:xb}.

By convention we have (a,a)=[a,a)=(a,a]=, while [a,a]={a}.

3.4 The supremum and infimum of a set.

Since R is complete, i.e., every bounded set has a least upper bound and greatest lower bound, we introduce the notion of supremum and infimum of a set.

Definition 3.12:

Let AR. We define the supremum of A, denoted sup(A) as follows:

  • If A=, then sup(A)=.

  • If A is non-empty and is bounded above [i.e., there exists αR such that for all xA, xα] then sup(A) is the least upper bound of A (which we know exists by the Completeness Axiom)

  • If A is non-empty and is not bounded above [i.e., for all αR, there exists xA such that x>α] then sup(A)=+.

Definition 3.13:

Let AR. We define the infimum of A, denoted inf(A) as follows:

  • If A=, then inf(A)=+.

  • If A is non-empty and is bounded below [i.e., there exists αR such that for all xA, xα] then inf(A) is the greatest lower bound of A (which we know exists by the Completeness Axiom)

  • If A is non-empty and is not bounded below [i.e., for all αR, there exists xA such that x<α] then sup(A)=.

Etymology:

Supremum comes from the Latin super meaning “over, above” while infimum comes from the Latin inferus meaning “below, underneath, lower” (these words gave rise to words like superior and inferior).

Example:

Let A={1n:nZ+}. We have already seen that sup(A)=1 (in A) and inf(A)=0 (not in A).

Proposition 3.14:

Let a,bR with a<b. Then

  1. sup((a,b))=sup((a,b])=sup([a,b))=sup([a,b])=b;

  2. inf((a,b))=inf((a,b])=inf([a,b))=inf([a,b])=a;

  3. sup((a,))=sup([a,))=+;

  4. inf((,a))=inf((,a])=;

  5. sup((,a))=sup((,a])=inf([a,))=inf((a,))=a.

Proof.

We will only prove sup((a,b))=b and sup((a,))=+ and leave the rest as the arguments are very similar.

Let a,bR with a<b, we will show sup((a,b))=b. [Note that (a,b) is non-empty (as ab) and it is bounded above.]

First we show that b is an upper bound. Indeed, let x(a,b) then by definition a<x<b so xb as required.

Next we show that b is the least upper bound [by showing any real number less than b is not an upper bound]. Let yR with y<b.Suppose a<y (i.e. y(a,b)) and let x=b+y2=y+by2=bby2. Note that x<b and x>y>a, so x(a,b). Since x>y, we have y is not an upper bound. Suppose ya (i.e. y(a,b)) and let x=a+b2=a+ba2=bba2. Then x<b and x>ay, so x(a,b). Since x>y, we have y is not an upper bound. In either cases, y is not an upper bound, so b is the least upper bound.

Let aR, we show that sup((a,))=+. [Note that (a,) is non-empty, so we want to show it is not bounded above].

Suppose for contradiction that (a,) is bounded above [i.e., sup((a,))+]. Let uR be an upper bound for (a,). Set x=|a|+|u|+1R. Note that x>|a|a, so x(a,). Hence u being an upper bound means xu, however x>|u|u. This is a contradiction.

Example:

Problem: Let A={n2+1|n+1/2|:nZ}. Show that sup(B)=+ and inf(B)=4/3.

Solution: Let us first look at the supremum. [The question is asking us to show B is not bounded above.] Let xR. By the Archimedean Principle, choose nZ+Z such that n>2x [Scratch work missing to work out why we choose this particular n]. Define a=n2+1n+1/2A. Then a=nn+1nn+12nnn+12as x+1nnnn2nas n+122n=n2>x. So we have found aA such that a>x, so A is not bounded above. Hence sup(A)=+.

We now look at the infimum. We first show that 4/3 is a lower bound. Let a=n2+1|n+1/2| for some nZ, so aA. Consider n2+1(4/3)|n+1/2|n2+1(4/3)(|n|+1/2) by the triangle inequality=n2(4/3)|n|+1/3=(|n|2/3)21/9 by completing the square 0 since for all nZ(|n|2/3)21/9. Therefore n2+1(4/3)|n+1/2|, i.e. a=n2+1|n+1/2|4/3 as required.

We next show that 4/3 is the greatest lower bound for A [by showing any number greater than it is not a lower bound]. First note that by setting n=1, we have n2+1|n+12|=23/2=43. So 4/3A. Thus, no value y>4/3 can be a lower bound for B. This shows that 4/3 is the greatest lower bound. Hence inf(B)=4/3.

The next example is more theoretical.

Example:

Problem: Let A and B be bounded non-empty subsets of R. Define the sum set as A+B={a+b:aA,bB}. Show that sup(A+B)=sup(A)+sup(B).

Solution: Let α=sup(A) and βsup(B). Note that α,βR as both A and B are bounded. We show α+β=sup(A+B).

We first show α+β is an upper bound for A+B. Let cA+B, by definition, there exists aA and bB such that c=a+b. Now aα and bβ, so c=a+bα+β.

We now show α+β is the least upper bound for A+B. Let ϵ>0, we show that α+βϵ is not an upper bound. Since α is the least upper bound of A, then αϵ/2 is not an upper bound, so there exists aA such that a>αϵ/2. Similarly, there exists bB such that b>βϵ/2. Define c=a+bA+B. Then c=a+b>(αϵ/2)+(βϵ/2)=α+βϵ. This shows for any ϵ>0, α+βϵ is not an upper bound for A+B. So α+β is the least upper bound for A+B, hence the supremum of A+B.

Proof techniques:

Sometimes, instead of showing something is true for all y>x, it is easier to show it is true for all x+ϵ where ϵ>0. We can do this because if y>x, then setting epsilon=yx>0, we see that y=x+ϵ.

Definition 3.15:

Let AR. We say that A has a maximum if sup(A)A. In this case we write max(A) to stand for the element aA with a=sup(A).

Similarly we say that A has a minimum if inf(A)A. In this case we write min(A) to stand for the element aA with a=inf(A).

Note that a set may not have a minimum or a maximum.

Example:

Let A={n2+1|n+1/2|:nZ}. We have seen inf(A)=4/3A, so min(A)=4/3. However A does not have a maximum as sup(A)=+A (as R and AR).

Example:

Let A={1n:nZ+}. Then we have seen that sup(A)=1 and 1A, so max(A)=1. However it does not have a minimum as we have seen inf(A)=0 and 0A.

4 Proof by induction

We have seen several type of proofs so far:

In this chapter, we introduce a new type of proof, called proof by induction.

Proof techniques:

Let n0Z+ and let P(n) be a statement for nn0. A proof by induction is where one prove that:

  • P(n0) is true, and

  • P(n)P(n+1) for all nn0.

Combining these two statements, by the principle of induction, we deduce that P(n) is true for all nn0.

Etymology:

The word induction comes from the Latin in and ductus meaning “to lead”. An inductive proof is one where a starting case leads into the next case and so on.

(In contrast, deduction has the prefix de meaning “down from”. When we do a proof by deduction, we start from certain rules and truths that “lead down” to specific things that must follow as a consequence.)

Example:

Statement: Show that, for every nZ+, we have 1+2+3++n=i=1n=n(n+1)2.

Proof: For nZ+, let P(n) be the following statement. 1+2+3++n=n(n+1)2. We will show that P(n) is a true statement, for all nZ+ by giving a proof by induction.

First, let us consider P(1). We have 1=1(1+1)2, hence, P(1) holds.

Now, suppose that P(k) is true for some positive integer k, that is 1+2+3++k=k(k+1)2.
Then 1+2+3++k+(k+1)=k(k+1)2+(k+1)=k(k+1)2+2(k+1)2=k2+3k+22=(k+1)(k+2)2. This shows that if P(k) is true then P(k+1) is true. By the Principle of Mathematical Induction, it follows that P(n) is true for all natural numbers n.

It is not enough to show P(n)P(n+1), we need to also show P(n0) holds. As we saw in the above example, we often have that n0=1. However, this is not always the case. The following example highlights these two points.

Example:

Let P(n) be the statement n22n1. To highlight the importance of the base case, let us first show that for n3 we have P(n)P(n+1).

Suppose that P(k) is a true statement for some natural number k3, that is k22k1.

Then

If we set n0=1, we do have P(1) holds (since 11), but have not proven P(1)P(2) (since 13). In fact, we can not prove P(1)P(2) since P(2) is false (2221).

Since we have shown that for n3 we have P(n)P(n+1), we might want to set n0=3. However, P(3) is also false since 3222. In fact, we can check that P(3),P(4),P(5) and P(6) are all false. However, P(7) is true since we have that 72=4964=26.

Therefore, we have n0=7 and by the Principle of mathematical induction, we have shown that n22n1 for all nZ+ such that n7.

Sometimes, the principle of induction is not strong enough, either because we need more than one base case, or because to prove P(n) is true we need to know P(m) is true for some unknown m<n. This is where we can use the strong principle of mathematical induction.

Proof techniques:

Let n0Z+ and let P(n) be a statement for nn0. A proof by strong induction is where one prove that:

  • P(n0),P(n0+1),P(n0+k) is true (for some k0), and

  • for all nn0+k, show that P(i) is true for all inP(n+1) is true.

Combining these two statements, by the principle of induction, we deduce that P(n) is true for all nn0.

Example:

Statement: Suppose that x1=3 and x2=5 and for n3, define xn=3xn12xn2. Show that xn=2n+1, for all nZ+.

Proof: Let P(n) be the statement “xn=2n+1”. We will show that P(n) is a true statement, for all nZ+, by giving a proof by induction. First, we consider our base cases n0=1 and n0+1=2. We have the given initial conditions x1=3 and x2=5. Using the formula xn=2n+1, we indeed have x1=21+1=3 and x2=22+1=5.Therefore, P(1) and P(2) hold.

Let nZ+ and suppose for all iZ+ such that in we have P(n) holds. Then by our assumption xn1=2n1+1 and xn=2n+1. We have: This shows that if P(i) is true, for 1in, then P(n+1) is true. By the Strong Principle of Mathematical Induction, it follows that P(n) is true for all natural numbers n.
Example:

Statement: Show that every natural number can be written as the sum of distinct powers of 2. That is for all nZ+, we can write n=2a1+2a2++2ar with aiZ, ai0, and aiaj if ij.

Proof: Let P(n) be the statements “there exists a1,arZ such that ai0, aiaj if ij and n=2a1+2a2++2ar”. We check that our base case n0=1 is true. Indeed 1=20 so P(1) holds.

Let nZ+ and suppose for all iZ+ such that in we have P(n) holds. Consider A={n+12:Z>0n+120}. We note that AZ0 by definition. Taking =1, we see that n1A, so A. So, by the Well Ordering Principle, A has a minimal element, call it m. If m=0 then n+1=2 and P(n+1) holds. If m0, then since P(m) holds, we have there exists a1,arZ such that ai0, aiaj if ij and m=2a1+2a2++2ar. Then n+1=2+2a1+2a2++2ar, so we just need to show ai for all i. For a contradiction, suppose =ai for some i. Then n+12+2=2+1, so 0n+12+1<m which contradicts the definition of m. Therefore the powers are distinct and P(n+1) holds.

Therefore, by the principle of strong induction, we have showed that every natural number can be written as the sum of distinct powers of 2.

Remark:

A proof by induction is a special case of a proof by strong induction (taking k=0)! We present these two ideas separately as it is easier to understand induction before understanding strong induction. However, most mathematician will say “induction” to mean both induction and strong induction and do not distinguish between the two (so feel free to also not distinguish between the two).

5 Studying the integers

The integers, Z, is an interesting set as unlike Q or R, we can not divide. This restriction bring a lot of interesting properties that we will now study in more details. (The Well Ordering Principle also sets Z apart from Q and R and we’ll also make use of that principle.)

5.1 Greatest common divisor

While we can not do division in general, there are cases when we can divide bZ by aZ.

Definition 5.1:
For a,bZ, we say that a divides b, denoted by ab, if cZ such that b=ac. That is baZ={ax:xZ}. In this case we say a is a divisor of b.
Remark:
We negate the above definition to say that a does not divide b, (or a is not a divisor of b) denoted by ab, if cZ we have bac. That is baZ.
Interest:

Notice that if we tried to extend this definition into Q, we will find that for all aQ{0}, bQ we have ab.

Note that if b0 then ab implies |a||b|.

Theorem 5.2: (Division Theorem)

Let a,bZ with a0. Then there exists a unique q,rZ such that b=aq+r and 0r<|a|.

We call q the quotient and r the remainder

Proof.

Let A={bak:kZbakZ0}. By definition AZ0. If b0, taking k=0 gives bA. If b<0, taking k=ab gives ba2b=b(1a2)A, since b<0 and (1a2)0. So A is non-empty, and contains a least element. Let this element be r and the corresponding k be q. Since rA, we know r0. We show r<|a| by contradiction. Suppose r|a| and consider r>r|a|0. Note that r|a|=(bak)|a|=b(k±1)aA, which contradicts the minimality of r. Hence r<a as required.

It remains to show that r and q are unique. For the sake of a contradiction, suppose there exists q1,q2,r1,r2Z with 0r1<r2<|a| (without loss of generality, we assume r1<r2 as they are not equal) and b=qia+ri for i=1,2. Then q1a+r1=q2a+r2 means r2r1=q1aq2a=a(q1q2). Since q1,q2Z, we have q1q2Z, so a|(r2r1), i.e. (r2r1)aZ. But 0<r2r1<|a| and there are no integers between 0 and |a| which is divisible by a. This is a contradiction to our assumption. Hence r1=r2, and we deduce that q1=q2.

Note that the division theorem shows that a|b if and only if the remainder is equal to 0.

Because we have a notion of division, the following definition is natural.

Definition 5.3:
Let a,b,cZ. We say that c is a common divisor of a and b if ca and cb.

Note that 1 is always a common divisor of a and b, and if a0, no integer larger than |a| can be a common divisor of a and b.

Definition 5.4:
Let a,bZ with a,b not both equal to 0. We define the greatest common divisor of a and b, denoted by gcd(a,b) (or hcf(a,b) for highest common factor), to be largest positive divisor that divides both a and b.
Remark:
Note that gcd(0,0) does not exist since every integer is a divisor of 0. However, for aZ with a0, we have that gcd(a,0)=|a|.
Definition 5.5:

Let a1,a2,,anZ with not all the ai’s being equal to 0. We define the greatest common divisor of a1 to an, denoted by gcd(a1,,an), to be the largest positive divisor that divides ai for all 1in.

The following lemma showcases different properties of the gcd.

Lemma 5.6:

Suppose a,bZ with a and b not both 0.

  1. We have gcd(a,b)=gcd(|a|,b)=gcd(a,|b|)=gcd(|a|,|b|).

  2. Set c=gcd(a,b), and take x,yZ so that a=cx, b=cy; then gcd(x,y)=1.

  3. For all xZ we have gcd(a,b)=gcd(a,ax+b).

  4. For all a1,a2,a3Z we have gcd(a1,a2,a3)=gcd(gcd(a1,a2),a3).

Proof.

Exercise.

Theorem 5.7:

Let a,bZ with a,b not both equal to 0, and let gcd(a,b)=c. Then there exists s,tZ such that c=as+bt.

Proof.

Let A be the set A={as+bt:(s,tZ)(as+bt>0) }.

By taking s=a,t=b we see that a2+b2A, hence A is a non-empty subset of Z+. By the Well Ordering Principle, it has a minimal value. We denote this minimum value by d and take s,tZ such that d=as+bt. Note that since ca and cb, we have c|(as+bt), so c|d. Hence, cd.

Now, using Theorem 5.2 on a and d, let q,rZ be such that a=dq+r with 0r<d. Then r=adq=a(as+bt)q=a(1sq)+b(tq). If r>0, then rA with r<d, contrary to how we chose d. Hence, we must have that r=0, which means da. Similarly, we can show that db, so d is a common divisor of a and b. Since c=gcd(a,b), we have that dc.

Since cd and dc, it follows that c=d. Hence, gcd(a,b)=c=d=as+bt.

Proof techniques:
Note here that we proved x=y by showing xy and yx. This trick is often exploited when tryingt to prove two numbers are the same.

Remark:
Note that s and t are not unique at all. Suppose that s,tZ are such that gcd(a,b)=as+bt, then using the fact that ab+b(a)=0, we see that gcd(a,b)=as+bt+(ab+b(a))=a(s+b)+b(ta), i.e., s+b,taZ also satisfies Theorem 5.7.

This theorem has several important corollaries (and one generalisation).

Corollary 5.8:

Let a,b,cZ with c0.

  • If cab and gcd(b,c)=1, then ca.

  • If ca and cb then cgcd(a,b).

Proof.

Exercise.

Corollary 5.9:

Let a1,,anZ, not all ai’s equal to 0. Then there exists s1,,snZ such that gcd(a1,,an)=a1s1++ansn.

Proof.

Exercise.

Note that the proof of Theorem 5.7 only shows the existence of s and t but doesn’t give us a way to find/calculate the value of s and t (that is the proof is not constructive). To find s and t we need to use an algorithm. An algorithm is a logical step-by-step procedure for solving a problem in a finite number of steps. Many algorithms are recursive, meaning that after one or more initial steps, a general method is given for determining each subsequent step on the basis of steps already taken.

Etymology:

The word algorithm comes from Abu Ja’far Muhammad ibn Musa Al-Khwarizmi (Arabic Mathematician, 780-850). Al-Khwarizmi (meaning “from Khwarazm”) wrote a book detailing how to use Hindu-Arabic numerals. When this book was translated for Europeans, it was given the Latin name Liber Algorismi meaning “Book of al-Khowarazmi”. As a consequence, any manipulation of Arabic numerals (which are the ones we use nowadays) was known as an algorism. The current form algorithm is due to a “pseudo-etymological perversion” where algorism was confused with the word arithmetic to give the current spelling of algorithm.

It is interesting to note that Al-Khwarizmi also wrote the book Hisab al-jabr w’al-muqabala. We have al-jabr comes from the Arabic al- meaning “the” and jabara meaning “to reunite”, so his book was on “the reunion of broken part”, i.e. how to solve equations with unknowns. When the book made its way to Europe, Europeans shorten the Arabic title to algeber. Over time, this mutated to the word algebra which is currently used.

Algorithm 5.10: (Extended Euclidean algorithm)

Input: Integers a and b such that a>0.

Output: Integers s, t and gcd(a,b) such that gcd(a,b)=as+bt.

Algorithm:

  • Step 0: Set s0=0, t0=1 and r0=a

  • Step 1: Set s1=1 and t1=0. Find unique q1,r1Z such that b=aq1+r1 and 0r1<a. If r1=0 then proceed to the final step, else go to Step 2.

  • Step 2: Set s2=s0q1s1 and t2=t0q1t1. Find unique q2,r2Z such that a=r1q2+r2 and 0r2<r1. If r2=0 then proceed to the final step, else go to Step 3.

  • Step k (for k3): Set sk=sk2qk1sk1 and tk=tk2qk1tk1. Find unique qk,rkZ such that rk2=rk1qk+rk and 0rk<rk1. If rk=0 then proceed to the final step, else go to Step k+1.

  • Final Step: If the last step was Step n (with n1) then output rn1=gcd(a,b), sn=s and tn=t.

Proposition 5.11:

The Extended Euclidean algorithm terminates and is correct.

Proof.

Notice that after k steps, we have a>r1>r2>>rk0. Hence, the algorithm must terminate after at most a steps.

If it terminates after Step 1 (i.e. r1=0), then gcd(a,b)=a=r0. Furthermore a=a1+b0=as1+bt1.

So suppose the algorithm terminates after n steps, for n>1. We note that r1=baq1 and r2=ar1q2, and for 3k<n, we have rk=rk2rk1qk. Then by Lemma 5.6, we have gcd(a,b)=gcd(a,r1)=gcd(r1,r2)==gcd(rn1,rn)=gcd(rn1,0)=rn1.

Finally, we prove by (strong) induction that asn+btn=rn1 at every stage of the algorithm. So let P(n) be the statement that asn+btn=rn. We have seen above that P(1) holds. We also have s2=q1 and t2=1. So r1=baq1=a(q1)+b1=as2+bt2. So P(2) holds. Assume P(n) holds for all n<k, then we have ask+btk=a(sk2qk1sk1)+b(tk2qk1tk1)=(ask2+btk2)+qk1(ask1+btk1)=rk3rk2qk1=rk1. Hence P(k) holds, so by strong induction P(n) holds.

Notice that if a<0, then apply the above algorithm to |a| and b then use s instead of s.

History:

Theorem 5.7 is often known as Bezout’s identity named after Étienne Bézout (French mathematician 1730–1783), while the Extended Euclidean algorithm is named after Euclid of Alexandria (Greek mathematician, 325BC-265BC). One might wonder at the gap between between those two mathematicians given Bezout’s identity follows immediately from the algorithm.

In his series of books The Elements, Euclid gives an algorithm to find the gcd of two numbers. The algorithm uses the fact that gcd(a,b)=gcd(a,ba) and Euclid uses repeated subtractions. This was improved to give the above algorithm without sk and tk. Around 1000 years later, Aryabhata (Indian mathematician, 476-550) wrote the book Aryabhatiya which contained a version of the Extended Euclidean Algorithm. It is not clear when the algorithm was known in Europe, but it was used by Claude Gaspar Bachet de Méziriac (French mathematician, 1581-1638), who was alive a full 100 years before Bezout. What Bezout did was to show that a version the Extended Euclidean Algorithm could be used for polynomials and a version of Bezout’s identity existed for polynomials (You can find out how polynomial rings are similar to Z in units like Algebra 2).

Example:

We want to compute gcd(323,1451) and find s,tZ such that gcd(1451,323)=323s+1451t. We go through the algorithm using the following table

k sk tk Calculation qk rk
0 0 1 - - -
1 1 0 1451=3234+159 4 159
2 014=4 104=1 323=1592+5 2 5
3 1(4)2=9 012=2 159=531+4 31 4
4 4931=283 1(2)31=63 5=41+1 1 1
5 9(283)1=292 2631=65 4=14+0 4 0

Hence gcd(323,1451)=1=323(292)+1451(65).

5.2 Primes and the Fundamental Theory of Arithmetic

We now move on to look at a fundamental property of Z. First we need some more definitions.

Definition 5.12:

We say an integer p>1 is prime if for all a,bZ if pab then pa or pb.

Equivalently, if abpZ then apZ or bpZ.
Remark:
We negate the above statement to say an integer p>1 is not a prime (often called a composite) if there exists a,bZ such that pab, pa and pb.
Theorem 5.13:

An integer p>1 is a prime if and only if the only positive divisors of p are 1 and p.

Proof.

We prove both directions separately.

). For a proof by contradiction, suppose p is prime but there exists a positive divisor a which is not 1 or p. Then we have 1<a<p and there exists bZ such that ab=p. Since p>1 and a>1, we have 1<b<p. Now p=ab means p|ab. However, pa and pb since 1<a,b<p. Hence p is not prime, which is a contradiction. So if p is prime, then the only positive integers of p are 1 and p.

). Suppose p>1 is an integer whose only positive divisors are 1 and p. Let a,bZ be such that pab. If pa then we satisfy the definition of p being prime. So suppose pa, then gcd(p,a)=1 (since p has no other positive divisors). By Corollary 5.8, we have that pb, hence satisfying the definition of being prime.

Remark:
The above theorem means that if an integer n>1 is composite, it has a positive divisors which is not 1 or n. I.e., there exists aZ such that a|n and 1<a<n.
Interest:

Most student are used to Theorem 5.13 to be the definition of a prime number. In fact, Euclid introduced prime numbers to be ones that satisfied that definition (he used the Greek word protos meaning first, as prime numbers are the first numbers from which every other numbers arise. The word prime comes from Latin “primus” which also means first). Around the 19th century, it was discovered that there are some rings in which elements satisfying Definition 5.12 and elements satisfying Theorem 5.13 are different. Following this, mathematician decided that elements satisfying Theorem 5.13 should be called irreducible (they can not be reduced into smaller elements). You can learn more about irreducible elements in general rings in a unit like Algebra 2.

Theorem 5.14: (Fundamental Theorem of Arithmetic)

Let nZ+ be such that n>1. Then there exist primes p1,,pr such that n=p1p2pr.

Furthermore, this factorisation is unique up to re-arranging, that is if n=p1pr=q1qt where q1,,qt are also primes, then r=t and, if we order pi and qi such that pipi+1 and qiqi+1 for all i, then pi=qi for all i.

Proof.

We use a proof by induction to show the existence of the prime divisors of n. Let P(n) be the statement that “there exists some primes p1,,pr such that n=p1pr”. We note that 2 is prime, hence P(2) holds. Assume P(k) holds for all kn and consider P(n+1).

If n+1 is prime, then P(n+1) holds. If n+1 is composite, then it has a non-trivial divisor, say a. So n+1=ab. Note as before that 1<a,b<n+1. So by the induction hypothesis, both a and b can be written as a product of prime. Hence n can be written as a product of prime and P(n) holds. So by the principle of (strong) induction, we have all integers n2 can be written as a product of primes.

We use a proof by contradiction to show the uniqueness (up to re-ordering) of the prime divisors of n. Let S be the subset of Z+ containing the integers whose factorisation is not unique. For a contradiction, we assume S is non-empty, hence by the Well Ordering Principle, it has a least element, call it n. Suppose n=p1pr=q1qt where pi and qi are primes.

Note that since p1|n we have p1q1qt=q1(q2qt). If p1q1, by definition p1(q2qt). In this case, if p1q2 then by definition p1(q3qt). Continuing this argument, we see that there exists i such that p1qi. Without loss of generality, re-order the qi’s so that p1|q1. Since q1 is prime, it has two positive divisors, 1 and q1. Since p1>1 (as it is prime), and a divisor of q1 we deduce p1=q1. Hence p1pr=q1qt implies p2pr=q2qt=a. Since p2pr=q2qt is two distinct prime factorisation of a, we have aS. However, a<n contradicting the minimality of n. Hence S must be empty, so every n>1 has a unique factorisation.

Corollary 5.15:

There are infinitely many prime numbers in Z+.

Proof.

To the contrary, suppose that there are only finitely many primes, say p1,p2,,pm for some mZ+. Set n=p1p2pm+1. Clearly m2, as 2 is a prime. Hence n>1. By the Fundamental Theorem of Arithmetic, n can be factored as a product of primes. So we let q be some prime dividing n. Then n=qk for some kZ+. Since there are only finitely many primes, we must have that q=pj for some jZ+, 1jm. Hence, n=pjk=p1p2pm+1, so 1=np1p2pm=pj(kp1p2pmpj). Hence the prime pj divides 1, which is a contradiction. The result follows.

History:

The above proof is often called Euclid’s proof as it can be found in his books the elements. The proof follows directly his proof that every number can be factorised as a product of primes. Euclid did not prove the uniqueness of such factorisation, instead a proof of the uniqueness of such factorisation can be found in Diophantus of Alexandria (Greek mathematician, 200BC - 284BC) book Arithmetica (the title presumably linked to the Greek word arithmos meaning “number”).

The Fundamental Theorem of Arithmetic can be used to find solutions to equations such as the one below.

Example:

Problem: Find all square numbers n2Z+ such that there exists a prime p with n2=5p+9.

Solution: First, we suppose we have a prime p such that 5p+9=n2 for some nZ+. We want to find constraints on p. We have that 5p=(n+3)(n3). By the Fundamental Theorem of Arithmetic, the only positive factors of 5p are 1,5,p and 5p. That is n+3{1,5,p,5p}, so let us analyse these four cases.

  • Suppose n+3=1. Then n3=5, that is 5p=(n+3)(n3)=5. But this implies that p=1, which is not prime. It follows that n+31.

  • Suppose n+3=5. Then n3=1, so 5p=(n+3)(n3)=5 and hence p=1, which is a contradiction. Hence, n+35.

  • Suppose n+3=p. Then n3=p6, so 5p=p(p6). Hence 5=p6, so p=11, which is prime.

  • Suppose n+3=5p. Then 5p=(n+3)(n3)=5p(n3). Hence n3=1, and so n=4. Then 5p=(n+3)(n3)=7. But this is not possible since 57. Hence, n+35p.

Summarising all of the above cases, we have that if p is a prime such that 5p+9=n2, for some nZ+, then p=11.

On the other hand, let p=11. Then 5p+9=55+9=64=82. Hence, the only squares nZ+ such that there exists a prime p with 5p+9=n2 is n=8 (and p=11).
Remark:

Two things to note:

  • One can use Theorem 5.14 to show that for xQ+, there are a,bZ+ so that gcd(a,b)=1 and x=ab. Furthermore, one can show that a,b are unique. We leave this as an exercise.

  • Every nZ with n>1 can be expressed uniquely as n=i=1rpini where pi are distinct primes and niZ0. We can extend this representation to include 1 by saying 1 is equal to the empty product.

With the above representation of integers, we can revisit the greatest common divisor.

Lemma 5.16:

Let a,bZ+ and write a=i=1rpiai and b=i=1rpibi where ai,biZ and pi are the prime divisors of a or b (and hence ai,bi can be 0). Then ab if and only if aibi for all i.

Proof.

We prove both directions separately.

). For a contradiction, suppose ab and there exists i such that ai>bi. We can write a=piaiA where gcd(pi,A)=1, and we can write b=pibiB where gcd(pi,B)=1. Since a|b, there exists cZ+ such that ac=b, that is piaiAc=pibiB. Rearranging, we have piaibiAc=B. Now since aibi>0, we have pi|B, which is a contradiction since gcd(B,pi)=1. Hence, for all i, aibi.

). If aibi for all i then b=(i=1rpiai)(i=1rpibiai)=ac. Hence ab.

Corollary 5.17:

Let a,bZ such that neither are 0 and write |a|=i=1rpiai and |b|=i=1rpibi where ai,biZ and pi are the prime divisors of |a| or |b| (and hence ai,bi can be 0). Then gcd(a,b)=gcd(|a|,|b|)=i=1rpimin(ai,bi).

Proof.

The greatest common divisor is a divisor of both and therefore the prime factorization of the gcd can contain only primes that appear in both |a| and |b|. Each of those primes can only appear with at most the minimum power appearing in both. The greatest common divisor will then be obtained by choosing as many primes as possible with the minimum power appearing.

Definition 5.18:
We say that a,b are co-prime if gcd(a,b)=1.

The word co-prime is to demonstrate that there are no prime factors in common between a and b (they are prime with respect to each other).

Remark:
We say a,b are not-coprime if there exists c>1 such that c|a and c|b.
Definition 5.19:
Let a,b,cZ. Then c is a common multiple of a and b if both ac and bc.
Definition 5.20:
Let a,bZ such that neither are 0. Then the lowest common multiple of a and b, denoted by lcm(a,b), is the smallest positive integer which is divisible by both a and b.
Lemma 5.21:

Let a,bZ such that neither are 0 and write |a|=i=1rpiai and |b|=i=1rpibi where ai,biZ and pi are the prime divisors of |a| or |b| (and hence ai,bi can be 0). Then lcm(a,b)=lcm(|a|,|b|)=i=1rpimax(ai,bi).

Proof.

The lowest common multiple is a multiple of both and therefore the prime factorization of the lcm must contain all the primes that appear in both |a| and |b|. Those primes must appear with at least the maximum power appearing in each. The lest common multiple will then be obtained by choosing as few primes as possible with the maximum power appearing.

Theorem 5.22:

Let a,bZ such that neither are 0. Then lcm(a,b)gcd(a,b)=|ab|.

Proof.

We first note that for any two numbers ai,bi we have min(ai,bi)+max(ai,bi)=ai+bi. Write |a|=i=1rpiai and |b|=i=1rpibi. Then: lcm(a,b)gcd(a,b)=(i=1rpimax(ai,bi))(i=1rpimin(ai,bi))=i=1rpimin(ai,bi)+max(ai,bi)=i=1rpiai+bi=(i=1rpiai)(i=1rpibi)=|a||b|=|ab|.

Corollary 5.23:

Let a,bZ such that neither are 0. If gcd(a,b)=1, then lcm(a,b)=|ab|.

Interest:

If there’s time, we’ll talk more about polynomial ring or famous open problems in Number Theory or the fundamental theorem of algebra.

6 Moving from one set to another - Functions

In mathematics, we are very often concerned with functions (also called maps). Some functions model the behaviour of complex systems, while other functions allow us to compare two sets. Loosely speaking, a function is a black box which takes an input from one set and gives an output in another. Every input must have an output, and there is no randomness, so the same input always gives the same output. To give a formal definition of functions, we need to go back to sets.

6.1 Definitions

Before we can start working with functions, we need to give a precise mathematical definition.

Definition 6.1:
Given two sets X,Y, we define the Cartesian product of X and Y, denoted by X×Y, by X×Y={(x,y): xX, yY }.
Remark:
Note that if X= or Y=, then X×Y=.
Example:
We have that R×R={(x,y): x,yR }. So R×R is the Cartesian plane.
Example:

Let X={1,2,3} and Y={4,5,6}. Then X×Y={(1,4),(1,5),(1,6),(2,4),(2,5),(2,6),(3,4),(3,5),(3,6)}, Y×X={(4,1),(4,2),(4,3),(5,1),(5,2),(5,3),(6,1),(6,2),(6,3)}.

Note that X×YY×X.
Definition 6.2:

Let X,Y be non-empty sets. A function f from X into Y, denoted by f:XY, is a set fX×Y which satisfies that for each element xX there exists exactly one element f(x)Y such that (x,f(x))f. That is, f={(x,f(x)): xX }X×Y.

Symbolically, we have fX×Y is a function if xX,!yY such that (x,y)f (or y=f(x)).
Remark:

We negate the above statement to say that, symbolically, fX×Y is not a function if xX such that yY,(x,y)f, or, xX such that y1,y2Y such that (x,y1)f,(x,y2)f and y1y2.

In other words, fX×Y is not a function if there is xX such that either (x,y)f for all yY, or there exists two distinct y1,y2Y with (x,y1),(x,y2)f.
Example:
Let X={1,2,3} and Y={4,5,6}. Let f={(1,4),(2,5),(3,4)}, g={(1,4),(1,5),(3,6)}. Then f is a function from X into Y since, for each xX, there is exactly one yY such that (x,y)f. However, g is not a function from X into Y for two different reasons. One reason is because 2X but there is no value of yY such that (2,y)g. The other is 1X, but there are two different values of yY (namely y=4 and y=5) such that (1,y)g.
Notation:

Consider f:XY and let AX. Then, the image of A under f is denoted by f[A]={f(x):xA}={yY:xA with y=f(x)}.

We have f[]=.

Remark:
In this introductory course we use the notation f[] to distinguish when we expect the input and the output to be a set (as opposed to f() which expects an element as input and output). However, many sources make no such distinction, and will use f() regardless of whether the input is a set or an element. The key thing to note is that the output is an element of Y if the input is an element of X, and a subset of Y if the input is a subset of X.
Definition 6.3:
Consider f:XY. We say that X is the domain of f and Y is the co-domain of f. The image (or range) of f is the image of X under f, i.e., the set f[X].
Etymology:

The word function comes from the Latin functus which is a form of the verb meaning “to perform”. A function can be thought as a set of operations that are performed on each values that are inputed. Interestingly, the word function (which was first introduced by Gottfried Leibniz (German mathematician, 1646 - 1716) ) was used for maps that returned more than one output for the same input. It was only in the later half of the 20th century that a function was restricted to “exactly one output for each input”.

The word domain comes from the Latin domus which means “house, home”. The domain is the place where all the inputs of f live in. Another way to look at it is, the word domus gave the word dominus which means “lord, master”, the person that ruled the house (i.e., a domain is the property owned by a Lord). The domain of a function is the set of all the x that f has “control over”.

Example:

Define f:Z×ZZ by f((m,n))=n2, for all m,nZ. We have Z×Z is the domain of f, while Z is the codomain. The image of f is {n2:nZ} (i.e., the square numbers).

Now, let A={(m,n):m,nZ,n=2m}. Note that A={(m,2m):mZ}. The image of A under f is f[A]={f((m,n)):(m,n)A}={f((m,2m)):mZ}={(2m)2:mZ}={4m2:mZ}.
Theorem 6.4:

Consider f:XY and g:XY. Then f=g if and only if f(x)=g(x) for all xX.

Proof.

We prove both implication separately.

). First, suppose that f=g. Take (arbitrary) xX and choose (the unique) yY such that (x,y)f. Then (x,y)g (since f=g) and by definition f(x)=y=g(x). This is true for all xX, so it follows that f(x)=g(x) for every xX.

). Second, suppose that f(x)=g(x) for all xX. Then f={(x,f(x)):xX}={(x,g(x)):xX}=g.

It follows that f=g if and only if f(x)=g(x), for all xX.

Proof techniques:

Notice that in this proof (and all others) that the correct punctuation follows each equations, whether they are in-line or own their own (displayed). Maths should be read as part of a grammatically correct sentence. For example “Consider f:XY and g:XY” reads “Consider f a function from X to Y and g a function from X to Y”, or “then f={(x,f(x)):xX}={(x,g(x)):xX}=g.” reads as “then f is equal to the set of pairs (x,f(x)) where x is in X which is equal to….”.

It is for this reason that we should not start a sentence with maths symbol.

6.2 Injective, surjective and bijective

We now introduce three important properties that a function may or may not have.

Definition 6.5:

We say a function f:XY is injective (or one-to-one, or an injection) if for all x1,x2X, we have that if x1x2 then f(x1)f(x2).

Using the contrapositive, an alternative definition is that a function f:XY is injective if for all x1,x2X, we have if f(x1)=f(x2) then x1=x2.
Remark:
We negate the above statement to say that f:XY is not injective if there exists two distinct x1,x2X such that f(x1)=f(x2).
Definition 6.6:
We say a function f:XY is surjective (or onto, or a surjection) if for all yY, there exists xX such that f(x)=y.
Remark:
We negate the above statement to say that f:XY is not surjective if there exists yY such that for all xX, f(x)y.
An injective function from $X$ to $Y$, there are at most one arrow going to any point in $Y$. However the function is not surjective as there are some points in $Y$ with no arrow going to it. Figure made with Geogebra.

Figure 6.1: An injective function from X to Y, there are at most one arrow going to any point in Y. However the function is not surjective as there are some points in Y with no arrow going to it. Figure made with Geogebra.

A surjective function from $X$ to $Y$, every point in $Y$ has at least one arrow going to it. However the function is not injective as there are some points in $Y$ with at least two arrows going to it. Figure made with Geogebra

Figure 6.2: A surjective function from X to Y, every point in Y has at least one arrow going to it. However the function is not injective as there are some points in Y with at least two arrows going to it. Figure made with Geogebra

Definition 6.7:
A function f:XY is called bijective if it is both injective and surjective.
Remark:
We negate the above statement to say that f:XY is not bijective if it is not injective or it is not surjective.
Example:

Define f:R×RR×R by f((m,n))=(m+n,mn), for all m,nR. We want to show that f is bijective.

We first show it is injective by using the contrapositive. Let (m1,n1),(m2,n2)R×R (the domain) be such that f((m1,n1))=f((m2,n2)). That is (m1+n1,m1n1)=(m2+n2,m2n2), i.e. m1+n1=m2+n2 and m1n1=m2n2. Therefore, 2m1=(m1+n1)+(m1n1)=(m2+n2)+(m2n2)=2m2. Since 20, we can divide by 2 to deduce m1=m2. Hence m1+n1=m2+n2=m1+n2 means n1=n2. Therefore (m1,n1)=(m2,n2) and f is injective.

We next show that f is surjective. We begin by choosing (u,v)R×R [the co-domain].

[Scratch work: We want to find (m,n)R×R such that f(m,n)=(u,v). So we need to find (m,n)R×R such that m+n=u and mn=v. Hence, we must have m=un and m=v+n. Then un=v+n, or equivalently, uv=2n, or equivalently, uv2=n. If we have uv2=n and m=un, then m=uuv2=u+v2.]

We set m=u+v2 and n=uv2. Then (m,n)R×R [the domain], and f(m,n)=(m+n,mn)=(u+v2+uv2,u+v2uv2)=(u,v). It follows that f is surjective.

Since f is injective and surjective, it is bijective.
Remark:
Note that a function consists of three information: the domain; the co-domain; and how f(x) is defined. Changing any of these three things can change the behaviour of f:XY.
Example:

Let f1:Z+Z+ be defined by f1(n)=n2 for all nZ+. We claim that f1 is injective but not surjective.

We first show f1 is injective. Let n1,n2Z+ be such that n1n2. Without loss of generality, we assume n1<n2. Note that n1,n2>0 by definition, so n1<n2 means n12<n2n1 and similarly, n1n2<n22. Combining these two inequalities together, we get n12<n1n2<n22, i.e. f1(n1)<f1(n2) so f(n1)f(n2). So f1 is injective.

We next show f1 is not surjective. We take 2Z+ and claim that for all nZ+ we have f1(n)=n22. For a contradiction, suppose such an n exists, i.e. n2=2. Note that n>1 since 11=1<2. However, if n2, then n24>2 which is not possible. Hence 1<n<2, but there are no natural numbers between 1 and 2. Hence n does not exists and f1 is not surjective.
Example:
Let A={n2:nZ+} and let f2:ZA be defined by f2(n)=n2 for all nZ+. We claim that f2 is surjective but not injective. Indeed, we have that 1,1Z with 11 but f2(1)=1=f2(1). Hence f is not injective. Next, take mA. By definition of A, there exists nZ+ such that m=n2. Since Z+Z, we have nZ and f2(n)=n2=m as required.
Example:
Let f3:Z+A be defined by f3(n)=n2 for all nZ+. We leave it as an exercise to show that f3 is bijective.
Proposition 6.8:

Consider f:XY. Then we have that f is injective if and only if for allyf[X], there exists a unique xX such that f(x)=y.

Proof.

We show the two implications separately.

). Suppose that f is injective, and take yf[X]. Hence, x1X such that f(x1)=y. Now, suppose x2X such that x2x1. Since f is injective, we have that f(x1)f(x2)=y. Hence, yf[X], !xX such that f(x)=y.

). Suppose that yf[X], !xX such that f(x)=y. Take x1,x2X such that x1=x2 and let y=f(x1). By assumption, x is the only element of X that f maps to y. Then yf(x2), so f(x1)f(x2). Hence, we have shown that for x1,x2X with x1x2, we have that f(x1)f(x2). Therefore, f is injective.

It follows that f is injective if and only if yf[X], !xX such that f(x)=y.

Remark:

Caution! Note that the converse of the above definition is, “for all xX, there exists a unique yY such that f(x)=y”, which is the definition of a function.

So do not confuse the definition of injectivity with the definition of a function. In particular, one definition can be true without the other being true.

For example, consider f={(y2,y): yZ }Z×Z. We have that for each yZ, there exists a unique xZ such that (x,y)f (namely x=y2) so f satisfies the definition of being injective. However f is not a function since, for example, (4,2),(4,2)f (so it doesn’t make sense to talk about whether f is injective or not).
Proposition 6.9:

Consider f:XY. Then f is surjective if and only if f[X]=Y.

Proof.

We show the two implications separately.

). Suppose f is surjective, we will show f[X]=Y by showing f[X]Y and Yf[X]. Note that by definition f[X]={yY:xX with f(x)=y}Y.

Take y0Y. Since f:XY is surjective, there exists x0X so that f(x0)=y0. Thus y0f[X]. As y0 was arbitrary, we have y0Y, y0f[x]. Hence Yf[X].

). Suppose f[X]=Y. Choose y0Y=f[X]. By definition of f[X], there exists x0X such that y0=f(x0). This argument holds for all y0Y, hence f is surjective.

Corollary 6.10:

Consider f:XY. Then we have that f is bijective if and only if yY, !xX such that f(x)=y.

Proof.

We show the two implications separately.

). Suppose f is bijective, then it is surjective so by Proposition 6.9 Y=f[X]. We also have f is injective, so by Proposition 6.8 we have yf[X]=Y, !xX such that f(x)=y as required.

). Suppose that yY, !xX such that f(x)=y. In particular, yY, xX such that f(x)=y, i.e. by definition f is surjective. By Proposition 6.9 Y=f[X], so our assumption becomes yY=f[X], !xX such that f(x)=y. Hence by Proposition 6.8, f is injective. Since we have shown f is surjective and injective, we deduce f is bijective.

Etymology:

The words injective, surjective and bijective all have the same ending “-jective”. This comes from the Latin verb jacere which means “to throw”.

Injective has the prefix in, a Latin word meaning “in, into”. This gave rise to the medical term “injection” where we throw some medecine into the body. An injective function is one that throws the set X into the set Y (every elements of X stay distincts once they are in Y.)

Surjective has the prefix sur a French word meaning “over”, itself coming from the Latin super which means “(from) above”. A surjective function is one that covers the whole of Y (in the sense X is thrown from above to cover Y).

Bijective has the prefix bi, a Latin word meaning “two”. Once we talk about inverse functions, it will become clear that a bijection is a function that works in two directions.

Theorem 6.11:

Consider f:XY and let X=UV. Then

  1. f[X]=f[U]f[V].

  2. if f is injective and UV=, then f[U]f[V]=.

Proof.

a.) Since U,VX, we have that f[U],f[V]f[X], so f[U]f[V]f[X]. On the other hand, take xX. Then xU or xV, so f(x)f[U] or f(x)f[V]. Therefore f(x)f[U]f[V]. Since this holds for all xX, we have f[X]f[U]f[V]. Hence, f[X]=f[U]f[V].

b.) Suppose that f is injective and UV=. To the contrary, suppose that there exists some yf(U)f(V). Hence, there exists some uU such that y=f(u), and there exists some vV such that y=f(v). It follows that f(u)=y=f(v). Since f is injective, we have that u=v. Hence, uUV, which contradicts our initial assumption. It follows that f(U)f(V)=.

Corollary 6.12:

Suppose that f:XY is bijective, and let AX and B=XA. Then f[A]f[B]= and f[A]f[B]=Y.

Proof.

Since f is bijective it is surjective and Y=f[X]. We also note that AB=X, so we satisfy the hypothesis of the above theorem and deduce f[A]f[B]=f[X]=Y.

Furthermore since f is bijective, it is injective and we have AB=. Hence we satisfy the hypothesis of part b.) of the above theorem and can deduce that f[A]f[B]=.

6.3 Pre-images

As well as looking at the image of various subset of X under the function f:XY, we are often interested in looking at where certain subsets of Y come from.

Definition 6.13:

Consider f:XY and let VY. We define the pre-image (or inverse image) of V under f by f1[V]={xX: f(x)V}.

We have f1[]=.
Example:

Let X={1,2,3} and Y={4,5,6}. Define the function f={(1,5),(2,5),(3,4)}X×Y. Then:

  • f1[{4,6}]={3},

  • f1[{4,5}]={1,2,3}=X,

  • f1[{6}]=,

  • f1[{5}]={1,2}.

Example:

Let f:R×RR be defined by f((x,y))=2x5y. We will find f1[{0}] and f1[(0,1)] (recall that the open interval (0,1)={xR:0<x<1}).

f1[{0}]={(x,y)R×R: f((x,y)){0}}={(x,y)R×R: 2x5y=0}={(x,y)R×R: y=2x5}={(x,2x5): xR}.

f1[(0,1)]={(x,y)R×R: f((x,y))(0,1)}={(x,y)R×R: 2x5y(0,1)}={(x,y)R×R: 0<2x5y<1}={(x,y)R×R: 52y<x<52y+12}.

We can also rearrange the last line to say:

f1[(0,1)]={(52y+ε,y): ϵ,yR, 0<ε<12}.
Example:

Define g:RR by g(x)=x2. Take V=[4,)={xR:4x}. Then

g1[V]={xR: g(x)V}={xR: x2[4,)}={xR: (x2)  (x2)}=(,2][2,).

We have the nice results that union and intersection behave as expected.

Theorem 6.14:

Consider f:XY and let U,VY. Then

  1. f1[UV]=f1[U]f1[V];

  2. f1[UV]=f1[U]f1[V].

Proof.

We prove a. and leave b. as an exercise.

We have f1[UV]={xX: f(x)UV }={xX: f(x)Uf(x)V }={xX: f(x)U }{xX: f(x)V }=f1[U]f1[V].

Therefore the two sets are equal, i.e. f1[UV]=f1[U]f1[V].

The link between images and pre-images is explored in the following theorem.

Theorem 6.15:

Consider f:XY, and let UX and VY. Then

  1. f[f1[V]]V, and for f surjective, we have f[f1[V]]=V.

  2. Uf1[f[U]], and for f injective, we have U=f1[f[U]].

Proof.

We prove a. and leave b. as an exercise.

If V=, then f1[V]= and f[f1[V]]==V. So we assume V. Similarly, as V, we have that if f[f1[V]]= then f[f1[V]]V. So we also assume f[f1[V]].

Suppose that yf[f1[V]]. Then y=f(w) for some wf1[V]. By the definition of f1[V], we have f(w)V. Hence, y=f(w)V. Since y is arbitrary, this shows that every element of f[f1[V]] lies in V, that is f[f1[V]]V.

Now suppose that f is surjective. We need to show that Vf[f1[V]]. Suppose that vV. Since f is surjective, there exists an xX such that f(x)=v. Then f(x)V, and it follows that xf1[V]. Hence, v=f(x)f[f1[V]]. Since v was arbitrary, this shows that Vf[f1[V]]. Summarising, we have that f[f1[V]]=V whenever f is surjective.

6.4 Composition and inverses of functions

We now turn our attention on how to apply several functions in a row.

Definition 6.16:
Consider f:XY and g:YZ. We define the composition of g with f, denoted by gf, by (gf)(x)=g(f(x)), for all xX.
Etymology:

Composition has the same roots as components and composite. It comes from co meaning “together with” and ponere meaning “to put”. So something composite is something “put together” from different components. The composition of two functions is the result of putting two functions together.

Since f assigns to xX exactly one value f(x)Y, and g assigns to f(x)Y exactly one value in Z, we have that gf is a function from X to Z, that is gf:XZ.

Remark:
The ordering of f and g is important, as gf and fg can be different. In fact, unless Z=X, then fg probably does not make sense.

We have that composition of functions is associative. That is:

Proposition 6.17:

Consider f:XY, g:YZ and h:ZW. Then h(gf)=(hg)f.

Proof.

By Theorem 6.4, to show h(gf)=(hg)f, we need to show that xX, we have (h(gf))(x)=((hg)f)(x). Let xX. Then (h(gf))(x)=h((gf)(x))=h(g(f(x))),xX and ((hg)f)(x)=(hg)(f(x))=h(g(f(x))),xX. Thus, h(gf)=(hg)f.

This argument holds for all xX, hence h(gf)=(hg)f.

The above theorem tells us that the notation hgf is unambiguous.

Example:

Let a,b,c,dR be such that a<b and c<d. Recall that [a,b] denotes the closed interval from a to b, that is [a,b]={xR: axb }.

We define the following three functions:

  • f1:[a,b][0,ba] defined by f1(x)=xa;

  • f2:[0,ba][0,dc] defined by f2(x)=xdcba;

  • f3:[0,dc][c,d] defined by f3(x)=x+c.

We briefly check that f2 is indeed a function from [0,ba] to [0,dc] by checking f2([0,ba])[0,dc] (in theory one should also do the same with f1 and f3, but these two cases are clearer). Pick x[0,ba], so 0xba by definition. Then since ba>0 we have 0xba1, and since dc>0 we have 0xdcbadc. So f2(x)[0,dc]. As this is true for all x[0,ba], we indeed have f2 is a function with co-domain [0,dc].

We can construct f:[a,b][c,d] by composing f3 with f2 with f1, i.e., f=f3f2f1 is defined by f(x)=c+(xa)(dc)ba.

Properties of functions carry through composition.

Theorem 6.18:

Consider f:XY and g:YZ. Then:

  1. if f and g are injective, we have that gf is injective.

  2. if f and g are surjective, we have that gf is surjective.

Proof.

We prove a. and leave b. as an exercise.

Suppose that f,g are injective, and suppose that x1,x2X are such that x1x2. Since f is injective, we have that f(x1)f(x2). Now, set y1=f(x1) and y2=f(x2). Then y1,y2Y with y1y2. Further, since g is injective, we have g(y1)g(y2). It follows that (gf)(x1)=g(f(x1))=g(y1)g(y2)=g(f(x2))=(gf)(x2). To conclude, for any x1,x2X with x1x2, we have (gf)(x1)(gf)(x2). Hence gf is injective.

Corollary 6.19:

Consider f:XY and g:YZ be bijective, then gf:XZ is also bijective.

Definition 6.20:

We say that a function f:XY is invertible if there exists a function g:YX such that:

  • gf is the identity function on X (that is, xX, (gf)(x)=x), and

  • fg is the identity function on Y (that is, yY, (fg)(y)=y).

We say g is an inverse of f.

If g is an inverse for f, then we also have that f is an inverse for g.
Etymology:

We’ve already seen that inverse comes from in and vertere which is the verb “to turn”. The inverse of an function, is one that turns the original function inside out to give back what was originally put in.

Example:
Define f:RR by f(x)=2x+3, and define g:RR by g(x)=x32. Then xR, we have (gf)(x)=g(f(x))=f(x)32=(2x+3)32=x and (fg)(x)=f(g(x))=2g(x)+3=2(x32)+3=x. Hence gf is the identity function on R, and fg is the identity function on R. It follows that f is invertible where g is an inverse of f.
Theorem 6.21:

Suppose f:XY, and suppose that g:YX and h:YX are inverses of f. Then g=h, that is, if f has an inverse then its inverse is unique.

Proof.

Exercise.

Remark:

Since the inverse is unique, the inverse of f:XY is often denoted by f1:YX.

Note the difference between f1[] which is about pre-images (also known as inverse images), and f1() which is about the inverse function. In particular, when using f1[] (i.e., calculating pre-images), we are not making any claim that f is invertible or indeed that f1:YX is a function that exists. However, when using f1(), we are at the same time claiming that f is invertible and hence f1 is a function.

It is important to note that we have two conditions to prove f:XY is invertible. As an exercise, one can find examples of f:XY and g:YX such that fg is the identity, but gf is not. However, if we have some constraints on f or g then only one condition needs to be met.

Proposition 6.22:

Suppose that f:XY and g:YX are such that gf is the identity function on X.

  1. If g is injective, then fg is the identity map on Y (and hence g=f1).

  2. If f is surjective, then fg is the identity map on Y (and hence g=f1).

Proof.

Exercise.

This next theorem gives another way to test if a function is invertible or not.

Theorem 6.23:

Suppose that f:XY. Then f is invertible if and only if f is bijective.

Proof.

We prove both implications separately.

). First, we will show that if f is invertible, then f is bijective. So suppose f is invertible, and let g:YX denote the inverse of f. Then gf is the identity function on X, and fg is the identity function on Y. We first show f is surjective. Take y1Y, then y1=fg(y1)=f(g(y1)). So set x1=g(y1). Then x1X and f(x1)=f(g(y1))=y1. So for all yY, there exists xX such that f(x)=y.

We prove that f is also injective using the contrapositive. Suppose we have x1,x2X with f(x1)=f(x2)=y. Then since gf is the identity on X, we have x1=g(f(x1))=y=g(f(x2))=x2. Hence f is injective and surjective, so it is bijective.

). Second, we will show that if f is bijective, then f is invertible. So suppose that f is bijective. We set g={(y,x)Y×X:(x,y)f}Y×X. Since f is bijective, by Corollary 6.10 we have that for all yY, there is a unique xX such that f(x)=y. Hence, g is a function, that is g:YX.

We will show that g is the inverse of f. For this, we first take xX and set yY to be y=f(x). Then by the definition of g, we have that g(y)=x, so (gf)(x)=g(y)=x. Since x was arbitrary, this shows that gf is the identity function on X. Now, choose any yY. Since f is bijective then for all yY, there exists a unique xX such that f(x)=y. Further, since g is a function, we must have that g(y)=x, and hence (fg)(y)=f(x)=y. Since y was arbitrary, this shows that fg is the identity function on Y. Hence if f is bijective, we have that f is invertible.

It follows that f is invertible if and only if f is bijective.

Suppose that f:XY is bijective. In the above proof, we found a way of defining f1:YX. That is, yY, we can write f1(y)=x where xX such that f(x)=y.

Example:

Let a,b,c,dR be such that a<b and c<d. In the previous example, we defined f:[a,b][c,d] as the composition of three functions and to be f(x)=c+(xa)(dc)ba. We want to prove f is bijective. One method would be to show all three functions that made up its composition are bijective (and this is relatively easy to do so). But we will instead show f is invertible by constructing the inverse.

By symmetry (replacing a with c, b with d etc), let us define g:[c,d][a,b] via g(x)=a+(xc)(ba)dc. We have (gf)(x)=g(f(x))=a+(f(x)c)(badc)=a+[c+(xa)(dcba)c](badc)=a+(xa)=x, for all x[a,b]. Similarly, (fg)(x)=f(g(x))=c+(g(x)a)(dcba)=c+[a+(xc)(ba)(dc)](dcba)=x, for all x[c,d]. It follows that g:[c,d][a,b] is the inverse of f. Therefore, f is bijective.
Theorem 6.24:

Suppose f:XY and g:YZ are bijective. Then (gf)1=f1g1.

Proof.

Exercise.

7 Cardinality

Now that we know about functions, we return to sets, in particular “measuring their size”. One reason sets were introduced was to “tame” and understand infinity. Indeed, as we will see at the end of course, the study of sets leads us to realise that there are different kinds of infinities.

Bijective functions allowed us to construct a “dictionary” from one set to another: every element in one set is in one-to-one correspondence with the elements in another bijective set. This brings the idea that there’s a fundamental property shared by these two sets. This is the motivation of this section, but first some notation.

Notation:

Let A be a set, we denoted by |A| the cardinality of A.

The above notation doesn’t actually define what we mean by the cardinality of a set.

Definition 7.1:

We build our notion of the cardinality of different sets as follows.

  • We define ||=0.

  • For nZ+ we define |{1,2,,n}|=n.

  • If A and B are two non-empty sets, we say |A|=|B| if and only if there exists a bijective f:AB.

Again, for an individual infinite set A this does not define what we mean by |A|, but it does define what we mean by |A|=|B| for two infinite sets A and B.

Remarks 7.2:

We need to ensure that our last definition makes sense by checking the following:

  • For any non-empty set A, we expect |A|=|A|. We verify this is true by noting that f:AA defined by f(a)=a is bijective.

  • For any non-empty sets A, B, we expect that if |A|=|B| then |B|=|A|. Let us verify this. Suppose |A|=|B|, then there exists a bijective f:AB. Since f is bijective it is invertible, so there exists f1:BA. Since f1 is invertible, it is bijective, hence |B|=|A| by definition.

  • For any non-empty sets A, B, C, we expect that if |A|=|B| and |B|=|C| then |A|=|C|. Let us verify this. Suppose |A|=|B| and |B|=|C|, then there exists bijective f:AB and g:BC. Consider (gf):AC, which is bijective by Corollary 6.19, hence |A|=|C| as required.

Looking forward, we will revisit the three properties above when we look at equivalence relations.

By combining the last two definitions together, we see that if f:{1,2,,n}A is bijective, then |A|=n and we say A has n elements. Note that in this case we can enumerate the elements of A as a1,,an (where ai=f(i)).

Definition 7.3:

We have two cases

  • When |A|Z0 then we say A is a finite set.

  • If A is not a finite set, we say A is an infinite set.

We take as a fact that Z+ is infinite (which makes sense since by the Archimedean Property, Z+ is not bounded above).

Etymology:

Finite comes from the Latin finis meaning “end”. A finite set is one that ends. While infinite has the prefix in- meaning “not”, so an infinite set is one that has no end.

Cardinality comes from the Latin cardin meaning “hinges” (which are used to pivot doors). This was used to talk about the “Cardinal virtues”, i.e., the pivotal virtues, the most important one. From there, the word developed to mean “principal”. This was used by mathematician to name the set of whole numbers {0,1,2,3,} to be the Cardinals, as they are the principal numbers (compared to fractions, reals etc.). From there, the cardinality of a set referred to the whole number/the cardinal associated to the set.

Lemma 7.4:

Suppose f:XY is injective and AX. Then |A|=|f[A]|.

Proof.

Let B=f[A]. Restrict f to A by defining g:AB via g(a)=f(a), for all aA. By the definition of B, we have that B=f[A]=g[A], so g is surjective. Now, suppose that a1,a2A are such that g(a1)=g(a2). Then f(a1)=f(a2), and since f is injective, this means that a1=a2. Hence, g is injective. It follows that g is bijective, so |A|=|g[A]|. We also know that g[A]=B=f[A], so |A|=|g[A]|=|f[A]|.

Theorem 7.5:

Let A, B be two finite sets with |A|=n and |B|=m. If there exists an injective f:AB then nm.

Proof.

Let C=f[A], then by the previous lemma we have |C|=|A|=n and we know CB. Hence B has at least n elements, so m=|B|n.

Remark:

The contrapositive of the theorem is “If |A|>|B| then for any f:AB there exist distinct a1,a2A such that f(a1)=f(a2)”. Theorem 7.5 is often known as the Pigeonhole Principle as it can be understood as “if you have more pigeons than pigeonholes, then two pigeons will need to share a pigeonhole”.

We extend the above principle to all (finite or infinite) sets.

Definition 7.6:
For any two sets A and B, if there exists injective f:AB then we say |A||B|. We say |A|<|B| if |A||B| and |A||B|.

Note that in particular, for any two sets A and B such that BA, since we can define an injective function f:BA via f(b)=b for all bB, we have |B||A|.

Lemma 7.7:

Let A and B be two sets such that BA. Then

  1. If A is finite then B is finite.

  2. If B is infinite then A is infinite.

Proof.

Assume |A|=n. We have |B||A|=n, i.e. |B|=m for some 0mn. By definition, B is finite.

Note that part b.) is the contrapositive of part a.)

Theorem 7.8:

If X,Y are sets with |X||Y| and |Y||X| then |X|=|Y|.

In particular, if there exists injective f:XY and injective g:YX then there exists bijective h:XY. (Note: we are not claiming f or g are bijective).

Proof.

Non-examinable. It is beyond the scope of this course.

Remark:
While we have the tools to prove the theorem in the case when X or Y is finite, the proof for when both X and Y are infinite is beyond the scope of this course (but see Set Theory in third year). There is an interesting proof of the theorem in the book Algebra by Pierre Grillet, which is available as an electronic book from the University of Bristol library (and in which the theorem is called the Cantor-Bernstein Theorem).
History:

Theorem 7.8 has many names: Cantor-Schröder-Bernstein Theorem; Schröder–Bernstein theorem or (as above) the Cantor-Bernstein theorem. Cantor himself suggested that it should be called the equivalence theorem.

The reasons for so many names is that Georg Cantor (Russian mathematician, 1845-1912) first stated the theorem in 1887, but did not prove it. In 1897, Sergei Bernstein (Ukranian mathematician, 1880-1968) and Ernst Schröder (German mathematician, 1841-1902) independently published different proofs. However, five years later, in 1902, Alwin Korselt (German mathematician, 1864-1947) found a flaw in Schröder’s proof (which was confirmed by Schröder just before he died).

Proposition 7.9:

Let X be a set and let A,BX be disjoint (AB=) finite sets. Then |AB|=|A|+|B|.

Proof.

Let m,nZ+ such that |A|=m and |B|=n. Then A={a1,a2,,am} where ai=aj only if i=j, for integers 1i,jm. Similarly, we have B={b1,b2,,bn} where bi=bj only if i=j, for integers 1i,jn. Since AB=, we know that for any integers i,j with 1im and 1jn, we have aibj. Now, define f:{1,2,,m+n}AB by f(k)={akif 1km,bkmif m<km+n. We leave it as an exercise that f is bijective.

Corollary 7.10:

Let X be a set and let A,BX be finite sets. Then |AB|=|A|+|B||AB|

Proof.

Exercise.

We will return to the idea of cardinality in Chapter 11, where we see there are different infinite cardinalities.

8 Sets with structure - Groups

Having looked at sets and functions between sets, we now turn our attention to “groups”. Formally these are sets with extra structure given by a binary operation (e.g. R with +), but they arose historically (and crop up across maths today) as a way to understand symmetry.

8.1 Motivational examples - Symmetries

Before even giving the definition of a group, we’ll look at several examples of symmetries of objects, so that the formal definition will make more sense when we come to it.

Exactly what we mean by a “symmetry” will vary from example to example, and sometimes there’s more than one sensible notion for the same object, so rather than giving a general definition, we’ll clarify what we mean in each example, and the common features should become clear.

8.1.1 Permutations of a set

Let us consider the set of three elements. Imagine them arranged in a line:

three points on a line

Figure 8.1: three points on a line

We can rearrange them, putting them in a different order. For example, we could swap the two elements furthest to the right:

the three points above but with the second and third swapped

Figure 8.2: the three points above but with the second and third swapped

This leaves us with exactly the same picture, so we’ll regard this as a “symmetry” in this context.

Now let’s label the elements and put both pictures next to each other to see how the elements have been permuted.

showing how we permuted the elements

Figure 8.3: showing how we permuted the elements

This picture (Figure 8.3) brings to mind the function f:{1,2,3}{1,2,3} defined by {(1,1),(2,3),(3,2)} (i.e. f(1)=1,f(2)=3 and f(3)=2).

Or if we cyclically permuted the elements as follows:
a cyclic permutation

Figure 8.4: a cyclic permutation

this (Figure 8.4) would correspond to the function f:{1,2,3}{1,2,3} defined by {(1,2),(2,3),(3,1)} (i.e. f(1)=2,f(2)=3 and f(3)=1).

Remark:
We could alternatively think of the function where f(x) tells us which element has moved to position x, in which case, in the last example we’d have f(1)=3, f(2)=1 and f(3)=2. There’s nothing wrong with doing this, but note that it gives a different function, and to avoid confusion we’ll stick to the first convention.
If this is to count as a ``symmetry’’ of the original position, then we don’t want to move two elements to the same place, or to leave a place unoccupied. For example:
not a permutation

Figure 8.5: not a permutation

gives a different pattern. Therefore we need the function f to be bijective.

Definition 8.1:

Let X be a set. A permutation of the set X is a bijection f:XX.

Let’s look in more detail at the permutations of a set X={1,2,3} with three elements. It’s easy to check that there are just six of these, which we’ll denote by e,f,g,h,i,j: see Figure 8.6, where we just show the position the three elements end up.

the six permutations of three elements

Figure 8.6: the six permutations of three elements

Notice that the function e is the identity function, and likewise it is called the identity permutation.

Now we can look at what happens when we perform one permutation followed by another. For example, if we do f (swap the two elements furthest to the right) and then g (swap the two elements that are now furthest to the left), then we are just taking the composition gf of the two functions. This gives g(f(1))=g(1)=2, g(f(2))=g(3)=3 and g(f(3))=g(2)=1, so gf=i.

For simplicity, we’ll leave out the composition symbol and write gf instead of gf. Let’s see what fg is: f(g(1))=f(2)=3, f(g(2))=f(1)=1 and f(g(3))=f(3)=2, so fg=h.

Remark:
When composing permutations or other symmetries, the order does matter in general. gf will always mean “do f first and then g” as in the notation for functions (g(f(x)) is what we get when we apply the function f and then the function g), rather than reading from left to right.

When we come to define groups, we’ll think of “composition” as an operation to combine permutations or other kinds of symmetry (if f and g are symmetries of an object, then gf is the symmetry “do f and then g”), much as ``multiplication’’ is an operation to combine numbers. In fact we’re using pretty much the same notation: compare gf for permutations with xy for numbers. Let’s explore this analogy further with permutations of a set X.

  • Note that since the composition of two bijections is a bijections, we have that the composition of two permutations is a permutation. This is similar to axioms (A1) and (A7).

  • We have seen that composition of functions is associative (see Proposition 6.17 ), hence if a,b,c are permutation, we have a(bc)=(ab)c. This is similar to axioms (A2) and (A8).

  • We always have an identity permutation, e:XX defined by e(x)=x for all xX. Let f be any other permutation of X, then for all xX we have ef(x)=e(f(x))=f(x)=f(e(x))=fe(x), i.e. ef=f=fe. This is similar to axioms (A4) and (A10).

  • We know that a bijective function is invertible, so for all permutation f:XX, there exists f1 such that ff1=e=f1f. This is similar to axioms (A5) and (A11).

[You can check in the permutations of {1,2,3} that e1=e;f1=f;g1=g;h1=i;i1=h and j1=j]

Note that we can’t have anything similar to axioms (A3) or (A9) as we have seen an example of two permutations f,g of {1,2,3} such that fgfg.

8.1.2 Symmetries of polygons

Consider a regular n-sided polygon (for example, if n=3 an equilateral triangle, or if n=4 a square). Examples of symmetries are:

  • A rotation through an angle of 2π/n (an nth of a revolution).

  • A reflection in a line that goes through the centre of the polygon and one of its vertices.

There are many ways to make precise what we mean by a symmetry. For example, considering the polygon as a subset X of R2, we could look at bijections f:XX that preserve distance: i.e., such that the distance between f(x) and f(y) is the same as the distance between x and y. It’s not hard to see that this implies that f must send vertices to vertices, and moreover must send adjacent vertices to adjacent vertices. To keep things simple, we’ll use this as our definition of a symmetry:

Definition 8.2:
A symmetry of a regular n-sided polygon is a permutation f of the set of n vertices that preserves adjacency: i.e., so that for vertices u and v, u and v are adjacent if and only if f(u) and f(v) are adjacent.

Note that for n=3, every pair of vertices is adjacent, so in that case every permutation of the vertices is a symmetry, and so we are just looking at permutations of a set of three elements as in the last section.

In the case n=4, we have a square. Let’s label the vertices as follows:
the four vertices labelled

Figure 8.7: the four vertices labelled

Then if we rotate it anticlockwise through an angle of π/2 (let’s call this symmetry f) we get (Figure 8.8):
rotating the square

Figure 8.8: rotating the square

While if we reflect in the line of symmetry going from top right to bottom left (let’s call this symmetry g) we get (Figure 8.9):

reflecting the square

Figure 8.9: reflecting the square

However the following does not represent a symmetry as vertices 1 and 2 have not remained adjacent (Figure 8.10):

not a symmetry

Figure 8.10: not a symmetry

As with permutations, we can compose symmetries together and get a new symmetry. In fact, the analogues of (A1)/(A7), (A2)/(A8), (A4)/(A10) and (A5)/(A11) all hold (Prove that the composition of two bijections that preserves adjacency is a bijection that preserves adjacency, and that the inverse of a function that preserve adjacency is a function that preserves adjacency.)

Let’s look at what happens when we compose the symmetries f and g (Figure 8.11).
composing the two previous symmetries

Figure 8.11: composing the two previous symmetries

We see that gf is a reflection in the horizontal line of symmetry while fg is a reflection in the vertical line of symmetry. So here again we have gffg.

8.1.3 Symmetries of a circle

We can look at bijections from a circle to itself that preserve distances between points. It’s not too hard to see that these symmetries are either rotations (through an arbitrary angle) or reflections (in an arbitrary line through the centre).

Again, if f and g are symmetries, then usually fggf.

8.1.4 Symmetries of a cube

We can look at the eight vertices of a cube, and define a symmetry to be a permutation of the set of vertices that preserves adjacency (as we did with a polygon). There are 48 symmetries: most are either rotations or reflections, but there is also the symmetry that takes each vertex to the opposite vertex.

8.1.5 Rubik’s Cube

A Rubik’s cube has 54 coloured stickers (9 on each face), and we could define a “symmetry” to be a permutation of the stickers that we can achieve by repeatedly turning faces (as one does with a Rubik’s cube). It turns out that there are 43,252,003,274,489,856,000 symmetries. Again we can compose symmetries (do one sequence of moves and then another), and every symmetry has an inverse symmetry.

8.2 Formal definition

We want to formalize the structure of symmetries of the kind we looked at in the last section. First of all noticed that in all cases we looked at combining two permutations/symmetries to get a third. We can formalize this as follows.

Definition 8.3:
A binary operation on a set G is a function :G×GG.
Etymology:

The word “binary” refers to the fact that the function takes two inputs x and y from G. (Technically speaking, it takes one input from G×G.)

Notation:

We’ll usually write xy instead of (x,y).

Example:
  • Let X be a non-empty set. Composition is a binary operation on the set G of permutations of a set X.

  • Addition + is a binary operation on the set R (or on Z, or on Q).

  • Multiplication and subtraction are also binary operations on R, Q or Z.

  • Intersection and unions are binary operations on the set of subsets of R.

Note that in the definition of a binary operation, the function maps to G, so if we have a definition of xy so that xy is not always in G, then this is not a binary operation on G (we say that G is not closed under ). Also, the domain of is G×G, so xy needs to be defined for all pairs of elements x,y.

Example:
  • Subtraction is not a binary operation on Z+ since, for example, 47=3Z+. So Z+ is not closed under subtraction (it is closed under addition).

  • Division is not a binary operation on Z since, for example, 1/2Z.

  • Division is not a binary operation on Q since x/y is not defined when y=0. (But division is a binary operation on the set Q{0} of non-zero rational numbers).

For a general binary operation, the order of the elements x,y matters: xy is not necessarily equal to yx.

Definition 8.4:
A binary operation on a set G is called commutative if xy=yx for all x,yG.
Example:

Some examples and non-example of commutative binary operations

  • Addition and multiplication are commutative binary operations on R (axioms (A3) and (A9) ).

  • Subtraction is not commutative on R since, for example, 21=1 but 12=1.

  • Composition is not a commutative operation on the set of permutations of the set {1,2,3}.

Bearing in mind the analogies we drew between some of the axioms of R and the compositions of permutations/symmetries, we’ll now define a group.

Definition 8.5:

A group (G,) is a set G together with a binary operation :G×GG satisfying the following properties (or “axioms”):

  • (Associativity) For all x,y,zG, (xy)z=x(yz).

  • (Existence of an identity element) There is an element eG (called the identity element of the group) such that, for every xG, xe=x=ex.

  • (Existence of inverses) For every xG, there is an element x1G (called the inverse of x) such that xx1=e=x1x.

Strictly speaking, the group consists of both the set G and the operation , but we’ll often talk about “the group G” if it’s clear what operation we mean, or say “G is a group under the operation ”. But the same set G can have several different group operations, so we need to be careful.

Example:

If X is a set, S(X) is the set of all permutations of X (i.e., bijective functions XX), and fg is the composition of bijections f and g, then (S(X),) is a group.

Note that in this example, there are two sets involved (X and the set S(X) of permutations). It is the set S(X) that is the group, not X (we haven’t defined a binary operation on X).

The set of all symmetries of a regular n-sided polygon is a group under composition, as is the set of all symmetries of a cube, or a Rubik’s cube.

These examples, and similar ones, of symmetry groups, are the motivation for the definition of a group, but there are some other familiar examples of sets of numbers with arithmetical operations that fit the definition.

Example:

Here we explore some sets and associated operations and check if they are a group or not.

  • We have (R,+) is a group. [by axioms (A2),(A4),(A5)]. Similarly (Z,+) and (Q,+) are groups.

  • The set Z+ is not a group under addition, since it doesn’t have an identity element [0Z+]. Note Z0 is still not a group (under addition), as although it has the identity element 0, no integer n>0 has an inverse (since for any nZ+, nZ0).

  • We have (R{0},) is a group [by axioms (A8),(A10) and (A11)]. Similarly (Q{0},) is a group.

  • We have (R,) is not a group, since 0 does not have an inverse.

  • We have R is not a group under subtraction, since associativity fails: (xy)z=xyz, but x(yz)=xy+z, and so these are different whenever z0.

Matrices are another source of examples.

Example:

Let Mn(R) be the set of n×n matrices with real entries. Then Mn(R) is a group under (matrix) addition.

Mn(R) is not a group under matrix multiplication, since not every matrix has an inverse. However, the set of invertible n×n matrices is a group under multiplication.

We’ve stressed that xy and yx are typically different in symmetry groups. But in the examples coming from addition and multiplication of integers, they are the same.

Definition 8.6:
A group (G,) is called abelian if xy=yx for all x,yG.
History:

The word “abelian” was introduced in the 1880s by German mathematician Heinrich Weber (1842 - 1913). He derived the name from the Norwegian mathematician Niels Henrik Abel (1802 - 1829) who studied the permutation of solutions of polynomials. Very loosely speaking he showed in 1824 that for a given polynomial if the permutation of the roots form an Abelian group then the polynomial can be solved in radicals (i.e. involving +,,,÷ and n-th roots only). You can learn more about this in the fourth year course Galois Theory (named after French mathematician Évariste Galois, 1811 - 1832).

Even in a non-abelian group, xy=yx may hold for some elements x and y, in which case we say that x and y commute. For example, the identity commutes with every other element. However for the group to be abelian it must hold for all elements.

Example:
  • The group S(X) of permutations of a set X is non-abelian if X has at least three elements, since if x,y,zX are three distinct elements, f is the permutation that swaps x and y (and fixing all other elements), g is the permutation that swaps y and z, then fggf.

  • More generally, symmetry groups are typically non-abelian (although sometimes they are). In particular, the symmetry group of a regular n-sided polygon (where n3) is non-abelian.

  • (R,+) is abelian by axiom (A3).

  • (R{0},) is abelian by axiom (A9).

  • Mn(R) is an abelian group under matrix addition.

  • If n2, then the set of invertible n×n real matrices is a non-abelian group under matrix multiplication, since for example (1101)(0110)=(1110) but (0110)(1101)=(0111).

Notation:

Often, especially when we’re dealing with abstract properties of general groups, we’ll simplify the notation by writing xy instead of xy, as though we’re multiplying. In this case we’ll say, for example, “Let G be a multiplicatively-written group”.

Note that this is purely a matter of the notation we use for the group operation: any group can be “multiplicatively-written” if we choose to use that notation, so if we prove anything about a general multiplicatively-written group, it will apply to all groups, no matter what the group operation is.

Of course, if we’re looking at something like the group of real numbers under addition, it would be incredibly confusing to write this multiplicatively, so in cases like that, where multiplication already has some other meaning, or where there’s already another standard notation for the group operation, we’ll tend not to use multiplicative notation.

Notice that in a multiplicatively-written group, the associativity axiom says (xy)z=x(yz), and from this it follows easily (by induction, for example) that any product such as wxyz has a unique meaning: we could bracket it as (wx)(yz) or (w(xy))z or w((xy)z) or any of the other possible ways, and because of associativity they would all give the same element.

Remark:
Because examples such as (R,+) are abelian, which is not typical for a group, it can be misleading to try to get your intuition for how groups work from examples like this. A much better simple example to use is the group of permutations of {1,2,3} or, if you like thinking more geometrically, the group of symmetries of an equilateral triangle (actually, since all vertices of a triangle are adjacent, this is really just the same as the group of permutations of the set of vertices), or of a square.

8.3 Elementary consequences of the definition

Throughout this section, G will be a multiplicatively written group. Remember that being “multiplicatively written” is purely a matter of notation, so everything will apply to any group.

A familiar idea from elementary algebra is that if you have some equation (say involving a,b,xR) such as ax=bx then, as long as x0, you can “cancel” the x to deduce that a=b. Formally, we multiply both sides by x1 (which exists by axiom (A11)).

A similar principle applies in a group, because of the fact that elements all have inverses, with one complication caused by the fact that the group operation may not be commutative.

Since multiplying on the left and on the right are in general different, so we need to decide which is appropriate.

Proposition 8.7: (Right cancellation)

Let a,b,x be elements of a multiplicatively written group G. If ax=bx, then a=b.

Proof.

Multiply on the right by x1: ax=bx(ax)x1=(bx)x1a(xx1)=b(xx1)ae=bea=b.

Proposition 8.8: (Left cancellation)

Let a,b,x be elements of a multiplicatively written group G. If xa=xb, then a=b.

Proof.

Multiply on the left by x1: xa=xbx1(xa)=x1(xb)(xx1)a=(xx1)bea=eba=b.

Remark:
Warning: If ax=xb, then in a non-abelian group it is not necessarily true that a=b, since to “cancel” x from both sides of the equation we need to multiply the left hand side by x1 on the right, but multiply the right hand side by x1 on the left, and these are different operations.

This simple principle has some nice consequences that make studying groups easier. One is that the defining property of identity element e is enough to identify it: no other element has the same property.

Proposition 8.9: (Uniqueness of the identity)

Let a,x be elements of a multiplicatively written group. If ax=a then x=e. Similarly, if xa=a then x=e.

Proof.

If ax=a then ax=ae. By “left cancellation”, we can cancel a to deduce x=e. Similarly, if xa=a then xa=ea, so by “right cancellation” we can cancel a to deduce x=e.

A similar proof shows that an element of a group can only have one inverse.

Proposition 8.10: (Uniqueness of inverses)

Let x,y be elements of a multiplicatively written group. If xy=e then x=y1 and y=x1.

Proof.

If xy=e then xy=xx1, and so, by left cancellation y=x1. Similarly, if xy=e then xy=y1y, and so by right cancellation x=y1.

This means that, to prove that one element x of a group is the inverse of another element y, we just need to check that their product (either way round: xy or yx) is equal to the identity. Here are some examples of useful facts that we can prove like this:

Proposition 8.11:

Let x be an element of a multiplicatively written group. Then the inverse of x1 is x: (x1)1=x.

Proof.

By uniqueness of inverses, we just need to check that xx1=e, which is true.

Proposition 8.12:

Let x,y be elements of a multiplicatively written group. Then the inverse of xy is (xy)1=y1x1.

Proof.

By uniqueness of inverses, we just need to check that (xy)(y1x1)=e. But (xy)(y1x1)=((xy)y1)x1=(x(yy1))x1=(xe)x1=xx1=e.

Make sure you understand how each step of the previous proof follows from the definition of a group, and in particular how we have used the associative property. The notes will be less explicit about this in future proofs, leaving out brackets.

Remark:
Warning: Note that in a non-abelian group it is not in general true that (xy)1=x1y1, since x1y1y1x1 in general.
Remark:
Compare the above proposition (and the remark) with Theorem 6.24 about the inverse of the composition of two bijective functions, and with the similar fact for inverting the product of two matrices.
Notation:

Next, some notation. If xG, we write write x2 for the product xx of x with itself, x3 for the product x(x2) (which is the same as (x2)x by associativity), and so on.

Note that for n=1 we also have a meaning for xn, since x1 is notation we use for the inverse of x.

We extend this even further by defining xn to be (xn)1 if n>0 (which gives us a meaning for xn for any non-zero integer n, positive or negative) and defining x0 to be the identity element e (so that we now have a meaning for xn for every nZ.

We call xn the nth power of x.

Remark:
If G=(R{0},), then this meaning of xn is the same as the meaning we’re used to.

To justify why this is a sensible notation, let us see what happens when we multiply powers. First:

Lemma 8.13:

If n>0 then xn=(x1)n.

Proof.

By definition, xn=(xn)1. To prove this is the same as (x1)n we just have to show (x1)nxn=e, by uniqueness of inverses. But (x1)nxn=x1x1x1xxx=x1x1exx=x1x1xx==e, cancelling each x1 with an x.

Proposition 8.14:

If x is an element of a multiplicatively written group G, and m and n are integers, then (xm)(xn)=xm+n.

Proof.

Let us fix nZ and prove it is true for all mZ.

We first prove this by induction for when m0. It is true when m=0 since (x0)(xn)=e(xn)=xn. Suppose it is true for m=k1 [that is (xk1)(xn)=xk1+n]. We show it is true for m=k. We have (xk)(xn)=(x(xk1))(xn)=x((xk1(xn))=x(xk+n1)=xk+n. So by induction it is true for all m0.

If m<0, and y=x1, then by the lemma (xm)(xn)=(ym)(yn) which is equal to y(m+n)=xm+n by applying what we’ve already proved with y in place of x.

We’ve already proved the formula (xy)1=y1x1. What about (xy)n for other values of n? In a non-abelian group, there is no simple formula. In particular:

Remark:
Warning: If x,y are elements of a non-abelian group, then in general (xy)nxnyn. The point is that (for n>0, say) (xy)n=xyxyxy and unless the group is abelian we can not rearrange the terms to get xxxyyy=xnyn.

8.4 Dihedral Groups

As we study group theory, it will be useful to have a supply of examples of groups to think about. Of course, no single example will be ideal for everything, but some are more helpful than others. Some of the more “arithmetic” groups, such as (Z,+) and (R{0},×), have the advantage of being very familiar, but the disadvantage of being rather untypical because they’re abelian. If you have a “standard example” that you like to think about, then it will be much less misleading if it’s non-abelian.

A good first family of examples are the “dihedral groups”, which are the symmetry groups of regular polygons, and are non-abelian, but still fairly uncomplicated. In the next section we will look at a second important family of examples, the “symmetric groups”.

Let X be a regular n-sided polygon, with vertices labelled anticlockwise 1,2,,n. Recall that by a symmetry of X we mean a permutation of the vertices that takes adjacent vertices to adjacent vertices.

For example, we can send vertex 1 to any other vertex by an appropriate rotation. If f is a symmetry sending vertex 1 to vertex i, then, since it preserves adjacency, it must send vertex 2 to one of the two vertices adjacent to vertex i, and once we know f(1) and f(2), then f(3) is determined as the other vertex adjacent to f(2), and so on around the polygon for f(4),,f(n).

So the total number of symmetries is 2n (since there are n choices for f(1) and for each of these there are two choices for f(2)). This explains the following choice of notation.

Definition 8.15:
The dihedral group D2n of order 2n is the group of symmetries of a regular n-sided polygon.
Remark:
Some books use the symbol Dn where we use D2n (i.e., they label the group with the size of the polygon rather than the size of the group), although at least the D is fairly standard.

Let’s fix some notation for two particular symmetries.

Notation:

We set aD2n to be a rotation anticlockwise through an angle of 2π/n. We set bD2n to be a reflection in the line through vertex 1 and the centre of the polygon.

So a(1)=2,a(2)=3,,a(n1)=n,a(n)=1, and ,b(n1)=3,b(n)=2,b(1)=1,b(2)=n,b(3)=n1,

The symmetries a and b on $D_8$

Figure 8.12: The symmetries a and b on D8

The symmetries a and b on $D_{10}$

Figure 8.13: The symmetries a and b on D10

Now consider the symmetries ai and aib for 0i<n. These both send vertex 1 to vertex 1+i, but ai sends vertices 2,3,,n to the vertices following anticlockwise around the polygon, whereas aib sends them to the vertices following clockwise around the polygon. So all of these symmetries are different, and every symmetry is of this form, and so every element of D2n can be written in terms of a and b.

Proposition 8.16:

We have D2n={e,a,a2,,an1,b,ab,a2b,,an1b}.

Let G be a group with a finite number of elements: its multiplication table (often also called the Cayley table) has one row and one column for each element of the group, and the entry in row x and column y is the product xy. So the table displays the group operation.

Example:

The Cayley table for D6={e,a,a2,b,ab,a2b} is as follows:

e a a2 b ab a2b
e e a a2 b ab a2b
a a a2 a3 ab a2b a3b
a2 a2 a3 a4 a2b a3b a4b
b b ba ba2 b2 bab ba2b
ab ab aba aba2 ab2 abab aba2b
a2b a2b a2ba a2ba2 a2b2 a2bab a2ba2b
History:

Arthur Cayley was an English mathematician (1821 - 1895) who gave the first abstract definition of a finite group in 1854. It is around then that he showed that a group is determined by its multiplication table (i.e., its Cayley table), and gave several examples of groups that were not symmetric groups. However, at the time the mathematical community did not pursue this further, preferring to stick to studying the concrete symmetric groups. It wasn’t until 1878 when he re-introduced his definition that other mathematicians refined it and an agreed formal definition of a group emerged.

As we can see in the Cayley graph above, we have expressions such as ba or a1 appearing, but they are not in our list of standard form for elements. So, we must be able to work out how to write elements such as ba or a1 into standard form.

There are three basic rules that let us easily do calculations.

Proposition 8.17:

We have an=e.

This is clearly true, as it just says that if we repeat a rotation through an angle of 2π/n n times, then this is the same as the identity.

Note that this means that a(an1)=e and so by uniqueness of inverses a1=an1, which allows us to write any power (positive or negative) of a as one of e,a,,an1.

Proposition 8.18:

We have b2=e.

This is clearly true, as repeating a reflection twice gives the identity.

This implies, by uniqueness of inverses again, that b1=b, and so any power of b is one of e or b.

What about ba? Well a(1)=2 and b(2)=n, so ba(1)=n, similarly ba(2)=n1 and so the other vertices follow clockwise. This is the same as an1b, or a1b.

Proposition 8.19:

We have ba=an1b=a1b.

We’ll see that these three rules allow us to simplify any expression.

For example, ba1=ban1=baan2=a1ban2=a1baan3=a1a1ban3==an+1b=ab, where we repeatedly “swap” a b with an a or a1, changing the a into a1 or the a1 into a.

We get the following rules for multiplying expressions in standard form.

Theorem 8.20:

Fix nZ. In D2n, for 0i,j<n,

aiaj={ai+j if i+j<nai+jn if i+jnai(ajb)={ai+jb if i+j<nai+jnb if i+jn(aib)aj={aijb if ij0aij+nb if ij<0(aib)(ajb)={aij if ij0aij+n if ij<0

You are encouraged to practice some calculations with the dihedral group, especially with n=3 and n=4, as we’ll frequently be using these as examples.

Example:

We rewrite The Cayley table for D6={e,a,a2,b,ab,a2b} (check the calculation yourself):

e a a2 b ab a2b
e e a a2 b ab a2b
a a a2 e ab a2b b
a2 a2 e a a2b b ab
b b a2b ab e a2 a
ab ab b a2b a e a2
a2b a2b ab b a2 a e
Remark:

The cancellation properties which we previously found have a nice interpretation in terms of Cayley tables:

The left cancellation proposition just says that all the entries in each row are different: if two entries xy and xz in row x are the same, then xy=xz and so y=z. Similarly the right cancellation property says that all the entries in each column are different. This can be a useful method for deducing information about a group from a partial multiplication table.

Observe that these hold with the Cayley table above.
Remark:
When n=3, all vertices of a regular n-sided polygon (i.e., an equilateral triangle) are adjacent to one another, so D6 contains all permutations of {1,2,3}. But this doesn’t apply for larger n.

8.5 Symmetric Groups and Cycles

We have already met symmetric groups, but not with that name. A symmetric group is just the group of all permutations of a set. We’ll only really consider finite sets, although the definition makes sense for any set.

Definition 8.21:
  • Let X be a set. The symmetric group on X is the group S(X) of all permutations of X (i.e., bijective functions f:XX), with composition as the group operation.

  • The symmetric group Sn of degree n is the group of all permutations of the set {1,2,,n} of n elements.

Let’s think of ways of describing permutations. Of course, we could just say what f(i) is for each value of i, and so, for example, refer to the element f of S6 with f(1)=3, f(2)=6, f(3)=1, f(4)=4, f(5)=2 and f(6)=5.

This is quite hard to read, and one easy way to set out the information in a more easily readable form is to use a “double row” notation with 1,2,,n along the top row, and f(1),f(2),,f(n) along the bottom row.

For example, if f is the element of S6 described above, then we could write f=(123456361425).

But there’s an even more compact method, that is also much more convenient for understanding group theoretic aspects of permutations.

Definition 8.22:

Let k and n be positive integers with kn. A k-cycle f in Sn is a permutation of the following form. There are k distinct elements i1,i2,,ik{1,2,,n} such that f(i1)=i2,f(i2)=i3,,f(ik1)=ik,f(ik)=i1 and f(i)=i for i{i1,i2,,ik}. (So f “cycles” the elements i1,i2,,ik and leaves other elements unmoved.)

Such an f is denoted by f=(i1,i2,,ik).

We call k the length of this cycle.
Example:

In S8, the 6-cycle g=(2,7,8,5,6,3) has g(2)=7,g(7)=8,g(8)=5,g(5)=6,g(6)=3,g(3)=2,g(1)=1 and g(4)=4, or, written in double row notation, (1234567817246385). The notation (2,7,8,5,6,3) would also denote an element of S9 with g(9)=9, so this notation doesn’t specify which symmetric group we’re looking at, although that is rarely a problem.

Note that the 6-cycle (7,8,5,6,3,2) is exactly the same permutation as g, so the same cycle can be written in different ways (in fact, a k-cycle can be written in k different ways, as we can choose to start the cycle with any of the k elements i1,i2,,ik).
Remark:
We’ll allow k=1, but every 1-cycle (i1) is just the identity permutation, since it send i1 to i1 and every ii1 to i, so we’ll rarely bother to write down 1-cycles.
Definition 8.23:
A transposition is another name for a 2-cycle. So this is just a permutation that swaps two elements and leaves all other elements unmoved.
Etymology:

Transposition comes from the Latin trans meaning “across, beyond” and positus meaning “to put”. A transposition takes two elements and put them across each other, i.e. swap position.

Of course, not every permutation is a cycle. For example, the permutation f=(123456361425) that we used as an example earlier is not. However, we can write it as a composition f=(1,3)(2,6,5) of two cycles. These two cycles are “disjoint” in the sense that they move disjoint set of elements {1,3} and {2,5,6}, and this means that it makes no difference which of the two cycles we apply first. So also f=(2,6,5)(1,3).

Definition 8.24:

A set of cycles in Sn is disjoint if no element of {1,2,,n} is moved by more than one of the cycles.

Theorem 8.25:

Every element of Sn is may be written as a product of disjoint cycles.

Proof.

We give a constructive proof, that is we give a method which will write any fSn as such a product.

We’ll write down a set of disjoint cycles by considering repeatedly applying f to elements of {1,2,,n}, starting with the smallest elements.

So we start with 1, and consider f(1),f2(1), until we reach an element fk(1) that we have already reached. The first time this happens must be with fk(1)=1, since if fk(1)=fl(1) for 0<l<k then fk1(1)=fl1(1), and so we’d have repeated the element fl1(1) earlier. So we have a cycle (1,f(1),f2(1),,fk(1)).

Now we start again with the smallest number i that is not involved in any of the cycles we’ve already written down, and get another cycle (i,f(i),,fs(i)) for some s.

We keep repeating until we’ve dealt with all the elements of {1,2,,n}.

This will probably be clearer if we look at an example.

Example:

Consider the element f=(123456789251976348) of S9 written in double row notation. To write this as a product of disjoint cycles, we start with 1, and calculate f(1)=2,f(2)=5,f(5)=7,f(7)=3,f(3)=1, so we have a 5-cycle (1,2,5,7,3).

The smallest number we haven’t yet dealt with is 4, and f(4)=9,f(9)=8,f(8)=4, so we have a 3-cycle (4,9,8).

The only number we haven’t dealt with is 6, and f(6)=6, so finally we have a 1-cycle (6).

So f=(1,2,5,7,3)(4,9,8)(6) as a product of disjoint cycles. Since 1-cycles are just the identity permutation, we can (and usually will) leave them out, and so we can write f=(1,2,5,7,3)(4,9,8).

Since the order in which we apply disjoint permutations doesn’t matter, we could write the cycles in a different order, or we could start each cycle at a different point. So, for example, f=(9,8,4)(5,7,3,1,2) is another way of writing the same permutation as a product of disjoint cycles.

It’s very easy to read off the product of permutations written in disjoint cycle notation, just by using the method in the proof of Theorem 8.25.

Example:

Let f=(1,5,3,4)(2,6,7) and g=(3,4,8,1,2,5) be elements of S8 written in disjoint cycle notation. Let’s calculate fg and gf. fg=(1,5,3,4)(2,6,7)(3,4,8,1,2,5), but of course these are not disjoint cycles. So we start with 1 and calculate where it is sent by performing the permutation fg repeatedly:

  • 1 is sent to 2 by (3,4,8,1,2,5), which is sent to 6 by (2,6,7), which is not moved by (1,5,3,4). So fg(1)=6.
  • 6 is not moved by (3,4,8,1,2,5). It is sent to 7 by (2,6,7), which is not moved by (1,5,3,4). So fg(6)=7.
  • 7 is not moved by (3,4,8,1,2,5). It is sent to 2 by (2,6,7), which is not moved by (1,5,3,4). So fg(7)=2.
  • 2 is sent to 5 by (3,4,8,1,2,5), which is not moved by (2,6,7) and is sent to 3 by (1,5,3,4). So fg(2)=3.
  • 3 is sent to 4 by (3,4,8,1,2,5), which is not moved by (2,6,7), and sent to 1 by (1,5,3,4). So fg(3)=1.
  • So we’ve completed our first cycle (1,6,7,2,3).
  • 4 is sent to 8 by (3,4,8,1,2,5), which is not moved by (2,6,7) or (1,5,3,4). So fg(4)=8.
  • 8 is sent to 1 by (3,4,8,1,2,5), which is not moved by (2,6,7) and sent to 5 by (1,5,3,4). So fg(8)=5.
  • 5 is sent to 3 by (3,4,8,1,2,5), which is not moved by (2,6,7) and sent to 4 by (1,5,3,4). So fg(5)=4.
  • So we’ve completed another cycle (4,8,5).
  • We’ve now dealt with all the numbers from 1 to 8, so fg=(1,6,7,2,3)(4,8,5) as a product of disjoint cycles.
Similarly, gf=(3,4,8,1,2,5)(1,5,3,4)(2,6,7)=(1,3,8)(2,6,7,5,4) as a product of disjoint cycles.
Example:
It is easy to write down the inverse of a permutation written as a product of disjoint cycles. Just write the permutation backwards. For example, if f=(1,4,3,5,7)(2,6,8) then f1=(8,6,2)(7,5,3,4,1). Of course, we have a choice of which order to take the cycles, and where to start each cycle, and if we carried out the method in the proof of Theorem 8.25 then we’d get the alternative representation f1=(1,7,5,3,4)(2,8,6).
History:

Symmetric groups were one of the first type of groups to emerge as mathematicians studied the permutation of roots of equations in the 1770s. This was before the formal abstract definition of a group was introduced - around 100 years later in the 1870s.

8.6 Order of a group and of elements

With those two main examples in mind, we now explore some more properties of groups.

Definition 8.26:

The order of a group G, denoted |G|, is the cardinality of the set G.

Similarly, we say a group G is finite if the set G is finite, and infinite otherwise.
Example:
  • We have seen that |D2n|=2n (justifying our notation).

  • We have |(Z,+)| is infinite.

  • We have |({1,1},)|=2.

Proposition 8.27:

We have |Sn|=n!.

Proof.

We’ll count the permutations of {1,2,,n}. Let fSn. Since f(1) can be any of the elements 1,2,,n, there are n possibilities for f(1). For each of these, there are n1 possibilities for f(2), since it can be any of the elements 1,2,,n except for f(1). Given f(1),f(2), there are n2 possibilities for f(3), since it can be any of the elements 1,2,,n except for f(1) and f(2). And so on. So in total there are n(n1)(n2)21=n! permutations of {1,2,,n}.

Definition 8.28:

Let x be an element of a multiplicatively-written group G. Then:

  • If xn=e for some nZ+, then the order of x, and denoted ord(x) (or ordG(x) if it may be unclear what group G we’re referring to) is ord(x)=min{nZ+:xn=e}, i.e. the least positive integer n such that xn=e. [Note this is well defined by the Well Ordering Principle.]

  • If there is no nZ+ such that xn=e, then we say that x has infinite order and write ord(x)= (or ordG(x)=).

Remark:
If you are asked to prove that ord(x)=n, then as well as proving that xn=e, remember that you also have to prove that n is the least positive integer for which this is true: in other words, prove that if 0<m<n then xme.
Remark:
We used the same word for the order of a group and the order of an element. Although the two meanings are different, we’ll see later that there is a very close relationship between them.
Example:
In any group G with identity element e, ord(e)=1 and x=e is the only element with ord(x)=1.
Example:
In the dihedral group D2n, we same notation as before, ord(a)=n and ord(b)=2.
Example:
In the group (R{0},×) we have ord(1)=1, ord(1)=2 (since (1)2=1 but (1)11), and for any other x ord(x)= (since either |x|<1, in which case |xn|<1 for all integers n>0 and so xn1, or |x|>1, in which case |xn|>1 for all integers n>0 and so xn1).
Example:
In the group (Z,+), ord(0)=1, but ord(s)= for any other sZ. Note that when the group operation is addition and the identity element is 0, as here, for an element s with finite order n we would require s+s++s (n times) to be 0.
Proposition 8.29:

The order of a k-cycle in the symmetric group Sn is k.

Proof.

It is clear that if we repeat a k-cycle (i1,i2,,ik) k times, each element ij is sent back to itself (and if we repeat it l<k times, then i1 is sent to il+1i1).

In fact, one benefit of disjoint cycle notation in Sn is that it makes it easy to calculate the order of a permutation.

Theorem 8.30:

If f is the product of disjoint cycles of lengths k1,k2,,kr, then ord(f)=lcm(k1,k2,,kr).

Proof.

Consider when fk is the identity permutation. For fk(ji)=ji when ji is one of the numbers involved in the ki-cycle, we need ki to divide k. But if ki divides k for all i, then this applies to all the numbers moved by f. So the smallest such k is the lowest common multiple of k1,,kr.

Example:
The order of the permutation (1,6,7,2,3)(4,8,5) from a previous example is lcm(5,3)=15. Notice that if we write this permutation as (1,5,3,4)(2,6,7)(3,8,1,2,5)(4,3), using cycles that are not disjoint, it is much less clear what the order is, and it is definitely not the lowest common multiple of the cycle lengths.

We’ll now see what we can say about the powers of an element of a group if we know its order, starting with elements of infinite order.

Proposition 8.31:

Let x be an element of a group G with ord(x)=. Then the powers xi of x are all distinct: i.e., xixj if ij are integers.

Proof.

Suppose xi=xj. Without loss of generality we’ll assume ij. Then for n=ij0, xn=xij=xi(xj)1=(xi)(xi)1=e, but since ord(x)= there is no positive integer n with xn=e, so we must have n=0 and so i=j.

Corollary 8.32:

If x is an element of a finite group G, then ord(x)<.

Proof.

If ord(x)= then the previous proposition tells us that the elements xi (for iZ) are all different, and so G must have infinitely many elements.

In fact, we’ll see later that the order |G| of a finite group severely restricts the possible (finite) orders of its elements. Next for elements of finite order.

Proposition 8.33:

Let x be an element of a group G with ord(x)=n<.

  1. For an integer i, xi=e if and only if i is divisible by n.

  2. For integers i,j, xi=xj if and only if ij is divisible by n.

  3. x1=xn1.

  4. The distinct powers of x are e,x,x2,,xn1.

Proof.

  1. Firstly, if i is divisible by n, so that i=nk for some integer k, then xi=xnk=(xn)k=ek=e, since xn=e.

Conversely, suppose that xi=e. We can write i=nq+r with q,rZ and 0r<n (by Theorem 5.2 ). Then e=xi=xnq+r=(xn)qxr=eqxr=xr. But n is the least positive integer with xn=e and xr=e with 0r<n, so r can’t be positive, and we must have r=0. So i is divisible by n.

  1. xi=xj if and only if xij=xi(xj)1=e, which by a.) is the case if and only if ij is divisible by n.

  2. Take i=n1, j=1. Then ij=n is divisible by n, and so xn1=x1 by b.

  3. For any integer i, write i=nq+r for q,r integers with 0r<n. Then ir is divisible by n, and so xi=xr by (2). So every power of x is equal to one of e=x0,x=x1,x2,,xn1. Conversely, If i,j{0,1,,n1} and ij is divisible by n, then i=j, and so by b.) the elements e,x,,xn1 are all different.

If we know the order of an element of a group, then we can work out the order of any power of that element.

Proposition 8.34:

Let x be an element of a group G, and i an integer.

  • If ord(x)= and i0, then ord(xi)=. (If i=0, then xi=e, and so ord(xi)=1).

  • If ord(x)=n<, then ord(xi)=ngcd(n,i).

Proof.

  • Suppose i>0. If ord(xi)=m<, then xim=(xi)m=e with im a positive integer, contradicting ord(x)=. Similarly, if i<0 then xim=e with im a positive integer, again giving a contradiction. So in either case ord(xi)=.

  • Since gcd(n,i) divides i, n divides nigcd(n,i), and so (xi)n/gcd(n,i)=xni/gcd(n,i)=e.

If 0<m and (xi)m=xim=e, then n divides im, so ngcd(n,i) divides m, and in particular ngcd(n,i)m. So ngcd(n,i) is the smallest positive exponent d such that (xi)d=e, and is therefore the order of xi.

Example:
In the dihedral group D12, ord(a)=6. So the Proposition gives ord(a4)=3, since gcd(6,4)=2.
Example:
Taking i=1, the Proposition gives ord(x1)=ord(x), since gcd(n,1)=1 for any n. [But this case is very easy to check directly, since (x1)d=(xd)1 and so xd=e if and only if (x1)d=e.]

9 Linking groups together

We linked sets to each other by considering subsets, functions between sets, and Cartesian products of sets. In a related way, in this section we look at appropriate ideas of “subgroups”, functions between groups and direct products of groups.

9.1 Subgroups

 

Definition 9.1:
A subgroup of a group G is a subset HG that is itself a group with the same operation as G. We sometimes write HG.
Remark:
It is important that the group operation is the same. For example R{0} is a subset of R, but we do not regard (R{0},×) as a subgroup of (R,+) since the group operations are different.
Example:
For any group G, the trivial subgroup {e} and G itself are subgroups of G.
Definition 9.2:
We call a subgroup not equal to {e} a non-trivial subgroup, and a subgroup not equal to G a proper subgroup of G.
Example:
We have (Q,+) is a proper subgroup of (R,+). We have (Z,+) is a proper subgroup of both (Q,+) and (R,+).
Example:

The group (2Z,+) is a subgroup of (Z,+).

If n is a positive integer, then the group (nZ,+) is a subgroup of (Z,+).
Example:
The group of rotations of a regular n-sided polygon (i.e., {e,a,a2,,an1}) is a subgroup of the dihedral group D2n.

Here’s a simple description of what needs to be checked for a subset to be a subgroup.

Theorem 9.3:

Let G be a multiplicatively written group and let HG be a subset of G, Then H is a subgroup of G if and only if the following conditions are satisfied.

  • (Closure) If x,yH then xyH.

  • (Identity) eH.

  • (Inverses) If xH then x1H.

Proof.

We don’t need to check associativity, since x(yz)=(xy)z is true for all elements of G, so is certainly true for all elements of H. So the conditions imply H is a group with the same operation as G.

If H is a group, then (Closure) must hold. By uniqueness of identity and inverses, the identity of H must be the same as that of G, and the inverse of xH is the same in H as in G, so (Identity) and (Inverses) must hold.

Proposition 9.4:

If H,K are two subgroups of a group G, then HK is also a subgroup of G.

Proof.

We check the three properties in the theorem.

  • If x,yHK then xyH by closure of H, and xyK by closure of K, and so xyHK.

  • eH and eK, so eHK.

  • If xHK, then x1H since H has inverses, and x1K since K has inverses. So x1HK.

9.2 Cyclic groups and cyclic subgroups

An important family of (sub)groups are those arising as collections of powers of a single element.

Notation:

Given a (multiplicatively-written) group G and an element xG, we’ll define x to be the subset {xi:iZ} of G consisting of all powers of x.

It is easy to check that:

Proposition 9.5:

The set x is a subgroup of G.

Proof.

We need to check the conditions of Theorem 9.3.

  • If xi,xj are powers of x, then xixj=xi+j is a power of x, so x is closed.

  • We have e=x0 is a power of x, so ex.

  • If xix, then (xi)1=xix, so x is closed under taking inverses.

Definition 9.6:
If xG, then x is called the cyclic subgroup of G generated by x.

The following explicit description of x follows immediately from Propositions 8.31 and 8.33.

Proposition 9.7:

If ord(x)= then x is infinite with xixj unless i=j.

If ord(x)=n then x={e,x,,xn1} is finite, with n distinct elements.

Remark:
This justifies using the word “order” in two different ways. The order (as an element) of x is the same as the order (as a group) of x.
Example:
If G=D2n is the dihedral group of order 2n, then a is the group {e,a,an1} of rotations, and b={e,b}.
Example:
Let G=(R{0},×). Then 1={1,1} and 2={,14,12,1,2,4,} consists of all powers of 2 (and of 12, since we include negative powers of 2).
Example:
Let G=(R,+). Since the operation is now addition, x consists of all multiples of x. So, for example, 1=Z and 3=3Z={3n:nZ}.
Definition 9.8:
A group G is called cyclic if G=x for some xG. Such an element x is called a generator of G.
Example:
Let G=(Z,+), then G is cyclic, since Z=1=1. Hence 1 and 1 are generators.

This example shows that there may be more than one possible choice of generator. However, not every element is a generator, since, for example, 0={0} and 2=2ZZ.

Example:
Let H be the cyclic subgroup of D12 generated by a. Then a and a1 are generators of H, but a2 is not, since a2={e,a2,a4} contains only the even powers of a.
Example:
Let H be the cyclic subgroup of D14 generated by a. Then a2 is a generator of H, since a={e,a2,a4,a6,a8=a,a10=a3,a12=a5} contains all the powers of a. In fact, all elements of H except e are generators.
Proposition 9.9:

Every cyclic group is abelian.

Proof.

Suppose G=x is cyclic with a generator x. Then if g,hG, g=xi and h=xj for some integers i and j. So gh=xixj=xi+j=xjxi=hg, and so G is abelian.

Of course, this means that not every group is cyclic, since no non-abelian group is. But there are also abelian groups, even finite ones, that are not cyclic.

Proposition 9.10:

Let G be a finite group with |G|=n. Then G is cyclic if and only if it has an element of order n. An element xG is a generator if and only if ord(x)=n.

Proof.

Suppose xG. Then |x|=ord(x), and since xG, x=G if and only if ord(x)=|x|=|G|=n.

Example:
Let G=D8 and let H={e,a2,b,a2b}. Then H is an abelian subgroup of G (check this), but it is not cyclic, since |H|=4 but ord(a2)=ord(b)=ord(a2b)=2 and ord(e)=1, so H has no element of order 4.
Theorem 9.11:

Every subgroup of a cyclic group is cyclic.

Proof.

Let G=x be a cyclic group with generator x, and let HG be a subgroup.

If H={e} is the trivial subgroup, then H=e is cyclic. Otherwise, xiH for some i0, and since also xiH since H is closed under taking inverses, we can assume i>0.

Let m be the smallest positive integer such that xmH. We shall show that H=xm, and so H is cyclic.

Certainly xmH, since every power of xm is in H. Suppose xkH and write k=mq+r where q,rZ and 0r<m. Then xr=xkmq=xk(xm)qH, since xkH and xmH. But 0r<m, and m is the smallest positive integer with xmH, so r=0 or we have a contradiction. So xk=(xm)qxm. Since xk was an arbitrary element of H, Hxm.

9.3 Group homomorphism and Isomorphism

After studying sets, we looked at how functions allows us to go from one set to another. A group is a set with some extra structure, and so to learn about groups we want to consider functions between groups which take account of this structure somehow.

A “group homomorphism” is a function between two groups that links the two group operations in the following way.

Definition 9.12:
Let (G,) and (H,) be groups. A group homomorphism is a function φ:GH such that φ(xy)=φ(x)φ(y) for all x,yG.
Remark:
If the groups G and H are written multiplicatively, then φ:GH is a homomorphism if φ(xy)=φ(x)φ(y) for all x,yG. But it should be noted that on the left hand side of that equation we multiply x and y in G, but on the right hand side we multiply φ(x) and φ(y) in H, so this still links the group operations of different groups.
Remark:
If you are doing Linear Algebra, then you may find it helpful to note a similarity between the definitions of a group homomorphism and a linear map. These are both functions that “commute with the operations” in the sense that the definition of a homomorphism says that multiplying two elements and then applying the homomorphism gives the same as applying the homomorphism to the two elements and then multiplying the resulting elements, and the definition of a linear map says the same for the operations of addition and multiplication by scalars in place of the group operation. In fact, many of the basic facts about homomorphisms are very similar to basic facts about linear maps, usually with pretty much the same proof.
Example:
If G and H are groups and eH is the identity element of H, the function φ:GH given by φ(x)=eH for all xG is a homomorphism (the trivial homomorphism).
Example:
If HG are groups, then the inclusion map i:HG given by i(x)=xG for all xH is a homomorphism. This is injective but not surjective (unless H=G).

If we have a group homomorphism, there are certain things we can deduce:

Lemma 9.13:

Let φ:GH be a homomorphism, let eG and eH be the identity elements of G and H respectively, and let xG. Then

  1. φ(eG)=eH,

  2. φ(x1)=φ(x)1,

  3. φ(xi)=φ(x)i for any iZ.

Proof.

We go through all three statements.

  1. We have φ(eG)=φ(eGeG)=φ(eG)φ(eG), so by uniqueness of the identity φ(eG)=eH.

  2. We have eH=φ(eG)=φ(xx1)=φ(x)φ(x1), so by uniqueness of inverses φ(x1)=φ(x)1.

  3. This is true for positive i by a simple induction (it is true for i=1, and if it is true for i=k then φ(xk+1)=φ(xk)φ(x)=φ(x)kφ(x)=φ(x)k+1, so it is true for i=k+1). Then by b.) it is true for negative i, and by a.) it is true for i=0.

As we have seen, bijective functions allows us to treat two sets of the same cardinality as essentially the same sets, just changing the name of the elements. A similar story occurs with groups. Suppose G=x={e,x,x2} and H=y={e,y,y2} are two cyclic groups of order 3. Strictly speaking they are different groups, since (for example) x is an element of G but not of H. But clearly they are “really the same” in some sense: the only differences are the names of the elements, and the “abstract structure” of the two groups is the same. To make this idea, of two groups being abstractly the same, precise, we introduce the idea of an isomorphism of groups.

Definition 9.14:
Let (G,) and (H,) be groups. An isomorphism from G to H is a bijective homomorphism. That is, a bijective function φ:GH such that φ(xy)=φ(x)φ(y) for all elements x,yG.
Remark:
As when we defined homomorphisms, we used different symbols for the two group operations to point out that the isomorphism links the two different operations. As ever, we’ll usually write the groups multiplicatively, in which case the defining property of an isomorphism becomes φ(xy)=φ(x)φ(y), but again it should be stressed that this involves two different kinds of “multiplication”: on the left hand side of the equation we are multiplying in G, but on the right hand side in H.
Etymology:

Homomorphism comes from the Greek homo to mean “same” and morph to mean “shape”. Isomorphism comes from the Greek iso to mean “equal” (and morph to mean “shape”).

A homomorphism is a map between two groups that keeps the same shape, i.e.  preserves the group operation (but not necessarily anything else), while an isomorphism is a map between two groups that have equal shape (it preserves the group operation and many other useful properties as we will see below).

Example:
Let G=x and H=y be two cyclic groups of the same order. Then φ:GH defined by φ(xi)=yi for every iZ is an isomorphism, since it is a bijection and φ(xi+j)=yi+j=yiyj=φ(xi)φ(xj) for all i,j.
Remark:
Since φ is a bijection, it pairs off elements of G with elements of H, and then the defining property of an isomorphism says that we can use φ and its inverse φ1 as a dictionary to translate between elements of G and elements of H without messing up the group operation. If we take the multiplication (Cayley) table of G and apply φ to all the entries, we get the multiplication table of H: the groups G and H are “really the same” apart from the names of the individual elements.

Next we’ll prove some of the easy consequences of the definition.

Proposition 9.15:

Let φ:GH be an isomorphism between (multiplicatively-written) groups. Then φ1:HG is also an isomorphism.

Proof.

Since φ is a bijection, it has an inverse function φ1 that is also a bijection.

Let u,vH. Since φ is a bijection, there are unique elements x,yG with u=φ(x) and v=φ(y). Then φ1(uv)=φ1(φ(x)φ(y))=φ1(φ(xy))=xy=φ1(u)φ1(v), and so φ1 is an isomorphism.

Because of this Proposition, the following definition makes sense, since it tells us that there is an isomorphism from G to H if an only if there is one from H to G.

Definition 9.16:
Two groups G and H are said to be isomorphic, or we say G is isomorphic to H, if there is an isomorphism φ:GH, and then we write GH, or φ:GH if we want to specify an isomorphism.
Proposition 9.17:

Let G,H,K be three groups. If G is isomorphic to H and H is isomorphic to K, then G is isomorphic to K.

Proof.

Let φ:GH and θ:HK be isomorphisms. Then the composition θφ:GK is a bijection and if x,yG then θ(φ(xy))=θ(φ(x)φ(y))=θ(φ(x))θ(φ(y)), and so θφ is an isomorphism.

Proposition 9.18:

Let φ:GH be an isomorphism between (multiplicatively-written) groups, let eG and eH be the identity elements of G and H respectively, and let xG. Then

  1. φ(eG)=eH,

  2. φ(x1)=φ(x)1.

  3. φ(xi)=φ(x)i for every iZ.

  4. ordG(x)=ordH(φ(x)).

Proof.

The first three statements follows directly from Proposition 9.13 since an isomorphism is an homomorphism.

For the last statement, by a.) and c.), we have xi=eGφ(x)i=eH, and so ordG(x)=ordH(φ(x)).

To prove that two groups are isomorphic usually requires finding an explicit isomorphism. Proving that two groups are not isomorphic is often easier, as if we can find an “abstract property” that distinguishes them, then this is enough, since isomorphic groups have the same “abstract properties”. We’ll make this precise, and prove it, with some typical properties, after which you should be able to see how to give similar proofs for other properties, just using an isomorphism to translate between properties of G and of H.

Proposition 9.19:

Let G and H be isomorphic groups. Then |G|=|H|.

Proof.

This follows just from the fact that an isomorphism is a bijection φ:GH.

Proposition 9.20:

Let G and H be isomorphic groups. If H is abelian then so is G.

Proof.

Suppose that φ:GH is an isomorphism and that H is abelian. Let x,yG. Then φ(xy)=φ(x)φ(y)=φ(y)φ(x)=φ(yx), since φ(x),φ(y)H, which is abelian. Since φ is a bijection (and in particular injective) it follows that xy=yx. So G is abelian.

Proposition 9.21:

Let G and H be isomorphic groups. If H is cyclic then so is G.

Proof.

Suppose φ:GH is an isomorphism and H=y is cyclic. So every element of H is a power of y. So if gG then φ(g)=yi for some iZ, and so g=φ1(y)i. So if x=φ1(y) then every element of G is a power of x, and so G=x.

Notation:

If nZ+, we write Cn to mean a (multiplicatively-written) cyclic group of order n.

This notation is sensible since a cyclic group can only be isomorphic to another cyclic group of the same order, and since two cyclic groups of the same order are isomorphic.

Remark:

We can use this proposition to show that there exists groups of the same order which are not isomorphic. Consider the groups C8 and D8, both have order 8, however we have seen D8 is not abelian, and hence not cyclic.

This again highlights that a group is a set with a binary operation: a bijection exists between the sets C8 and D8, but it does not preserve the group operations.
Proposition 9.22:

Let G and H be isomorphic groups and nZ{}. Then G and H have the same number of elements of order n.

Proof.

By Proposition 9.18, an isomorphism φ:GH induces a bijection between the set of elements of G with order n and the set of elements of H with order n.

The idea of isomorphism gives an important tool for understanding unfamiliar or difficult groups. If we can prove that such a group is isomorphic to a group that we understand well, then this is a huge step forward.

The following example of a group isomorphism was used for very practical purposes in the past.

Example:

The logarithm function log10:(R+,)(R,+) is an isomorphism from the group of positive real numbers under multiplication to the group of real numbers under addition (of course we could use any base for the logarithms). It is a bijection since the function y10y is the inverse function, and the group isomorphism property is the familiar property of logarithms that log10(ab)=log10(a)+log10(b). Now, if you don’t have a calculator, then addition is much easier to do by hand than multiplication, and people used to use “log tables” to make multiplication easier. If they wanted to multiply two numbers, they would look up the logarithms, add them and then look up the “antilogarithm”.

In group theoretic language they were exploiting the fact that there is an isomorphism between the “difficult” group (R+,) and the “easy” group (R,+).

9.4 Direct Product

In this section we’ll study a simple way of combining two groups to build a new, larger, group. We will do that by generalising the idea of the Cartesian product of two sets (Definition 6.1).

Definition 9.23:
Let H and K be (multiplicatively-written) groups. The direct product H×K of H and K is the Cartesian product of the sets H and K, with the binary operation (x,y)(x,y)=(xx,yy) for x,xH and y,yK.
Proposition 9.24:

The direct product H×K of groups is itself a group.

Proof.

Associativity of H×K follows from associativity of H and K, since if x,x,xH and y,y,yK, then ((x,y)(x,y))(x,y)=(xx,yy)(x,y)=(xxx,yyy)=(x,y)(xx,yy)=(x,y)((x,y)(x,y)).

If eH and eK are the identity elements of H and K, then (eH,eK) is the identity element of H×K, since if xH and yK, then (eH,eK)(x,y)=(eHx,eKy)=(x,y)=(xeH,yeK)=(x,y)(eH,eK).

The inverse of (x,y)H×K is (x1,y1), since (x,y)(x1,y1)=(xx1,yy1)=(eH,eK)=(x1x,y1y)=(x1,y1)(x,y).

Notice that for all aspects of the group structure, we simply apply the corresponding idea in the first coordinate for H and in the second coordinate for K. This is generally how we understand H×K, by considering the two coordinates separately, and if we understand H and K, then H×K is easy to understand. For example, it is easy to see that, for any iZ and any (x,y)H×K, (x,y)i=(xi,yi), so also taking powers in a direct product is just a matter of taking powers of the coordinates separately.

Here are some very easy consequences of the definition.

Proposition 9.25:

Let H and K be (multiplicatively-written) groups, and let G=H×K be the direct product.

  1. G is finite if and only if both H and K are finite, in which case |G|=|H||K|.

  2. G is abelian if and only if both H and K are abelian.

  3. If G is cyclic then both H and K are cyclic.

Proof.

We go through all three statements.

  1. This is just a familiar property of Cartesian products of sets (that was left as an exercise).

  2. Suppose H and K are abelian, and let (x,y),(x,y)G. Then (x,y)(x,y)=(xx,yy)=(xx,yy)=(x,y)(x,y), and so G is abelian.

Suppose G is abelian, and x,xH, then (xx,eK)=(x,eK)(x,eK)=(x,eK)(x,eK)=(xx,eK), and considering the first coordinates, xx=xx, and so H is abelian. Similarly K is abelian.

  1. Suppose G is cyclic, and (x,y) is a generator, so that every element of G is a power of (x,y). Let xH. Then (x,eK)G, so (x,eK)=(x,y)i=(xi,yi) for some iZ, and so x=xi. So every element of H is a power of x, so H=x is cyclic. Similarly K is cyclic.

Remark:
The converse of c.) is not true in general: the direct product of cyclic groups may not be cyclic. For example, if H=K=C2, then (x,y)2=(eH,eK) for every (x,y)C2×C2, so C2×C2 has no element of order |H×K|=4, and so can’t be cyclic.
Proposition 9.26:

Let H and K be (multiplicatively-written) groups, and xH, yK elements with finite order. Then (x,y)H×K has finite order equal to the lowest common multiple lcm(ordH(x),ordK(y)).

Proof.

Let iZ. Then (x,y)i=(eH,eK) if and only if xi=eH and yi=eK, which is the case if and only if i is divisible by both ordH(x) and by ordK(y). The least positive such i is lcm(ordH(x),ordK(y)), and so this is the order of (x,y).

We can now decide precisely when the direct product of cyclic groups is cyclic.

Theorem 9.27:

Let H=Cn and K=Cm be finite cyclic groups. Then Cn×Cm is cyclic if and only if gcd(n,m)=1.

Proof.

Let (x,y)H×K. Then ordH(x)|H| and ordK(y)|K|. So by Proposition 9.26, ordH×K(x,y)=lcm(ordH(x),ordK(y))ordH(x)ordK(y)|H||K|, where the first inequality is an equality if and only if ordH(x) and ordK(y) are coprime, and the second inequality is an equality if and only if H=x and K=y.

Since |H×K|=|H||K|, H×K is cyclic if and only if it has an element of order |H||K|, which by the argument above is true if and only if gcd(ordH(x),ordK(y))=1.

Since cyclic groups of the same order are isomorphic, the last theorem says that Cm×CnCmn if and only if gcd(m,n)=1.

Remark:
While the above proof tells us the existence of an isomorphism between Cm×Cn and Cmn it does not tell us explicitly what the isomorphism is. However, we can use Theorem 10.14 to construct the isomorphism between Cm×Cn and Cmn. Let Cm=x, Cn=y , Cmn=z and define φ:Cm×CnCmn by φ((xi,yj))=zk where by Theorem 10.14 k is such that ki(modm) and kj(modn).
Example:
C2×C2 and C4×C2 are not cyclic.
Remark:
C2×C2 is an abelian group of order 4 such that every element apart from the identity has order 2. It is easy to check that if G={e,a,b,c} is any group with these properties, then ab=c=ba, ac=b=ca and bc=a=cb: i.e., the product of any two of the three non-identity elements is the other non-identity element. This means that there is only one possible multiplication table for such a group, and so any two groups with these properties are isomorphic.
Definition 9.28:
A Klein 4-group is a group of order 4 such that every element except the identity has order 2.
History:

Felix Klein, German mathematician (1849 - 1925) applied ideas from group theory to geometry, which led to the emergence of transformation groups. These were some of the first example of infinite groups (beyond the groups (R,+), (R{0},×), which weren’t really studied as groups). Klein wrote about the Klein 4-group, although he called it Vierergruppe (meaning “four-group”), as it is the smallest group which is not cyclic.

Example:
  • C2×C3C6

  • C2×C9C18

  • C90C2×C45C2×(C9×C5)

We can clearly extend the definition of a direct product of two groups to three, four or more groups and make sense of things like G×H×K, which would be a group whose elements are the ordered triples (x,y,z) with xG, yH and zK.

We can then write the last example as C90C2×C9×C5.

More generally, if n=p1r1p2r2pkrk, where p1,p2,,pk are distinct primes, then CnCp1r1×Cp2r2××Cpkrk, so that every finite cyclic group is isomorphic to a direct product of groups with order powers of primes.

We’ll finish this section with a different example of a familiar group that is isomorphic to a direct product.

Proposition 9.29:

(R{0},×)(R>0,×)×({1,1},×).

Proof.

Define a map φ:R>0×{1,1}R{0} by φ(x,ε)=εx. This is clearly a bijection, and φ((x,ε)(xε))=φ(xx,εε)=εεxx=(εx)(εx)=φ(x,ε)φ(x,ε), so that φ is an isomorphism.

10 Modular arithmetic and Lagrange

We finish this course by linking several threads together. We will explore the important Lagrange’s theorem of group theory (which actually was first formulated long before group theory was developed). We will then see how this theorem has applications to the study of the integers (which was one of our earlier threads). First though, we need to take a step back and explore the concept of partitioning a set, and saying when objects are “basically the same”.

10.1 Equivalence relations and partition of sets

There are many situations in maths where we have different objects that we want to “consider the same”, often because they share a desired property. The requirement that two objects are exactly equal is often too restrictive, so we generalise this concept to be broader (this theory is applied to many branches of mathematics).

Definition 10.1:

A relation on a nonempty set X is a subset RX×X. We say x is related to y, denoted by xy, when (x,y)R.

  • A relation is reflexive if for all xX, xx.

  • A relation is symmetric if for all x,yX we have xy implies yx.

  • A relation is transitive if for all x,y,zX, if xy and yz then xz.

Remark:

We use the notation xy to say x is not related to y.

We negate the above three properties to say that:

  • A relation is not reflexive if there exists xX such that xx.

  • A relation is not symmetric if there exists x,yX such that xy and yx.

  • A relation is not transitive if there exists x,y,zX, such that xy and yz and xz.

Definition 10.2:

A relation on a nonempty set X is an equivalence relation if it is reflexive, symmetric and transitive.

In such case we read xy as “x is equivalent to y”.
Etymology:

Reflexive comes from the Latin re meaning “back” and flectere meaning “to bend”. This gave rise to words like “reflect” (where an object is bent back through a mirror) and “reflection”. A relationship is reflexive if it reflects every element (i.e., every element is in relation with itself).

Symmetric comes from the Greek sun meaning “together with” and metron meaning “a measure”. Given a point “together with a distance (from a line of symmetry)”, then you have found the image of the point using that symmetry.

We’ve seen the word transitive before ( (O2) ) - transitive comes from the Latin trans meaning “across, beyond” and the verb itus/ire meaning “to go”. A transitive relationship is one where knowing xy and yz allows us to go from x beyond y all the way to z.

Equivalence comes from the Latin oequus meaning “even, level” and valere meaning “to be strong, to have value”. Two elements are equivalent if they are level (i.e., equal) in their value. One can be substituted for the other (notice this has the same roots as the word equal).

Example:

Since we claim that equivalence relations are a generalisation of =, let us show that = is indeed an equivalence relation. Define a relation on R via xy if and only if x=y. We show is an equivalence relation.

Let xR. Since x=x, we have xx. This is true for all xR, hence is reflexive.

Let x,yR be such that xy. By definition, this means x=y, which means y=x, i.e. yx. This is true for all x,yR, hence is symmetric.

Let x,y,zR be such that xy and yz. This means x=y and y=z, so x=y=z, i.e. x=z. So xz. This is true for all x,y,zR, hence is transitive.

Since is reflexive, symmetric and transitive, we have that is an equivalence relation.
Example:
Define a relation on R via xy if and only if xy. We have that is not an equivalence relation because it is not symmetric. Indeed let x=1 and y=2, we have 12 so 12 but 21 so 21. (The relation is however reflexive and transitive).
Example:
We define a relation on the set of all subsets of R via XY if and only if there exists a bijection f:XY. Remark 7.2 proved that is reflexive, symmetric and transitive Hence is an equivalence relation. (and the reason why we could define Cardinality in a way that makes sense)
Example:
Define a relation on the set of groups via GH if and only if G is isomorphic to H. This is reflexive (since G is isomorphic to itself via the identity function), Proposition 9.15 shows that is symmetric, and Proposition 9.17 proved that is transitive. So is an equivalence relation (and hence the rationale for why we care about groups “up to isomorphism”).
Definition 10.3:
Suppose is an equivalence relation on a (nonempty) set X. For xX, we define the equivalence class of x, denoted by [x], to be [x]={yX: yx}.
Example:
Let X={1,0,1,2} and define a relation on X via xy if and only if either x=y or xy>0. One can check this is an equivalence relation (it takes a bit of work to show is transitive). We have the following equivalence classes: Hence, we have three distinct equivalence classes, namely [1], [0] and [1].

Notice how in the above example each distinct equivalence class did not intersect each other. This is true in general.

Proposition 10.4:

Suppose is an equivalence relation on a (nonempty) set X. Then for any x,yX, [x][y] if and only if [x][y]=.

Proof.

Take x,yX.

). First, we will show that if [x][y] then [x][y]= by using the contrapositive. That is will prove that if [x][y], then [x]=[y].

So suppose that [x][y]. Then there exists some z[x][y]. Hence, z[x], so zx. Similarly, z[y], so zy. Since is symmetric, we have that xz. Since is transitive, we have that xy. Now, choose w[x]. Then wx, and since xy and is transitive, we have that wy. Hence, w[y]. Since w[x] is arbitrary, we have that [x][y]. Similarly, we can show that, for any w[y], we have w[x], so [y][x]. Hence [x]=[y].

). Second, we will show that if [x][y]= then [x][y] by using the contrapositive. So we will show that if [x]=[y], then [x][y].

So suppose that [x]=[y]. Since is reflexive, we know that x[x]. Hence, x[x]=[x][y], so [x][y].

Summarising, we have that [x][y] if and only if [x][y]=.

The above set-up leads us to the following definition which breaks a set into “parts”.

Definition 10.5:

A partition of a non-empty set X is a collection {Ai:iI} of non-empty subsets of X such that

  1. xX, iI such that xAi, and

  2. xX and i,jI, if xAiAj, then Ai=Aj .

Remark:

The above two conditions can be rephrased as follows:

  1. X=iIAi, and

  2. AiAj if and only if Ai=Aj.

The above remark and Proposition 10.4 suggests a strong link between equivalence relations and partitions of sets, as we explore in the next two theorems.

Theorem 10.6:

Suppose is an equivalence relation on a (non-empty) set X. Then Π={[x]: xX} is a partition of X.

Proof.

Take aX. Then a[a]Π. Hence, every element of X is in one of the equivalence classes in Π. Furthermore, by Proposition 10.4 we have that [x][y] if and only if [x][y]=., i.e. (by the contrapositive) [x][y] if and only if [x]=[y] as required.

Furthermore, given a partition of a set, we can construct an equivalence relation.

Theorem 10.7:

Suppose Π={Ai: iI } is a partition of a (nonempty) set X, for some indexing set I. For x,yX, define xy if and only if there exists an iI such that x,yAi. Then is an equivalence relation on X.

Proof.

We show is reflexive. Take xX. Since Π is a partition of X, there exists some iI such that xAi. Hence, xx.

We show is symmetric. Suppose x,yX such that xy. Then there exists some iI such that x,yAi. It follows that y,xAi, so yx.

We show is transitive. Suppose that x,y,zX are such that xy and yz. Then there exists some iI such that x,yAi and there exists some jI such that y,zAj. Hence, we have that yAi and yAj. Since Π is a partition, we must have that Ai=Aj. It follows that x,zAi, so xz.

Summarising, we have that is an equivalence relation on X.

Example:

Let us take the previous example further. Define a relation on Z via xy if and only if either x=y or xy>0. This is an equivalence relation. We have that partitions Z into three sets, [1]=Z, [0]={0} and [1]=Z+.

Example:

We will construct Q from Z by partitioning the set Z×Z+.

We define the relation on Z×Z+ via (a,b)(c,d) if and only if ad=bc. We check this is an equivalence relation.:

Let (a,b)Z×Z+. Since ab=ba, we have that (a,b)(a,b) and is reflexive.

Let (a,b),(c,d)Z×Z+ be such that (a,b)(c,d). Then we have ad=bc, which can be re-written as cb=dc, so (c,d)(a,b) and is symmetric.

Let (a,b),(c,d),(f,g)Z×Z+ be such that (a,b)(c,d) and (c,d)(f,g). Then we have ad=bc and cg=df. Multiplying the first equation by g we get adg=cbg, and using the second equation we can write it as adg=bdf. Since we have d0 (as dZ+) we can divide through to give ag=bf which means that (a,b)(f,g) and is transitive.

Therefore is reflexive, symmetric and transitive, and so it is an equivalence relation.

We define elements of Q, which we denote ab to be the equivalence classes [(a,b)]={(a,b),(2a,2b),}.
History:

We will not try to summarise the interesting 2019 article by Amir Asghari on the history of equivalence relation (separating the definition from when the notion was first used), but we would recommend it as something worth reading just to showcase how ideas can be used before they are formalised, and how the mathematics we do today is different from how mathematics used to be done. The full reference is Asghari, A. Equivalence: an attempt at a history of the idea. Synthese 196, 4657–4677 (2019). https://doi.org/10.1007/s11229-018-1674-2

10.2 Congruences and modular arithmetic

We now use our work on equivalence classes to partition Z. Recall that for any two integers a,nZ, n0, there exists a quotient q and a remainder r such that a=nq+r. Sometimes we only care about the remainder r, which leads us into the following definition.

Definition 10.8:
Let nZ+. For a,bZ, we say that a is congruent to b modulo n, denoted by ab (modn), if n|(ab), i.e. if abnZ
Theorem 10.9:

Let nZ+. Define a relation on Z via ab if and only if ab (modn). Then is an equivalence relation.

Proof.

Exercise.

Using Theorem 5.2, we can see that, given an integer n, every integer a is congruent modulo n to a unique r such that 0r<n. I.e, we partition Z into exactly n congruence classes modulo n given by [0],[1],,[n1] where [r]={aZ:ar(modn)}=r+nZ.

Notation:

We write Z/nZ for the set of congruence classes module n.

If we’re clear that we’re considering [r] as an element of Z/nZ we’ll often simply write r.

Both addition and multiplication make sense modulo n, by the following theorem, which is very useful in computations.

Theorem 10.10:

Fix nZ+. Suppose that a,b,c,dZ are such that ac (modn) and bd (modn). Then a+bc+d (modn)andabcd (modn).

Proof.

By assumption, we have that nac and nbd. Hence, for some x,yZ, we have that ac=nx and bd=ny. It follows that (a+b)(c+d)=(ac)+(bd)=nx+ny=n(x+y). Since x+yZ, this means that n((a+b)(c+d)), so a+bc+d (modn). Further, since a=c+nx and b=d+ny, we have that ab=(c+nx)(d+ny)=cd+n(cy+dx+nxy) so abcd=n(cy+dx+nxy). Since cy+dx+nxy is an integer, we have that nabcd, so abcd (modn).

Example:
We want to compute 35+28 (mod7). We have 3292 (mod7). So 34=3232224 (mod7). Hence 35343125 (mod7). Further, we have 2381 (mod7), so 262323111 (mod7). Hence, we have that 282622144 (mod7). It follows that 35+285+492 (mod7).

Another consequence of Theorem 10.10 is that we can “shift” the congruence classes by a given constant cZ, i.e. the congruence classes can be given by [c],[c+1],,[c+n1].

Example:
For example Z/7Z={[0],[1],[2],[3],[4],[5],[6]}={[3],[2],[1],[0],[1],[2],[3]}={[5],[6],[7],[8],[9],[10],[11]}.
Theorem 10.11:

We have (Z/nZ,+) is an abelian group.

Proof.

Addition on Z/nZ is a well-defined binary operation by Theorem 10.10. Addition (modn) is associative, since ([a]+[b])+[c]=[a+b+c]=[a]+([b]+[c]). The identity element is [0] since [a]+[0]=[a]=[0]+[a] for any aZ. The inverse of [a] is [a], since [a]+[a]=[0]=[a]+[a]. So (Z/nZ,+) is a group. It is abelian since [a]+[b]=[a+b]=[b+a] for all a and b.

In fact, it is a cyclic group, since the following is clear.

Proposition 10.12:

(Z/nZ,+)=1.

Remark:

Since (Z/nZ,+) is cyclic, it is isomorphic to Cn=x. The isomorphism is φ:(Z/nZ,+)Cn defined by φ(i)=xi.

This means that cyclic groups are particularly simple to understand if we know a generator, as the group operation is just addition of exponents: in a cyclic group G=x, xixj=xi+j, so the group operation in an infinite cyclic group is “just like” addition of integers, and the group operation in a finite cyclic group of order n is “just like”” addition of integers (modn).

However, (Z/nZ,×) is not a group, since although it is associative and has an identity element [1], not every element has an inverse. For example, [0] never has an inverse, and in Z/4Z, [2] does not have a multiplicative inverse.

Proposition 10.13:

In Z/nZ, [a] has a multiplicative inverse if and only if gcd(a,n)=1.

Proof.

If gcd(a,n)=1 then Theorem 5.7 there exists s,tZ such that as+nt=1 for some . So as1nt1(modn), so [a][s]=[1] in Z/nZ, and so [s] is a multiplicative inverse of [a].

Conversely, if gcd(a,n)=h>1, and if as1(modn), then 1=as+nq for some qZ. But h divides both a and n, so it divides as+nq. But no integer h>1 divides 1. So there is no s such that [s] is a multiplicative inverse of a.

Note that the first part of the proof is constructive and allows us to solve some congruence equations.

Example:

Find xZ such that 4x=3(mod7).

We find the inverse of [4] in Z/7Z (which exists since gcd(4,7)=1). We find s,tZ such that 4s+7t=1. We have that

k sk tk Calculation qk rk
0 0 1 - - -
1 1 0 7=41+3 1 3
2 011=1 101=1 4=31+1 1 1
3 1(1)1=2 011=1 3=13+0 3 0

so s=2 and t=1. So [2] is the inverse of [4] in Z/7Z.

Multiply both side of 4x3(mod7) by 2, we get x24x236(mod7).

We can also look at solving a system of simultaneous congruence equations.

Theorem 10.14:

Let m,nZ+ be co-prime. For any a,bZ, xZ such that xa (modm) and xb (modn). Furthermore, this x is unique modulo mn. That is for xZ, we have that xa (modm) and xb (modn) if and only if xx (modmn).

Proof.

We will prove the existence of x and leave the uniqueness of x modulo mn as an exercise.

Since gcd(m,n)=1, by Theorem 5.7 we know that there exist s,tZ such that ms+nt=1. Then 1ms+ntnt (modm) and 1ms+ntms (modn). So let x=msb+ntaZ. Then xmsb+ntanta1aa (modm) and xmsb+ntamsb1bb (modn).

History:

The above theorem is often known as the Chinese Remainder Theorem. An example of this theorem was first discussed by Sun Zi (Chinese mathematician, around 400-460) in his text Sunzi suanjing (Sun Zi’s Mathematical Manual). While Sun Zi came to the correct solution he posed in his text using the methods we would use nowadays, there is no sign that he developed a general method. Instead it was Qin Jiushao (Chinese mathematician, 1202-1261) who wrote in his text “Shushu Jiuzang” how to solve simultaneous linear congruent equations. The origin of the name “Chinese Remainder Theorem” is unclear; it arose in the west some time after a 1852 article by Wiley on the history of Chinese mathematics (however, he did not use this term).

Note that the above proof is a constructive proof, that is, it gives a way to find x.

Example:

We want to find xZ such that x4 (mod5) and x7 (mod8).

We have that gcd(5,8)=1, so we set x=msb+nta, where a=4,m=5,b=7,n=8 and s,t are such that gcd(5,8)=5s+8t. We have that

k sk tk Calculation qk rk
0 0 1 - - -
1 1 0 8=51+3 1 3
2 011=1 101=1 5=31+2 1 2
3 1(1)1=2 011=1 3=21+1 1 1
4 121=3 1(1)1=2 2=12+0 2 0
so s=3 and t=2. Hence, we set x=msb+nta=537+824=105+64=41. We double check that x indeed satisfies the given congruences: Note that x such that xx (mod40)=1,39,79, are also solutions.

One can use induction and the Fundamental Theorem of Arithmetic to prove the following generalisation of Theorem 10.14:

Let rZ+ with r2, and let m1,,mrZ+ be pairwise co-prime, that is gcd(mi,mj)=1 for i,jZ with 1ir, 1jr and ij. Then for any a1,,arZ, there exists some xZ such that xai (modmi) for all iZ with 1ir. Further, this x is unique modulo m1m2mr.

One can also think about combining all these methods together to solve simultaneous congruent equation where the coefficient in front of x is not 1.

10.3 Lagrange’s theorem

Let G=D2n be the dihedral group of order 2n, and consider the orders of elements. Every reflection has order 2, which divides |G|. The rotation a has order n, which divides |G|. Every other rotation ai has order n/gcd(n,i), which divides |G|. In this section we’ll show that this is a general phenomenon for finite groups. In fact, we’ll prove a more general fact about orders of subgroups (of course, this includes the case of the order of an element x, since ord(x) is equal to the order of the cyclic subgroup x generated by x).

History:

The theorem in question is named after Joseph-Louis Lagrange, an Italian mathematician ( 1736 - 1813 ), who proved a special case of the theorem in 1770 (long before abstract group theory existed).

Theorem 10.15: (Lagrange’s Theorem)

Let G be a finite group, and HG a subgroup. Then |H| divides |G|.

The idea of the proof is to partition the set G into subsets, each with the same number of elements as H, so that |G| is just the number of these subsets times |H|.

Definition 10.16:
For (possibly infinite) groups HG, and xG, the left coset xH is the subset xH={xhG:hH}.
Remark:
This is a subset of G, but not usually a subgroup since the identity element e is only in xH if e=xh for some hH, in which case x=h1 and so xH. So xH is only a subgroup if xH, in which case xH=H.
Remark:
We could also define a right coset Hx in the obvious way. In general this may be different from xH, but it would make little difference to what follows if we used right cosets instead of left cosets.

The set of left cosets partition G.

Lemma 10.17:

For any group G and subgroup HG, {xH:xG} is a partition of the set G.

Proof.

Clearly property 1 is satisfied: G=xGxH because any x=xexH. So we just need to check property 2, that for x,yG we have xHyH if and only if xH=yH.

Suppose xHyH, and choose gxHyH. Then g=xa=yb for some a,bH, so that x=yba1. If hH then xh=y(ba1h)yH since ba1hH. So every element of xH is in yH. Similarly every element of yH is in xH, so that xH=yH.

Interest:

Given that we have a partition on the set G, we can construct an equivalence relation on the set G and say xHy if and only if there exists z such that x,yzH. With a bit of work, we can show that this is the same as saying x,yG are equivalent (with respect to/modulo H) if and only if y1xH [you can check this is an equivalence relation].

Compare this with the (infinite) group G=(Z,+) and (infinite) subgroup H=(nZ,+). We partitioned Z using nZ and defined an equivalence relation ab(modn) if and only if abnZ.

Next we verify that each left coset has the same cardinality.

Lemma 10.18:

Let HG and xG. Then there is a bijection α:HxH, so that |xH|=|H|.

Proof.

Define α by α(h)=xh. Then α is surjective, since by definition every element of xH is of the form xh=α(h) for some hH. But also α is injective, since if h,hH then α(h)=α(h)xh=xhh=h.

Everything so far works for possibly infinite H,G, but now for finite G (and hence H) we can put everything together to prove Lagrange’s Theorem.

Proof (Proof of 10.15).

Suppose that k is the number of distinct left cosets xH. By the two previous lemmas, the cosets partition G and each coset contains |H| elements. So the number of elements of G is k|H|.

Example:
Let G=D3 be the dihedral group of order 6 and let H=a={e,a,a2} be the cyclic subgroup generated by a. If xH, then xH=H. If xH, so x=bai for some i, then xH={bai,bai+1,bai+2}=bH. So there are two left cosets {e,a,a2} and {b,ba,ba2}. [In this case the right cosets are the same as the left cosets.]
Example:

Let G=D6 again, but let H=b={e,b} be the cyclic subgroup generated by b. Then

  • eH={e,b}=bH;

  • aH={a,ab}=abH;

  • a2H={a2,a2b}=a2bH,

so there are three left cosets. [In this case the right cosets are different, since for example Ha={a,ba}={a,a2b}, which is not a left coset.]
Definition 10.19:
Let HG. Then the index |G:H| is the number (possibly infinite if G is infinite) of left cosets xH in G.
Remark:

You may have seen the word index (Latin for “pointer”) in the context of summation (e.g., i=13i2, where i is the index). In group theory, there are many time where one might want to do i=1|G:H|. Here, the index is telling us how many terms are in our sum.

Remark:
So the proof of Lagrange’s Theorem shows that |G|=|H||G:H| if G is finite.
Remark:
Even if G is infinite, the index |G:H| may be finite. For example, let G=(Z,+) and let H=(nZ,+) for some nZ+. Then we have seen that |Z:nZ|=n, with the left cosets being the congruence classes (modn).
Corollary 10.20: (Lagrange for orders of elements)

Let G be a finite group with |G|=n. Then for any xG, the order of x divides n, and so xn=e.

Proof.

By Lagrange’s Theorem, the order of the cyclic subgroup x divides n. But the order of this subgroup is just ord(x), so ord(x) divides n, and so xn=e.

10.4 Some applications of Lagrange’s theorem

Lagrange’s Theorem gives most information about a group when the order of the group has relatively few factors, as then it puts more restrictions on possible orders of subgroups and elements.

Let’s consider the extreme case, when the order of the group is a prime p, and so the only factors are 1 and p.

Theorem 10.21:

Let p be a prime and G a group with |G|=p. Then

  1. G is cyclic.

  2. Every element of G except the identity has order p and generates G.

  3. The only subgroups of G are the trivial subgroup {e} and G itself.

Proof.

Let xG with xe. Then ord(x) divides |G|=p by Corollary 10.20, and ord(x)1 since xe. So ord(x)=p. So the cyclic subgroup x generated by x has order p, and so must be the whole of G. This proves a.) and b.).

Let HG. Then by Theorem 10.15 |H| divides |G|=p, and so |H|=1, in which case H is the trivial subgroup {e}, or |H|=p, in which case H=G.

Remark:
In particular this shows that if p is prime then all groups of order p are isomorphic.
Corollary 10.22:

If p is prime and P and Q are two subgroups of a group G with |P|=p=|Q|, then either P=Q or PQ={e}.

Proof.

If PQ{e} then choose xPQ with xe. By the previous theorem, x generates both P and Q, so P=x=Q.

Now some other simple general consequences of Lagrange’s Theorem.

Proposition 10.23:

Let G be a group and H,K two finite subgroups of G with |H|=m, |K|=n and gcd(m,n)=1. Then HK={e}.

Proof.

Recall that the intersection of two subgroups is itself a subgroup, so that I=HK is a subgroup both of H and of K. Since it’s a subgroup of H, Lagrange’s Theorem implies that |I| divides m=|H|. But similarly |I| divides n=|K|. So since gcd(m,n)=1, |I|=1 and so I={e}.

Theorem 10.24:

Let G be a group with |G|=4. Then either G is cyclic or G is isomorphic to the Klein 4-group C2×C2. In particular there are just two non-isomorphic groups of order 4, both abelian.

Proof.

By Corollary 10.20 the order of any element of G divides 4, and so must be 1, 2 or 4.

If G has an element of order 4 then it is cyclic.

If not, it must have one element (the identity e) of order 1 and three elements a,b,c of order 2. So a1=a, b1=b and c1=c.

Consider which element is ab. If ab=e then b=a1, which is false, since a1=a. If ab=a then b=e, which is also false. If ab=b then a=e, which is also false. So ab=c.

Similarly ba=c, ac=b=ca and bc=a=cb, and G is isomorphic to the Klein 4-group.

We’ll finish this section with some other results about groups of small order that we won’t prove. These are easier to prove with a bit more theory, which are all proved in the third year Group Theory unit.

Theorem 10.25:

Let p be an odd prime. Then every group of order 2p is either cyclic or isomorphic to the dihedral group D2p.

Theorem 10.26:

Let p be a prime. Every group of order p2 is either cyclic or isomorphic to CP×Cp (and so in particular is abelian).

However there are non-abelian groups of order p3 for every prime p. The dihedral group D8 is one example for p=2.

Theorem 10.27:

There are five groups of order 8 up to isomorphism. Three, C8, C4×C2 and C2×C2×C2, are abelian, and two, the dihedral group D8 and another group Q8 called the quaternion group are non-abelian.

The first few orders not dealt with by the general theorems above are 12, 15, 16 and 18. It turns out that are five non-isomorphic groups of order 12, every group of order 15 is cyclic, there are fourteen non-isomorphic groups of order 16, and five of order 18.

The number of non-isomorphic groups of order 2n grows very quickly with n. There are 49,487,365,422 (nearly fifty billion) non-isomorphic groups of order 1024=210.

10.5 Applications to number theory

We now use some of the tools from group theory to look back at questions linked to modular arithmetic.

History:

It is worth noting that much of the following theory predates and motivated the development of abstract group theory, in particular the study of Abelian groups (from 1800). Alongside Symmetry groups and Transformation groups (which we did not explore), Abelian groups were concrete examples of groups that drove this development.

First, recall that (Z/nZ,) is not a group as some elements (such as [0]) do not have a multiplicative inverse (recall Proposition 10.13). Let us define the following group.

Definition 10.28:
Un is the subset {[a]:aZ and gcd(a,n)=1} of Z/nZ.
Remark:
If gcd(a,n)=1 then gcd(a+nt,n)=1 for any tZ and so it makes no difference which element of [a] we use to check the condition for [a]Un. Since every congruence class [a] contains an element a with 0a<n, we’ll usually use these elements.
Theorem 10.29:

(Un,) is an abelian group.

Proof.

Suppose [a],[b]Un, so gcd(a,n)=1=gcd(b,n). Then gcd(ab,n)=1 and so [ab]Un and Un is closed under multiplication.

Since ([a][b])[c]=[abc]=[a]([b][c]) multiplication on Un is associative.

[1] is an identity element, since [1][a]=[a]=[a][1] for any a.

If [a]Un, so gcd(a,n)=1, then as+nt=1 for integers s,t, and gcd(s,n)=1. So [s]Un is an inverse of [a].

So Un is a group under multiplication. It is abelian since [a][b]=[ab]=[b][a] for all a,bZ.

Interest:

The notation U stands for “units”. In Algebra 2, you’ll see that a unit is an element which has a multiplicative inverse, so Un is the group of units of Z/nZ.

Similar links can be made with (R,+) and (R{0},) or with (Z,+) and ({1,1},).

Example:
If p is a prime, then Up={[1],[2],,[p1]} has order p1.
Example:
U4={[1],[3]}, U6={[1],[5]} both have order 2. U8={[1],[3],[5],[7]} and U10={[1],[3],[7],[9]} have order 4.

Unlike (Z/nZ,+), (Un,×) is not always cyclic (although it sometimes is).

Example:
U10 is cyclic, with generator [7]: [7]0=[1], [7]1=[7], [7]2=[49]=[9], [7]3=[7][9]=[63]=[3], so [7]={[1],[3],[7],[9]=U10.
Example:
U8 is not cyclic. [3]2=[9]=[1], [5]2=[25]=[1] and [7]2=[49]=[1], so every element has order 2 apart from [1], which has order 1. In particular there are no elements of order |U8|=4, so the group is not cyclic, and is a Klein 4-group.
Remark:
In fact, Un is cyclic if and only if n=2, n=4 or n=pr or n=2pr for p an odd prime and r1, but this is beyond the scope of this unit.

We can now apply Lagrange’s theorem to Un to get an important number theory result.

Theorem 10.30: (Fermat’s Little Theorem)

Let p be a prime and a an integer not divisible by p. Then ap11(modp).

Proof.

We’ll apply Corollary 10.20 to the group G=Up. Since p is prime, Up={[1],[2],,[p1]} has order p1, and if a is not divisible by p then [a]Up. So by Corollary 10.20, [a]p1=[1] in Up, or in other words ap11(modp).

This simplifies calculating powers of integers modulo a prime:

Example:
Suppose we want to calculate 7100(mod31). The straightforward way to do this would involve multiplying by seven 99 times (although even without Fermat’s Little Theorem a more intelligent approach would use many fewer multiplications). But by Fermat’s Little Theorem with p=31, 7301(mod31), and so 7100=(730)3×71013×710(mod31). So without much calculation at all we reduce the problem to calculating a tenth power instead of a hundredth power. And then, to finish, 72=4918(mod31), so 737×18=1262(mod31) and 710=(73)3×723×7=5625(mod31).
History:

Pierre de Fermat, French lawyer and government official (1601 - 1665) worked on several number theory problems alongside his job. He wrote the above theorem however he did not write down a proof for fear it would be too long - he certainly was not aware of group theory tools that would have provided a short proof.

There is a generalization to powers modulo m where m is not prime, but the statement is a bit more complicated, since Um is not just {[1],[2],,[m1]}. So first, let’s introduce some standard notation for the order of this group.

Definition 10.31:
Euler’s phi function is the function φ from positive integers to positive integers with φ(m) equal to the number of integers a such that 0<am and gcd(a,m)=1.
Remark:
This is precisely the order of Um, since Um={[a]:gcd(a,m)=1}.
Example:
If p is prime, then φ(p)=p1.
Theorem 10.32: (Fermat-Euler Theorem)

Let m>0 and a be integers with gcd(a,m)=1. Then aφ(m)1(modm).

Proof.

Apply Corollary 10.20 to the group G=Um. The condition gcd(a,m)=1 means that [a]Um, and |Um|=φ(m), so the corollary gives [a]φ(m)=[1] in Um, or in other words aφ(m)1(modm).

History:

Leonhard Euler, Swiss mathematician (1707 - 1783), wrote down a proof of Fermat’s Little Theorem in 1736 (nearly a 100 year after Fermat wrote down the statement). He then extended Fermat’s theorem to all cases. While he used modular arithmetic notation as we do today, it wasn’t until 1801 that Gauss revisited the proof and gave a shorter and simpler version using group theory ideas.

Many of the groups Un are direct products in disguise.

Proposition 10.33:

Let m,n be positive integers with gcd(m,n)=1. Then UmnUm×Un.

Proof.

Define a map φ:UmnUm×Un by φ([a]mn)=([a]m,[a]n) (where we use subscripts such as [a]m to indicate a congruence class(modn)). This makes sense, since if gcd(a,mn)=1, then gcd(a,m)=1 and gcd(a,n)=1.

By Theorem 10.14, we know given aZ/mZ and bZ/nZ there exists xZ/mnZ such that φ(x)=(a,b). Furthermore, x is unique in Z/mnZ. Hence φ is bijective [for all ([a]m,[b]n)Um×Un, there exists a unique [x]Umn such that φ([x]mn)=([a]m,[b]n) ]

Since φ is bijective, and φ([a]mn[a]mn)=([aa]m,[aa]n)=([a]m,[a]n)([a]m,[a]n)=φ([a]mn)φ([a]mn), φ is an isomorphism.

Remark:
The conclusion of the above is not true if gcd(m,n)1. For example, |U4|=2, but |U16|=8, so U16 can’t be isomorphic to U4×U4.

We have an immediate corollary

Corollary 10.34:

If gcd(m,n)=1 then φ(mn)=φ(m)φ(n).

Example:
We calculate 7100(mod30). Since 30=235 and gcd(2,3)=1, gcd(6,5)=1 we have φ(30)=φ(2)φ(3)φ(5)=124=8. So as gcd(7,30)=1 by Fermat-Euler we have 781(mod30). Hence 710079674(78)12(72)21192(11)21211(mod30).

11 Taming infinity

In Chapter 7 we saw the definition of cardinality, to be precise, for sets A and B, we defined |A|=|B|,|A||B| and |A|<|B|, with a focus on finite sets. In this chapter, we will see that there are different sizes of infinity.

11.1 Countable

We first notice that there are some infinite sets where we can “count” the elements.

Definition 11.1:
We say a set X is countable if there exists a bijection f:Z+X, or equivalently, if there is a bijection g:XZ+.
Remark:

In essence a countable set is an infinite set where we can enumerate the elements, x1,x2,x3, where xi=f(i) for iZ+.

Note that some texts in literature say that a set if countable if it is finite or if there exists a bijective function f:Z+X, and when there exists a bijective function f:Z+X, these texts say X is countably infinite. However, in this course, a countable set must be infinite (it makes some theorems easier to state).

Combining the definition of countability with the definition of cardinality, we have that X is a countable set if and only if |X|=|Z+|.
Example:

We show that the set of positive even integers is countable.

Let 2Z+={2x:xZ+} and define f:Z+2Z+ by f(x)=2x, for all xZ+. To see that f is injective, suppose that x,yZ+ such that f(x)=f(y). Then 2x=2y, so x=y, showing that f is injective.

To see that f is surjective, take aZ+. Then a=2x, for some xZ+, and hence a=2x=f(x). Hence, f is surjective. This shows that f is bijective. Therefore, A is countable.

Similarly, the set of odd positive integers, {2x1:xZ+}, can be shown to be countable.
Proposition 11.2:

Let X and Y be two sets such that |X|=|Y|. If X is countable, then Y is countable.

Proof.

Let |X|=|Y| and suppose that X is countable. Then |X|=|Z+| so |Y|=|X|=|Z+|. Hence |Y| is countable.

Theorem 11.3:

Every infinite set contains a countable subset.

Proof.

Non-examinable. It is beyond the scope of this course.

Remark:

While the statement seems intuitive, proving this result is outside the scope of this course. It involves using the Axiom of Choice, which says given a countable number of non-empty sets, we can choose one element from each set.

Interestingly, some research is done into how many theorems are true independent of the Axiom of Choice (and therefore would be true if we did not assume the Axiom of Choice to be true.)
Proposition 11.4:

Let X be a subset of Z+. Then X is either finite or countable.

Proof.

If X is finite then we are done. So suppose X is infinite. We have an injective map g:XZ+ given by g(x)=x, so |X||Z+|. On the other hand, we know that X contains a countable subset A. Hence, there exists a bijective map h:Z+A. Now, define f:Z+X by f(n)=h(n), for all nZ+. Then f is an injective map from Z+ into X. Hence, |Z+||X|. So by Theorem 7.8, we have that |X|=|Z+|. It follows that X is countable.

Example:
Since there exist infinitely many primes, the set of primes is countable.
The next result is very useful when proving that a set is countable.
Corollary 11.5:

Suppose X is an infinite set. Then X is countable if and only if there exists an injective f:XZ+.

Proof.

We prove both directions separately.

() First, suppose that X is countable. Then there exists a bijection f:XZ+. Since f is bijective, it is injective.

() Second, suppose there exists such an injective f:XZ+. Then |X|=|f[X]|, and f[X]Z+. Since X is not finite and |X|=|f[X]|, f[X] is not finite. Hence, f[X] is countable. Therefore, we must have that X is countable.

Summarising, X is countable if and only if there exists an injective f:XZ+.

Theorem 11.6:

Let X be a set and let A,BX. Suppose that A is a countable set and that B is a nonempty finite set disjoint from A (AB=). Then AB is countable.

Proof.

Since A is countable, there exists an injective map f:AZ+. Since B is finite we have that B={b1,,bm}, for some mZ+, where |B|=m. Now, define g:ABZ+ by g(x)={i,if x=bi,f(x)+m,if xA. We claim that g is injective. To see this, take x,yAB such that xy. If x,yB, then x=bi and y=bj for some i,jZ+ such that im, jm with ij. Hence, we have that g(x)=ij=g(y). If xB and yA, then x=bi for some iZ+ with im. Hence, we have that g(x)=i<m+1g(y). If x,yA, then f(x)f(y) since xy and f is injective. So g(x)=f(x)+m+1f(y)+m+1=g(y). It follows that g is injective.

Since AAB and A is infinite, AB is infinite. Since g:ABZ+ is injective and AB is infinite, AB is countable.

History:

The above proof was popularised by David Hilbert (Prussian mathematician, 1862-1943) with what is now called the “Hilbert hotel”, where there is always room for another guest:
The Hilbert hotel has countably many rooms, labelled 1,2,3, (so for each number in Z+, there is a room with that number). One night, the hotel is full, and another potential guest arrives at the hotel looking for a room. The manager says, no problem! Then the manager announces to the guests that every guest is to move to the next room (so the guests in room n move to room n+1). Thus all the guests still have rooms, and room 1 has been made available to the new arrival.

Theorem 11.7:

We have that Z+×Z+ is countable.

Proof.

First we note that Z+×Z+ is infinite as Z+×{1}={(n,1):nZ+}Z+×Z+, and Z+×{1} is countable (using the bijection f:Z+Z+×{1} defined by f(n)=(n,1)).

We define f:Z+×Z+Z+ by f((a,b))=2a3bZ+. Suppose f((a1,b1))=f((a2,b2)), then 2a13b1=2a23b2=n. Note that n2 (since a11), so by the Fundamental Theorem of Arithmetic, n is expressed uniquely as a product of primes. That is a1=a2 and b1=b2, so (a1,b1)=(a2,b2). Hence f is injective and Z+×Z+ is infinite, so Z+×Z+ is countable.

There are many proofs of this theorem. Above we presented one that uses the Fundamental Theorem of Arithmetic. Below we outline the idea of another proof that doesn’t use the Fundamental Theorem of Arithmetic. We leave the rigorous details as an exercise.

We arrange the elements of Z+×Z+ in a grid: (1,1)(1,2)(1,3)(1,4)(2,1)(2,2)(2,3)(2,4)(3,1)(3,2)(3,3)(3,4)(4,1)(4,2)(4,3)(4,4) We order the elements of this grid along the cross-diagonals: (1,1); (1,2), (2,1); (1,3), (2,2), (3,1); So we want to construct a function f:Z+×Z+Z+ such that f((1,1))=1,f((1,2))=2,f((2,1))=3,. We leave it an exercise to find a general formula for f((n,m)) and show that f is injective.

The following lemma lists the basic properties of countable sets.

Lemma 11.8:

Suppose that X is a countable set.

  • if AX, then A is either finite or countable.

  • if AX and A is finite, then XA is countable.

  • there exists BX such that B and XB are countable.

  • if f:CX is injective, then C is either a finite set or a countable set.

Proof.

Exercise.

Using the above lemma, and the fact that Z+×Z+ is countable, we can deduce:

Corollary 11.9:

We have that Q+, Q and Z are all countable. In particular |Z+|=|Z|=|Q|.

Corollary 11.10:

Let X and Y be countable sets. Then

  • X×Y is countable.

  • if XY=, then XY is countable.

Proof.

Exercise.

Note that by repeated application of the above lemma, we can see that for countable X and for any nZ+, we have Xn=X×X×X××X (the Cartesian product of n copies of X) is countable.

Corollary 11.11:

Let {An:nZ+} be a (countable) collection of countable sets that are pairwise disjoint (i.e. AiAj= for all ij). Then nZ+An is countable.

Proof.

First, note that since A1 is countable, we have that it is infinite. Since A1nZ+An, we know that nZ+An is also infinite. We now construct an injective g:nZ+AnZ+×Z+.

For each nZ+, enumerate the elements of An as an,1,an,2,an,3,. Now, define g:n=1AnZ+×Z+ by g(am,k)=(m,k), for all am,kn=1An. To see that g is injective, suppose that g(am,k)=g(as,t) for some m,k,s,tZ+. Then (m,k)=(s,t), so m=s, k=t, and hence am,k=as,t. It follows that g is injective. Hence, we have an injective function from n=1An into a countable set. Since n=1An is infinite, it follows that n=1An is countable.

Remark:
The result of Corollary 11.11 still holds in the case when {An:nZ+} is a (countable) collection of countable sets which are not pairwise disjoint. In this case, one shows that there exists a (countable) collection {Bn:nZ+} of countable sets which are pairwise disjoint such that nZ+Bn=nZ+An.

11.2 Uncountable

Having explored the countable sets, we look at whether there exists sets that are not finite and not countable.

Definition 11.12:
A set X is uncountable if it is infinite but not countable.
Remark:

An uncountable set is one where we can not enumerate its elements.

We note that the cardinality of Z+ separates finite, countable and uncountable sets. Indeed we leave it as an exercise to show that:

  • X is finite if and only if |X|<|Z+|;

  • X is countable if and only if |X|=|Z+|;

  • X is uncountable if and only if |X|>|Z+|.

We will prove that the set (0,1)={xR:0<x<1} is uncountable. First we need to set up some notation. Let us assume that every real numbers between 0 and 1 has a decimal expansion of the form 0.a1a2a3=kZ+ak10k with 0ak9 for each kZ+.

Note however that this representation is not unique. Indeed, we have 0.99999=1, and in general let 0<α<1 have decimal expansion 0.a1a2a3aN99999. That is, there exists Z+ such that aN9 and an=9 for all n>N. Then, using the results kZ+10k=1011101=19. We have: 0.a1a2a3aN999=kZ+ak10k=k=1Nak10k+kZ+,k>N910k=k=1Nak10k+10N9kZk10k=k=1Nak10k+10N=0.a1a2a3aN1bN, where bN=aN+1.

Therefore, we will take for granted that every αR such that 0<α<1 can be uniquely expressed as 0.a1a2a3.... with 0ak9 and ¬(NZ+ so that nZ+, n>Nan=9). (i.e., NZ+,nZ+,n>N so that an9).

Theorem 11.13:

The interval (0,1)={xR:0<x<1} is uncountable.

Proof.

We know the interval (0,1) is infinite, since f:Z(0,1) defined by f(k)=10k is easily shown to be injective.

For the sake of contradiction, suppose (0,1) is countable. Thus we can enumerate the elements of (0,1) as α1,α2,α3,. Write each αk as a decimal expansion as described above: αk=0.ak1ak2ak3 where 0aki9 and ¬(NZ+ so that nZ+, n>Nan=9). For each kZ+, set bk={1if akk1,2if akk=1. Set β=0.b1b2b3. Thus βR with 0<β<1 and ¬(NZ+ so that nZ+, n>Nan=9). Hence by assumption, β=αm for some mZ+. But bmamm, contradicting the uniqueness of the representation of β as a decimal expansion not ending in an infinite sequence of 9s. Thus the assumption that the interval (0,1) is countable leads to a contradiction, so (0,1) must be uncountable.

History:

The above theorem is sometimes referred to as Cantor’s diagonalisation argument. While it was not the first proof that Cantor published to show (0,1) is uncountable, it showcases a technique that was used subsequently in many other proofs by other mathematicians.

Corollary 11.14:

We have that R is uncountable.

Proof.

Since (0,1)R, we have |(0,1)||R|. Since (0,1) is uncountable, we have |Z+|<|(0,1)||R|. Hence R is uncountable.

While we have shown |(0,1)||R|, the following theorem shows that in fact |(0,1)|=|R|.

Theorem 11.15:

There is a bijection between the interval (0,1) and R.

Proof.

Exercise.

In a way, this highlight another difference between R, Q and Z. In R we can have a bounded subset which is uncountable. However Q a bounded subset is either finite or countable, while in Z a bounded subset must be finite.

Recall that we have shown that given any a,b,c,dR such that a<b and c<d there exists a bijection f:(a,b)(c,d). In particular if we take c=0,d=1 we have that |(0,1)|=|(a,b)| for all a,bR such that a<b. No matter how “small” (length wise) we take (a,b) to be, as a set it is extremely large, i.e. uncountable. If we combine this with the fact that if X is countable then Xn is countable for all nZ+, we have strange results like “for any nZ+, |Qn|<|(0,1n)|”.

Interest:

Since we have shown |Z|<|R|, a natural question is “does there exists a set A such that |Z|<A<|R|”. This was one of the 23 problems set by Hilbert at the turn of the 20th Century, and is known as the continuum hypothesis. It turns out that this is a subtle question without a yes/no answer: G"odel and Cohen showed that the two statements “the continuum hypothesis is true” and “the continuum hypothesis is false” are both consistent with the standard (ZFC) axioms of mathematics.

11.3 Power sets

We finish this section by showing that there are infinitely many different types of infinities.

Definition 11.16:
Let A be a set. We define the power set of A, denoted by P(A), to be the set P(A)={C:CA}.
Example:

We have the following examples:

  • P()={}, so |P()|=1.

  • P({1,2})={,{1},{2},{1,2}}, so |P({1,2})|=4=22.

  • P({1,2,3})={,{1},{2},{3},{1,2},{1,3},{2,3},{1,2,3}}. So |P({1,2,3})|=8=23.

Note that, for any nonempty set X, we know that ,X are distinct subsets of X. Hence, we have that |P(X)|2. We have the following two results, whose proofs are left as an exercise.

Lemma 11.17:

Suppose that A is a finite set with |A|=n for some nZ0. Then |P(A)|=2n.

Proof.

Exercise.

Proposition 11.18:

Let A,B be sets. Then

  • AB if and only if P(A)P(B).

  • P(A)P(B)P(AB).

  • P(A)P(B)=P(AB).

Proof.

Exercise.

Theorem 11.19:

Let X be a set. Then |X|<|P(X)|.

Proof.

Suppose X=. Then |X|=0<1=|P(X)|.

So suppose X is nonempty, and define f:XP(X) by f(x)={x}, for all xX. We show that f is injective. Let x1,x2X be such that f(x1)=f(x2). Then {x1}={x2}. Hence, we have that x1=x2. Therefore, f is injective, so |X||P(X)|.

Next, we use contradiction to show that there exists no bijection between X and P(X). Suppose there exists such a bijection g:XP(X). Note that for every xX, g(x)P(X) means g(x)X. Define A={xX:xg(x)}. Then A is a subset of X, so AP(X). Furthermore, since g is bijective, there exists some zX such that g(z)=A. By the definition of A, we have zA if and only if zg(z)=A, which is a contradiction (namely zAzA). Therefore, our assumption that there exists a bijection g:XP(X) is false. So |X||P(X)|.

It follows that |X|<|P(X)|.

History:

The above theorem is sometimes known as Cantor’s theorem. For a finite set, the result is clear, and Cantor re-used the diagonalisation method to prove the above for infinite set.

Corollary 11.20:

We have:

  • P(Z+) is uncountable.

  • |P(R)|>|R|, i.e. there are different type of uncountability.

We can extend on our last bullet point to see that |R|<|P(R)|<|P(P(R))|<|P(P(P(R)))|<...., i.e. there are infinitely many different type of infinities.