The building blocks of pure mathematics - sets and logic
To be able to prove mathematical results, we need two ingredients - the setting in which we are doing maths; and the logical reasoning we use to do maths.
Sets
Before we can start to do maths, we need to know the setting in which we are doing maths. For this reason, sets can be seen as the building block of all maths.
Definition 2.1:
A set is a collection of objects, where we ignore repeated elements and the order they appear in.
Notation:
We use curly brackets { and } to denotes sets.
We use the symbol to say an element is in a set. We drawn a line through it to negate it, i.e. we use to say an element is not in a set.
We use the symbol
to mean “(strictly) less than” and
to mean “less than or equal to”. Similarly we use
to mean “greater than” and
to mean “greater than or equal to”
Example:
The set of trigonometric functions: .
The set of integers between and : .
Since
and
, we have
, however
as
.
In mathematics, there are certain sets which are used so often that we have abbreviated notations for them. We start with two of them and we’ll build up to see more.
Notation:
If a set does not have any elements, we call it the empty set and use (or ).
is the set of integers, that is .
Etymology:
The symbol comes from the German “Zahlen” - which means numbers. It was first used by David Hilbert (German mathematician, 1862 - 1943), and popularised in
Europe by Nicolas Bourbaki (a collective of mainly French mathematicians, 1935 - ) in their 1947 book “Algèbre”. Integers come from the Latin in, which means “not”, and
tag which means “to touch”. An integer is a number that has been “untouched”, i.e. “intact” or “whole”.
The set of integers is equipped with the operations of additions and multiplication that satisfy the following 10 arithmetic properties (5 relating to addition; a Distributive Law; and 4 relating to multiplication):
(A1) - Closure under addition For all we have .
(A2) - Associativity under addition For all we have .
(A3) - Commutativity of addition For all we have .
(A4) - Additive identity For all we have .
(A5) - Additive inverse For all we have and .
(A6) - Distributive Law For all we have .
(A7) - Closure under multiplication For all we have .
(A8) - Associativity under multiplication For all we have .
(A9) - Commutativity of multiplication For all we have .
(A10) - Multiplicative identity For all we have .
Formally speaking, this is saying that with and is a ring. The notion of
a ring is explored in more details in later units such as the second year unit Algebra 2.
Properties (A1) to (A5) tells us that with is an abelian group. We will
explore the notion of groups and abelian groups later in this unit.
On top of these 10 arithmetic properties, is well ordered, i.e., it comes with 4 order properties:
(O1) - trichotomy For all either , or .
(O2) - transitivity For all , if and then .
(O3) - compatibility with addition For all , if then .
(O4) - compatibility with multiplication For all , if and then .
Etymology:
There are many words above that seems to have come from nowhere, but can be related back to words used in everyday English.
Associative comes from the Latin ad meaning “to” and socius meaning “partner, companion”. An associate is someone who is a companion to you. The property shows that it doesn’t matter who “keeps
company with”, the result is still the same.
Commutative comes from the Latin co meaning “with” and mutare meaning “to move”. To commute is “to change, to exchange, to move”. In everyday situation, a commute is the journey (i.e., moving) from home to work. The
property shows that we can move/exchange and and the result is the same.
Identity comes from the Latin idem meaning “same”. The additive identity is the element which keeps other elements the same when added to it. The multiplicative identity is the element which keeps other elements the same
when multiplied by it.
Inverse comes from in and vertere which is the verb “to turn”. The additive inverse of is the quantity that turns back “adding ”.
Trichotomy comes from the Greek trikha meaning “in three parts” and temnein meaning “to cut”. If you pick , then you can cut into three parts, the integers less than , the integers equal
to and the integers greater than .
Transitive comes from the Latin trans meaning “across, beyond” and the verb itus/ire meaning “to go”. Knowing and allows us to go across/beyond to conclude .
As seen from the properties above, we often need to quantify objects in mathematics, that is we need to
distinguish between a criteria always being met (“for all”), or the existence of a case where a criteria is met
(“there exists”). Sometimes we also need to distinguish whether there is a unique case where a criteria is met.
Notation:
We have the following symbolic notation:
The symbol denotes for all, or equivalently, for every.
The symbol denotes there exists.
We use to denote there exists a unique, or equivalently there exists one and only one.
A note on the usage of these symbols. Often we use these symbols as a shortcut when discussing maths verbally
and writing down their ideas (for example: in lectures; discussing mathematics with colleagues). However, in
formal text (for example: lecture notes; articles submitted to journals), often we avoid these symbols and use
words. In formal text, these symbols tend to be reserved for when doing formal logic (which we will see later)
or within a set setting (which we will see below).
Proof techniques:
To show that something is unique, we first show that one such case exists, and then proceed to show that if another case exists, it is equal to the first case.
□
We can use as a starting point to construct different sets.
Notation:
Within the curly brackets of a set, we use colon, , to mean “such that”.
Example:
Returning to the previous example, the set of integers between and can be written as ;
Notation:
We use to denote the positive numbers in a set. I.e., denotes the set of positive integers.
Similarly, denotes the set of negative integers.
We denote the set of non-negative integers by .
If , we write .
You may have also heard of the natural numbers denoted . However, some sources consider as a natural number (so ) and other
consider not to be a natural number (so ). To avoid any confusion (and because we will need sometimes and at
other times), we will not be using in this course.
Notation:
is the set of rational numbers, that is .
We will later see how can be constructed from and how this lead to a “natural” way to write each rational numbers
Etymology:
The symbol stands for the word “quotient”, which is Latin for “how often/how many”, i.e., the quotient is “how many times does fit in ”.
Surprisingly, the word “rational” to describe some numbers came after the use of “irrational” number. Ratio is Latin for “thinking/reasoning”. When the Phythagorean school
in Ancient Greece realised some numbers could not be expressed as the quotient of two whole numbers (such as , which we will prove later), they called those “irrational”,
i.e. numbers that should not be thought about. “Rational” numbers were numbers that were not “irrational”, i.e., one could think about them.
We extend the operations of addition and multiplication as well as the order relation for the integers to the rational numbers. Let , then:
Similarly to , we have that with and satisfies the properties (A1) to (A10) as well as (O1) to (O4). It also satisfy the extra arithmetic property:
(A11) - multiplicative inverse For all with , we have and .
Notice that (A11) is similar to (A5) but for multiplication. As we will see later in the course,
another way of saying (A7) to (A11) is that without under is an abelian group. As you will see in Linear Algebra, the arithmetic properties of ((A1) to (A11)) comes from the fact that is a field.
Similarly with , using we can construct the sets , , etc.
Truth table
Now that we have some objects to work with, we want to know what we can do with them.
In a mathematical system, statements are either true or false, but not both at once. We sometimes say a statement holds to mean it is true. The label ‘true’ or ‘false’
associated to a given statement is its truth value.
Example:
The statement “” holds and the statement “” is false. However, the statement “” could be true or false depending
on the value of , but it can not be both true and false at the same time.
Definitions (i.e., ) and axioms (i.e., (A1)-(A11) ) are statements we take to be true, while propositions, theorems and lemmas are statements that we want to prove are true and often consist of smaller statements linked together.
While we often don’t write statements symbolically, looking at truth table and statements help us understand the fundamentals of how a proof works. We first introduce the four building blocks of statements.
Definition 2.2:
We use the symbol to mean not. The truth table below shows the value takes depending on the truth value of .
We will see concrete examples of how to negate statements later in this chapter.
Definition 2.3:
We use the symbol to mean and. Let and be two statements, we have that is true exactly when and are true. The corresponding truth table is as follows:
Example:
Let , be the statement and be the statement . Then is the statement and , i.e., .
Definition 2.4:
We use the symbol to mean or. Let and be two statements, we have that is true exactly when at least one of or is true. The corresponding truth table is as follows.
Example:
Let , be the statement and be the statement . Then is the statement or . (Do not write as this makes no sense since ).
Definition 2.5:
We use the symbol to mean implies. Let and be two statements “” is the same as “If , then ”, or “for we have ”. The corresponding truth table is as follows.
The above truth table can seem confusing, but consider the following example.
Example:
Recall (A1) - “For all we have ”. Let be the statement and be the statement , then (A1) can
be written symbolically as . This statement is true, regardless of what the value of and are. But let us look at the truth value of and with different and
But no matter what
and
we pick, we will not get that
is true while
is false.
Proof techniques:
Many theorems are of the type “If then ”.
A common method to prove such statement is to start with the assumption (i.e., assume is true) and use logical steps to arrive at .These are often referred to as “direct proofs”.
The above example also shows that you can not start a proof with what you want to prove, as you could start with something false and end up with something true.
□
Turning back to sets, we look at examples of direct proofs.
Definition 2.6:
A set is a subset of a set , denoted by , if every element of is also an element of (Symbolically: , we have ).
We write when is not a subset of , so there is at least one element of which is not an element of (Symbolically: such that ).
A set
is a
proper subset of a set
, denoted by
, if
is a subset of
but
.
Example:
Let and . We will prove using a direct proof.
[If we let be the statement and be the statement , note that we have translate symbolically to .]
Let [i.e., suppose is true]. Then there exists such that . Hence, , for some , i.e. there exists such that .
Hence, [i.e., is true]. Since this argument is true for any , we have that for all , hence .
We will now prove that by showing there is an element of which is not an element of . Take . Then [as ] but [as but ].
Combining the two statements above, it follows that
.
Proof techniques:
To prove something is true for all , we “let ” or “suppose ” with no further conditions. Whatever we conclude about is true for all . This is the technique we used in Part 1. above.
Suppose
and
are sets. Showing that
is the same as showing that
and
.
□
Example:
We show that , by showing and .
Let . Then there exists such that . By (A6), since , we have . As this is true for all we have .
Let . Let . Note that so by (A6), since , we have . Hence, there exists such that
, so . As this is true for all we have .
We can combine the three basic symbols together to make more complicated statements, and use truth tables to find when their truth values based on the truth values of and .
Example:
Let and be two statements. The corresponding truth table for is as follows.
T |
T |
F |
T |
T |
F |
F |
F |
F |
T |
T |
T |
F |
F |
T |
T |
Example:
Let and be two statements. The corresponding truth table for is as follows.
T |
T |
F |
F |
T |
T |
F |
F |
T |
F |
F |
T |
T |
F |
T |
F |
F |
T |
T |
T |
Logical Equivalence
As well as combining statements together to make new statements, we also want to know whether two statements are equivalent, that is they are the same.
Definition 2.7:
We use the symbol to mean if and only if. For two statements and , “” means and . In this case we say “ and are equivalent”. The corresponding truth table is as follows.
If we take two statements which are logically equivalent, say is equivalent to , then proving to be true is equivalent to proving to be true. Similarly,
proving to be true is equivalent to proving to be true.
We can use truth tables to prove if two (abstract) statements are equivalent.This will prove to be useful later on when we turn a statement we want to prove is true into another equivalent statement that may be easier
to prove.
Theorem 2.8:
Let and be two statements.
is equivalent to .
is equivalent to .
Proof.
Using the last two examples and the truth table for , we have the following truth table.
T |
T |
T |
T |
T |
T |
F |
F |
F |
F |
F |
T |
T |
T |
T |
F |
F |
T |
T |
T |
Hence, for all truth values of and , we have that and have the same truth values. Therefore, is equivalent to .
Similarly, for all truth values of
and
, we have that
and
have the same truth values. Therefore,
is equivalent to
.
□
Proof techniques:
Notice that the above proof has several sentences to explain to the reader what is going on. It is made up of full sentences with a clear conclusion on what the calculation (in this case the truth table) shows. A proof should communicate clearly to the reader why the statement (be that a theorem, proposition or lemma) is true.
How much detail you put in a proof will be influenced by who your target audience is - this is a skill you will develop over your time as a student.
□
We leave the following proposition as an exercise.
Proposition 2.9:
Suppose and are two statements. Then:
The next proposition shows that and are associative, that is, and are statements that are clear without parentheses (and therefore do not require parentheses).
Proposition 2.10:
Suppose are statements.
Proof.
We prove part a. and leave the proof of part b. as an exercise. We have the following truth table.
T |
T |
T |
T |
T |
T |
T |
T |
T |
F |
T |
F |
F |
F |
T |
F |
T |
F |
F |
F |
F |
T |
F |
F |
F |
F |
F |
F |
F |
T |
T |
F |
F |
T |
F |
F |
T |
F |
F |
F |
F |
F |
F |
F |
T |
F |
F |
F |
F |
F |
F |
F |
F |
F |
F |
F |
Hence, for any truth values of
, the truth table shows that
□
Proposition 2.11:
Let be statements. Then and are not equivalent.
Proof.
We have the following truth table
T |
T |
T |
T |
T |
T |
T |
T |
T |
F |
F |
F |
T |
F |
T |
F |
T |
F |
F |
F |
F |
T |
F |
F |
T |
T |
F |
T |
F |
T |
T |
T |
T |
T |
T |
F |
T |
F |
F |
T |
T |
F |
F |
F |
T |
F |
T |
T |
T |
F |
F |
F |
T |
T |
T |
F |
From the above truth table, we can see that when
,
and
are false, then the truth value of
(which is true) is different from
the truth value of
(which is false). Hence
and
are not equivalent.
□
The above proposition shows that the statement therefore has no clear meaning without parenthesis. Similarly, there is an exercise to show that
and are not equivalent, so is likewise not clear.
Hence the meaning of assertions such as is undefined (unless one puts in parenthesis).
As an exercise,
one may also prove the following sometimes useful equivalences.
Proposition 2.12:
Let be statements. Then:
The next proposition shows that
and
are
distributive.
Proposition 2.13:
Let be statements. Then
(
(
Proof.
We will prove part a. and leave the proof of part b. as an exercise. We have the following truth table.
T |
T |
T |
T |
T |
T |
T |
T |
T |
T |
F |
T |
T |
T |
F |
T |
T |
F |
T |
T |
T |
F |
T |
T |
T |
F |
F |
F |
F |
F |
F |
F |
F |
T |
T |
T |
F |
T |
F |
F |
F |
T |
F |
T |
F |
F |
F |
F |
F |
F |
T |
T |
F |
F |
F |
F |
F |
F |
F |
F |
F |
F |
F |
F |
Hence, for any truth values of
, the above truth table shows that
□
Let us return to sets to see how logic may be applied to prove statements.
Definition 2.14:
Suppose that and are subsets of some set .
denotes the union of and , that is
denotes the intersection of and , that is
When , we say that and are disjoint.
Example:
We have . We also see that , hence they are disjoint.
Etymology:
The word union comes from the Latin unio meaning “a one-ness”. The union of two sets is a set that lists every element in each set just once (even if the element appears in
both sets). While the symbol looks like a “U” (the first letter of union), this is a coincidence.
While the symbol was first used by Hermann Grassmann (Polish/German mathematician, 1809 - 1877) in 1844, Giuseppe Peano (Italian mathematician, 1858-1932) used it to represent the union of two sets in 1888 in his article Calcolo geometrico secondo Ausdehnungslehre di H. Grassmann. However, at the time the union was referred to as
the disjunction of two sets.
The word intersect comes from the Latin inter meaning “within, in between” and sectus meaning “to cut”. The interesection of two curves is the place they cut each other,
the intersection of two sets is the “place” where two sets overlaps.
While the symbol was first used by Gottfried Leibniz (German mathematician, 1646 - 1716), he also used it to represent regular multiplication (there are some links
between the two ideas). Again, was used by Giuseppe Peano in 1888 to refer to intersection only.
The word disjoint comes from the Latin
dis meaning “away, in two parts” and the word
joint. Two sets are disjoint if they are apart from each other without any joints between them. (Compare this to disjunction, which has the same roots but is used to mean joining two things that are apart).
Lemma 2.15:
Let be a set, and for , let be the statement that satisfies the
criteria , and let be the statement that satisfies the criteria . Set
Then
Proof.
For , we have that if and only if the statement holds. Similarly, we have that if and only if the statement holds. Then
and
□
We can use our work on logical equivalence to show that that and are associative.
Proposition 2.16:
Suppose are subsets of a set . Then
.
.
Proof.
We will prove part a. and leave part b. as an exercise
Suppose
. Let
be the statement that
, let
be the statement that
, and let
be the statement that
. Recall Proposition
2.10,
. Then
Hence, we have that
if and only if
. It follows that
□
Proof techniques:
In the above proof, we used in each line so that the proof works both way (and we concluded at the same time as ).
When using , one needs to be very careful that indeed the implication works both ways, as it is very easy to make a mistake along the way. For this reason, many proofs
instead show and as two separate proofs (within the same proof).
□
Similarly, we have that and are distributive.
Proposition 2.17:
Let be subsets of a set . Then
Proof.
We will prove part a. and leave part b. as an exercise
Suppose
. Let
be the statement that
, let
be the statement that
, and let
be the statement that
. Recall Proposition
2.13 that
Then
Hence, we have that
if and only if
. It follows that
.
□
Negations
Being able to negate statements is important for two reasons:
Instead of proving is true, it might be easier to prove that is false (proof by contradiction).
Instead of proving it might be easier (by Theorem 2.8 ) to prove (proof by contrapositive).
We will expand on these two points later. We already know how to negate most simple statements, for example:
To negate simple statements that have been strung together, we use the following theorem.
Theorem 2.18:
Suppose are two statements. Then
Proof.
We will prove part a. and leave part b. and c. as exercises. We have the following truth table.
T |
T |
F |
F |
T |
F |
F |
T |
F |
F |
T |
F |
T |
T |
F |
T |
T |
F |
F |
T |
T |
F |
F |
T |
T |
F |
T |
T |
Thus, for any truth value of
, the truth values of
and
are the same.
□
Example:
If is not between and , then is either less than or more than . Formally speaking if and only if or if and only if or .
For statements that involves quantifiers (i.e., ), we use the following theorem.
Theorem 2.19:
Let be a set, and suppose that is a statement involving . Then
We also have
Proof.
Suppose that
is a false statement. Then there must be at least one such that does not hold. That is,
Conversely, suppose that
is a true statement. Then it is not the case that holds for all , that is
By (2.1) and (2.2), it follows that
Equivalently, we have
Setting
, we have
□
Example:
Let be a set and let us negate the statement “”, by writing equivalent statements using Theorem 2.18 and Theorem 2.19.
To make this more concrete, let
be the statement
and
the statement
. We have already shown that
, i.e.
is false. We did this by showing when
,
is false while
is true, i.e.,
such that
is true and
is false.
Negating statements is also very useful to see when an object does not satisfy a definition. We will see more examples of this later in the course, but for the moment here is an example.
Example:
Using (O2) as an example, a set satisfies transitivity (with respect to ) if for all if and then .
Let us see what it means for not to satisfy (O2). First we turn the definition into symbolic language . We then negate this
Proof techniques:
We have inserted phrases like “such that” to make our sentences more readable without changing their meanings.
□
Contradiction and the contrapositive
As mentioned earlier, negating statements are useful when trying to prove statements using different methods. First we look at proof by contradiction.
Proof techniques:
A proof by contradiction uses the following logic. Instead of trying to prove a statement is true, we assume that is true. If we end up with a contradiction,
either to something we already knew is true or to our original assumption, we must deduce that is false. This allows us to conclude that is true.
□
While we will use this method more extensively later, let us see an easy example.
Example:
Statement: Let . If then .
Proof: For the sake of a contradiction let us assume that
and
. (Recall the negation of
is
.) Since
we have
[by
(O3)], i.e.
.
Since
then
, i.e.
. So
and
, i.e. [by
(O1)]
, which is a contradiction. Hence
and
is false, so If
then
.
Another technique is a proof by contrapositive.
Definition 2.20:
Let and be two statements. The contrapositive of “” is “”.
Example:
The contrapositive of (A1), “for all , if then ” is “for all , if then or .”
The contrapositive of
(O3), “for all
, if
then
” is “$for all
if
then
”.
By Theorem 2.8, we know that is equivalent to its contrapositive. Sometimes, proving the contrapositive is easier than proving (we will see more examples of this later).
Example:
Statement: Let . If then .
Proof: We prove the contrapositive. The contrapositive is “if
then
”. Suppose
then
[since
by
(A1) and using
(O3)*]. In
other words
, i.e.
.
Note that we have proven “” by contradiction and by contrapositive. It is also worth noting that we could have proven it directly. This is meant to show that often there are numerous way to prove the same thing.
Note that the contrapositive doesn’t just change the implication symbol, but it also negates and . The contrapositive can often be confused with the converse:
Definition 2.21:
Let and be two statements. The converse of “” is “”.
Note that is not equivalent to its converse. Be careful when using a theorem/lemma/proposition that you are not using the converse by accident (which may not be true).
Example:
The converse of (A1) “for all , if then ” is “for all , if then ”, which we know is false (e.g., by taking .)
If the converse of is true, then we can deduce that .
Example:
The converse of “If then ” is “ then ”. We can show this is true, suppose then , i.e. , i.e. .
Therefore we conclude, for all
,
if and only if
.
Etymology:
Contradiction comes from the Latin contra which means “against” and dict which is a conjugation of the verb “to say, tell”. A contradiction is a statement that speaks against another statement.
Contrapositive also comes from the Latin contra and the Latin positus which is a conjugation of the verb “to put”. The contrapositive of a statement is a statement
“put against” the original statement, we have negated both parts and reversed the order. Furthermore it is “positive” as it has the same truth value as the original statement.
On the other hand, while sounding similar, converse comes from the Latin
con which means “together with” and
vergere which means “to turn”. The converse turns the order of
and
.
Set complement
We finish this section by looking at the complement of sets.
Definition 2.22:
Suppose that and are subsets of some set .
denotes the relative complement of with respect to , that is
denotes the complement of , that is
Example:
Let , and . Then , and .
Proposition 2.23:
Suppose are subsets of a set . Then
- .
Proof.
We will prove part a. and leave part b. as an exercise.
Let
. Then
Hence, we have that
is equivalent to
. It follows that
□
Theorem 2.24:
Suppose that are subsets of a set . Then
.
.
Proof.
We will prove parts a. and d. and leave parts b. and c. as exercises.
Proving a.) Let . Recall that for statements , we have that Then
Hence, we have that if and only if . It follows that
Proving d.) Let . Then
Hence, we have that
if and only if
. It follows that
.
□
History:
The above theorem is often known as De Morgan’s Laws, or De Morgan’s Theorem. Augustus De Morgan (English Mathematician, 1806 - 1871) was the first to write this
theorem using formal logic (the one we are currently seeing). However this result was known and used by mathematicians and logicians since Aristotle (Greek philosopher,
384BC - 322BC), and can be found in the medieval texts by William of Ockham (English philosopher, 1287 - 1347) or Jean Buridan (French philosopher, 1301 - 1362).
Notation:
It is often convenient to denote the elements of a set using indices. For example,
suppose is a set with elements. Then we can denote these elements as . So we can write
The set is called the indexing set.
Let
be a collection of subsets of a set
where
is an indexing set. Then we write
to denote the
union of all the sets , for
. That is,
Furthermore, we write
to denote the
intersection of all the sets , for
. That is,
Proposition 2.25:
Let be a set, let be a subset of , and let be an indexed collection of subsets, where is an indexing set. Then we have
Proof.
We will prove part a. and leave part b. as an exercise.
We know that if and only if we have that , for all . Then if and only if there exists an such that . Now, suppose that . Then , and for some , we have that . Hence, for some , we have that . Then which shows that
Now, suppose that
Hence, for some
, we have that
. Then for some
, we have that
and
. Since there exists some
such that
, we have
. Then
which shows that
. Summarising the above, we have that
□
Interest:
We finish this section with a brief side-note. How does one differentiate whether a statement is a definition, a theorem, a proposition, a lemma etc? The team at Chalkdust
Magazine made the following flowchart which while is not meant to be serious, does reflect quite well how one can classify different statements. That is: a definition is
a statement taken to be true; roughly speaking a proposition is an interesting but non-important result; while a theorem is an interesting, important main result; and a
lemma is there to build up to a theorem.
The original figure can be found at https://chalkdustmagazine.com/regulars/flowchart/which-type-of-statement-are-you/
The rationals are not enough
Now that we have a solid foundation of logic and abstract set notation, let us explore sets within . That is, let us look at subsets of the rationals.
This will lead us to notice that irrational numbers exists, and hence exploring a new set called the reals, .
The absolute value
Before we look at subsets of , we introduce the notion of absolute value.
Definition 3.1:
For , the absolute value or modulus of , denoted , is defined by
It is often helpful to think of the absolute value as the distances between the point and the origin . Likewise, is the distance between the points and .
Proposition 3.2:
For any
with if and only if ;
;
.
Example:
Statement Show that for any .
Solution Let . We have that , so [expanding the bracket] . Rearranging, this gives .
Proposition 3.3: (Triangle Inequality)
For all we have .
Proof.
We prove this by case by case analysis. First note that for all we have and . Let .
Case 1: Suppose then and so .
Case 2: Suppose
then
and so
.
□
Bounds for sets
With this notion of absolute value, we can start asking whether a subset of contains arbitrarily large or small elements.
Definition 3.4:
Let be non-empty. We that that is:
bounded above (in ) by if for all , ;
bounded below (in ) by if for all , ;
bounded if it is bounded above and below;
If is bounded above by and below by , then by setting we have is bounded by , i.e., for all , we have .
Note that is far from unique. For example, take the set . Then we can see that is bounded above by , but it is also bounded above by and by etc.
Definition 3.5:
Let be non-empty. The least (or smallest) upper bound of (in ) is such that:
is an upper bound, i.e. for all , ;
any rational number less than is not an upper bound, i.e. for all with , there exists with .
The greatest (or largest) lower bound of (in ) is such that:
is a lower bound, i.e. for all , ;
any rational number greater than is not a lower bound, i.e. for all with , there exists with .
Example:
Let . We show that is the least upper bound of . As remarked before, we have is an upper bound since if we take then with
. In particular , so .
We now show that is the least upper bound by showing any number less than is not an upper bound. Let . By taking , we see that , hence means
is not an upper bound. Hence is the least upper bound.
We show that is the greatest lower bound. First we show is a lower bound. Let then with . In particular , so .
We now show that is the greatest lower bound by showing any number greater than is not a lower bound. Let [so ]. Set , so . Then
So we have found such that , so is not a lower bound.
Proof techniques:
The argument for not being a lower bound above seems to come from nowhere. Sometimes it is hard to see where to start a proof, so mathematician first do scratch work.
This is the rough working we do as we explore different avenues and arguments, but that is not included in the final proof (so to keep the proof clean and easy to understand).
The scratch work for the above proof of might have been along the lines:
We want to find
such that
. Rearranging, this gives
(as both
and
are positive), so
. Since
, we have
. Picking
would satisfy
.
□
Note that in the example above the greatest lower bound of is not in itself [if , then there exists such that , i.e. , which is a contradiction.]. If a set is a
bounded subset of , then we get a different story.
Theorem 3.6: (Well Ordering Principle)
Any non-empty subset of contains a minimal element.
Proof.
Exercise to be done after we have introduced the completeness axiom.
□
Corollary 3.7:
Let be non-empty. If is bounded below, it contains a minimal element (i.e. its greatest lower bound is in ). If it is bounded above, it contains a maximal element (i.e., its least upper bound is in ).
As we saw above with the set , this is not the case for general subsets of . However, it is even worst, because a general subset of might be bounded
and not have a greatest lower bound or least upper bound in , as we will see in the next section.
The irrationals and the reals
We first show that there exists irrational numbers (i.e., numbers that are not rational) and show that this means that there exists subsets of whose lower upper bound is not rational (and hence we need
a bigger number system).
Theorem 3.8:
There does not exists such that .
Proof.
For the sake of a contradiction, suppose there exists such that . Without loss of generality, since and , we can assume , i.e. . Furthermore, if , then , and if then . So we assume that .
Let . Note that is non-empty since, , so . We have that is bounded below by , so by the
Well Ordering Principle, contains a minimal element, call if . We will prove that is not
minimal by finding with . This will be a contradiction to the Well Ordering Principle.
Define
. Since
, we have
so
. Similarly,
since
we have
so
. Hence
. Now
Hence
and
which is a contradiction.
□
Since irrational numbers exists if we restrict ourselves to only using rational number then there are many unanswerable questions. From simple geometrical problems (what is the length of the diagonal of a square with length side 1), to the fact that there are some bounded set of rationals which do not have a rational least upper bower.
Example:
Consider the set We have is bounded above (for example, by or ), but we show it does not have a least upper bound in the rational.
Let be the least upper bound of and note we can assume . We either have , or . We will show that all three of these cases leads to a contradiction.
Case 1: . We show that in this case is not an upper bound by finding such that .
[Scratch work: We look for such that . Rearranging, we get , then . Since , we know , so . Since , we have , so . So to find such that , i.e. , i.e. by completing the square . To simplify our life, let us take to be a multiple of , say , then we are looking for , i.e. , so should work, i.e. .]
Let . We prove that (and hence is not an upper bound) by showing . We have
Case 2: . This is a contradiction to Theorem 3.8
Case 3: . We leave this as an exercise. [Hint: Find appropriate so that is such that . Argue that is an upper bound for and to conclude is not the least upper bound. ]
Proof techniques:
Notice that in the above we made sure to have by setting where . By doing so, we reduced the numbers of properties we needed to have.
□
We use this as a motivation to introduce the real numbers.
Definition 3.9:
The set of real numbers, denoted , equipped with addition , multiplication and the order relation satisfies axioms (A1) to (A11), (O1) to (O4) and the Completeness Axiom
Completeness Axiom: Every non-empty subset of which is bounded above has a least upper bound.
Interest:
It can be shown that there is exactly one quadruple which satisfies these properties - up to isomorphism. We do not discuss the notion of isomorphism in this context here (although we will later look at it in the context of groups) save to remark that any two real number systems are in bijection (we will define this later) and preserves certain properties. This allows us to speak of the
real numbers.
There are several ways of constructing the real number system from the rational numbers.
One option is to use
Dedekind cuts. Another is to define the real numbers as equivalence classes of Cauchy sequences of rational numbers (you will explore Cauchy sequences in Analysis).
You may continue to imagine the real numbers as a number line as you did pre-university.
The definition of the absolute value is the same for real numbers as for the rational numbers, as is the notion of bounded sets (and therefore all the results in the previous section still holds for ).
Interest:
We can use the absolute value to define a distance or metric on . To do so we define for any two points . This distance has the following properties, for any :
You can explore whether other distance/metric can be defined on
or other sets in the 2nd year unit Metric Spaces. You can explore how we construct
other sets from
that satisfies the completeness axiom by looking up “p-adic numbers”.
We deduce two important result about .
Proposition 3.10:
Every non-empty subset of bounded below has a greatest lower bound.
Proof.
Let be non-empty and bounded below. Let be a lower bound. Define the set
Let , so . Since is a lower bound, , i.e. . So is an upper bound for , and is non-empty [since is non-empty]. By the Completeness Axiom has a least upper bound, . Let . We prove that is the greatest lower bound for .
We first show is a lower bound. Let , then so . Hence .
We now show is the greatest lower bound by showing any real number bigger than is not a lower bound. Let be such that , so . Now is not an upper bound for since is the least upper bound of . So by definition, there exists such that , i.e. . Since , we have . Hence is not a lower bound for .
So
is the greatest lower bound for
.
□
Theorem 3.11: (Archimedean Property)
For any there exists such that .
Proof.
We prove this by contradiction.
such that ) such that .
Suppose there exists
such that
for all
. In particular, this means
is bounded above. So by the completeness axiom,
has a least upper bound
[in
]. Since
is the least upper bound,
is not an upper bound, i.e. there exists
such that
. But then,
and
so
. This contradicts the fact
is an upper bound for
.
□
History:
The above property is named after Archimedes of Syracuse (Sicilian/Italian mathematician, 287BC - 212 BC) although when Archimedes wrote down this theorem, he credited Eudoxus of Cnidus (Turkish mathematician and astronomer,
408BC - 355BC). It was Otto Stolz (Austrian mathematician, 1842 - 1905) who coined this property - partly because he studied fields where this property is not true and therefore needed to coin a term
to distinguish between what is now know as Archimedean fields and non-Archimedean fields.
We finish this section by introducing notation for some common subsets of .
Notation:
Let are such that , we denote:
By convention we have , while .
The supremum and infimum of a set.
Since is complete, i.e., every bounded set has a least upper bound and greatest lower bound, we introduce the notion of supremum and infimum of a set.
Definition 3.12:
Let . We define the supremum of , denoted as follows:
If , then .
If is non-empty and is bounded above [i.e., there exists such that for all , ] then is the least upper bound of (which we know exists by the Completeness Axiom)
If is non-empty and is not bounded above [i.e., for all , there exists such that ] then .
Definition 3.13:
Let . We define the infimum of , denoted as follows:
If , then .
If is non-empty and is bounded below [i.e., there exists such that for all , ] then is the greatest lower bound of (which we know exists by the Completeness Axiom)
If is non-empty and is not bounded below [i.e., for all , there exists such that ] then .
Etymology:
Supremum comes from the Latin super meaning “over, above” while infimum comes from the Latin inferus meaning “below, underneath, lower” (these words gave rise to words like superior and inferior).
Example:
Let . We have already seen that (in ) and (not in ).
Proposition 3.14:
Let with . Then
;
;
;
;
.
Proof.
We will only prove and and leave the rest as the arguments are very similar.
Let with , we will show . [Note that is non-empty (as ) and it is bounded above.]
First we show that is an upper bound. Indeed, let then by definition so as required.
Next we show that is the least upper bound [by showing any real number less than is not an upper bound]. Let with .Suppose (i.e. ) and let . Note that and , so . Since , we have is not an upper bound.
Suppose (i.e. ) and let . Then
and , so . Since , we have is not an upper bound. In either cases, is not an upper bound, so is the least upper bound.
Let , we show that . [Note that is non-empty, so we want to show it is not bounded above].
Suppose for contradiction that
is bounded above [i.e.,
]. Let
be an upper bound for
. Set
. Note that
, so
. Hence
being an upper bound means
, however
. This is a contradiction.
□
Example:
Problem: Let Show that and .
Solution: Let us first look at the supremum. [The question is asking us to show is not bounded above.] Let . By the Archimedean Principle, choose such that [Scratch work missing to work out why we choose this particular ]. Define . Then
So we have found such that , so is not bounded above. Hence .
We now look at the infimum. We first show that is a lower bound. Let for some , so . Consider
Therefore , i.e. as required.
We next show that is the greatest lower bound for [by showing any number greater than it is not a lower bound]. First note that by setting , we have So . Thus, no value can be a lower bound for . This shows that is the greatest lower bound. Hence .
The next example is more theoretical.
Example:
Problem: Let and be bounded non-empty subsets of . Define the sum set as Show that .
Solution: Let and . Note that as both and are bounded. We show .
We first show is an upper bound for . Let , by definition, there exists and such that . Now and , so
We now show is the least upper bound for . Let , we show that is not an upper bound. Since is the least upper bound of , then is not an upper bound, so there exists such that . Similarly, there exists such that . Define . Then
This shows for any , is not an upper bound for . So is the least upper bound for , hence the supremum of .
Proof techniques:
Sometimes, instead of showing something is true for all , it is easier to show it is true for all where . We can do this because if , then setting , we see that .
□
Definition 3.15:
Let . We say that has a maximum if . In this case
we write to stand for the element with .
Similarly we say that
has a minimum if
. In this case we write
to stand for the element
with
.
Note that a set may not have a minimum or a maximum.
Example:
Let We have seen , so . However does not have a maximum as (as and ).
Example:
Let . Then we have seen that and , so . However it does not have a minimum as we have seen and .
Studying the integers
The integers, , is an interesting set as unlike or , we can not divide. This restriction bring a lot of interesting properties that we will now study in more details. (The Well Ordering Principle also sets apart from and and we’ll also make use of that principle.)
Greatest common divisor
While we can not do division in general, there are cases when we can divide by .
Definition 5.1:
For , we say that divides , denoted by , if such that . That is .
In this case we say is a divisor of .
Interest:
Notice that if we tried to extend this definition into , we will find that for all , we have .
Note that if then implies .
Theorem 5.2: (Division Theorem)
Let with . Then there exists a unique such that and .
We call
the
quotient and
the
remainder
Proof.
Let . By definition . If , taking gives . If
, taking gives , since and . So is non-empty, and contains a least element. Let this element be and the
corresponding be . Since , we know . We show by contradiction. Suppose and consider . Note that
, which contradicts the minimality of . Hence as required.
It remains to show that
and
are unique. For the sake of a contradiction, suppose there exists
with
(without loss of generality, we assume
as they are not equal) and
for
. Then
means
. Since
, we have
, so
, i.e.
. But
and there are no integers between
and
which is divisible by
. This is a contradiction to our assumption. Hence
, and we deduce that
.
□
Note that the division theorem shows that if and only if the remainder is equal to .
Because we have a notion of division, the following definition is natural.
Definition 5.3:
Let . We say that is a common divisor of and if and .
Note that is always a common divisor of and , and if , no integer larger than can be a common divisor of and .
Definition 5.4:
Let with not both equal to . We define the greatest common divisor of and , denoted by (or for highest common factor), to be largest positive divisor that divides both and .
Definition 5.5:
Let with not all the ’s being equal to . We define the greatest common divisor of to , denoted by
, to be the largest positive divisor that divides for all .
The following lemma showcases different properties of the gcd.
Lemma 5.6:
Suppose with and not both 0.
We have
Set , and take so that , ; then .
For all we have .
For all we have .
Theorem 5.7:
Let with not both equal to , and let Then there exists such that
Proof.
Let be the set
By taking we see that , hence is a non-empty subset of . By the Well Ordering Principle, it has a minimal value. We denote this minimum value by and take such that Note that since and , we have , so . Hence, .
Now, using Theorem 5.2 on and , let be such that with . Then
If , then with , contrary to how we chose . Hence, we must have that , which means . Similarly, we can show that , so is a common divisor of and . Since we have that .
Since
and
, it follows that
. Hence,
□
Proof techniques:
Note here that we proved by showing and . This trick is often exploited when tryingt to prove two numbers are the same.
□
This theorem has several important corollaries (and one generalisation).
Corollary 5.8:
Let with .
If and , then .
If and then .
Corollary 5.9:
Let , not all ’s equal to . Then there exists such that .
Note that the proof of Theorem 5.7 only shows the existence of and but doesn’t give us a way to find/calculate the value of and (that is the
proof is not constructive). To find and we need to use an algorithm. An algorithm is a logical step-by-step procedure for solving a problem in a finite number of
steps. Many algorithms are recursive, meaning that after one or more initial steps, a general method is given for determining each subsequent step on the basis of steps
already taken.
Etymology:
The word algorithm comes from Abu Ja’far Muhammad ibn Musa Al-Khwarizmi (Arabic Mathematician, 780-850). Al-Khwarizmi (meaning “from Khwarazm”) wrote a book detailing how
to use Hindu-Arabic numerals. When this book was translated for Europeans, it was given the Latin name Liber Algorismi meaning “Book of al-Khowarazmi”. As a consequence,
any manipulation of Arabic numerals (which are the ones we use nowadays) was known as an algorism. The current form algorithm is due to a “pseudo-etymological perversion”
where algorism was confused with the word arithmetic to give the current spelling of algorithm.
It is interesting to note that Al-Khwarizmi also wrote the book
Hisab al-jabr w’al-muqabala. We have
al-jabr comes from the Arabic
al- meaning “the” and
jabara meaning “to reunite”, so his book was on “the reunion of broken part”, i.e. how to solve equations with unknowns. When the book made its
way to Europe, Europeans shorten the Arabic title to
algeber. Over time, this mutated to the word
algebra which is currently used.
Algorithm 5.10: (Extended Euclidean algorithm)
Input: Integers and such that .
Output: Integers , and such that .
Algorithm:
Step 0: Set , and
Step 1: Set and . Find unique such that and . If then proceed to the final step, else go to Step 2.
Step 2: Set and . Find unique such that and . If then proceed to the final step, else go to Step 3.
Step (for ): Set and . Find unique such that and . If then proceed to the final step, else go to Step .
Final Step: If the last step was Step (with ) then output , and .
Proposition 5.11:
The Extended Euclidean algorithm terminates and is correct.
Proof.
Notice that after steps, we have Hence, the algorithm must terminate after at most steps.
If it terminates after Step 1 (i.e. ), then . Furthermore
So suppose the algorithm terminates after steps, for .
We note that and , and for , we have . Then by Lemma 5.6, we have
Finally, we prove by (strong) induction that
at every stage of the algorithm. So let
be the statement that
. We have seen above
that
holds. We also have
and
. So
. So
holds. Assume
holds for all
, then
we have
Hence
holds, so by strong induction
holds.
□
Notice that if , then apply the above algorithm to and then use instead of .
History:
Theorem 5.7 is often known as Bezout’s identity named after Étienne Bézout (French mathematician 1730–1783), while the Extended Euclidean algorithm is
named after Euclid of Alexandria (Greek mathematician, 325BC-265BC). One might wonder at the gap between between those two mathematicians given Bezout’s identity follows
immediately from the algorithm.
In his series of books
The Elements, Euclid gives an algorithm to find the gcd of two numbers. The algorithm uses the fact that
and Euclid uses repeated
subtractions. This was improved to give the above algorithm without
and
. Around 1000 years later, Aryabhata (Indian mathematician, 476-550) wrote the book
Aryabhatiya which contained a version of the Extended Euclidean Algorithm.
It is not clear when the algorithm was known in Europe, but it was used by Claude Gaspar Bachet de Méziriac (French mathematician, 1581-1638), who was alive a full 100 years
before Bezout. What Bezout did was to show that a version the Extended Euclidean Algorithm could be used for polynomials and a version of Bezout’s identity existed for
polynomials (You can find out how
polynomial rings are similar to
in units like Algebra 2).
Example:
We want to compute and find such that . We go through the algorithm using the following table
Hence .
Primes and the Fundamental Theory of Arithmetic
We now move on to look at a fundamental property of . First we need some more definitions.
Definition 5.12:
We say an integer is prime if for all if then or .
Equivalently, if
then
or
.
Theorem 5.13:
An integer is a prime if and only if the only positive divisors of are and .
Proof.
We prove both directions separately.
For a proof by contradiction, suppose is prime but there exists a positive divisor which is not or . Then we have and there exists
such that . Since and , we have . Now means . However, and since . Hence is
not prime, which is a contradiction. So if is prime, then the only positive integers of are and .
Suppose
is an integer whose only positive divisors are
and
. Let
be such that
. If
then we satisfy the definition of
being prime. So suppose
, then
(since
has no other positive divisors). By Corollary
5.8, we have that
, hence satisfying the definition of being prime.
□
Interest:
Most student are used to Theorem 5.13 to be the definition of a prime number. In fact, Euclid introduced prime numbers to be ones that satisfied that
definition (he used the Greek word protos meaning first, as prime numbers are the first numbers from which every other numbers arise. The word prime comes from Latin
“primus” which also means first). Around the 19th century,
it was discovered that there are some rings in which elements satisfying Definition 5.12 and elements satisfying Theorem 5.13 are different.
Following this, mathematician decided that elements satisfying Theorem 5.13 should be called irreducible (they can not be reduced into smaller elements).
You can learn more about
irreducible elements in general rings in a unit like Algebra 2.
Theorem 5.14: (Fundamental Theorem of Arithmetic)
Let be such that . Then there exist primes such that .
Furthermore, this factorisation is unique up to re-arranging, that is if
where
are also primes, then
and, if
we order
and
such that
and
for all
, then
for all
.
Proof.
We use a proof by induction to show the existence of the prime divisors of . Let be the statement that “there exists some primes such that
”. We note that is prime, hence holds. Assume holds for all and consider .
If is prime, then holds. If is composite, then it has a non-trivial divisor, say . So . Note as before that . So by the induction
hypothesis, both and can be written as a product of prime. Hence can be written as a product of prime and holds. So by the principle of (strong)
induction, we have all integers can be written as a product of primes.
We use a proof by contradiction to show the uniqueness (up to re-ordering) of the prime divisors of . Let be the subset of containing the integers
whose factorisation is not unique. For a contradiction, we assume is non-empty, hence by the Well Ordering Principle, it has a least element, call it .
Suppose where and are primes.
Note that since
we have
. If
, by definition
. In this case, if
then by definition
. Continuing this argument, we see that there exists
such that
. Without loss of generality, re-order the
’s so that
.
Since
is prime, it has two positive divisors,
and
. Since
(as it is prime), and a divisor of
we deduce
.
Hence
implies
. Since
is two distinct prime factorisation of
, we have
.
However,
contradicting the minimality of
. Hence
must be empty, so every
has a unique factorisation.
□
Corollary 5.15:
There are infinitely many prime numbers in .
Proof.
To the contrary, suppose that there are only finitely many primes, say for some . Set . Clearly , as is a prime. Hence . By the Fundamental Theorem of Arithmetic, can be factored as a product of primes. So we let be some prime dividing . Then for some . Since there are only finitely many primes, we must have that for some , . Hence,
so
Hence the prime divides , which is a contradiction. The result follows.
□
History:
The above proof is often called Euclid’s proof as it can be found in his books the elements. The proof follows directly his proof that every number can be factorised
as a product of primes. Euclid did not prove the uniqueness of such factorisation, instead a proof of the uniqueness of such factorisation can be found in Diophantus of
Alexandria (Greek mathematician, 200BC - 284BC) book Arithmetica (the title presumably linked to the Greek word arithmos meaning “number”).
The Fundamental Theorem of Arithmetic can be used to find solutions to equations such as the one below.
Example:
Problem: Find all square numbers such that there exists a prime with .
Solution: First, we suppose we have a prime such that for some . We want to find constraints on . We have that
By the Fundamental Theorem of Arithmetic, the only positive factors of are and . That is , so let us analyse these four cases.
Suppose . Then , that is . But this implies that , which is not prime. It follows that .
Suppose . Then , so and hence , which is a contradiction. Hence, .
Suppose . Then , so . Hence , so , which is prime.
Suppose . Then . Hence , and so . Then . But this is not possible since . Hence, .
Summarising all of the above cases, we have that if is a prime such that , for some , then
On the other hand, let
. Then
. Hence, the only squares
such that there exists a prime
with
is
(and
).
With the above representation of integers, we can revisit the greatest common divisor.
Lemma 5.16:
Let and write and where and are the prime divisors of or (and hence can be ). Then if and only if for all .
Proof.
We prove both directions separately.
For a contradiction, suppose and there exists such that . We can write where , and we can
write where . Since , there exists such that , that is . Rearranging, we have . Now since , we have , which is a contradiction since . Hence, for all ,
.
If
for all
then
Hence
.
□
Corollary 5.17:
Let such that neither are and write and where and are the prime divisors of or (and hence can be ). Then
Proof.
The greatest common divisor is a divisor of both and therefore the prime factorization of the gcd can contain only primes that appear in both and . Each of those primes can only appear with at most the minimum power appearing in both. The greatest common divisor will then be obtained by choosing as many primes as possible with the minimum power appearing.
□
Definition 5.18:
We say that are co-prime if .
The word co-prime is to demonstrate that there are no prime factors in common between and (they are prime with respect to each other).
Definition 5.19:
Let . Then is a common multiple of and if both and
Definition 5.20:
Let such that neither are . Then the lowest common multiple of and , denoted by , is the smallest positive integer which is divisible by both and .
Lemma 5.21:
Let such that neither are and write and where and are the prime divisors of or (and hence can be ). Then
Proof.
The lowest common multiple is a multiple of both and therefore the prime factorization of
the lcm must contain all the primes that appear in both and . Those primes must appear with at
least the maximum power appearing in each. The lest common multiple will then be obtained
by choosing as few primes as possible with the maximum power appearing.
□
Theorem 5.22:
Let such that neither are . Then
Proof.
We first note that for any two numbers we have . Write and .
Then:
□
Corollary 5.23:
Let such that neither are . If , then
Interest:
If there’s time, we’ll talk more about polynomial ring or famous open problems in Number Theory or the fundamental theorem of algebra.
Moving from one set to another - Functions
In mathematics, we are very often concerned with functions (also called maps). Some functions model the behaviour of complex systems, while other functions allow us to compare two sets. Loosely speaking, a function is a black box which takes an input from one set and gives an output in another. Every input must have an output, and there is no randomness, so the
same input always gives the same output. To give a formal definition of functions, we need to go back to sets.
Definitions
Before we can start working with functions, we need to give a precise mathematical definition.
Definition 6.1:
Given two sets , we define the Cartesian product of and , denoted by , by
Example:
We have that
So is the Cartesian plane.
Example:
Let and . Then
Note that
.
Definition 6.2:
Let be non-empty sets. A function from into , denoted by , is a set which satisfies that for each element there exists exactly one element such that . That is,
Symbolically, we have
is a function if
such that
(or
).
Example:
Let and . Let
Then is a function from into since, for each , there is exactly one such that . However, is not a function from into for
two different reasons. One reason is because but there is no value of such that . The other is , but there are two different values of (namely and ) such that .
Notation:
Consider and let . Then, the image of under is denoted by
We have
.
Definition 6.3:
Consider . We say that is the domain of and is the co-domain of . The image (or range) of is the image of under , i.e., the set .
Etymology:
The word function comes from the Latin functus which is a form of the verb meaning “to perform”. A function can be thought as a set of operations that are performed on
each values that are inputed. Interestingly, the word function (which was first introduced by Gottfried Leibniz (German mathematician, 1646 - 1716) ) was used for maps that
returned more than one output for the same input. It was only in the later half of the 20th century that a function was restricted to “exactly one output for each input”.
The word domain comes from the Latin
domus which means “house, home”. The domain is the place where all the inputs of
live in. Another way to look at it is, the word
domus gave the word
dominus which means “lord, master”, the person that ruled the house (i.e., a domain is the property owned by a Lord). The domain of a function is
the set of all the
that
has “control over”.
Example:
Define by , for all . We have is the domain of , while is the codomain. The
image of is (i.e., the square numbers).
Now, let
. Note that
The image of
under
is
Theorem 6.4:
Consider and . Then if and only if for all .
Proof.
We prove both implication separately.
). First, suppose that . Take (arbitrary) and choose (the unique) such that . Then (since ) and
by definition . This is true for all , so it follows that for every .
). Second, suppose that for all . Then
It follows that
if and only if
for all
.
□
Proof techniques:
Notice that in this proof (and all others) that the correct punctuation follows each equations, whether they are in-line or own their own (displayed). Maths should be read as part of a grammatically correct sentence. For example “Consider and ” reads “Consider a function from to and a function from to ”, or “then ” reads as “then is equal to the set of pairs where is in which is equal to….”.
It is for this reason that we should not start a sentence with maths symbol.
□
Injective, surjective and bijective
We now introduce three important properties that a function may or may not have.
Definition 6.5:
We say a function is injective (or one-to-one, or an injection) if for all , we have that if then .
Using the contrapositive, an alternative definition is that a function
is injective if for all
, we have if
then
.
Definition 6.6:
We say a function is surjective (or onto, or a surjection) if for all , there exists such that .
Definition 6.7:
A function is called bijective if it is both injective and surjective.
Example:
Define by
for all . We want to show that is bijective.
We first show it is injective by using the contrapositive. Let (the domain) be such that . That is
, i.e. and . Therefore, . Since ,
we can divide by to deduce . Hence means . Therefore and is injective.
We next show that is surjective. We begin by choosing [the co-domain].
[Scratch work: We want to find such that . So we need to find such that and . Hence, we must have and . Then , or equivalently, , or equivalently, . If we have and , then ]
We set and . Then [the domain], and
It follows that is surjective.
Since
is injective and surjective, it is bijective.
Example:
Let be defined by for all . We claim that is injective but not surjective.
We first show is injective. Let be such that . Without loss of generality, we assume . Note that by
definition, so means and similarly, . Combining these two inequalities together, we get , i.e.
so . So is injective.
We next show
is not surjective. We take
and claim that for all
we have
. For a contradiction, suppose such
an
exists, i.e.
. Note that
since
. However, if
, then
which is not possible. Hence
, but there are no natural
numbers between
and
. Hence
does not exists and
is not surjective.
Example:
Let and let be defined by for all . We claim that is surjective but not injective.
Indeed, we have that with but . Hence is not injective. Next, take . By definition of , there exists such that . Since , we have and as required.
Example:
Let be defined by for all . We leave it as an exercise to show that is bijective.
Proposition 6.8:
Consider . Then we have that is injective if and only if for all, there exists a unique such that .
Proof.
We show the two implications separately.
). Suppose that is injective, and take . Hence, such that . Now, suppose such that . Since is injective, we have that Hence, , such that .
). Suppose that , such that . Take such that and let By assumption, is the only element of that maps to . Then so Hence, we have shown that for with , we have that . Therefore, is injective.
It follows that
is injective if and only if
,
such that
.
□
Proposition 6.9:
Consider . Then is surjective if and only if .
Proof.
We show the two implications separately.
). Suppose is surjective, we will show by showing and . Note that by definition
.
Take . Since is surjective, there exists so that . Thus . As was arbitrary, we have ,
. Hence .
). Suppose
. Choose
. By definition of
, there exists
such that
. This argument holds for all
, hence
is surjective.
□
Corollary 6.10:
Consider . Then we have that is bijective if and only if , such that .
Proof.
We show the two implications separately.
). Suppose is bijective, then it is surjective so by Proposition 6.9 . We also have is injective, so by Proposition 6.8
we have , such that as required.
). Suppose that
,
such that
. In particular,
,
such that
, i.e. by
definition
is surjective. By Proposition
6.9 , so our assumption becomes
,
such that
. Hence
by Proposition
6.8,
is injective. Since we have shown
is surjective and injective, we deduce
is bijective.
□
Etymology:
The words injective, surjective and bijective all have the same ending “-jective”. This comes from the Latin verb jacere which means “to throw”.
Injective has the prefix in, a Latin word meaning “in, into”. This gave rise to the medical term “injection” where we throw some medecine into the body. An injective function is one that throws the set into the set (every elements of stay distincts once they are in .)
Surjective has the prefix sur a French word meaning “over”, itself coming from the Latin super which means “(from) above”. A surjective function is one
that covers the whole of (in the sense is thrown from above to cover ).
Bijective has the prefix
bi, a Latin word meaning “two”. Once we talk about inverse functions, it will become clear that a bijection is a function that works in two
directions.
Theorem 6.11:
Consider and let . Then
.
if is injective and , then .
Proof.
a.) Since , we have that , so . On the other hand, take . Then or ,
so or . Therefore . Since this holds for all , we have . Hence, .
b.) Suppose that
is injective and
. To the contrary, suppose that there exists some
. Hence, there exists some
such that
,
and there exists some
such that
. It follows that
. Since
is injective, we have that
.
Hence,
, which contradicts our initial assumption. It follows that
□
Corollary 6.12:
Suppose that is bijective, and let and . Then and .
Proof.
Since is bijective it is surjective and . We also note that , so we satisfy the hypothesis of the above theorem and deduce .
Furthermore since
is bijective, it is injective and we have
. Hence we satisfy the hypothesis of part b.) of the above
theorem and can deduce that
.
□
Pre-images
As well as looking at the image of various subset of under the function , we are often interested in looking at where certain subsets of come from.
Definition 6.13:
Consider and let . We define the pre-image (or inverse image) of under by
We have
.
Example:
Let and . Define the function . Then:
Example:
Let be defined by .
We will find and (recall that the open interval ).
We can also rearrange the last line to say:
We have the nice results that union and intersection behave as expected.
Theorem 6.14:
Consider and let Then
;
Proof.
We prove a. and leave b. as an exercise.
We have
Therefore the two sets are equal, i.e.
□
The link between images and pre-images is explored in the following theorem.
Theorem 6.15:
Consider , and let and . Then
, and for surjective, we have .
, and for injective, we have
Proof.
We prove a. and leave b. as an exercise.
If , then and . So we assume . Similarly, as , we have that if
then . So we also assume .
Suppose that . Then for some . By the definition of , we have . Hence, . Since is arbitrary, this shows that every element of lies in , that is .
Now suppose that
is surjective. We need to show that
Suppose that
. Since
is surjective, there exists an
such that
. Then
, and it follows that
Hence,
Since
was arbitrary, this shows that
Summarising, we have that
whenever
is surjective.
□
Composition and inverses of functions
We now turn our attention on how to apply several functions in a row.
Definition 6.16:
Consider and . We define the composition of with , denoted by , by
Etymology:
Composition has the same roots as components and composite. It comes from co meaning “together with” and ponere meaning “to put”. So something composite is something
“put together” from different components. The composition of two functions is the result of putting two functions together.
Since assigns to exactly one value , and assigns to exactly one value in ,
we have that is a function from to , that is .
We have that composition of functions is associative. That is:
Proposition 6.17:
Consider , and . Then
.
Proof.
By Theorem 6.4, to show , we need to show that , we have . Let . Then
and
Thus, .
This argument holds for all
, hence
.
□
The above theorem tells us that the notation is unambiguous.
Example:
Let be such that and . Recall that denotes the closed interval from to , that is
We define the following three functions:
defined by ;
defined by ;
defined by .
We briefly check that is indeed a function from to by checking (in theory one should also do the same with and , but these two cases are clearer). Pick , so by definition. Then since we have ,
and since we have . So . As this is true for all , we indeed have is a function with co-domain .
We can construct
by composing
with
with
, i.e.,
is defined by
.
Properties of functions carry through composition.
Theorem 6.18:
Consider and . Then:
if and are injective, we have that is injective.
if and are surjective, we have that is surjective.
Proof.
We prove a. and leave b. as an exercise.
Suppose that
are injective, and suppose that
are such that
.
Since
is injective, we have that
. Now, set
and
. Then
with
. Further, since
is injective,
we have
. It follows that
To conclude, for any
with
, we have
Hence
is injective.
□
Corollary 6.19:
Consider and be bijective, then is also bijective.
Definition 6.20:
We say that a function is invertible if there exists a function such that:
is the identity function on (that is, , ), and
is the identity function on (that is, , ).
We say is an inverse of .
If
is an inverse for
, then we also have that
is an inverse for
.
Etymology:
We’ve already seen that inverse comes from in and vertere which is the verb “to turn”. The inverse of an function, is one that turns the original function inside out to give back what was originally put in.
Example:
Define by , and define by . Then , we have
and
Hence is the identity function on , and is the identity function on . It follows that is invertible where is an inverse of .
Theorem 6.21:
Suppose , and suppose that and are inverses of . Then , that is, if has an inverse then its inverse is unique.
It is important to note that we have two conditions to prove is invertible. As an exercise, one can find examples of and such that
is the identity, but is not. However, if we have some constraints on or then only one condition needs to be met.
Proposition 6.22:
Suppose that and are such that is the identity function on .
If is injective, then is the identity map on (and hence ).
If is surjective, then is the identity map on (and hence ).
This next theorem gives another way to test if a function is invertible or not.
Theorem 6.23:
Suppose that . Then is invertible if and only if is bijective.
Proof.
We prove both implications separately.
). First, we will show that if is invertible, then is bijective. So suppose is invertible, and let denote the inverse of . Then is the identity function on , and is the identity function on . We first show is surjective. Take , then
. So set . Then and . So for all , there exists such that .
We prove that is also injective using the contrapositive. Suppose we have with . Then since is the identity on , we have
. Hence is injective and surjective, so it is bijective.
). Second, we will show that if is bijective, then is invertible. So suppose that is bijective. We set
Since is bijective, by Corollary 6.10 we have that for all , there is a unique such that . Hence, is a function, that is .
We will show that is the inverse of . For this, we first take and set to be . Then by the definition of , we have that , so . Since was arbitrary, this shows that is the identity function on . Now, choose any . Since is bijective then for all , there exists a unique such that . Further, since is a function, we must have that , and hence . Since was arbitrary, this shows that is the identity function on . Hence if is bijective, we have that is invertible.
It follows that
is invertible if and only if
is bijective.
□
Suppose that is bijective. In the above proof, we found a way of defining . That is, , we can write where such that
Example:
Let be such that and . In the previous example, we defined as the composition of three functions and to be
. We want to prove is bijective. One method would be to show all three functions that made up its composition are bijective (and
this is relatively easy to do so). But we will instead show is invertible by constructing the inverse.
By symmetry (replacing
with
,
with
etc), let us define
via
. We have
for all
Similarly,
for all
. It follows that
is the inverse of
. Therefore,
is bijective.
Theorem 6.24:
Suppose and are bijective. Then .
Cardinality
Now that we know about functions, we return to sets, in particular “measuring their size”. One reason sets were introduced was to “tame” and understand infinity. Indeed, as we will see at the end of course, the study of sets leads us to realise that there are different kinds of infinities.
Bijective functions allowed us to construct a “dictionary” from one set to another: every element in one set is in one-to-one correspondence with the elements in another bijective set.
This brings the idea that there’s a fundamental property shared by these two sets. This is the motivation of this section, but first some notation.
Notation:
Let be a set, we denoted by the cardinality of .
The above notation doesn’t actually define what we mean by the cardinality of a set.
Definition 7.1:
We build our notion of the cardinality of different sets as follows.
Again, for an individual infinite set this does not define what we mean by , but it does define what we mean by for two infinite sets and .
By combining the last two definitions together, we see that if is bijective, then and we say has elements. Note that in this case
we can enumerate the elements of as (where ).
We take as a fact that is infinite (which makes sense since by the Archimedean Property, is not bounded above).
Etymology:
Finite comes from the Latin finis meaning “end”. A finite set is one that ends. While infinite has the prefix in- meaning “not”, so an infinite set is one that has no end.
Cardinality comes from the Latin
cardin meaning “hinges” (which are used to pivot doors). This was used to talk about the “Cardinal virtues”, i.e., the pivotal virtues,
the most important one. From there, the word developed to mean “principal”. This was used by mathematician to name the set of whole numbers
to be the
Cardinals, as they are the principal numbers (compared to fractions, reals etc.). From there, the cardinality of a set referred to the whole number/the cardinal associated
to the set.
Lemma 7.4:
Suppose is injective and . Then .
Proof.
Let . Restrict to by defining via , for all . By the definition of , we have that , so is surjective. Now, suppose that are such that . Then , and since is injective, this means that . Hence, is injective. It follows that is bijective, so . We also know that , so .
□
Theorem 7.5:
Let , be two finite sets with and . If there exists an injective then .
Proof.
Let , then by the previous lemma we have and we know . Hence has at least elements, so .
□
We extend the above principle to all (finite or infinite) sets.
Definition 7.6:
For any two sets and , if there exists injective then we say . We say if and .
Note that in particular, for any two sets and such that , since we can define an injective function via for all , we have
.
Lemma 7.7:
Let and be two sets such that . Then
If is finite then is finite.
If is infinite then is infinite.
Proof.
Assume . We have , i.e. for some .
By definition, is finite.
Note that part b.) is the contrapositive of part a.)
□
Theorem 7.8:
If are sets with and then .
In particular, if there exists injective
and injective
then there exists bijective
. (Note: we are not claiming
or
are bijective).
Proof.
Non-examinable. It is beyond the scope of this course.
□
History:
Theorem 7.8 has many names: Cantor-Schröder-Bernstein Theorem; Schröder–Bernstein theorem or (as above) the Cantor-Bernstein theorem. Cantor himself suggested
that it should be called the equivalence theorem.
The reasons for so many names is that Georg Cantor (Russian mathematician, 1845-1912) first stated the theorem in 1887, but did not prove it. In 1897, Sergei Bernstein
(Ukranian mathematician, 1880-1968) and Ernst Schröder (German mathematician, 1841-1902) independently published different proofs. However, five years later, in 1902,
Alwin Korselt (German mathematician, 1864-1947) found a flaw in Schröder’s proof (which was confirmed by Schröder just before he died).
Proposition 7.9:
Let be a set and let be disjoint () finite sets. Then .
Proof.
Let such that and . Then
where only if , for integers . Similarly, we have
where only if , for integers . Since , we know that for any integers with and , we have . Now, define by
We leave it as an exercise that is bijective.
□
Corollary 7.10:
Let be a set and let be finite sets. Then
We will return to the idea of cardinality in Chapter 11, where we see there are different infinite cardinalities.
Sets with structure - Groups
Having looked at sets and functions between sets, we now turn our attention to “groups”. Formally these are sets with extra structure given by a binary operation (e.g. with ), but they arose historically (and crop up across maths today) as a way to understand symmetry.
Motivational examples - Symmetries
Before even giving the definition of a group, we’ll
look at several examples of symmetries of objects, so that the formal
definition will make more sense when we come to it.
Exactly what we mean by a “symmetry” will vary from example to
example, and sometimes there’s more than one sensible notion for the
same object, so rather than giving a general definition, we’ll clarify
what we mean in each example, and the common features should become
clear.
Permutations of a set
Let us consider the set of three elements. Imagine them arranged in a line:
We can rearrange them, putting them in a different order. For example,
we could swap the two elements furthest to the right:
This leaves us with exactly the same picture, so we’ll
regard this as a “symmetry” in this context.
Now let’s label the elements and put both pictures next to each other to see how the elements have been permuted.
This picture (Figure 8.3) brings to mind the function defined by (i.e. and ).
Or if we cyclically permuted the elements as follows:
this (Figure 8.4) would correspond to the function defined by (i.e. and ).
If this is to count as a ``symmetry’’ of the original position, then
we don’t want to move two elements to the same place, or to leave a
place unoccupied. For example:
gives a different pattern. Therefore we need the function to be bijective.
Definition 8.1:
Let be a set. A permutation of the set is a bijection .
Let’s look in more detail at the permutations of a set
with three elements. It’s easy to check that there are just six of
these, which we’ll denote by : see Figure 8.6, where we just show the position the three elements end up.
Notice that the function is the identity function, and likewise it is called the identity permutation.
Now we can look at what happens when we perform one permutation
followed by another. For example, if we do (swap the two elements
furthest to the right) and then (swap the two elements that are
now furthest to the left), then we are just taking the composition
of the two functions. This gives , and , so .
For simplicity, we’ll leave out the composition symbol and
write instead of . Let’s see what is: , and , so .
When we come to define groups, we’ll think of “composition” as an
operation to combine permutations or other kinds of symmetry (if
and are symmetries of an object, then is the symmetry “do
and then ”), much as ``multiplication’’ is an operation to
combine numbers. In fact we’re using pretty much the same notation:
compare for permutations with for numbers. Let’s explore this analogy further with permutations of a set .
Note that since the composition of two bijections is a bijections, we have that the composition of two permutations is a permutation. This is similar to axioms (A1) and (A7).
We have seen that composition of functions is associative (see Proposition 6.17 ), hence if are permutation, we have . This is similar to axioms (A2) and (A8).
We always have an identity permutation, defined by for all . Let be any other permutation of , then for all we have i.e. . This is
similar to axioms (A4) and (A10).
We know that a bijective function is invertible, so for all permutation , there exists such that . This is similar to axioms (A5) and (A11).
[You can check in the permutations of that and ]
Note that we can’t have anything similar to axioms (A3) or (A9) as we have seen an example of two permutations of such that .
Symmetries of polygons
Consider a regular -sided polygon (for example, if an equilateral triangle, or if a square). Examples of symmetries are:
There are many ways to make precise what we mean by a symmetry. For
example, considering the polygon as a subset of , we could
look at bijections that preserve distance: i.e., such that
the distance between and is the same as the distance
between and . It’s not hard to see that this implies that
must send vertices to vertices, and moreover must send adjacent
vertices to adjacent vertices. To keep things simple, we’ll use this
as our definition of a symmetry:
Definition 8.2:
A symmetry of a regular -sided polygon is a permutation
of the set of vertices that preserves adjacency: i.e., so that
for vertices and , and are adjacent if and only if
and are adjacent.
Note that for , every pair of vertices is adjacent, so in that
case every permutation of the vertices is a symmetry, and so we
are just looking at permutations of a set of three elements as in the
last section.
In the case
, we have a square. Let’s label the vertices as follows:
Then if we rotate it anticlockwise through an angle of
(let’s call this symmetry
) we get (Figure
8.8):
While if we reflect in the line of symmetry going from top right to bottom left (let’s call this symmetry ) we get (Figure 8.9):
However the following does not represent a symmetry as vertices and have not remained adjacent (Figure 8.10):
As with permutations, we can compose symmetries together and get a new symmetry. In fact, the analogues of (A1)/(A7), (A2)/(A8), (A4)/(A10) and (A5)/(A11) all hold (Prove that the composition of two bijections that preserves adjacency is a bijection that preserves adjacency, and that the inverse of a function that preserve adjacency is a function that preserves adjacency.)
Let’s look at what happens when we compose the symmetries
and
(Figure
8.11).
We see that is a reflection in the horizontal line of symmetry while is a reflection in the vertical line of symmetry. So here again we have .
Symmetries of a circle
We can look at bijections from a circle to itself that preserve
distances between points. It’s not too hard to see that these
symmetries are either rotations (through an arbitrary angle) or
reflections (in an arbitrary line through the centre).
Again, if and are symmetries, then usually .
Symmetries of a cube
We can look at the eight vertices of a cube, and define a symmetry to
be a permutation of the set of vertices that preserves adjacency (as
we did with a polygon). There are symmetries: most are either
rotations or reflections, but there is also the symmetry that takes
each vertex to the opposite vertex.
Rubik’s Cube
A Rubik’s cube has coloured stickers ( on each face), and we
could define a “symmetry” to be a permutation of the stickers that we
can achieve by repeatedly turning faces (as one does with a Rubik’s
cube). It turns out that there are
symmetries. Again we can compose symmetries (do one sequence of moves
and then another), and every symmetry has an inverse symmetry.
Formal definition
We want to formalize the structure of symmetries of the kind we looked
at in the last section. First of all noticed that in all cases we looked at combining two
permutations/symmetries to get a third. We can formalize this as follows.
Definition 8.3:
A binary operation on a set is a function
Etymology:
The word “binary” refers to the fact that the
function takes two inputs and from . (Technically speaking, it takes one input from .)
Notation:
We’ll usually write instead of .
Example:
Let be a non-empty set. Composition is a binary operation on the set of permutations of a set .
Addition is a binary operation on the set (or on , or on ).
Multiplication and subtraction are also binary operations on , or .
Intersection and unions are binary operations on the set of subsets of .
Note that in the definition of a binary operation, the function
maps to , so if we have a definition of so that
is not always in , then this is not a binary operation
on (we say that is not closed under ). Also, the
domain of is , so needs to be defined
for all pairs of elements .
Example:
Subtraction is not a binary operation on since, for example, . So is not closed under subtraction (it is closed under addition).
Division is not a binary operation on since, for example, .
Division is not a binary operation on since is not defined when . (But division is a
binary operation on the set of non-zero rational numbers).
For a general binary operation, the order of the elements
matters: is not necessarily equal to .
Definition 8.4:
A binary operation on a set is called commutative if
for all .
Example:
Some examples and non-example of commutative binary operations
Addition and multiplication are commutative binary operations on (axioms (A3) and (A9) ).
Subtraction is not commutative on since, for example, but .
Composition is not a commutative operation on the set of permutations of the set .
Bearing in mind the analogies we drew between some of the axioms of and the compositions of permutations/symmetries,
we’ll now define a group.
Definition 8.5:
A group is a set together with a binary
operation satisfying the following
properties (or “axioms”):
(Associativity) For all ,
(Existence of an identity element) There is an element
(called the identity element of the group) such that, for
every ,
(Existence of inverses) For every , there is an element
(called the inverse of ) such that
Strictly speaking, the group consists of both the set and
the operation , but we’ll often talk about “the group ”
if it’s clear what operation we mean, or say “ is a group under
the operation ”. But the same set can have several
different group operations, so we need to be careful.
Example:
If is a set, is the set of all permutations of
(i.e., bijective functions ), and is the
composition of bijections and , then
is a group.
Note that in this example, there are two sets involved ( and the
set of permutations). It is the set that is the group,
not (we haven’t defined a binary operation on ).
The set of all symmetries of a regular -sided polygon is a
group under composition, as is the set of all symmetries of a
cube, or a Rubik’s cube.
These examples, and similar ones, of symmetry groups, are the
motivation for the definition of a group, but there are some other
familiar examples of sets of numbers with arithmetical operations
that fit the definition.
Example:
Here we explore some sets and associated operations and check if they are a group or not.
We have is a group. [by axioms (A2),(A4),(A5)]. Similarly and are groups.
The set is not a group under addition,
since it doesn’t have an identity element []. Note
is still not a group (under addition), as although it has the identity
element , no integer has an inverse (since for any , ).
We have is a group [by axioms (A8),(A10) and (A11)].
Similarly is a group.
We have is not a group, since does not have an inverse.
We have is not a group under subtraction, since associativity fails:
, but , and so these are different
whenever .
Matrices are another source of examples.
Example:
Let be the set of matrices with real
entries. Then is a group under (matrix) addition.
is not a group under matrix multiplication, since not
every matrix has an inverse. However, the set of invertible
matrices is a group under multiplication.
We’ve stressed that and are typically
different in symmetry groups. But in the examples coming from
addition and multiplication of integers, they are the same.
Definition 8.6:
A group is called abelian if
for all .
History:
The word “abelian” was introduced in the 1880s by German mathematician Heinrich Weber (1842 - 1913). He derived the name from
the Norwegian mathematician Niels Henrik Abel (1802 - 1829) who studied the permutation of solutions of polynomials.
Very loosely speaking he showed in 1824 that for a given polynomial if the permutation of the roots form an Abelian group then the polynomial can be solved in radicals (i.e. involving and -th roots only).
You can learn more about this in the fourth year course Galois Theory (named after French mathematician Évariste Galois, 1811 - 1832).
Even in a non-abelian group, may hold for some elements
and , in which case we say that and commute. For example, the identity
commutes with every other element. However for the group to be abelian it must hold for all elements.
Example:
The group of permutations of a set is non-abelian if
has at least three elements, since if are three distinct elements, is the permutation that swaps and (and fixing all other elements), is the permutation that swaps and , then .
More generally, symmetry groups are typically non-abelian
(although sometimes they are). In particular, the symmetry group of
a regular -sided polygon (where ) is non-abelian.
is abelian by axiom (A3).
is abelian by axiom (A9).
is an abelian group under matrix addition.
If , then the set of invertible real
matrices is a non-abelian group under matrix multiplication, since for example
but
Notation:
Often, especially when we’re dealing with abstract properties of
general groups, we’ll simplify the notation by writing instead of
, as though we’re multiplying. In this case we’ll say, for
example, “Let be a multiplicatively-written group”.
Note
that this is purely a matter of the notation we use for the group
operation: any group can be “multiplicatively-written” if we
choose to use that notation, so if we prove anything about a general
multiplicatively-written group, it will apply to all groups, no matter
what the group operation is.
Of course, if we’re looking at something like the group of real
numbers under addition, it would be incredibly confusing to write this
multiplicatively, so in cases like that, where multiplication already
has some other meaning, or where there’s already another standard
notation for the group operation, we’ll tend not to use multiplicative
notation.
Notice that in a multiplicatively-written group, the associativity axiom says
and from this it follows easily (by induction, for example) that any
product such as has a unique meaning: we could bracket it as
or or or any of the other possible
ways, and because of associativity they would all give the same
element.
Elementary consequences of the definition
Throughout this section, will be a multiplicatively written
group. Remember that being “multiplicatively written” is purely a
matter of notation, so everything will apply to any group.
A familiar idea from elementary algebra is that if you have some
equation (say involving ) such as
then, as long as , you can “cancel” the to deduce that .
Formally, we multiply both sides by (which exists by axiom (A11)).
A similar principle applies in a group, because of the fact that
elements all have inverses, with one complication caused by the fact
that the group operation may not be commutative.
Since multiplying on the left and on the right are in general
different, so we need to decide which is appropriate.
Proposition 8.7: (Right cancellation)
Let be elements of a multiplicatively written group . If , then .
Proof.
Multiply on the right by :
□
Proposition 8.8: (Left cancellation)
Let be elements of a multiplicatively written group . If , then .
Proof.
Multiply on the left by :
□
This simple principle has some nice consequences that
make studying groups easier. One is that the defining property of
identity element is enough to identify it: no other element has
the same property.
Proposition 8.9: (Uniqueness of the identity)
Let be elements of a multiplicatively written group. If then . Similarly, if then .
Proof.
If then . By “left cancellation”, we can cancel to deduce . Similarly, if then , so by “right cancellation” we can cancel to deduce .
□
A similar proof shows that an element of a group can only have one inverse.
Proposition 8.10: (Uniqueness of inverses)
Let be elements of a multiplicatively written group. If
then and .
Proof.
If then , and so, by left cancellation
. Similarly, if then , and so by right
cancellation .
□
This means that, to prove that one element of a group is the
inverse of another element , we just need to check that their
product (either way round: or ) is equal to the
identity. Here are some examples of useful facts that we can prove
like this:
Proposition 8.11:
Let be an element of a multiplicatively written group. Then the
inverse of is : .
Proof.
By uniqueness of inverses, we just need to check that ,
which is true.
□
Proposition 8.12:
Let be elements of a multiplicatively written group. Then the
inverse of is .
Proof.
By uniqueness of inverses, we just need to check that . But
□
Make sure you understand how each step of the previous proof follows
from the definition of a group, and in particular how we have used the
associative property. The notes will be less explicit about this in future proofs,
leaving out brackets.
Notation:
Next, some notation. If , we write write for the
product of with itself, for the product (which
is the same as by associativity), and so on.
Note that for we also have a meaning for , since
is notation we use for the inverse of .
We extend this even further by defining to be
if (which gives us a meaning for for any non-zero integer
, positive or negative) and defining to be the identity
element (so that we now have a meaning for for every .
We call
the
th power of
.
To justify why this is a sensible notation, let us see what happens
when we multiply powers. First:
Proof.
By definition, . To prove this is the same as
we just have to show , by uniqueness
of inverses. But
cancelling each with an .
□
Proposition 8.14:
If is an element of a multiplicatively written group , and
and are integers, then
Proof.
Let us fix and prove it is true for all .
We first prove this by induction for when . It is true
when since
Suppose it is true for [that is ]. We show it is true
for . We have
So by induction it is true for all .
If
, and
, then by the lemma
which is equal to
by applying what we’ve already proved with
in place of
.
□
We’ve already proved the formula . What about
for other values of ? In a non-abelian group, there is no
simple formula. In particular:
Dihedral Groups
As we study group theory, it will be useful to have a supply of
examples of groups to think about. Of course, no single example will
be ideal for everything, but some are more helpful than others. Some
of the more “arithmetic” groups, such as and
, have the advantage of being very
familiar, but the disadvantage of being rather untypical because
they’re abelian. If you have a “standard example” that you like to
think about, then it will be much less misleading if it’s
non-abelian.
A good first family of examples are the “dihedral groups”, which are the symmetry groups of regular polygons, and are non-abelian, but still fairly uncomplicated.
In the next section we will look at a second important family of examples, the “symmetric groups”.
Let be a regular -sided polygon, with vertices labelled
anticlockwise . Recall that by a symmetry of we
mean a permutation of the vertices that takes adjacent vertices to
adjacent vertices.
For example, we can send vertex to any other vertex by an
appropriate rotation. If is a symmetry sending vertex to
vertex , then, since it preserves adjacency, it must send vertex
to one of the two vertices adjacent to vertex , and once we
know and , then is determined as the other vertex
adjacent to , and so on around the polygon for
.
So the total number of symmetries is (since there are choices for and
for each of these there are two choices for ). This explains the
following choice of notation.
Definition 8.15:
The dihedral group of order is the group of
symmetries of a regular -sided polygon.
Let’s fix some notation for two particular symmetries.
Notation:
We set to be a rotation anticlockwise through an angle of
.
We set to be a reflection in the line through vertex
and the centre of the polygon.
So
and
Now consider the symmetries and for . These
both send vertex to vertex , but sends vertices
to the vertices following anticlockwise around the
polygon, whereas sends them to the vertices following clockwise
around the polygon. So all of these symmetries are different, and every
symmetry is of this form, and so every element of
can be written in terms of and .
Proposition 8.16:
We have .
Let be a group with a finite
number of elements: its multiplication table (often also called the
Cayley table) has one row and one column for each element of the group, and
the entry in row and column is the product . So the table
displays the group operation.
Example:
The Cayley table for is as follows:
History:
Arthur Cayley was an English mathematician (1821 - 1895) who gave the first abstract
definition of a finite group in 1854. It is around then that he showed that a group
is determined by its multiplication table (i.e., its Cayley table), and gave several
examples of groups that were not symmetric groups. However, at the
time the mathematical community did not pursue this further, preferring to stick to studying the
concrete symmetric groups. It wasn’t until 1878 when he re-introduced his definition
that other mathematicians refined it and an agreed formal definition of a group emerged.
As we can see in the Cayley graph above, we have expressions such as or
appearing, but they are not in our list of standard form for elements.
So, we must be able to work out how to write elements such as or
into standard form.
There are three basic rules that let us easily do calculations.
Proposition 8.17:
We have .
This is clearly true, as it just says that if we repeat a rotation
through an angle of times, then this is the same as the
identity.
Note that this means that and so by uniqueness of inverses
which allows us to write any power (positive or negative) of as
one of .
Proposition 8.18:
We have .
This is clearly true, as repeating a reflection twice gives the identity.
This implies, by uniqueness of inverses again, that , and so any power of is one of or .
What about ? Well and , so , similarly
and so the other vertices follow clockwise. This is the same as , or .
Proposition 8.19:
We have .
We’ll see that these three rules allow us to simplify any expression.
For example,
where we repeatedly “swap” a with an or , changing the into
or the into .
We get the following rules for multiplying expressions in standard form.
Theorem 8.20:
Fix . In , for ,
You are encouraged to practice some calculations with the dihedral
group, especially with and , as we’ll frequently be using
these as examples.
Example:
We rewrite The Cayley table for (check the calculation
yourself):
Symmetric Groups and Cycles
We have already met symmetric groups, but not with that name. A
symmetric group is just the group of all permutations of a set. We’ll
only really consider finite sets, although the definition makes sense
for any set.
Definition 8.21:
Let be a set. The symmetric group on is the
group of all permutations of (i.e., bijective functions
), with composition as the group operation.
The symmetric group of degree is the
group of all permutations of the set of
elements.
Let’s think of ways of describing permutations. Of course, we could
just say what is for each value of , and so, for example,
refer to the element of with , , ,
, and .
This is quite hard to read, and one easy way to set out the
information in a more easily readable form is to use a “double row”
notation with along the top row, and
along the bottom row.
For example, if is the element of described above, then we
could write
But there’s an even more compact method, that is also much more
convenient for understanding group theoretic aspects of permutations.
Definition 8.22:
Let and be positive integers with . A -cycle
in is a permutation of the following
form. There are distinct elements
such that
and for . (So “cycles”
the elements and leaves other elements unmoved.)
Such an is denoted by .
We call
the
length of this cycle.
Example:
In , the -cycle
has
or, written in double row notation,
The notation
would also denote an element of with ,
so this notation doesn’t specify which symmetric group we’re looking
at, although that is rarely a problem.
Note that the
-cycle
is exactly the same
permutation as
, so the same cycle can be written in different
ways (in fact, a
-cycle can be written in
different ways, as we
can choose to start the cycle with any of the
elements
).
Definition 8.23:
A transposition is another name for a -cycle. So this is
just a permutation that swaps two elements and leaves all other
elements unmoved.
Etymology:
Transposition comes from the Latin trans meaning “across, beyond” and positus meaning “to put”. A transposition takes two elements and put them across each other, i.e. swap position.
Of course, not every permutation is a cycle. For example, the permutation
that we used as an example earlier is not. However, we can write it as
a composition
of two cycles. These two cycles are “disjoint” in the sense that
they move disjoint set of elements and , and this
means that it makes no difference which of the two cycles we apply
first. So also
Definition 8.24:
A set of cycles in is disjoint if no element of
is moved by more than one of the cycles.
Theorem 8.25:
Every element of is may be written as a product of disjoint cycles.
Proof.
We give a constructive proof, that is we give a method which will write any as such a product.
We’ll write down a set of disjoint cycles by considering repeatedly
applying to elements of , starting with the
smallest elements.
So we start with , and consider until we
reach an element that we have already reached.
The first
time this happens must be with , since if
for then , and so we’d have repeated
the element earlier. So we have a cycle
Now we start again with the smallest number that is not involved
in any of the cycles we’ve already written down, and get another cycle
for some .
We keep repeating until we’ve dealt with all the elements of
.
□
This will probably be clearer if we look at an example.
Example:
Consider the element
of written in double row
notation. To write this as a product of disjoint cycles, we start with
, and calculate
so we have a -cycle .
The smallest number we haven’t yet dealt with is , and
so we have a -cycle .
The only number we haven’t dealt with is , and
so finally we have a -cycle .
So
as a product of disjoint cycles. Since -cycles are just the
identity permutation, we can (and usually will) leave them out, and so
we can write
Since the order in which we apply disjoint permutations doesn’t
matter, we could write the cycles in a different order, or we could
start each cycle at a different point. So, for example,
is another way of writing the same permutation as a product of
disjoint cycles.
It’s very easy to read off the product of permutations written in
disjoint cycle notation, just by using the method in the proof of
Theorem 8.25.
Example:
Let
and
be elements of written in disjoint cycle notation. Let’s calculate and .
but of course these are not disjoint cycles. So we start with
and calculate where it is sent by performing the permutation
repeatedly:
- is sent to by ,
which is sent to by , which is not moved by
. So .
- is not moved by . It is sent to
by , which is not moved by . So .
- is not moved by . It is sent to
by , which is not moved by . So .
- is sent to by ,
which is not moved by and is sent to by
. So .
- is sent to by , which is not moved by
, and sent to by . So
.
- So we’ve completed our first cycle .
- is sent to by
, which is not moved by or . So
.
- is sent to by ,
which is not moved by and sent to by . So
.
- is sent to by ,
which is not moved by and sent to by . So
.
- So we’ve completed another cycle .
- We’ve now dealt with all the numbers from to , so
as a product of disjoint cycles.
Similarly,
as a product of disjoint cycles.
Example:
It is easy to write down the inverse of a permutation written as a
product of disjoint cycles. Just write the permutation
backwards. For example, if
then
Of course, we have a choice of which order to take the cycles, and
where to start each cycle, and if we carried out the method in the
proof of Theorem
8.25 then we’d get the alternative
representation
History:
Symmetric groups were one of the first type of groups to emerge as mathematicians studied
the permutation of roots of equations in the 1770s. This was before the formal abstract definition
of a group was introduced - around 100 years later in the 1870s.
Order of a group and of elements
With those two main examples in mind, we now explore some more properties of groups.
Definition 8.26:
The order of a group , denoted , is the cardinality of the set .
Similarly, we say a group
is finite if the set
is finite, and infinite
otherwise.
Proposition 8.27:
We have .
Proof.
We’ll count the permutations of . Let . Since can be any of the elements , there
are possibilities for . For each of these, there are
possibilities for , since it can be any of the elements
except for . Given , there are
possibilities for , since it can be any of the elements
except for and . And so on. So in total
there are
permutations of .
□
Definition 8.28:
Let be an element of a multiplicatively-written group . Then:
If for some , then the order of , and
denoted (or if it may be unclear what group
we’re referring to) is , i.e.
the least positive integer such that . [Note this is well defined
by the Well Ordering Principle.]
If there is no such that , then we
say that has infinite order and write
(or ).
Example:
In any group with identity element , and is
the only element with .
Example:
In the dihedral group , we same notation as before, and .
Example:
In the group we have ,
(since
but ), and for any other
(since either , in which case for
all integers and so , or , in which case
for all integers and so ).
Example:
In the group , , but for
any other . Note that when the group operation is
addition and the identity element is , as here, for an element
with finite order we would require ( times)
to be .
Proposition 8.29:
The order of a -cycle in the symmetric group is .
Proof.
It is clear that if we repeat a -cycle
times, each element is sent back to itself (and if we repeat it
times, then is sent to ).
□
In fact, one benefit of disjoint cycle notation in is that it makes it easy to
calculate the order of a permutation.
Theorem 8.30:
If is the product of disjoint
cycles of lengths , then
Proof.
Consider when is the identity permutation. For
when is one of the numbers involved in the -cycle, we
need to divide . But if divides for all , then
this applies to all the numbers moved by . So the smallest such
is the lowest common multiple of .
□
Example:
The order of the permutation from a previous
example is . Notice that if we write this permutation as
using cycles that are not disjoint, it is much less clear what the
order is, and it is definitely not the lowest common multiple of the
cycle lengths.
We’ll now see what we can say about the powers of an element of a
group if we know its order, starting with elements of infinite order.
Proposition 8.31:
Let be an element of a group with . Then
the powers of are all distinct: i.e., if
are integers.
Proof.
Suppose . Without loss of generality we’ll assume . Then for ,
but since there is no positive integer with
, so we must have and so .
□
Corollary 8.32:
If is an element of a finite group , then .
Proof.
If then the previous proposition tells us that the
elements (for ) are all different, and so
must have infinitely many elements.
□
In fact, we’ll see later that the order of a finite group
severely restricts the possible (finite) orders of its elements.
Next for elements of finite order.
Proposition 8.33:
Let be an element of a group with .
For an integer , if and only if is divisible by .
For integers , if and only if is divisible by .
.
The distinct powers of are .
Proof.
- Firstly, if is divisible by , so that for some integer , then
since .
Conversely, suppose that . We can write with
and (by Theorem 5.2 ). Then
But is the least positive integer with and with
, so can’t be positive, and we must have . So
is divisible by .
if and only if , which by a.) is the case if and only if is divisible by .
Take , . Then is divisible by , and so by b.
For any integer , write for integers with
. Then is divisible by , and so by
. So every power of is equal to one of
. Conversely, If
and is divisible by , then ,
and so by b.) the elements are all different.
□
If we know the order of an element of a group, then we can work out
the order of any power of that element.
Proposition 8.34:
Let be an element of a group , and an integer.
Proof.
Suppose . If , then
with a positive integer, contradicting
. Similarly, if then with
a positive integer, again giving a contradiction. So in either
case .
Since divides , divides , and so
If
and
, then
divides
, so
divides
, and in particular
. So
is the smallest
positive exponent
such that
, and is therefore the
order of
.
□
Example:
In the dihedral group , . So the Proposition
gives , since .
Example:
Taking , the Proposition gives , since
for any . [But this case is very easy to check
directly, since and so if and only
if .]
Linking groups together
We linked sets to each other by considering subsets, functions between sets, and Cartesian products of sets.
In a related way, in this section we look at appropriate ideas of “subgroups”, functions between groups and direct products of groups.
Subgroups
Definition 9.1:
A subgroup of a group is a subset that is
itself a group with the same operation as .
We sometimes write .
Example:
For any group , the trivial subgroup and itself are subgroups of .
Definition 9.2:
We call a subgroup not equal to a non-trivial subgroup,
and a subgroup not equal to a proper subgroup of .
Example:
We have is a proper subgroup of . We have
is a proper subgroup of both and .
Example:
The group is a subgroup of .
If
is a positive integer, then the group
is a subgroup of
.
Example:
The group of rotations of a regular -sided polygon (i.e., ) is a subgroup of the dihedral group .
Here’s a simple description of what needs to be checked for a subset to be a subgroup.
Theorem 9.3:
Let be a multiplicatively written group and let be a subset of , Then is a subgroup of if and only if the following conditions are satisfied.
(Closure) If then .
(Identity) .
(Inverses) If then .
Proof.
We don’t need to check associativity, since is
true for all elements of , so is certainly true for all elements
of . So the conditions imply is a group with the same
operation as .
If
is a group, then (
Closure) must hold. By uniqueness of
identity and inverses, the identity of
must be the same as that
of
, and the inverse of
is the same in
as in
, so
(
Identity) and (
Inverses) must hold.
□
Proposition 9.4:
If are two subgroups of a group , then is also a subgroup of .
Proof.
We check the three properties in the theorem.
If then by closure of , and by closure of , and so .
and , so .
If , then since has inverses, and
since has inverses. So .
□
Cyclic groups and cyclic subgroups
An important family of (sub)groups are those arising as collections of powers of a single element.
Notation:
Given a (multiplicatively-written) group and an element ,
we’ll define to be the subset of consisting of
all powers of .
It is easy to check that:
Proposition 9.5:
The set is a subgroup of .
Proof.
We need to check the conditions of Theorem 9.3.
If are powers of , then is a power of
, so is closed.
We have is a power of , so .
If , then , so is closed under taking inverses.
□
Definition 9.6:
If , then is called the cyclic subgroup of generated by .
The following explicit description of follows
immediately from Propositions 8.31 and
8.33.
Proposition 9.7:
If then is infinite with unless .
If
then
is
finite, with
distinct elements.
Example:
If is the dihedral group of order , then is the group of rotations, and
.
Example:
Let . Then and consists of all powers of
(and of , since we include negative powers of ).
Example:
Let . Since the operation is now addition, consists of all multiples of . So, for example,
and .
Definition 9.8:
A group is called cyclic if for some
. Such an element is called a generator of .
Example:
Let , then is cyclic, since .
Hence and are generators.
This example shows that there may be more than one possible choice of
generator. However, not every element is a generator, since, for
example, and .
Example:
Let be the cyclic subgroup of generated by . Then
and are generators of , but is not, since
contains only the even powers of
.
Example:
Let be the cyclic subgroup of generated by . Then
is a generator of , since
contains all the powers of . In fact, all elements of except
are generators.
Proposition 9.9:
Every cyclic group is abelian.
Proof.
Suppose is cyclic with a generator . Then if
, and for some integers and . So
and so is abelian.
□
Of course, this means that not every group is cyclic, since no
non-abelian group is. But there are also abelian groups, even finite
ones, that are not cyclic.
Proposition 9.10:
Let be a finite group with . Then is cyclic if and
only if it has an element of order . An element is a
generator if and only if .
Proof.
Suppose . Then , and since
, if and only if
.
□
Example:
Let and let . Then is an abelian
subgroup of (check this), but it is not cyclic, since
but and , so has no
element of order .
Theorem 9.11:
Every subgroup of a cyclic group is cyclic.
Proof.
Let be a cyclic group with generator , and
let be a subgroup.
If is the trivial subgroup,
then is cyclic. Otherwise, for some
, and since also since is closed under
taking inverses, we can assume .
Let be the smallest positive integer such that . We
shall show that , and so is cyclic.
Certainly
, since every power of
is in
. Suppose
and write
where
and
. Then
since
and
. But
, and
is the
smallest positive integer with
, so
or we have a
contradiction. So
. Since
was
an arbitrary element of
,
.
□
Group homomorphism and Isomorphism
After studying sets, we looked at how functions allows us to go from one set
to another. A group is a set with some extra structure, and so to learn
about groups we want to consider functions between groups which take
account of this structure somehow.
A “group homomorphism” is a function between two groups that links the
two group operations in the following way.
Definition 9.12:
Let and be groups. A group homomorphism
is a function such that
for all .
Example:
If and are groups and is the identity element of ,
the function given by
for all is a homomorphism (the trivial homomorphism).
Example:
If are groups, then the inclusion map given by
for all is a homomorphism. This is injective but not
surjective (unless ).
If we have a group homomorphism, there are certain things we can deduce:
Lemma 9.13:
Let be a homomorphism, let and be the
identity elements of and respectively, and let . Then
,
,
for any .
Proof.
We go through all three statements.
We have , so by
uniqueness of the identity .
We have ,
so by uniqueness of inverses .
This is true for positive by a simple induction (it is true for , and if it is true for then
so it is true for ). Then by b.) it is true for negative ,
and by a.) it is true for .
□
As we have seen, bijective functions allows us to treat two sets of the same
cardinality as essentially the same sets, just changing the name of the elements.
A similar story occurs with groups.
Suppose and are two cyclic groups of order . Strictly
speaking they are different groups, since (for example) is an
element of but not of . But clearly they are “really the
same” in some sense: the only differences are the names of the
elements, and the “abstract structure” of the two groups is the
same. To make this idea, of two groups being abstractly the same,
precise, we introduce the idea of an isomorphism of groups.
Definition 9.14:
Let and be groups. An isomorphism
from to is a bijective homomorphism. That is,
a bijective function such that
for all elements .
Etymology:
Homomorphism comes from the Greek homo to mean “same” and morph to mean “shape”.
Isomorphism comes from the Greek iso to mean “equal” (and morph to mean “shape”).
A homomorphism is a map between two groups that keeps the
same shape, i.e.
preserves the group operation (but not necessarily anything else), while an isomorphism
is a map between two groups that have equal shape (it preserves the group operation
and many other useful properties as we will see below).
Example:
Let and be two cyclic
groups of the same order. Then defined by
for every is an isomorphism,
since it is a bijection and
for all .
Next we’ll prove some of the easy consequences of the definition.
Proposition 9.15:
Let be an isomorphism between
(multiplicatively-written) groups. Then is
also an isomorphism.
Proof.
Since is a bijection, it has an inverse function
that is also a bijection.
Let
. Since
is a bijection, there are unique
elements
with
and
. Then
and so
is an isomorphism.
□
Because of this Proposition, the following definition makes sense,
since it tells us that there is an isomorphism from to if an
only if there is one from to .
Definition 9.16:
Two groups and are said to be isomorphic, or we say
is isomorphic to , if there is an isomorphism
, and then we write , or if we want to specify an isomorphism.
Proposition 9.17:
Let be three groups. If is isomorphic to and is
isomorphic to , then is isomorphic to .
Proof.
Let and be isomorphisms. Then the
composition is a bijection and if then
and so
is an isomorphism.
□
Proposition 9.18:
Let be an isomorphism between
(multiplicatively-written) groups, let and be the
identity elements of and respectively, and let .
Then
,
.
for every .
.
Proof.
The first three statements follows directly from Proposition 9.13
since an isomorphism is an homomorphism.
For the last statement, by a.) and c.), we have
, and so
.
□
To prove that two groups are isomorphic usually requires finding an
explicit isomorphism. Proving that two groups are not isomorphic
is often easier, as if we can find an “abstract property” that
distinguishes them, then this is enough, since isomorphic groups have
the same “abstract properties”. We’ll make this precise, and prove
it, with some typical properties, after which you should be able to
see how to give similar proofs for other properties, just using an
isomorphism to translate between properties of and of .
Proposition 9.19:
Let and be isomorphic groups. Then .
Proof.
This follows just from the fact that an isomorphism is a bijection
.
□
Proposition 9.20:
Let and be isomorphic groups. If is abelian then so is .
Proof.
Suppose that is an isomorphism and that is
abelian. Let . Then
since , which is abelian. Since
is a bijection (and in particular injective) it follows that
. So is abelian.
□
Proposition 9.21:
Let and be isomorphic groups. If is cyclic then so is .
Proof.
Suppose is an isomorphism and
is cyclic. So every element of is a power of . So if
then for some , and so
. So if then every element
of is a power of , and so .
□
Notation:
If , we write to mean a (multiplicatively-written)
cyclic group of order .
This notation is sensible since a cyclic group can only be isomorphic to another cyclic group of the same
order, and since two cyclic groups of the same order are isomorphic.
Proposition 9.22:
Let and be isomorphic groups and .
Then and have the same number of elements of
order .
Proof.
By Proposition 9.18, an isomorphism induces a
bijection between the set of elements of with order and the
set of elements of with order .
□
The idea of isomorphism gives an important tool for understanding
unfamiliar or difficult groups. If we can prove that such a group is
isomorphic to a group that we understand well, then this is a huge
step forward.
The following example of a group isomorphism was used for very
practical purposes in the past.
Example:
The logarithm function
is an isomorphism from the group of positive real numbers under
multiplication to the group of real numbers under addition (of course
we could use any base for the logarithms). It is a bijection since the
function is the inverse function, and the group
isomorphism property is the familiar property of logarithms that
Now, if you don’t have a calculator, then addition is much easier to
do by hand than multiplication, and people used to use “log tables”
to make multiplication easier. If they wanted to multiply two numbers,
they would look up the logarithms, add them and then look up the
“antilogarithm”.
In group theoretic language they were exploiting the fact that there
is an isomorphism between the “difficult” group
and the “easy” group
.
Direct Product
In this section we’ll study a simple way of combining two groups to
build a new, larger, group. We will do that by generalising the idea of
the Cartesian product of two sets (Definition 6.1).
Definition 9.23:
Let and be (multiplicatively-written) groups. The direct product
of and is the Cartesian
product of the sets and , with the binary operation
for and .
Proposition 9.24:
The direct product of groups is itself a group.
Proof.
Associativity of follows from associativity of and
, since if and , then
If and are the identity elements of and , then
is the identity element of , since if
and , then
The inverse of
is
, since
□
Notice that for all aspects of the group structure, we simply apply
the corresponding idea in the first coordinate for and in the
second coordinate for . This is generally how we understand
, by considering the two coordinates separately, and if we
understand and , then is easy to understand. For
example, it is easy to see that, for any and any ,
so also taking powers in a direct product is just a matter of taking
powers of the coordinates separately.
Here are some very easy consequences of the definition.
Proposition 9.25:
Let and be (multiplicatively-written) groups, and let
be the direct product.
is finite if and only if both and are finite, in
which case .
is abelian if and only if both and are abelian.
If is cyclic then both and are cyclic.
Proof.
We go through all three statements.
This is just a familiar property of Cartesian products of sets (that was left as an exercise).
Suppose and are abelian, and let . Then
and so is abelian.
Suppose is abelian, and , then
and considering the first coordinates, , and so is
abelian. Similarly is abelian.
- Suppose is cyclic, and is a generator, so that every
element of is a power of . Let . Then
, so for some ,
and so . So every element of is a power of , so
is cyclic. Similarly is cyclic.
□
Proposition 9.26:
Let and be (multiplicatively-written) groups, and ,
elements with finite order. Then has
finite order equal to the lowest common multiple
Proof.
Let . Then if and only if and
, which is the case if and only if is divisible by both
and by . The least positive such is
, and so this is the order of
.
□
We can now decide precisely when the direct product of cyclic groups
is cyclic.
Theorem 9.27:
Let and be finite cyclic groups. Then is cyclic
if and only if .
Proof.
Let . Then and
. So by Proposition 9.26,
where the first inequality is an
equality if and only if and are coprime, and
the second inequality is an equality if and only if and .
Since
,
is cyclic if and only if it
has an element of order
, which by the argument above is true
if and only if
.
□
Since cyclic groups of the same order are isomorphic, the last theorem says that
if and only if .
Example:
and are not cyclic.
Definition 9.28:
A Klein 4-group is a group of order such that every
element except the identity has order .
History:
Felix Klein, German mathematician (1849 - 1925) applied ideas from group theory
to geometry, which led to the emergence of transformation groups. These were some
of the first example of infinite groups (beyond the groups , ,
which weren’t really studied as groups).
Klein wrote about the Klein 4-group, although he called it Vierergruppe
(meaning “four-group”), as it is the smallest group which is not cyclic.
We can clearly extend the definition of a direct product of two groups
to three, four or more groups and make sense of things like , which would be a group whose elements are the ordered
triples with , and .
We can then write the last example as
More generally, if , where
are distinct primes, then
so that every finite cyclic group is isomorphic to a direct product of
groups with order powers of primes.
We’ll finish this section with a different example of a
familiar group that is isomorphic to a direct product.
Proof.
Define a map
by . This is clearly a bijection, and
so that is an isomorphism.
□
Modular arithmetic and Lagrange
We finish this course by linking several threads together. We will explore the important Lagrange’s theorem of group theory (which actually
was first formulated long before group theory was developed). We will then see how this theorem has applications to the
study of the integers (which was one of our earlier threads).
First though, we need to take a step back and explore the concept of partitioning a set, and saying when objects are “basically the same”.
Equivalence relations and partition of sets
There are many situations in maths where we have different objects that we want to “consider the same”, often because they share a desired property. The requirement
that two objects are exactly equal is often too restrictive, so we generalise this concept to be broader (this theory is applied to many branches of mathematics).
Definition 10.1:
A relation on a nonempty set is a subset . We say is related to , denoted by , when .
A relation is reflexive if for all , .
A relation is symmetric if for all we have implies .
A relation is transitive if for all , if and then .
Definition 10.2:
A relation on a nonempty set is an equivalence relation if it is reflexive, symmetric and transitive.
In such case we read
as “
is
equivalent to
”.
Etymology:
Reflexive comes from the Latin re meaning “back” and flectere meaning “to bend”. This gave rise to words like “reflect” (where an object is bent back through a mirror) and “reflection”. A relationship is reflexive if it reflects every element (i.e., every element is in relation with itself).
Symmetric comes from the Greek sun meaning “together with” and metron meaning “a measure”. Given a point “together with a distance (from a line of symmetry)”, then you
have found the image of the point using that symmetry.
We’ve seen the word transitive before ( (O2) ) - transitive comes from the Latin trans meaning “across, beyond” and the verb itus/ire meaning “to go”. A transitive relationship is one where knowing and
allows us to go from beyond all the way to .
Equivalence comes from the Latin
oequus meaning “even, level” and
valere meaning “to be strong, to have value”. Two elements are equivalent if they are level (i.e.,
equal) in their value. One can be substituted for the other (notice this has the same roots as the word equal).
Example:
Since we claim that equivalence relations are a generalisation of , let us show that is indeed an equivalence relation.
Define a relation on via if and only if . We show is an equivalence relation.
Let . Since , we have . This is true for all , hence is reflexive.
Let be such that . By definition, this means , which means , i.e. . This is true for all , hence is symmetric.
Let be such that and . This means and , so , i.e. . So . This is true for all , hence is transitive.
Since
is reflexive, symmetric and transitive, we have that
is an equivalence relation.
Example:
Define a relation on via if and only if . We have that is not an equivalence relation because it is not symmetric. Indeed
let and , we have so but so . (The relation is however reflexive and transitive).
Example:
We define a relation
on the set of all subsets of
via
if and only if there exists a bijection
.
Remark
7.2 proved that
is reflexive, symmetric and transitive Hence
is an equivalence relation.
(and the reason why we could define Cardinality in a way that makes sense)
Example:
Define a relation
on the set of groups via
if and only if
is isomorphic to
.
This is reflexive (since
is isomorphic to itself via the identity function), Proposition
9.15
shows that
is symmetric, and Proposition
9.17 proved that
is transitive.
So
is an equivalence relation (and hence the rationale for why we care about groups “up to isomorphism”).
Definition 10.3:
Suppose is an equivalence relation on a (nonempty) set . For , we define the equivalence class of , denoted by , to be
Example:
Let and define a relation on via if and only if either or . One can check this is an equivalence relation
(it takes a bit of work to show is transitive). We have the following equivalence classes:
Hence, we have three distinct equivalence classes, namely , and .
Notice how in the above example each distinct equivalence class did not intersect each other. This is true in general.
Proposition 10.4:
Suppose is an equivalence relation on a (nonempty) set . Then for any , if and only if
Proof.
Take .
). First, we will show that if then by using the contrapositive. That is will prove that if , then .
So suppose that . Then there exists some . Hence, , so . Similarly, , so . Since is symmetric, we have that .
Since is transitive, we have that . Now, choose . Then , and since and is transitive, we have that . Hence, . Since is arbitrary, we have that . Similarly, we can show that, for any , we have , so . Hence
). Second, we will show that if then by using the contrapositive. So we will show that if , then .
So suppose that . Since is reflexive, we know that . Hence, , so
Summarising, we have that
if and only if
□
The above set-up leads us to the following definition which breaks a set into “parts”.
Definition 10.5:
A partition of a non-empty set is a collection of non-empty subsets of such that
, such that , and
and , if , then .
The above remark and Proposition 10.4 suggests a strong link between equivalence relations and partitions of sets, as we explore in the next two theorems.
Theorem 10.6:
Suppose is an equivalence relation on a (non-empty) set . Then
is a partition of .
Proof.
Take . Then . Hence, every element of is in one of the equivalence classes in . Furthermore, by Proposition 10.4 we have
that if and only if , i.e. (by the contrapositive) if and only if as required.
□
Furthermore, given a partition of a set, we can construct an equivalence relation.
Theorem 10.7:
Suppose is a partition of a (nonempty) set , for some indexing set . For , define if and only if there exists an such that . Then is an equivalence relation on .
Proof.
We show is reflexive. Take . Since is a partition of , there exists some such that . Hence, .
We show is symmetric. Suppose such that . Then there exists some such that . It follows that , so .
We show is transitive. Suppose that are such that
and . Then there exists some such that
and there exists some such that . Hence, we have that and . Since is a partition, we must have that . It follows that , so .
Summarising, we have that
is an equivalence relation on
.
□
Example:
Let us take the previous example further. Define a relation on via if and only if either or . This is an equivalence relation.
We have that partitions into three sets, , and .
Example:
We will construct from by partitioning the set .
We define the relation on via if and only if . We check this is an equivalence relation.:
Let . Since , we have that and is reflexive.
Let be such that . Then we have , which can be re-written as , so and is symmetric.
Let be such that and . Then we have and . Multiplying the first equation by we get , and using the second equation we can write it as . Since we have (as ) we can divide through to give which means that and is transitive.
Therefore is reflexive, symmetric and transitive, and so it is an equivalence relation.
We define elements of
, which we denote
to be the equivalence classes
.
History:
We will not try to summarise the interesting 2019 article by Amir Asghari on the history of equivalence relation (separating the definition from when the notion was first used), but we would recommend it as something worth reading just to showcase how ideas can be used before they are formalised, and how the mathematics we do today is different from how mathematics used to be done. The full reference is Asghari, A. Equivalence: an attempt at a history of the idea. Synthese 196, 4657–4677 (2019). https://doi.org/10.1007/s11229-018-1674-2
Congruences and modular arithmetic
We now use our work on equivalence classes to partition .
Recall that for any two integers , , there exists a quotient and a remainder such that .
Sometimes we only care about the remainder , which leads us into the following definition.
Definition 10.8:
Let . For , we say that is congruent to modulo , denoted by , if , i.e. if
Theorem 10.9:
Let . Define a relation on via if and only if . Then is an equivalence relation.
Using Theorem 5.2, we can see that, given an integer , every integer is congruent modulo to a unique such that . I.e, we partition into exactly
congruence classes modulo given by where
Notation:
We write for the set of congruence classes module .
If we’re clear that we’re considering
as an element of
we’ll often simply write
.
Both addition and multiplication make sense modulo , by the following theorem, which is very useful in computations.
Theorem 10.10:
Fix . Suppose that are such that and . Then
Proof.
By assumption, we have that and . Hence, for some , we have that and . It follows that
Since , this means that , so Further, since and , we have that
so . Since is an integer, we have that , so
□
Example:
We want to compute . We have . So . Hence
Further, we have , so Hence, we have that
It follows that
Another consequence of Theorem 10.10 is that we can “shift” the congruence classes by a given constant , i.e. the congruence classes can be given by .
Theorem 10.11:
We have is an abelian group.
Proof.
Addition on is a well-defined binary operation by Theorem 10.10. Addition is associative, since
The identity element is since
for any . The inverse of is , since
So is a group. It is abelian since
for all and .
□
In fact, it is a cyclic group, since the following is clear.
However, is not a group, since although it is
associative and has an identity element , not every element has
an inverse. For example, never has an inverse, and in
, does not have a multiplicative inverse.
Proposition 10.13:
In , has a multiplicative inverse if and only if .
Proof.
If then Theorem 5.7 there exists such that for some . So
so in , and so is a multiplicative
inverse of .
Conversely, if
, and if
, then
for some
. But
divides both
and
, so it
divides
. But no integer
divides
. So there is no
such that
is a multiplicative inverse of
.
□
Note that the first part of the proof is constructive and allows us to solve
some congruence equations.
Example:
Find such that .
We find the inverse of in (which exists since ).
We find such that .
We have that
so and . So is the inverse of in .
Multiply both side of by , we get .
We can also look at solving a system of simultaneous congruence equations.
Theorem 10.14:
Let be co-prime. For any , such that
Furthermore, this is unique modulo . That is for , we have that
if and only if
Proof.
We will prove the existence of and leave the uniqueness of modulo as an exercise.
Since
, by Theorem
5.7 we know that there exist
such that
. Then
and
So let
. Then
and
□
History:
The above theorem is often known as the Chinese Remainder Theorem.
An example of this theorem was first discussed by Sun Zi
(Chinese mathematician, around 400-460) in his text Sunzi suanjing (Sun Zi’s Mathematical Manual). While Sun Zi
came to the correct solution he posed in his text using the methods we would use nowadays,
there is no sign that he developed a general method. Instead it was Qin Jiushao (Chinese mathematician, 1202-1261)
who wrote in his text “Shushu Jiuzang” how to solve simultaneous linear congruent equations.
The origin of the name “Chinese Remainder Theorem” is unclear; it arose in the west some time after a 1852 article by Wiley on the history of Chinese mathematics (however, he did not use this term).
Note that the above proof is a constructive proof, that is, it gives a way to find .
Example:
We want to find such that and .
We have that , so we set , where and are such that . We have that
so
and
. Hence, we set
We double check that
indeed satisfies the given congruences:
Note that
such that
are also solutions.
One can use induction and the Fundamental Theorem of Arithmetic to prove the following generalisation of Theorem 10.14:
Let with , and let be pairwise co-prime, that is for with , and . Then for any , there exists some such that
for all with . Further, this is unique modulo .
One can also think about combining all these methods together to solve simultaneous
congruent equation where the coefficient in front of is not .
Lagrange’s theorem
Let be the dihedral group of order , and consider the
orders of elements. Every reflection has order , which divides
. The rotation has order , which divides . Every
other rotation has order , which divides . In
this section we’ll show that this is a general phenomenon for finite
groups. In fact, we’ll prove a more general fact about orders of
subgroups (of course, this includes the case of the order of an
element , since is equal to the order of the cyclic
subgroup generated by ).
History:
The theorem in question is named after Joseph-Louis Lagrange,
an Italian mathematician ( 1736 - 1813 ), who proved a special case
of the theorem in 1770 (long before abstract group theory existed).
Theorem 10.15: (Lagrange’s Theorem)
Let be a finite group, and a subgroup. Then
divides .
The idea of the proof is to partition the set into subsets, each with the same number of elements as , so that is just the number of these subsets times .
Definition 10.16:
For (possibly infinite) groups , and ,
the left coset is the subset
The set of left cosets partition .
Lemma 10.17:
For any group and subgroup , is a partition of the set .
Proof.
Clearly property 1
is satisfied: because any .
So we just need to check property 2, that for we have if and only if .
Suppose
, and choose
. Then
for some
, so that
. If
then
since
. So every element
of
is in
. Similarly every element of
is in
, so
that
.
□
Interest:
Given that we have a partition on the set , we can construct an equivalence
relation on the set and say if and only if there exists such
that . With a bit of work, we can show that this is the same as
saying are equivalent (with respect to/modulo ) if and only
if [you can check
this is an equivalence relation].
Compare this with the (infinite) group
and (infinite)
subgroup
. We partitioned
using
and defined an equivalence relation
if and only if
.
Next we verify that each left coset has the same cardinality.
Lemma 10.18:
Let and . Then there is a bijection , so that .
Proof.
Define by . Then is surjective,
since by definition every element of is of the form
for some . But also is injective,
since if then
□
Everything so far works for possibly infinite , but now for finite (and hence ) we can put everything together to prove Lagrange’s Theorem.
Proof (Proof of 10.15).
Suppose that is the number of distinct left cosets .
By the two previous lemmas, the cosets partition and each coset contains elements. So the number of elements of is .
□
Example:
Let be the dihedral group of order and let be the cyclic subgroup generated by . If
, then . If , so for some ,
then . So there are two left
cosets and . [In this case the right
cosets are the same as the left cosets.]
Example:
Let again, but let be the
cyclic subgroup generated by . Then
so there are three left cosets. [In this case the right cosets are
different, since for example
, which is not
a left coset.]
Definition 10.19:
Let . Then the index is the number (possibly
infinite if is infinite) of left cosets in .
Corollary 10.20: (Lagrange for orders of elements)
Let be a finite group with . Then for any , the
order of divides , and so .
Proof.
By Lagrange’s Theorem, the order of the cyclic subgroup divides . But the order of this subgroup is just
, so divides , and so .
□
Some applications of Lagrange’s theorem
Lagrange’s Theorem gives most information about a group when the order
of the group has relatively few factors, as then it puts more
restrictions on possible orders of subgroups and elements.
Let’s consider the extreme case, when the order of the group is a
prime , and so the only factors are and .
Theorem 10.21:
Let be a prime and a group with . Then
is cyclic.
Every element of except the identity has order and
generates .
The only subgroups of are the trivial subgroup and
itself.
Proof.
Let with . Then divides by
Corollary 10.20, and since . So . So the cyclic subgroup
generated by has order , and so must be the whole of
. This proves a.) and b.).
Let
. Then by Theorem
10.15 divides
,
and so
, in which case
is the trivial subgroup
,
or
, in which case
.
□
Corollary 10.22:
If is prime and and are two subgroups of a group
with , then either or .
Proof.
If then choose with . By
the previous theorem, generates both and , so .
□
Now some other simple general consequences of Lagrange’s Theorem.
Proposition 10.23:
Let be a group and , two finite subgroups of with
, and . Then .
Proof.
Recall that the intersection of two subgroups is itself a subgroup,
so that is a subgroup both of and of . Since it’s
a subgroup of , Lagrange’s Theorem implies that divides
. But similarly divides . So since
, and so .
□
Theorem 10.24:
Let be a group with . Then either is cyclic or is
isomorphic to the Klein -group . In particular
there are just two non-isomorphic groups of order , both abelian.
Proof.
By Corollary 10.20 the order of any element of
divides , and so must be , or .
If has an element of order then it is cyclic.
If not, it must have one element (the identity ) of order and
three elements of order . So , and
.
Consider which element is . If then , which is
false, since . If then , which is also false. If
then , which is also false. So .
Similarly
,
and
, and
is isomorphic to
the Klein
-group.
□
We’ll finish this section with some other results about groups of
small order that we won’t prove. These are easier to prove with a bit
more theory, which are all proved in the third year Group Theory
unit.
Theorem 10.25:
Let be an odd prime. Then every group of order is either
cyclic or isomorphic to the dihedral group .
Theorem 10.26:
Let be a prime. Every group of order is either cyclic or
isomorphic to (and so in particular is abelian).
However there are non-abelian groups of order for every prime
. The dihedral group is one example for .
Theorem 10.27:
There are five groups of order up to isomorphism. Three, ,
and , are abelian, and two,
the dihedral group and another group called the quaternion group are non-abelian.
The first few orders not dealt with by the general theorems above are
, , and . It turns out that are five non-isomorphic
groups of order , every group of order is cyclic, there are
fourteen non-isomorphic groups of order , and five of order .
The number of non-isomorphic groups of order grows very quickly
with . There are (nearly fifty billion)
non-isomorphic groups of order .
Applications to number theory
We now use some of the tools from group theory to look back at questions
linked to modular arithmetic.
History:
It is worth noting that much of the following theory predates and motivated the development of abstract group theory, in particular the study of Abelian groups (from 1800). Alongside Symmetry groups and
Transformation groups (which we did not explore), Abelian groups
were concrete examples of groups that drove this development.
First, recall that
is not a group as some elements (such as ) do not have a multiplicative
inverse (recall Proposition 10.13). Let us define the following group.
Definition 10.28:
is the subset of .
Theorem 10.29:
is an abelian group.
Proof.
Suppose , so . Then
and so and is closed under
multiplication.
Since
multiplication on is associative.
is an identity element, since for any .
If , so , then for integers ,
and . So is an inverse of .
So
is a group under multiplication. It is abelian since
for all
.
□
Interest:
The notation stands for “units”. In Algebra 2, you’ll see that a unit is
an element which has a multiplicative inverse, so is the group of units of
.
Similar links can be made with
and
or with
and
.
Example:
If is a prime, then has order .
Example:
, both have order
. and have
order .
Unlike , is not always cyclic (although it sometimes is).
Example:
is cyclic, with generator : , , , , so
Example:
is not cyclic. , and
, so every element has order apart from ,
which has order . In particular there are no elements of order
, so the group is not cyclic, and is a Klein 4-group.
We can now apply Lagrange’s theorem to to get an important number
theory result.
Theorem 10.30: (Fermat’s Little Theorem)
Let be a prime and an integer not divisible by . Then
Proof.
We’ll apply Corollary 10.20 to the group
. Since is prime, has order
, and if is not divisible by then . So by
Corollary 10.20,
in , or in other words
□
This simplifies calculating powers of integers modulo a prime:
Example:
Suppose we want to calculate . The straightforward
way to do this would involve multiplying by seven times
(although even without Fermat’s Little Theorem a more intelligent
approach would use many fewer multiplications). But by Fermat’s
Little Theorem with ,
and so
So without much calculation at all we reduce the problem to
calculating a tenth power instead of a hundredth power. And then, to
finish,
, so and
History:
Pierre de Fermat, French lawyer and government official (1601 - 1665) worked
on several number theory problems alongside his job. He wrote the above theorem
however he did not write down a proof for fear it would be too long
- he certainly was not aware of group theory tools that would have provided a
short proof.
There is a generalization to powers modulo where is not prime,
but the statement is a bit more complicated, since is not just
. So first, let’s introduce some standard
notation for the order of this group.
Definition 10.31:
Euler’s phi function is the function from positive
integers to positive integers with equal to the number
of integers such that and .
Example:
If is prime, then .
Theorem 10.32: (Fermat-Euler Theorem)
Let and be integers with . Then
Proof.
Apply Corollary 10.20 to the group . The
condition means that , and
, so the corollary gives
in , or in other words
□
History:
Leonhard Euler, Swiss mathematician (1707 - 1783), wrote down a proof of
Fermat’s Little Theorem in 1736 (nearly a 100 year after Fermat wrote down
the statement). He then extended Fermat’s theorem to all cases. While he
used modular arithmetic notation as we do today, it wasn’t until 1801 that
Gauss revisited the proof and gave a shorter and simpler version using group
theory ideas.
Many of the groups are direct products in disguise.
Proposition 10.33:
Let be positive integers with . Then
Proof.
Define a map
by (where we use subscripts such as
to indicate a congruence class). This makes sense,
since if , then and .
By Theorem 10.14, we know given
and there exists
such that . Furthermore, is unique in .
Hence is bijective [for all ,
there exists a unique such that ]
Since
is bijective, and
is an isomorphism.
□
We have an immediate corollary
Corollary 10.34:
If then .
Example:
We calculate . Since and ,
we have .
So as by Fermat-Euler we have .
Hence .
Taming infinity
In Chapter 7 we saw the definition of cardinality, to be precise, for sets and , we defined and , with a focus on finite sets. In this chapter, we will see that there are different sizes of infinity.
Countable
We first notice that there are some infinite sets where we can “count” the elements.
Definition 11.1:
We say a set is countable if there exists a bijection , or equivalently, if there is a bijection .
Example:
We show that the set of positive even integers is countable.
Let and define by , for all . To see that is injective, suppose that such that . Then , so , showing that is injective.
To see that is surjective, take . Then , for some , and hence . Hence, is surjective. This shows that is bijective. Therefore, is countable.
Similarly, the set of odd positive integers,
, can be shown to be countable.
Proposition 11.2:
Let and be two sets such that . If is countable, then is countable.
Proof.
Let and suppose that is countable. Then so . Hence is countable.
□
Theorem 11.3:
Every infinite set contains a countable subset.
Proof.
Non-examinable. It is beyond the scope of this course.
□
Proposition 11.4:
Let be a subset of . Then is either finite or countable.
Proof.
If is finite then we are done. So suppose is infinite. We have an injective map given by , so .
On the other hand, we know that contains a countable subset . Hence, there exists a bijective map . Now, define by , for all . Then is an injective map from into . Hence, . So by Theorem 7.8, we have that . It follows that is countable.
□
Example:
Since there exist infinitely many primes, the set of primes is countable.
The next result is very useful when proving that a set is countable.
Corollary 11.5:
Suppose is an infinite set. Then is countable if and only if there exists an injective .
Proof.
We prove both directions separately.
() First, suppose that is countable. Then there exists a bijection . Since is bijective, it is injective.
() Second, suppose there exists such an injective . Then , and . Since is not finite and , is not finite. Hence, is countable. Therefore, we must have that is countable.
Summarising,
is countable if and only if there exists an injective
.
□
Theorem 11.6:
Let be a set and let . Suppose that is a countable set and that is a nonempty finite set disjoint from (). Then is countable.
Proof.
Since is countable, there exists an injective map . Since is finite we have that , for some , where . Now, define by
We claim that is injective. To see this, take such that . If , then and for some such that , with . Hence, we have that . If and , then for some with . Hence, we have that . If , then since and is injective. So . It follows that is injective.
Since
and
is infinite,
is infinite. Since
is injective and
is infinite,
is countable.
□
History:
The above proof was popularised by David Hilbert (Prussian mathematician, 1862-1943) with what is now called the “Hilbert hotel”, where there is always room for another guest:
The Hilbert hotel has countably many rooms,
labelled (so for each number in , there is a room with that number). One night, the hotel is full,
and another potential guest arrives at the hotel looking for a room. The manager says, no problem! Then the manager announces
to the guests that every guest is to move to the next room (so the guests in room move to room ). Thus all the guests
still have rooms, and room 1 has been made available to the new arrival.
Theorem 11.7:
We have that is countable.
Proof.
First we note that is infinite as , and
is countable (using the bijection defined by ).
We define
by
. Suppose
, then
.
Note that
(since
), so by the Fundamental Theorem of Arithmetic,
is expressed uniquely as a product of primes. That is
and
, so
. Hence
is injective and
is infinite, so
is countable.
□
There are many proofs of this theorem. Above we presented one that uses the Fundamental Theorem of Arithmetic. Below we outline the idea of another proof that doesn’t use
the Fundamental Theorem of Arithmetic. We leave the rigorous details as an exercise.
We arrange the elements of in a grid:
We order the elements of this grid along the cross-diagonals:
So we want to construct a function such that . We leave it an exercise to find a
general formula for and show that is injective.
The following lemma lists the basic properties of countable sets.
Lemma 11.8:
Suppose that is a countable set.
if , then is either finite or countable.
if and is finite, then is countable.
there exists such that and are countable.
if is injective, then is either a finite set or a countable set.
Using the above lemma, and the fact that is countable, we can deduce:
Corollary 11.9:
We have that , and are all countable. In particular .
Corollary 11.10:
Let and be countable sets. Then
is countable.
if , then is countable.
Note that by repeated application of the above lemma, we can see that for countable and for any , we have (the Cartesian product of copies of ) is countable.
Corollary 11.11:
Let be a (countable) collection of countable sets that are pairwise disjoint (i.e. for all ).
Then is countable.
Proof.
First, note that since is countable, we have that it is infinite. Since , we know that is also infinite. We now construct an injective .
For each
, enumerate the elements of
as
. Now, define
by
for all
. To see that
is injective, suppose that
for some
. Then
, so
,
, and hence
. It follows that
is injective. Hence, we have an injective function from
into a countable set. Since
is infinite, it follows that
is countable.
□
Uncountable
Having explored the countable sets, we look at whether there exists sets that are not finite and not countable.
Definition 11.12:
A set is uncountable if it is infinite but not countable.
We will prove that the set is uncountable. First we need to set up some notation. Let us assume that every real numbers between and
has a decimal expansion of the form
with for each .
Note however that this representation is not unique. Indeed, we have , and in general let have decimal expansion .
That is, there exists such that and for all . Then, using the results
We have:
where .
Therefore, we will take for granted that every such that can be uniquely expressed as with and
(i.e.,
).
Theorem 11.13:
The interval is uncountable.
Proof.
We know the interval is infinite, since
defined by is easily shown to be injective.
For the sake of contradiction,
suppose
is countable. Thus we can enumerate the elements of
as
. Write each
as a decimal expansion as described above:
where
and
For each
, set
Set
. Thus
with
and
Hence by assumption,
for some
. But
, contradicting the uniqueness of the representation
of
as a decimal expansion not ending in an infinite sequence of 9s.
Thus the assumption that the interval
is countable leads to a contradiction, so
must be uncountable.
□
History:
The above theorem is sometimes referred to as Cantor’s diagonalisation argument.
While it was not the first proof that Cantor published to show is uncountable,
it showcases a technique that was used subsequently in many other proofs by other mathematicians.
Corollary 11.14:
We have that is uncountable.
Proof.
Since , we have . Since is uncountable, we have . Hence is
uncountable.
□
While we have shown , the following theorem shows that in fact .
Theorem 11.15:
There is a bijection between the interval and .
In a way, this highlight another difference between , and . In we can have a bounded subset which is uncountable. However
a bounded subset is either finite or countable, while in a bounded subset must be finite.
Recall that we have shown that given any such that and there exists a bijection . In particular if we take
we have that for all such that . No matter how “small” (length wise) we take to be, as a set it is extremely large, i.e.
uncountable.
If we combine this with the fact that if is countable then is countable for all , we have strange results like “for any ,
”.
Interest:
Since we have shown , a natural question is “does there exists a set such that ”. This was one of the 23
problems set by Hilbert at the turn of the 20th Century, and is known as the continuum hypothesis. It turns out that this is a subtle question without a yes/no answer: G"odel and Cohen showed that the two statements “the continuum hypothesis is true” and “the continuum hypothesis is false” are both consistent with the standard (ZFC) axioms of mathematics.
Power sets
We finish this section by showing that there are infinitely many different types of infinities.
Definition 11.16:
Let be a set. We define the power set of , denoted by , to be the set
Example:
We have the following examples:
Note that, for any nonempty set , we know that are distinct subsets of . Hence, we have that . We have the following two results, whose proofs are left as an exercise.
Lemma 11.17:
Suppose that is a finite set with for some . Then .
Proposition 11.18:
Let be sets. Then
Theorem 11.19:
Let be a set. Then
Proof.
Suppose . Then
So suppose is nonempty, and define by , for all . We show that is injective. Let be such that . Then . Hence, we have that . Therefore, is injective, so
Next, we use contradiction to show that there exists no bijection between and . Suppose there exists such a bijection Note that
for every , means . Define
Then is a subset of , so Furthermore, since is bijective, there exists some such that .
By the definition of , we have if and only if , which is a contradiction (namely ). Therefore, our assumption that there exists a bijection is false. So
It follows that
□
History:
The above theorem is sometimes known as Cantor’s theorem. For a finite set, the result is clear, and Cantor re-used the diagonalisation method to prove the above for infinite
set.
Corollary 11.20:
We have:
We can extend on our last bullet point to see that , i.e. there are infinitely many different type of infinities.