3 Week 3: Coding in R: An introduction
3.1 Overview
This week you are going to learning about coding simple things using R. We are not going to go into too much detail, because the tidyverse
actually negates the need to use a lot of typical R coding. As such, the relevant book chapter for this week doesn’t go into a lot of detail, so I will provide some additional material for you to work through too.
The chapter guides you through some very basic elements of coding, introducing things such as objects
(also known as “variables”, a term I actually prefer but won’t use here…), naming
conventions, and some simple functions
.
In the additional material I’ve provided, I show you a few other things you can use R for, even though we may not use these on the course (because the tidyverse
bypasses the need to use them). I just think they are useful things to know about to add to your growing repertoire of R skills.
Warning. The chapter mentions in passing that it’s important to adhere to certain style conventions when writing R code. Sometimes it’s easy to be a litle lazy and not bother writing code in the proper style. This is understandable, because the code runs fine even if you don’t use the correct style.
However, I strongly urge you to pay attention to style. As Hadley Wickham says in his book:
Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.
Remember that the purpose of learning R is to enhance reproducibility of your data processing, analysis, and visualisation; if your code is difficult and/or impossible to read then reproducibility is impaired.
3.2 Reading
- All sections from Chapter 4 of R4DS.
- This will introduce you to some coding basics in R.
- Note that this section doesn’t go into a lot of detail, so there will be some additional things you need to go through written below…
3.2.1 Arithmetic operators
In the chapter you saw that R can be used for arithmetic. Let’s explore this a little further. Most of this should be intuitive:
In R, you can do addition…
14 + 32
## [1] 46
you can also do subtraction…
9 - 7
## [1] 2
…multiplication…
2 * 8
## [1] 16
…division…
19 / 2
## [1] 9.5
…and any and all of the above:
3 * 3) + (4 - 1) / 2 (
## [1] 10.5
Be sure to remember the order of operations when you have multiple operations in your code. For example, the two entries below produce different output, even though the numbers are the same:
3 * 3) + (4 - 1) / 2 (
## [1] 10.5
3 * 3 + 4 - 1 / 2
## [1] 12.5
You can also raise numbers to a certain power. For example, if you want to know what 3 cubed is (\(3 ^ {3}\), which is 3 \(\times\) 3 \(\times\) 3), type
3 ^ 3
## [1] 27
3.2.2 Logical operators
Sometimes it is useful to compare values by testing whether a number is smaller than / larger than / equal to / not equal to another number. The examples that follow appear too simple to be needed in practical work, but the use of these so-called logical operators
become very handy later in the book.
We can test whether 4 is lower than 10 by typing
4 < 10
## [1] TRUE
Note that the output is not a number like before, but a word indicating that the comparison is TRUE. In the above, we are asking R “Is 4 lower than 10?”, and R responds yes, that is TRUE
. Let’s see R reply FALSE
:
4 < 2
## [1] FALSE
We can test whether a number is greater than another number:
10 > 4
## [1] TRUE
We can also test whether a number is less than or equal to another number. Here are two examples to demonstrate the importance of the or equal component:
4 <= 5
## [1] TRUE
4 <= 4
## [1] TRUE
Can you guess what we would type to test whether a number is greater than or equal to another number? You guessed it!
10 >= 9
## [1] TRUE
10 >= 10
## [1] TRUE
Sometimes you just want to test whethe two numbers are exactly equal to eachother. You would think that you use the =
symbol, but this is one of those quirks in R I referred to in the introduction. Instead, you use two equals symbols:
4 == 4
## [1] TRUE
4 == 12
## [1] FALSE
If you want to check whether two numbers are not equal to eachother, just use !=
:
4 != 3
## [1] TRUE
4 != 1000
## [1] TRUE
4 != 4
## [1] FALSE
3.2.3 Objects
The chapter introduced objects, and these are incredibly important in R.
An object allows us to store some information in such a way that we can refer to it using a useful name. Think of it as a storage space that has a named location. We can then refer to what is in that storage space by using the name of the space. For example, if we want to assign the value 4
to the object called x
, we use the <-
symbol—just type the “less than sign” (<
) and the “minus” (-
) sign with no space between—in the following way:
<- 4 x
This can be read as “x is assigned the value 4”.
Tip. In the R4DS
book, Hadley urges you to use <-
for assigning values to an object rather than =
, despite the fact that both work. This is something that divides the R community. But take it from me, you will save yourself some major headaches in the future if you stick to using <-
, so just get used to it now.
Note that R produced no output when you did this assignment. That’s because you didn’t perform an operation, you just stored something away in an object. If you want to see what is stored in an object, just type the name into the console:
x
## [1] 4
If the object contains a number, you can perform arithmetic and logical operations using that object name. For example, try the following in your console:
+ 1
x * 4
x / 2
x < 5
x > 5
x == 4
x * x
x ^ x x
Objects can be used to store the result of an operation. For example, if you want to store the result of 12 divided by 4 into an object called “result”, type
<- 12 / 4
my_result my_result
## [1] 3
They can also be used to store logical values and text:
<- 4 > 0
logical_test logical_test
## [1] TRUE
<- "hello!"
text_test text_test
## [1] "hello!"
Objects can be given almost any name, but it must start with a letter, and cannot contain spaces. If you want to have multiple words as your object name, best-practice is to join them with an underscore (`_’). For example:
<- 4 * 12
long_object_name long_object_name
## [1] 48
Remember the names can be used to perform operations:
<- 12 * 13
object_a <- 30
object_b
/ object_b object_a
## [1] 5.2
3.2.4 Vectors
So far, we have looked at how R handles single numbers. But this is often of little use when we are doing statistics, where lots of numbers are involved. In this book, we will encounter vectors frequently. Don’t be put off by the term; it (basically) refers to a collection of objects (i.e., numbers, characters etc.). Vectors group objects together into one location.
For example, let’s say you’re wanting to record the shoe sizes of all of your family members (don’t ask me why; some people might want this information). These data could be stored as individual objects as we have already seen:
<- 11
me <- 10
brother <- 6
mum <- 9 dad
However, we can also group these together and think of them as one “set” of data: the shoe sizes are 11, 10, 6, and 9. If we want to store this information in one object, we use a vector. In R, we can combine data into one object using the c()
function, which stands for combine. We put the data we want into the brackets, separated by a comma. So, our data can now be stored like this:
<- c(11, 10, 6, 9)
shoe_sizes shoe_sizes
## [1] 11 10 6 9
Each value in the vector can be referred to as an element. Neat properties of numeric vectors—like the one above—is that we can perform arithmetic and logical operations on the whole vector at once, rather than having to do the operation once for each element in the vector.
For example, let’s say we want to multiply the shoe sizes by 2 (don’t ask me why!). Before, we would have had to do this operation on each object representing each element:
* 2 me
## [1] 22
* 2 brother
## [1] 20
* 2 mum
## [1] 12
* 2 dad
## [1] 18
Instead, we can perform this multiplication on all elements in the shoe_size
vector at once:
* 2 shoe_sizes
## [1] 22 20 12 18
How might we store this result in a new object?
<- shoe_sizes * 2
new_shoe_sizes new_shoe_sizes
## [1] 22 20 12 18
There’s a whole rabbit-hole we could go down right now regarding vector arithmetic
in R. This won’t be useful for this module (and if you exclusively use the tidyverse
for your analysis, you will likely never need to know about it). I just wanted to mention it in case you come across it; if you do, run!
We can also store characters in vectors. Let’s store the names of who the shoe sizes belong to:
<- c("me", "brother", "mum", "dad")
names names
## [1] "me" "brother" "mum" "dad"
You can’t have different “types” of object in the same vector Note that you cannot combine numbers and characters in the same vector. R needs to know what “type” of object is contained within the vector, and needs to keep this conistent. For example, if you try the following:
<- c(1, 2, 3, 4, "one", "two", "three", "four")
mixed_vector mixed_vector
## [1] "1" "2" "3" "4" "one" "two" "three" "four"
you will note that R has turned the whole vector into characters. Bear this in mind when constructing vectors.
3.3 Workshop Exercises
As before, we will start by covering the exercises contained in the R4DS
chapter.
- Why does this code not work?
<- 10
my_variable my_var1able
- Tweak each of the following R commands so that they run correctly:
library(tidyverse)
ggplot(dota = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
fliter(mpg, cyl = 8)
filter(diamond, carat > 3)
Press Alt + Shift + K. What happens? How can you get to the same place using the menus?
Does the following code work for you? Is there anything wrong with it?
= 12
my_first_variable = 32
my_second_variable
* my_second_variable my_first_variable
The following questions ask you to explore some functions built into R. Functions are the bread & butter of data processing and analysis, so it’s important we get used to using them here. We will only explore very basic functions here.
There are multiple ways to get help when you become a bit lost when using function. If you are stuck using a function but you know its name, you can access the help file for that function by typing a question mark followed by the function name into the R console (no spaces):
r ?mean
There are too many built-in functions in R to go through them now (let alone all of the functions included in any packages you install). No need to spend any time on this, but for a quick glance at a list of all functions included in R can be seen here: Link.
I told you there are loads!
Create an object that holds 6 evenly spaced numbers starting at 2 and ending at 12. Create this manually (i.e., don’t use any functions).
Repeat this step but now use a function to achieve the same result. (Tip: see the
seq
function.)You run an experiment testing the response time of 12 students. The mean response time for each participant is as follows (all in milliseconds): 597, 763, 614, 705, 523, 703, 858, 775, 759, 520, 505, 680. What was the
mean
response time of your sample? (Tip: You may want to first create an object to hold your data…)What was the
median
response time of your sample?Was the mean response time of your sample smaller than the median response time of your sample? How might you check this using logical operators?
Run the following code. (Don’t worry what it does in detail, but basically, it creates a vector of 100 random numbers drawn from a normal distribution with a mean of 100 and a standard deviation of 20. If you aren’t sure what these terms are—especially if you are from a non-science background—don’t worry! We are just going to use the numbers it creates.). How might you find the
minimum
and themaximum
value in the created vector?
<- rnorm(n = 100, mean = 100, sd = 20) random_numbers
Run the
summary
function on your random numbers. What does this function return?That’s probably enough for now! If you are thirsty for more, find new functions to play around with from the list above in the “reading” section and try it on new data!