Chapter 5 Vectors
5.1 Numeric Data
Let’s imagine that someone in the audience works for R
, likes the look of this workbook, and decides to sign me up to write textbooks for them. Imagine that they then start to send me my sales data monthly. Let’s suppose I have 50 sales in June, 20 in July, 10 in August, and 70 in September, but then no other sales for the rest of the year.
Task: I want to create a variable, called monthly.sales
that stores this data. The first number should be 50, the second 20, and so on. We again want to use the combine function c()
to help us do this. To create our vector, we should write:
monthly.sales <- c(50, 20, 10, 70, 0, 0, 0)
monthly.sales
## [1] 50 20 10 70 0 0 0
To summarise, we have created a single variable called monthly.sales
, and this variable is a vector with 7 elements.
So, now that we have our vector, how do we get information out of it? What if I wanted to know how many sales I made in August, for example. Since we started in June, August was the 3rd month of sales, so lets try:
monthly.sales[3]
## [1] 10
Turns out that the numbers I received for the August sales were wrong, and I actually had 100 sold, not 10! How can I fix this in my monthly.sales
variable? I could make the whole vector again, but thats a lot of typing and wasteful, given that I only need to change one value.
We can just tell R to change that one specific value:
monthly.sales[3] <- 100
monthly.sales
## [1] 50 20 100 70 0 0 0
You could also use the edit()
and fix()
functions, but we won’t be covering these in this session. You should check them out in your own time.
You can also ask R
to return multiple values at once by indexing. For example, say I wanted to know how much I earned between July (2nd element) and October (5th element). The first way to ask for an element is to simply provide the numeric position of the desired element in the structure (vector, list…) in a set of square brackets [ ]
at the end of the object name. I would ask R
:
monthly.sales[2:5]
## [1] 20 100 70 0
# equivalent to
monthly.sales[c(2, 3, 4, 5)]
## [1] 20 100 70 0
Notice that the order matters here. If I asked for it in the reverse order, then R
would output the data in the reverse too.
monthly.sales[5:2]
## [1] 0 70 100 20
# equivalent to
monthly.sales[c(5, 4, 3, 2)]
## [1] 0 70 100 20
Next I want to figure out how much money I’ll be making each month (given that the end of the year isn’t looking too good, I hope the next few months are!). Since I earn £5 per book, I can just multiply each element of monthly.sales
by 5. Sounds pretty easy, and it is!
monthly.sales * 5
## [1] 250 100 500 350 0 0 0
5.2 Text/Character Data
Although you will mostly be dealing with numeric data, this isn’t always the case. Sometimes, you’ll use text. Let’s create a simple variable:
greeting <- "hello"
greeting
## [1] "hello"
It is important to note the use of quotation marks here. This is because R
recognises this as a “character”, a string of characters, no matter how long. It can be a single letter, 'g'
, but it can equally well be a sentence, "Descriptive statistics can be like online dating profiles: technically accurate and yet pretty darn misleading."
Back to my R book example, I might want to create a variable that includes the names of the months. To do so, I could tell R
:
months <- c("June", "July", "August", "September", "October", "November", "December")
In simple terms, you have now created a character vector containing 7 elements, each of which is the name of a month. Lets say I wanted to know how many what the 5th month was. What would I type?
months[5]
## [1] "October"
5.3 Logical Data
A logical element can take one of two values, TRUE
or FALSE
. Logicals are usually the output of logical operations (anything that can be phrased as a yes/no question, e.g., is x equal to y?). In formal logic, TRUE
is represented as 1 and FALSE
as 0. This is also the case in R
.
If we ask R
to calculate 2 + 2, it will always give the same answer
2+2
## [1] 4
If we want R
to judge whether something is a TRUE
statement, we have to explicitly ask. For example:
2+2 == 4
## [1] TRUE
By using the equality operator ==
, R
is being forced to make a TRUE
or FALSE
judgement.
2+2 == 3
## [1] FALSE
What if we try to force R
to believe some fake news (aka incorrect truths)?
2+2 = 3
## Error in 2 + 2 = 3: target of assignment expands to non-language object
R
cannot be convinced that easily. It understands that the 2+2
is not a variable (“non-language object”), and it won’t let you change what 2+2
is. In other words, it wont let you change the ‘definition’ of the value of 2
.
There are several other logical operators that you can use, some of which are detailed in the below table.
Operation | R code | Example Input | Example Output |
---|---|---|---|
Less than | < |
1 < 2 | TRUE |
Greater than | > |
1 > 2 | FALSE |
Less than or equal to | <= |
1 <= 2 | TRUE |
Greater than or equal to | <= |
1 >= 2 | FALSE |
Equal to | == |
1 == 2 | FALSE |
Not equal to | != |
1 != 2 | TRUE |
Not | ! |
!(1==1) | FALSE |
Or | | |
(1==1) | (1==2) |
And | & |
(1==1) | (1==2) |
Lets apply some of these logical operators to our vectors. Lets use our monthly.sales
vector, and ask R
when I actually sold a book:
monthly.sales > 0
## [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE
I can then store this into a vector:
any.sales <- monthly.sales > 0
any.sales
## [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE
To summarise, we have created a new logical vector called any.sales
, whose elements are TRUE
only if the corresponding sale is > 0.
But this output isn’t very helpful, as a big list of TRUE
and FALSE
values don’t give me much insight to which months I’ve sold my book in.
We can use logical indexing to ask for the names of the months where sales are > 0. Ask R
:
months[ any.sales > 0 ]
## [1] "June" "July" "August" "September"
You can apply the same logic to find the actual sales numbers for these months too:
monthly.sales [monthly.sales > 0]
## [1] 50 20 100 70
You could also do the same thing with text. It turns out that the one store that sold my R book didn’t always have books in stock. Let’s create a variable called stock.levels
to have a look at this:
stock.levels <- c("high", "high", "low", "high", "low", "out", "out")
stock.levels
## [1] "high" "high" "low" "high" "low" "out" "out"
Now, apply the same logical indexing trick, but with the character vector instead, to see when the book was not in stock.
months[stock.levels == "out"]
## [1] "November" "December"
That explains the lack of sales anyway! But what if I wanted to know when the shop either had low or no copies? You could ask R
one of two things:
months[stock.levels == "out" | stock.levels == "low"]
## [1] "August" "October" "November" "December"
#Alternatively
months[stock.levels != "high"]
## [1] "August" "October" "November" "December"
5.3.1 Exercise
Try to create a few variables of your own, and ask R
to return you specific elements that are [blah].