Chapter 5 Vectors

5.1 Numeric Data

Let’s imagine that someone in the audience works for R, likes the look of this workbook, and decides to sign me up to write textbooks for them. Imagine that they then start to send me my sales data monthly. Let’s suppose I have 50 sales in June, 20 in July, 10 in August, and 70 in September, but then no other sales for the rest of the year.

Task: I want to create a variable, called monthly.sales that stores this data. The first number should be 50, the second 20, and so on. We again want to use the combine function c() to help us do this. To create our vector, we should write:

monthly.sales <- c(50, 20, 10, 70, 0, 0, 0)
monthly.sales
## [1] 50 20 10 70  0  0  0

To summarise, we have created a single variable called monthly.sales, and this variable is a vector with 7 elements.

So, now that we have our vector, how do we get information out of it? What if I wanted to know how many sales I made in August, for example. Since we started in June, August was the 3rd month of sales, so lets try:

monthly.sales[3]
## [1] 10

Turns out that the numbers I received for the August sales were wrong, and I actually had 100 sold, not 10! How can I fix this in my monthly.sales variable? I could make the whole vector again, but thats a lot of typing and wasteful, given that I only need to change one value.

We can just tell R to change that one specific value:

monthly.sales[3] <- 100
monthly.sales
## [1]  50  20 100  70   0   0   0

You could also use the edit() and fix() functions, but we won’t be covering these in this session. You should check them out in your own time.

You can also ask R to return multiple values at once by indexing. For example, say I wanted to know how much I earned between July (2nd element) and October (5th element). The first way to ask for an element is to simply provide the numeric position of the desired element in the structure (vector, list…) in a set of square brackets [ ] at the end of the object name. I would ask R:

monthly.sales[2:5]
## [1]  20 100  70   0
# equivalent to
monthly.sales[c(2, 3, 4, 5)]
## [1]  20 100  70   0

Notice that the order matters here. If I asked for it in the reverse order, then R would output the data in the reverse too.

monthly.sales[5:2]
## [1]   0  70 100  20
# equivalent to
monthly.sales[c(5, 4, 3, 2)]
## [1]   0  70 100  20

Next I want to figure out how much money I’ll be making each month (given that the end of the year isn’t looking too good, I hope the next few months are!). Since I earn £5 per book, I can just multiply each element of monthly.sales by 5. Sounds pretty easy, and it is!

monthly.sales * 5
## [1] 250 100 500 350   0   0   0

5.2 Text/Character Data

Although you will mostly be dealing with numeric data, this isn’t always the case. Sometimes, you’ll use text. Let’s create a simple variable:

greeting <- "hello"
greeting
## [1] "hello"

It is important to note the use of quotation marks here. This is because R recognises this as a “character”, a string of characters, no matter how long. It can be a single letter, 'g', but it can equally well be a sentence, "Descriptive statistics can be like online dating profiles: technically accurate and yet pretty darn misleading."

Back to my R book example, I might want to create a variable that includes the names of the months. To do so, I could tell R:

months <- c("June", "July", "August", "September", "October", "November", "December")

In simple terms, you have now created a character vector containing 7 elements, each of which is the name of a month. Lets say I wanted to know how many what the 5th month was. What would I type?

months[5]
## [1] "October"

5.3 Logical Data

A logical element can take one of two values, TRUE or FALSE. Logicals are usually the output of logical operations (anything that can be phrased as a yes/no question, e.g., is x equal to y?). In formal logic, TRUE is represented as 1 and FALSE as 0. This is also the case in R.

If we ask R to calculate 2 + 2, it will always give the same answer

2+2
## [1] 4

If we want R to judge whether something is a TRUE statement, we have to explicitly ask. For example:

2+2 == 4
## [1] TRUE

By using the equality operator == , R is being forced to make a TRUE or FALSE judgement.

2+2 == 3
## [1] FALSE

What if we try to force R to believe some fake news (aka incorrect truths)?

2+2 = 3
## Error in 2 + 2 = 3: target of assignment expands to non-language object

R cannot be convinced that easily. It understands that the 2+2 is not a variable (“non-language object”), and it won’t let you change what 2+2 is. In other words, it wont let you change the ‘definition’ of the value of 2.

There are several other logical operators that you can use, some of which are detailed in the below table.

Operation R code Example Input Example Output
Less than < 1 < 2 TRUE
Greater than > 1 > 2 FALSE
Less than or equal to <= 1 <= 2 TRUE
Greater than or equal to <= 1 >= 2 FALSE
Equal to == 1 == 2 FALSE
Not equal to != 1 != 2 TRUE
Not ! !(1==1) FALSE
Or | (1==1) (1==2)
And & (1==1) (1==2)

Lets apply some of these logical operators to our vectors. Lets use our monthly.sales vector, and ask R when I actually sold a book:

monthly.sales > 0
## [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

I can then store this into a vector:

any.sales <- monthly.sales > 0
any.sales 
## [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

To summarise, we have created a new logical vector called any.sales, whose elements are TRUE only if the corresponding sale is > 0.

But this output isn’t very helpful, as a big list of TRUE and FALSE values don’t give me much insight to which months I’ve sold my book in.

We can use logical indexing to ask for the names of the months where sales are > 0. Ask R:

months[ any.sales > 0 ]
## [1] "June"      "July"      "August"    "September"

You can apply the same logic to find the actual sales numbers for these months too:

monthly.sales [monthly.sales > 0]
## [1]  50  20 100  70

You could also do the same thing with text. It turns out that the one store that sold my R book didn’t always have books in stock. Let’s create a variable called stock.levels to have a look at this:

stock.levels <- c("high", "high", "low", "high", "low", "out", "out")
stock.levels
## [1] "high" "high" "low"  "high" "low"  "out"  "out"

Now, apply the same logical indexing trick, but with the character vector instead, to see when the book was not in stock.

months[stock.levels == "out"]
## [1] "November" "December"

That explains the lack of sales anyway! But what if I wanted to know when the shop either had low or no copies? You could ask R one of two things:

months[stock.levels == "out" | stock.levels == "low"]
## [1] "August"   "October"  "November" "December"
#Alternatively
months[stock.levels != "high"]
## [1] "August"   "October"  "November" "December"

5.3.1 Exercise

Try to create a few variables of your own, and ask R to return you specific elements that are [blah].