6 Tutorial 6: Control structures in R

After working through Tutorial 6, you’ll…

understand how to write your own control structures

Data

For this tutorial, we’ll use a new data set “data_tutorial6.txt” (via Moodle/Data for R).

The data set consists of data that is completely made up - a survey with 1000 citizens in Europe.

The data file “data_tutorial6.txt” is structured as follows:

Each row contains the answer for a single citizen.
Each column contains all values given by citizens for a single variable.

The five variables included here are:

country: the country in which each citizen was living at the time of the survey (France/Germany/Italy/Switzerland)
date: the date on which each citizen was surveyed (from 2021-09-20 to 2021-10-03)
gender: each citizen’s gender (female/male/NA)
trust_politics: how much each citizen trusts the political system (from 1 = no trust at all to 4 = a lot of trust)
trust_news_media: how much each citizen trusts the news media (from 1 = no trust at all to 4 = a lot of trust)

Read in the data set:

data <- read.csv2("data_tutorial 6.txt")

This is how the data looks like in R:

head(data)

##       country       date gender trust_politics trust_news_media
## 1     Germany 2021-09-20 female              3                1
## 2 Switzerland 2021-10-02   male              2                1
## 3      France 2021-09-21   <NA>              1                3
## 4       Italy 2021-10-03   male              2                2
## 5     Germany 2021-09-21 female              3                1
## 6 Switzerland 2021-09-20   male              1                2

6.1 Control structures

What are control structures?

Sometimes, you may want R to not execute all of your code but only chunks of it.

To control the flow of the program, especially which chunks of code should be run or how often/for which objects code should be run, we can use so-called control structures, including:

if/else conditions: R executes functions only if specific conditions are fulfilled.
loops: We often need R to execute functions in an iterative way, i.e., repeatedly apply the same function to different objects

6.1.1 if/else conditions

If/else conditions are helpful for running code only if specific conditions are fulfilled (and specifying which other functions should be executed else). In if/else conditions, you have to specify at least one if condition (or several), while the command for which code should executed otherwise (i.e., else) is obligatory.

Image: if/else conditions

if(condition){
  # function that should be executed
  # if condition true
} else{
  # function that should be executed
  # if condition not true
}

Let’s start with an example: You want to check whether the variable date has been imported in character or another format and for R to tell you which format is correct.

In short:

if the variable date is saved in character format, R should print: “Variable is saved in character format”
else, R should print: “Variable is saved in a different format”

In code, this would look something like this:

First if condition:

if(is.character(data$date) == TRUE){
  print("Variable is saved in character format")
}

## [1] "Variable is saved in character format"

Here, R prints out the sentence “Variable is saved in character format” because the variable is indeed saved in character format (and, thus, the condition is TRUE).

If/else condition

Let’s bring this condition together with the else command telling R what do to if the if command is not true.

if(is.character(data$date) == TRUE){
  print("Variable is saved in character format")
} else {
  print("Variable is saved in a different format")
}

## [1] "Variable is saved in character format"

Bringing these functions together, we see that only the first condition - the variable date being saved in character format - is true. Thus, only the first block of code is run while the else command is ignored.

If we run the same command with variables we know to be saved in non-character format - for instance the variable trust_politics, we get a different result (since the if condition is not true and, thus, the code after else is run):

if(is.character(data$trust_politics) == TRUE){
  print("Variable is saved in character format")
} else {
  print("Variable is saved in a different format")
}

## [1] "Variable is saved in a different format"

Another example:

The following if/else condition will only print one of the two statements:

if((1 > 2) == TRUE){
  print("1 is bigger than 2")
} else {
  print("1 is smaller than 2")
}

## [1] "1 is smaller than 2"

What is the reason for that?

Since the first if condition - 1 > 2 - is not true, the if condition is ignored. Instead, R executes the code related to the else condition.

6.1.2 Loops

In R, you may often have to execute functions in an iterative way, i.e., repeatedly apply the same function to different objects. For loops do just that.

This is how for loops look like:

for (i in objects) {
  # function that should be executed
}

What does this for loop indicate?

The first row describes for which object i in a sequence of objects objects a command should be executed. The loop takes every unique object i out of objects.
It then applies the subsequent function defined between the curly brackets {} to this object i.

In short: For every object i in objects, the function between the brackets would be executed.

Image: Loop

For example: The following loop takes every single number i out of the sequence of numbers 1:10. R then prints all these numbers - i.e., every i in 1:10.

for (i in 1:10){
  print(i)
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

Later on, you will mostly use objects defined outside of the loop, for instance like so:

numbers <- 1:10
for (i in numbers){
  print(i)
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

Wow - feels like we have advanced to hacker-levels already! In fact, this is exactly the same as the loop before - we have just defined our objects numbers outside of the loop, which makes the loop look much more abstract.

You won’t use loops in too many cases (and better alternative such as the purrr package belonging to the tidyverse exist). However, it is useful to understand how they work - which is why you should know about their general structure.

6.2 Take Aways

if/else conditions: allow you to execute chunks of code only if specific conditions are fulfilled
loops: allow you to execute code repeatedly, i.e., apply the same function to different objects

6.3 More tutorials on this

You still have questions? The following tutorials & papers can help you with that:

6.4 Test your knowledge

You’ve worked through all the material of Tutorial 6? Let’s see it - the following task will test your knowledge.

Import the same data as used in the tutorial, the new data set “data_tutorial6.txt” (via Moodle/Data for R).

Remember, the five variables included here are:

country: the country in which each citizen was living at the time of the survey (France/Germany/Italy/Switzerland)
date: the date on which each citizen was surveyed (from 2021-09-20 to 2021-10-03)
gender: each citizen’s gender (female/male/NA)
trust_politics: how much each citizen trusts the political system (from 1 = no trust at all to 4 = a lot of trust)
trust_news_media: how much each citizen trusts the news media (from 1 = no trust at all to 4 = a lot of trust)

6.4.1 Task 6.1

Writing the corresponding R code,

add the value of the previous observation in the data set to every value belonging to the variable trust_politics. Save the resulting vector in a variable called trust_politics_new.

In this case, the first observation should be coded as NA. The second observation should include the sum of the first and the second observation, the third observation should include the values of the second and third observation, etc.

The result should then look something like this (see the old values on the left and the new values on the right):

head(data[c("trust_politics", "trust_politics_new")])

##   trust_politics trust_politics_new
## 1              3                 NA
## 2              2                  5
## 3              1                  3
## 4              2                  3
## 5              3                  5
## 6              1                  4

Thus:

the first row is set to NA.
the second row contains the sum of the first and second observation (3 + 2 = 5).
the third row contains the sum of the second and third observation (2 + 1 = 3).
etc.

This is where you’ll find solutions for tutorial 6.

Let’s keep going: [Tutorial 7: Testing Intercoder Reliability]