Chapter 6 Optional: Advanced Use

The content of this optional chapter will not be part of the seminar sessions but you can use it if you are interested in some of the functionalities that set R apart from conventional statistics software. Here, we will have a look at the basic programming tools you need to write your own functions.

6.1 Programming basics

So far we have used R mainly as a software for statistical analysis, but it is in fact a fully-fledged programming language. Learning the basic structures you need for programming your own functions is actually not very hard, so we will show the basic building blocks here.

6.1.1 Defining a function

Apart from using already existing functions in R, you can write you own function if you don’t find one that is doing exactly what you need. For demonstration purposes, let’s define a function mySum() that takes two single numbers firstNumber and secondNumber as input and computes the sum of these numbers:

mySum <- function(firstNumber, secondNumber){
    result <- firstNumber + secondNumber
    result
}

In this block of code a function is defined and given the name mySum using the assignment operator <-.

The definition of a function always comes in the form function(<arguments>){<body>}. <arguments> is a comma seperated list of the input data you need for you computation and <body> describes the operations that need to be done for the computation. For better readability, we usually enter the <body> over several lines enclosed by {}.

So mySum() expects two input objects firstNumberand secondNumber. In the body, these two are added and the result is assigned the name result. In the next line result is called, to make sure the result gets actually printed when calling the function, then the body closes with }.

After defining the function we can use it:

mySum(3,4)
[1] 7

When you execute this line of code, the following happens:

  1. R looks up the function that is saved under mySum.
  2. The value 3 is assigned to the internal variable firstNumber and the value 4 is assigned to the internal variable secondNumber
  3. firstNumber + secondNumber is executed, the result 7 is assigned to the internal variable result
  4. result is called at the end of the body to make sure its value is returned to the “outside”.
  5. Everything that is not explicitly called in the last line of the body stays inside the function. This means neither result nor firstNumberor secondNumber can be called outside of the function as the following line shows:
firstNumber
Error in eval(expr, envir, enclos): object 'firstNumber' not found

As you know from the functions you have used already, it is also possible to assign default values to some of the arguments. The following function has a default of 10 for secondNumber:

mySum2 <- function(firstNumber, secondNumber=10){
    result <- firstNumber + secondNumber
    result
}

This means if you omit secondNumber in the function call, it is assumed to be 10:

mySum2(5)
[1] 15

But you can overwrite the default:

mySum2(5,2)
[1] 7

You can also call other functions inside your function. For example you can write a function that computes the mean difference of two vectors:

meandiff <- function(x,y){
    result <- mean(x) - mean(y) 
    result
}

v1<-c(1,2,3)
v2<-c(10,20,30)

meandiff(v1,v2)
[1] -18

6.1.2 Conditional statements

Sometimes you want your code to do one thing in one case and another thing in the other case. For example you could write some code that tests whether a person has fever:

bodytemp <- 38

if(bodytemp>=38){
   "fever"
}
[1] "fever"

You can change the value of bodytemp to different values to see how the conditional statement works. In the condition part if(<logical statement>) you test a logical condition of the kind you have learned about in the first chapter. Then follows the body {<what to do>} that specifies the code you want to execute if the condition evaluates to TRUE.

In the above code nothing happens if the condition is not met. If you want your code to return a "no fever" for cases where bodytemp < 38, you can extend the statement by an else part:

bodytemp <- 37

if(bodytemp>=38){
   "fever"
}else{
    "no fever"
}
[1] "no fever"

Now, if the condition evaluates to TRUE the block in the first {} is executed, if the condition evaluates to FALSE, the block in the second {} is executed.

Of course you can wrap this in a function to make it easier to use repeatedly:

hasFever <- function(bodytemp){
    
    if(bodytemp>=38){
        status<-"fever"
    }else{
        status<-"no fever"
    }
    
    status
}

And try it out with different values:

hasFever(36.2)
[1] "no fever"
hasFever(40)
[1] "fever"

You can also check different conditions in a row using else if in between. The line breaks are just for readability but make sure you keep track of all the opening and closing brackets!

tempChecker <- function(bodytemp){
    
    if(bodytemp<36){
        
        status <- "too cold"
    
    }else if(bodytemp>=38){
    
        status <- "too hot"
    
    }else{
    
        status <- "normal"
    }
    
    status
}

Try it out with different numbers:

tempChecker(35)
[1] "too cold"
tempChecker(39)
[1] "too hot"
tempChecker(37)
[1] "normal"

In this code, the conditions are checked in the order they appear in. If the first condition applies, the first block of code is executed, and the rest of the if else statement is ignored. If the first condition is not met, the second condition is evaluated. If it is TRUE the following code block is executed, the rest of the statement is ignored. When the all of the conditions have been tested and evaluated to FALSE, the last code block from the else part is executed.

6.1.3 Loops

The final structure is the loop: A loop allows you to assign repetitive tasks to your computer instead of doing them yourself. The first kind of loop you will learn about is the for loop. In this loop you specify the number of repetitions for a task explicitly. The following loop prints the numbers from 1 to 5:

for (i in 1:5) {
    
    print(i)
    
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

In the () part you define the counting variable, which is often called i (but can have any other name too) and we define the values this counting variable should take (the values 1 to 5 in our case). In the {} part we then define the task for every iteration. print(i) simply tells R to print the value of i into the console. So the above loop has 5 iterations in each of which the current value of i is printed to the console.

Of course we can also have proper computations. For example we can add up alle the numbers from 1 to 1000 with this code:

result <- 0

for(i in 1:1000){
    
    result <- result + i
    
}

result
[1] 500500

In the above code the value of result is 0 to begin with. Then the loop enters its first round and the value of result is updated to the current value of result plus the current value of i, so 0 + 1 = 1. Then the second iteration starts and the same happens again: The current value of result is updated by adding the current value of i to it, so result is now 1 + 2 = 3 etc.

Sometimes a repetitive task has to be done until a certain condition is met, but we cannot tell beforehand how many iterations it is going to take. In these cases, we can use the while loop. For example you can count how often you have to add 0.6 until you get to a number that is greater than 1000:

x <- 0 
counter <- 0
while(x <= 1000){
    x <- x + 0.6
    counter <- counter + 1
}
counter
[1] 1667

Before the loop starts, both x and counter have the value 0. Then in every iteration, x grows by 0.6 and counter by 1 to count the number of iterations. As soon as the condition in () is not met anylonger (i.e. when x is greater than 1000), the loop stops. As you can see, it takes 1667 iterations to make x greater than 1000. The previous examples are of course just toy examples to demonstrate the basic functionality of loops. In reality we can use a loop for more practical tasks, for example to create the same kind of graphic for a large number of variables.