Chapter 6 Optional: Advanced Use
The content of this optional chapter will not be part of the seminar sessions but you can use it if you are interested in some of the functionalities that set R apart from conventional statistics software. Here, we will have a look at the basic programming tools you need to write your own functions.
6.1 Programming basics
So far we have used R mainly as a software for statistical analysis, but it is in fact a fully-fledged programming language. Learning the basic structures you need for programming your own functions is actually not very hard, so we will show the basic building blocks here.
6.1.1 Defining a function
Apart from using already existing functions in R, you can write you own function if you don’t find one that is doing exactly what you need. For demonstration purposes, let’s define a function mySum()
that takes two single numbers firstNumber
and secondNumber
as input and computes the sum of these numbers:
In this block of code a function is defined and given the name mySum
using the assignment operator <-
.
The definition of a function always comes in the form function(<arguments>){<body>}
. <arguments>
is a comma seperated list of the input data you need for you computation and <body>
describes the operations that need to be done for the computation. For better readability, we usually enter the <body>
over several lines enclosed by {}
.
So mySum()
expects two input objects firstNumber
and secondNumber
. In the body, these two are added and the result is assigned the name result
. In the next line result
is called, to make sure the result gets actually printed when calling the function, then the body closes with }
.
After defining the function we can use it:
[1] 7
When you execute this line of code, the following happens:
- R looks up the function that is saved under
mySum
. - The value 3 is assigned to the internal variable
firstNumber
and the value 4 is assigned to the internal variablesecondNumber
firstNumber + secondNumber
is executed, the result 7 is assigned to the internal variableresult
result
is called at the end of the body to make sure its value is returned to the “outside”.- Everything that is not explicitly called in the last line of the body stays inside the function. This means neither
result
norfirstNumber
orsecondNumber
can be called outside of the function as the following line shows:
Error in eval(expr, envir, enclos): object 'firstNumber' not found
As you know from the functions you have used already, it is also possible to assign default values to some of the arguments. The following function has a default of 10
for secondNumber
:
This means if you omit secondNumber
in the function call, it is assumed to be 10
:
[1] 15
But you can overwrite the default:
[1] 7
You can also call other functions inside your function. For example you can write a function that computes the mean difference of two vectors:
meandiff <- function(x,y){
result <- mean(x) - mean(y)
result
}
v1<-c(1,2,3)
v2<-c(10,20,30)
meandiff(v1,v2)
[1] -18
6.1.2 Conditional statements
Sometimes you want your code to do one thing in one case and another thing in the other case. For example you could write some code that tests whether a person has fever:
[1] "fever"
You can change the value of bodytemp
to different values to see how the conditional statement works. In the condition part if(<logical statement>)
you test a logical condition of the kind you have learned about in the first chapter. Then follows the body {<what to do>}
that specifies the code you want to execute if the condition evaluates to TRUE
.
In the above code nothing happens if the condition is not met. If you want your code to return a "no fever"
for cases where bodytemp < 38
, you can extend the statement by an else
part:
[1] "no fever"
Now, if the condition evaluates to TRUE
the block in the first {}
is executed, if the condition evaluates to FALSE
, the block in the second {}
is executed.
Of course you can wrap this in a function to make it easier to use repeatedly:
hasFever <- function(bodytemp){
if(bodytemp>=38){
status<-"fever"
}else{
status<-"no fever"
}
status
}
And try it out with different values:
[1] "no fever"
[1] "fever"
You can also check different conditions in a row using else if
in between. The line breaks are just for readability but make sure you keep track of all the opening and closing brackets!
tempChecker <- function(bodytemp){
if(bodytemp<36){
status <- "too cold"
}else if(bodytemp>=38){
status <- "too hot"
}else{
status <- "normal"
}
status
}
Try it out with different numbers:
[1] "too cold"
[1] "too hot"
[1] "normal"
In this code, the conditions are checked in the order they appear in. If the first condition applies, the first block of code is executed, and the rest of the if else
statement is ignored. If the first condition is not met, the second condition is evaluated. If it is TRUE
the following code block is executed, the rest of the statement is ignored. When the all of the conditions have been tested and evaluated to FALSE
, the last code block from the else
part is executed.
6.1.3 Loops
The final structure is the loop: A loop allows you to assign repetitive tasks to your computer instead of doing them yourself. The first kind of loop you will learn about is the for
loop. In this loop you specify the number of repetitions for a task explicitly. The following loop prints the numbers from 1 to 5:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
In the ()
part you define the counting variable, which is often called i
(but can have any other name too) and we define the values this counting variable should take (the values 1 to 5 in our case). In the {}
part we then define the task for every iteration. print(i)
simply tells R to print the value of i
into the console. So the above loop has 5 iterations in each of which the current value of i
is printed to the console.
Of course we can also have proper computations. For example we can add up alle the numbers from 1 to 1000 with this code:
[1] 500500
In the above code the value of result
is 0 to begin with. Then the loop enters its first round and the value of result
is updated to the current value of result
plus the current value of i
, so 0 + 1 = 1
. Then the second iteration starts and the same happens again: The current value of result
is updated by adding the current value of i
to it, so result
is now 1 + 2 = 3
etc.
Sometimes a repetitive task has to be done until a certain condition is met, but we cannot tell beforehand how many iterations it is going to take. In these cases, we can use the while
loop. For example you can count how often you have to add 0.6 until you get to a number that is greater than 1000:
[1] 1667
Before the loop starts, both x
and counter
have the value 0. Then in every iteration, x
grows by 0.6 and counter
by 1 to count the number of iterations. As soon as the condition in ()
is not met anylonger (i.e. when x is greater than 1000), the loop stops. As you can see, it takes 1667 iterations to make x greater than 1000.
The previous examples are of course just toy examples to demonstrate the basic functionality of loops. In reality we can use a loop for more practical tasks, for example to create the same kind of graphic for a large number of variables.