C Programming Concepts

Although programming is not the focus of this book, there are several basic programming concepts that are useful for working with data in R. These concepts will be demonstrated in this short appendix.

C.1 Conditional Statements

In general terms, conditional statements are used to control the flow of a program, depending on whether specified conditions are met. In this section we will explore two of the more common types of conditional statements. Although we present this material in the context of the R language, these concepts apply to most other languages as well (such as Python).

C.1.1 for Loops

A loop allows for a set of commands to be repeated under a specific set of conditions; the for loop is one of the several types of loops available in R. A for loop has the following basic structure:

for(counter){

  instructions

}

The commands that you write in the instructions section of the for loop are executed multiple times based on the value of counter. In the code chunk below, the index variable \(i\) takes on the values 1 through 5, so the body of the for loop (the command to print the number \(i\)) is executed 5 times. Note that we can use the colon notation a:b to get all integers between a and b (inclusive).

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

Now we’ll try something a litte more complicated - we’ll use a for loop to sum all of the numbers between 1 and 100. To start, we need to initialize the sum variable so that it equals 0. We will then use a for loop with 100 iterations to calculate a rolling sum between 1 and 100.

## [1] 5050

This works according to the following logic:

  1. Initialize sum so that it starts equal to 0.
  2. Enter the for loop; in the first iteration \(i\) is equal to 1.
  3. Re-assign sum so that it equals its current value plus the value of \(i\). This means sum now equals 0 + 1 = 1.
  4. Start the next iteration of the for loop, where \(i\) equals 2.
  5. Re-assign sum so that it equals its current value plus the value of \(i\). This means sum now equals 1 + 2 = 3.
  6. Continue this until the 100th iteration of the loop.

In the above for loop, we did not store the result at each iteration of the loop. This means that we cannot go back and access the sum at, say, the 40th iteration; we can only see the sum after all 100 iterations of the loop are finished. For some types of problems, we want to be able to go back and access the results at all iterations of the loop.

To show this, let’s calculate the squares of all integers from 1 to 5. This time we start by initializing an empty vector called squares, which is done using c(). This empty vector will store the results from each iteration of the loop. The append() function is used to add each result to the end of this vector as we iterate over the values one through five.

## [1]  1  4  9 16 25

This works according to the following logic:

  1. Initialize squares so that it is an empty vector.
  2. Enter the for loop; in the first iteration \(i\) is equal to 1.
  3. Calculate 1 squared (with the command “i^2”) and append it to squares.
  4. Start the next iteration of the for loop, where \(i\) equals 2.
  5. Calculate 2 squared (with the command “i^2”) and append it to squares.
  6. Continue this until the 5th iteration of the loop.

C.1.2 if/else Statements

Often we encounter situations where we would like to run some code if a condition is TRUE, or run different code if the condition is FALSE. For these situations we need to use if/else statements, which take the general form:

if(condition1){

  code block 1

} else if (condition2){

  code block 2

} else{

  code block 3

}

If condition1 is true, then R runs the code in code block 1 and ignores code block 2 and code block 3. If condition1 is not true and condition2 is true, R skips code block 1 and code block 3 and runs code block 2. Finally, if neither condition1 nor condition2 are true, R runs code block 3. For example:

## [1] "Negative"
## [1] "Negative"
## [1] "Zero"
## [1] "Positive"

Note that you do not need to include an else if statement if you only have one condition to evaluate:

## [1] "Negative"
## [1] "Negative"
## [1] "Not Negative"
## [1] "Not Negative"

C.2 Functions

Throughout the book, we have seen many examples of built-in R functions. However, we can also define our own functions! Imagine we wanted to calculate the compound interest on an investment with the following formula:

\[A = P(1 + \frac{r}{n})^{nt}\]

\(A =\) final amount
\(P =\) principal balance
\(r =\) interest rate
\(n =\) number of times interest is applied per time period
\(t =\) number of time periods

Of course, we could write out the formula arithmetically every time we wanted to calculate compound interest. Let’s say \(P = \$10,000\), \(r = 0.10\), \(n = 12\), and \(t = 5\). Using the formula above:

## [1] 16453.09

Now imagine we wanted to calculate compound interest on many different investments and compare them. We could copy-and-paste the code above, each time changing the values of \(P\), \(r\), \(n\), and \(t\). However, imagine after doing this we realized that there was a mistake in our original formula. We would then need to go back and fix that mistake in every line of code that we copy-and-pasted from the original. We can prevent this headache by defining our own function to calculate compound interest, and then applying that function many times. If we notice a mistake in our formula, we simply need to fix it in the definition of the function and not in every single line of code.

If you find yourself repeatedly copy-and-pasting a chunk of code, this is a good sign that you should define a function.

How can we define our own functions in R? Function definitions take the following form:

function_name <- function(arg1, arg2, …){

    …code block…

    return(result)

}

  • Required
    • function_name: The name of our new function. Function names follow the same basic naming rules that we saw in Section 2.
    • ...code block...: The code we want the function to apply. This is where we will write the compound interest formula.
  • Optional
    • arg1, arg2, ...: Any arguments we want the function to accept. In our compound interest example, we want the function to accept the arguments \(P\), \(r\), \(n\), and \(t\). Note that arguments are optional, so a function can take no inputs.
    • return(result): Any values or objects we want the function to return. In our compound interest example, we will return the result of the compound interest calculation. Note that this is optional, so a function does not need to return anything.

Now let’s create a function called compound_interest(). Following the syntax shown above, we can define this function as follows:

We can then apply our new function to calculate compound interest:

## [1] 16453.09

Colaboratory: Frequently Asked Questions. 2021. 1600 Amphitheatre Parkway, Mountain View, California, United States: Google. https://research.google.com/colaboratory/faq.html.

Jupyter Project and Community. 2021. About Us. Project Jupyter. https://jupyter.org/about.

R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

RStudio Team. 2021. RStudio: Integrated Development Environment for R. Boston, MA: RStudio, PBC. http://www.rstudio.com/.

Wasserstein, et al., Ronald L. 2019. “Moving to a World Beyond ‘P < 0.05.” The American Statistician. https://doi.org/10.1080/00031305.2019.1583913.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Wickham, Hadley, and Jennifer Bryan. 2019. Readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.