Chapter 17 Loops

Loops in R can be fun. know...don't screw it up.

Figure 17.1: Loops in R can be fun. Just…you know…don’t screw it up.

One of the golden rules of programming is D.R.Y. ``Don’t repeat yourself." Why? Not because you can’t, but because it’s almost certainly a waste of time. You see, while computers are still much, much worse than humans at some tasks (like recognizing faces), they are much, much better than humans at doing a few key things - like doing the same thing over…and over…and over. To tell R to do something over and over, we use a loop. Loops are absolutely critical in conducting many analyses because they allow you to write code once but evaluate it tens, hundreds, thousands, or millions of times without ever repeating yourself.

For example, imagine that you conduct a survey of 50 people containing 100 yes/no questions. Question 1 might be “Do you ever pick your nose?” and Question 2 might be “No seriously, do you ever pick your nose?!” When you finish the survey, you could store the data as a dataframe with 50 rows (one row for each person surveyed), and 100 columns representing all 100 questions. Now, because every question should have a yes or no answer, the only values in the dataframe should be “yes” or “no” Unfortunately, as is the case with all real world data collection, you will likely get some invalid responses – like “Maybe” or “What be yee phone number?!’’. For this reason, you’d like to go through all the data, and recode any invalid response as NA (aka, missing). To do this sequentially, you’d have to write the following 100 lines of code…

# SLOW way to convert any values that aren't equal to "Y", or "N" to NA
survey.df$q.1[($q1 %in% c("Y", "N")) == FALSE] <- NA
survey.df$q.2[($q2 %in% c("Y", "N")) == FALSE] <- NA
# . ... Wait...I have to type this 98 more times?!
# .
# . ... My god this is boring...
# .
survey.df$q.100[($q100 %in% c("Y", "N")) == FALSE] <- NA

Pretty brutal right? Imagine if you have a huge dataset with 1,000 columns, now you’re really doing a lot of typing. Thankfully, with a loop you can take care of this in no time. Check out this following code chunk which uses a loop to convert the data for all 100 columns in our survey dataframe.

# FAST way to convert values that aren't "Y", or "N" to NA

for(i in 1:100) { # Loop over all 100 columns

temp <- survey.df[, i]  # Get data for ith column and save in a new temporary object temp

temp[(temp %in% c("Y", "N")) == FALSE] <- NA # Convert invalid values in temp to NA

survey.df[, i] <- temp # Assign temp back to survey.df!

} # Close loop!

Done. All 100 columns. Take a look at the code and see if you can understand the general idea. But if not, no worries. By the end of this chapter, you’ll know all the basics of how to construct loops like this one.