8.6 Combining slicing with functions

Now that you know how to slice and dice dataframes using indexing and subset(), you can easily combine slicing and dicing with statistical functions to calculate summary statistics on groups of data. For example, the following code will calculate the mean tooth length of guinea pigs with the OJ supplement using the subset() function:

# What is the mean tooth length of Guinea pigs given OJ?

# Step 1: Create a subsettted dataframe called oj

oj <- subset(x = ToothGrowth,
             subset = supp == "OJ")

# Step 2: Calculate the mean of the len column from
#  the new subsetted dataset

mean(oj$len)
## [1] 21

We can also get the same solution using logical indexing:

# Step 1: Create a subsettted dataframe called oj
oj <- ToothGrowth[ToothGrowth$supp == "OJ",]

# Step 2: Calculate the mean of the len column from
#  the new subsetted dataset
mean(oj$len)
## [1] 21

Or heck, we can do it all in one line by only referring to column vectors:

mean(ToothGrowth$len[ToothGrowth$supp == "OJ"])
## [1] 21

As you can see, R allows for many methods to accomplish the same task. The choice is up to you.

8.6.1 with()

The with() function helps to save you some typing when you are using multiple columns from a dataframe. Specifically, it allows you to specify a dataframe (or any other object in R) once at the beginning of a line – then, for every object you refer to in the code in that line, R will assume you’re referring to that object in an expression.

For example, let’s create a dataframe called health with some health information:

health <- data.frame("age" = c(32, 24, 43, 19, 43),
                     "height" = c(1.75, 1.65, 1.50, 1.92, 1.80),
                     "weight" = c(70, 65, 62, 79, 85))

health
##   age height weight
## 1  32    1.8     70
## 2  24    1.6     65
## 3  43    1.5     62
## 4  19    1.9     79
## 5  43    1.8     85

Now let’s say we want to add a new column called bmi which represents a person’s body mass index (BMI). The formula for bmi is \(bmi = \frac{weight}{height^{2}}\), where height is in meters and weight is in kilograms. If we wanted to calculate the bmi of each person, we’d need to write health$weight / health$height ^ 2:

# Calculate bmi
health$weight / health$height ^ 2
## [1] 23 24 28 21 26

As you can see, we have to retype the name of the dataframe for each column. However, using the with() function, we can make it a bit easier by saying the name of the dataframe once.

# Save typing by using with()
with(health, height / weight ^ 2)
## [1] 0.00036 0.00039 0.00039 0.00031 0.00025

As you can see, the results are identical. In this case, we didn’t save so much typing. But if you are doing many calculations, then with() can save you a lot of typing. For example, contrast these two lines of code that perform identical calculations:

# Long code
health$weight + health$height / health$age + 2 * health$height
## [1] 74 68 65 83 89

# Short code that does the same thing
with(health, weight + height / age + 2 * height)
## [1] 74 68 65 83 89