10.5 Additional aggregation functions

There are many, many other aggregation functions that I haven’t covered in this chapter – mainly because I rarely use them. In fact, that’s a good reminder of a peculiarity about R, there are many methods to achieve the same result, and your choice of which method to use will often come down to which method you just like the most.

10.5.1 rowMeans(), colMeans()

To easily calculate means (or sums) across all rows or columns in a matrix or dataframe, use rowMeans(), colMeans(), rowSums() or colSums().

For example, imagine we have the following data frame representing scores from a quiz with 5 questions, where each row represents a student, and each column represents a question. Each value can be either 1 (correct) or 0 (incorrect)

Table 10.2: Scores from an exam.
q1 q2 q3 q4 q5
1 1 1 1 1
0 0 0 1 0
0 1 1 1 0
0 1 0 1 1
0 0 0 1 1
# Some exam scores
exam <- data.frame("q1" = c(1, 0, 0, 0, 0),
                   "q2" = c(1, 0, 1, 1, 0),
                   "q3" = c(1, 0, 1, 0, 0),
                   "q4" = c(1, 1, 1, 1, 1),
                   "q5" = c(1, 0, 0, 1, 1))

Let’s use rowMeans() to get the average scores for each student:

# What percent did each student get correct?
rowMeans(exam)
## [1] 1.0 0.2 0.6 0.6 0.4

Now let’s use colMeans() to get the average scores for each question:

# What percent of students got each question correct?
colMeans(exam)
##  q1  q2  q3  q4  q5 
## 0.2 0.6 0.4 1.0 0.6

Warning rowMeans() and colMeans() only work on numeric columns. If you try to apply them to non-numeric data, you’ll receive an error.

10.5.2 apply family

There is an entire class of apply functions in R that apply functions to groups of data. For example, tapply(), sapply() and lapply() each work very similarly to aggregate(). For example, you can calculate the average length of movies by genre with tapply() as follows.

with(movies, tapply(X = time,        # DV is time
                    INDEX = genre,   # IV is genre
                    FUN = mean,      # function is mean
                    na.rm = TRUE))   # Ignore missing
##              Action           Adventure        Black Comedy 
##                 113                 106                 113 
##              Comedy Concert/Performance         Documentary 
##                  99                  78                  69 
##               Drama              Horror     Multiple Genres 
##                 116                  99                 114 
##             Musical             Reality     Romantic Comedy 
##                 113                  44                 107 
##   Thriller/Suspense             Western 
##                 112                 121

tapply(), sapply(), and lapply() all work very similarly, their main difference is in the structure of their output. For example, lapply() returns a list (we’ll cover lists in a future chapter).