13.12 group_by(): Applying functions across groups

  • We can apply above functions to subgroups within the dataset # Observations that have certain values on a variable # Individuals with different levels of education # Individuals belonging to countries
  • dplyr lets you use the group_by() function to describe how to break a dataset down into groups of rows
  • dplyr functions recognize when data frame is grouped by using group_by()
  • Can be used for aggregating data


13.12.1 Example: Applying dplyr functions across groups (aggregation)

# See http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html for this example

essdata <- read.dta("./Material/ESS4e04_de.dta", convert.factors=F)


View(essdata)
nrow(essdata) # Check number of rows
names(essdata)

# edulvl measures education levels
by_edulvl <- group_by(essdata, edulvl_str) # convert the data frame into a 
                                        # grouped data frame and save in object
# Character variable to aggregate
by_edulvl # we can see the group variable and the dimensions
class(by_edulvl) # we can see the new class, 
essdata.agg <- summarise(by_edulvl, # summarise collapses data frame
  n = n(), # Add variable with the number of observations in group
  age.m = mean(age, na.rm = TRUE), # Variable containing mean
  hheinkommen.m = mean(hheinkommen, na.rm = TRUE)) # Variable containing mean
View(essdata.agg)


old.way <- aggregate(essdata, by = list(essdata$edulvl), mean, na.rm=TRUE)
View(old.way)


13.12.2 Exercise: Applying dplyr functions across groups (aggregation)

  1. Execute the following code: library(foreign) and essdata <- read.dta("./Material/ESS4e04_de.dta", convert.factors=F). Adapt your file path!
  2. The variable religion_str contains the religious affiliation of respondents. Aggregate the data set - using functions from dplyr package - so that you obtain averages for subgroups of religious affiliations for the variables polinteresse and trustparties - as well as a variable with the number of observations across the groups.


13.12.3 Solution: Applying dplyr functions across groups (aggregation)