## 13.12 group_by(): Applying functions across groups

• We can apply above functions to subgroups within the dataset # Observations that have certain values on a variable # Individuals with different levels of education # Individuals belonging to countries
• dplyr lets you use the group_by() function to describe how to break a dataset down into groups of rows
• dplyr functions recognize when data frame is grouped by using group_by()
• Can be used for aggregating data

### 13.12.1 Example: Applying dplyr functions across groups (aggregation)

# See http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html for this example

View(essdata)
nrow(essdata) # Check number of rows
names(essdata)

# edulvl measures education levels
by_edulvl <- group_by(essdata, edulvl_str) # convert the data frame into a
# grouped data frame and save in object
# Character variable to aggregate
by_edulvl # we can see the group variable and the dimensions
class(by_edulvl) # we can see the new class,
essdata.agg <- summarise(by_edulvl, # summarise collapses data frame
n = n(), # Add variable with the number of observations in group
age.m = mean(age, na.rm = TRUE), # Variable containing mean
hheinkommen.m = mean(hheinkommen, na.rm = TRUE)) # Variable containing mean
View(essdata.agg)

old.way <- aggregate(essdata, by = list(essdata\$edulvl), mean, na.rm=TRUE)
View(old.way)

### 13.12.2 Exercise: Applying dplyr functions across groups (aggregation)

1. Execute the following code: library(foreign) and essdata <- read.dta("./Material/ESS4e04_de.dta", convert.factors=F). Adapt your file path!
2. The variable religion_str contains the religious affiliation of respondents. Aggregate the data set - using functions from dplyr package - so that you obtain averages for subgroups of religious affiliations for the variables polinteresse and trustparties - as well as a variable with the number of observations across the groups.