## 10.3 `aggregate()`

: Grouped aggregation

Argument | Description |
---|---|

`formula` |
A formula in the form `y ~ x1 + x2 + ...` where y is the dependent variable, and x1, x2… are the independent variables. For example, `salary ~ sex + age` will aggregate a `salary` column at every unique combination of `sex` and `age` |

`FUN` |
A function that you want to apply to y at every level of the independent variables. E.g.; `mean` , or `max` . |

`data` |
The dataframe containing the variables in `formula` |

`subset` |
A subset of data to analyze. For example, `subset(sex == "f" & age > 20)` would restrict the analysis to females older than 20. You can ignore this argument to use all data. |

The first aggregation function we’ll cover is `aggregate()`

. Aggregate allows you to easily answer questions in the form: “What is the value of the function `FUN`

applied to a dependent variable `dv`

at each level of one (or more) independent variable(s) `iv`

?

```
# General structure of aggregate()
aggregate(formula = dv ~ iv, # dv is the data, iv is the group
FUN = fun, # The function you want to apply
data = df) # The dataframe object containing dv and iv
```

Let’s give `aggregate()`

a whirl. No…not a whirl…we’ll give it a spin. Definitely a spin. We’ll use `aggregate()`

on the `ChickWeight`

dataset to answer the question “What is the mean weight for each diet?”

If we wanted to answer this question using basic R functions, we’d have to write a separate command for each supplement like this:

```
# The WRONG way to do grouped aggregation.
# We should be using aggregate() instead!
mean(ChickWeight$weight[ChickWeight$Diet == 1])
## [1] 103
mean(ChickWeight$weight[ChickWeight$Diet == 2])
## [1] 123
mean(ChickWeight$weight[ChickWeight$Diet == 3])
## [1] 143
mean(ChickWeight$weight[ChickWeight$Diet == 4])
## [1] 135
```

If you are ever writing code like this, there is almost always a simpler way to do it. Let’s replace this code with a much more elegant solution using `aggregate()`

.For this question, we’ll set the value of the dependent variable Y to `weight`

, x1 to `Diet`

, and FUN to `mean`

```
# Calculate the mean weight for each value of Diet
aggregate(formula = weight ~ Diet, # DV is weight, IV is Diet
FUN = mean, # Calculate the mean of each group
data = ChickWeight) # dataframe is ChickWeight
## Diet weight
## 1 1 103
## 2 2 123
## 3 3 143
## 4 4 135
```

As you can see, the `aggregate()`

function has returned a dataframe with a column for the independent variable `Diet`

, and a column for the results of the function `mean`

applied to each level of the independent variable. The result of this function is the same thing we’d got from manually indexing each level of `Diet`

individually – but of course, this code is much simpler and more elegant!

You can also include a `subset`

argument within an `aggregate()`

function to apply the function to subsets of the original data. For example, if I wanted to calculate the mean chicken weights for each diet, but only when the chicks are less than 10 weeks old, I would do the following:

```
# Calculate the mean weight for each value of Diet,
# But only when chicks are less than 10 weeks old
aggregate(formula = weight ~ Diet, # DV is weight, IV is Diet
FUN = mean, # Calculate the mean of each group
subset = Time < 10, # Only when Chicks are less than 10 weeks old
data = ChickWeight) # dataframe is ChickWeight
## Diet weight
## 1 1 58
## 2 2 63
## 3 3 66
## 4 4 69
```

You can also include multiple independent variables in the formula argument to `aggregate()`

. For example, let’s use `aggregate()`

to now get the mean weight of the chicks for all combinations of both `Diet`

and `Time`

, but now only for weeks 0, 2, and 4:

```
# Calculate the mean weight for each value of Diet and Time,
# But only when chicks are 0, 2 or 4 weeks okd
aggregate(formula = weight ~ Diet + Time, # DV is weight, IVs are Diet and Time
FUN = mean, # Calculate the mean of each group
subset = Time %in% c(0, 2, 4), # Only when Chicks are 0, 2, and 4 weeks old
data = ChickWeight) # dataframe is ChickWeight
## Diet Time weight
## 1 1 0 41
## 2 2 0 41
## 3 3 0 41
## 4 4 0 41
## 5 1 2 47
## 6 2 2 49
## 7 3 2 50
## 8 4 2 52
## 9 1 4 56
## 10 2 4 60
## 11 3 4 62
## 12 4 4 64
```