6 Summary Statistics
There are two functions in base R that I use to quickly calculate summary statistics. The first is summary() which calculates quantitative summary statistics.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -43.00 -5.00 -2.00 12.64 11.00 1301.00 8255
The other function is table() which creates a basic frequency table.
##
## 9E AA AS B6 DL EV F9 FL HA MQ OO UA US VX WN YV
## 18460 32729 714 54635 48110 54173 685 3260 342 26397 32 58665 20536 5162 12275 601
6.1 The tidyverse approach: summarize
The tidyverse approach to calculating summary statistics is a bit more involved, although offers a lot of flexibility. The key function is summarize(), which aggregates all the data in your dataset and creates new “variables” that are functions of your whole data. For example, I’m going to calculate the mean departure delay.
## # A tibble: 1 x 1
## delay
## <dbl>
## 1 12.6
flights %>% summarize(dep.delay=mean(dep_delay, na.rm = T),
dep.delay.sd = sd(dep_delay, na.rm = T),
dep.delay.med = median(dep_delay, na.rm = T))
## # A tibble: 1 x 3
## dep.delay dep.delay.sd dep.delay.med
## <dbl> <dbl> <dbl>
## 1 12.6 40.2 -2