4.6 Summarise
We may wish to generate summary statistics for a particular variable ourselves, rather than using summary. This is a job for summarise(). Let’s investigate some summary statistics for TB cure rate.
Here we introduce another way to say !is.na() using complete.cases(). There is oftentimes more than one way to do something in R.
#--- Get summary statistics for TB cure rate
sdg %>% filter(complete.cases(tb.cure)) %>%
summarise(mean = mean(tb.cure),
median = median(tb.cure),
sd = sd(tb.cure),
min = min(tb.cure),
max = max(tb.cure))
## mean median sd min max
## 1 78.925 83 15.30328 0 100
#--- Get same summary statistics grouped by region
sdg %>% filter(complete.cases(tb.cure)) %>%
group_by(reg) %>%
summarise(mean = mean(tb.cure),
median = median(tb.cure),
sd = sd(tb.cure),
min = min(tb.cure),
max = max(tb.cure))
## # A tibble: 6 x 6
## reg mean median sd min max
## <fct> <dbl> <dbl> <dbl> <int> <int>
## 1 AFR 80.0 83 11.1 34 94
## 2 AMR 76.0 80 21.9 0 100
## 3 EMR 81.9 86 13.1 44 97
## 4 EUR 78.1 81 11.5 45 100
## 5 SEA 81.5 84 15.8 37 93
## 6 WPR 79.4 86 17.3 20 100
EXERCISE: Variance is equal to the square of the standard deviation. Get the variance of maternal mortality by region.
EXERCISE: By country GDP descriptor (lmic), calculate the mean proportion of individuals living without adequate sanitation.
With just a handful of verbs from the tidyverse, we have seen how to perform a number of different data manipulations that can inform us about our data.