4.6 Summarise

We may wish to generate summary statistics for a particular variable ourselves, rather than using summary. This is a job for summarise(). Let’s investigate some summary statistics for TB cure rate.

Here we introduce another way to say !is.na() using complete.cases(). There is oftentimes more than one way to do something in R.

#--- Get summary statistics for TB cure rate
sdg %>% filter(complete.cases(tb.cure)) %>% 
  summarise(mean = mean(tb.cure),
                  median = median(tb.cure),
                  sd = sd(tb.cure),
                  min = min(tb.cure),
                  max = max(tb.cure))

##     mean median       sd min max
## 1 78.925     83 15.30328   0 100

#--- Get same summary statistics grouped by region
sdg %>% filter(complete.cases(tb.cure)) %>% 
  group_by(reg) %>% 
  summarise(mean = mean(tb.cure),
                  median = median(tb.cure),
                  sd = sd(tb.cure),
                  min = min(tb.cure),
                  max = max(tb.cure))

## # A tibble: 6 x 6
##   reg    mean median    sd   min   max
##   <fct> <dbl>  <dbl> <dbl> <int> <int>
## 1 AFR    80.0     83  11.1    34    94
## 2 AMR    76.0     80  21.9     0   100
## 3 EMR    81.9     86  13.1    44    97
## 4 EUR    78.1     81  11.5    45   100
## 5 SEA    81.5     84  15.8    37    93
## 6 WPR    79.4     86  17.3    20   100

EXERCISE: Variance is equal to the square of the standard deviation. Get the variance of maternal mortality by region.

EXERCISE: By country GDP descriptor (lmic), calculate the mean proportion of individuals living without adequate sanitation.

With just a handful of verbs from the tidyverse, we have seen how to perform a number of different data manipulations that can inform us about our data.