Chapter 5 Numeric Summary

5.1 tally - for summarizing categorical data

Let’s use the KidsFeet data to demonstrate the tally function. KidsFeet contains 39 observations of fourth grade students with variables such as the month of birth, the gender of the childe, and the dominant hand.

library(mosaic)
library(mosaicData)
str(KidsFeet)
## 'data.frame':    39 obs. of  8 variables:
##  $ name      : Factor w/ 36 levels "Abby","Alisha",..: 10 24 36 20 23 34 13 4 14 8 ...
##  $ birthmonth: int  5 10 12 1 2 3 2 6 5 9 ...
##  $ birthyear : int  88 87 87 88 88 88 88 88 88 88 ...
##  $ length    : num  24.4 25.4 24.5 25.2 25.1 25.7 26.1 23 23.6 22.9 ...
##  $ width     : num  8.4 8.8 9.7 9.8 8.9 9.7 9.6 8.8 9.3 8.8 ...
##  $ sex       : Factor w/ 2 levels "B","G": 1 1 1 1 1 1 1 2 2 1 ...
##  $ biggerfoot: Factor w/ 2 levels "L","R": 1 1 2 1 1 2 1 1 2 2 ...
##  $ domhand   : Factor w/ 2 levels "L","R": 2 1 2 2 2 2 2 2 2 1 ...
tally(~sex, data=KidsFeet)
## sex
##  B  G 
## 20 19

There are 20 boys and 19 girls in the dataset.

tally(~sex + domhand, data=KidsFeet, format="percent")
##    domhand
## sex         L         R
##   B 12.820513 38.461538
##   G  7.692308 41.025641

By adding the format=“percent” argument, the counts are displayed in proportions. We can see that about 41% of the children are right-handed girls.

tally(~sex + domhand, data=KidsFeet, margins=TRUE)
##        domhand
## sex      L  R Total
##   B      5 15    20
##   G      3 16    19
##   Total  8 31    39

We can use the margins argument to display the marginal totals.

5.2 favstats() - for summarizing numeric data

In order to summarize numeric data, we compute things like the mean, median, and standard deviation. We can do that one at a time …

mean(~length, data=KidsFeet, na.rm = T)
## [1] 24.72308
median(~length, data=KidsFeet, na.rm=T)
## [1] 24.5
sd(~length, data=KidsFeet, na.rm=T)
## [1] 1.317586

… or we can do it all at once with favstats():

favstats(~length, data=KidsFeet)
##   min Q1 median   Q3  max     mean       sd  n missing
##  21.6 24   24.5 25.6 27.5 24.72308 1.317586 39       0

We can even use a formula to break out the statistics by a categorical variable:

favstats(~length | sex, data=KidsFeet)
##   sex  min    Q1 median   Q3  max     mean       sd  n missing
## 1   B 22.9 24.35  24.95 25.8 27.5 25.10500 1.216758 20       0
## 2   G 21.6 23.65  24.20 25.1 26.7 24.32105 1.330238 19       0

Now we can view the mean for the boys and girls (sex) separately.