Chapter 5 Numeric Summary
5.1 tally - for summarizing categorical data
Let’s use the KidsFeet data to demonstrate the tally function. KidsFeet contains 39 observations of fourth grade students with variables such as the month of birth, the gender of the childe, and the dominant hand.
library(mosaic)
library(mosaicData)
str(KidsFeet)
## 'data.frame': 39 obs. of 8 variables:
## $ name : Factor w/ 36 levels "Abby","Alisha",..: 10 24 36 20 23 34 13 4 14 8 ...
## $ birthmonth: int 5 10 12 1 2 3 2 6 5 9 ...
## $ birthyear : int 88 87 87 88 88 88 88 88 88 88 ...
## $ length : num 24.4 25.4 24.5 25.2 25.1 25.7 26.1 23 23.6 22.9 ...
## $ width : num 8.4 8.8 9.7 9.8 8.9 9.7 9.6 8.8 9.3 8.8 ...
## $ sex : Factor w/ 2 levels "B","G": 1 1 1 1 1 1 1 2 2 1 ...
## $ biggerfoot: Factor w/ 2 levels "L","R": 1 1 2 1 1 2 1 1 2 2 ...
## $ domhand : Factor w/ 2 levels "L","R": 2 1 2 2 2 2 2 2 2 1 ...
tally(~sex, data=KidsFeet)
## sex
## B G
## 20 19
There are 20 boys and 19 girls in the dataset.
tally(~sex + domhand, data=KidsFeet, format="percent")
## domhand
## sex L R
## B 12.820513 38.461538
## G 7.692308 41.025641
By adding the format=“percent” argument, the counts are displayed in proportions. We can see that about 41% of the children are right-handed girls.
tally(~sex + domhand, data=KidsFeet, margins=TRUE)
## domhand
## sex L R Total
## B 5 15 20
## G 3 16 19
## Total 8 31 39
We can use the margins argument to display the marginal totals.
5.2 favstats() - for summarizing numeric data
In order to summarize numeric data, we compute things like the mean, median, and standard deviation. We can do that one at a time …
mean(~length, data=KidsFeet, na.rm = T)
## [1] 24.72308
median(~length, data=KidsFeet, na.rm=T)
## [1] 24.5
sd(~length, data=KidsFeet, na.rm=T)
## [1] 1.317586
… or we can do it all at once with favstats():
favstats(~length, data=KidsFeet)
## min Q1 median Q3 max mean sd n missing
## 21.6 24 24.5 25.6 27.5 24.72308 1.317586 39 0
We can even use a formula to break out the statistics by a categorical variable:
favstats(~length | sex, data=KidsFeet)
## sex min Q1 median Q3 max mean sd n missing
## 1 B 22.9 24.35 24.95 25.8 27.5 25.10500 1.216758 20 0
## 2 G 21.6 23.65 24.20 25.1 26.7 24.32105 1.330238 19 0
Now we can view the mean for the boys and girls (sex) separately.