5.1 Summarizing categorical data
To summarize a categorical variable, compute the frequency (N) and proportion (%) of each value of that variable, along with the number of missing values. For example, summarize the variable income
.
# Frequency (N)
# Include "useNA = "ifany"" to see the number of missing values
table(mydat$income, useNA = "ifany")
##
## < $25,000 $25,000 to < $55,000 $55,000+ <NA>
## 76 86 67 21
##
## < $25,000 $25,000 to < $55,000 $55,000+ <NA>
## 0.304 0.344 0.268 0.084
##
## < $25,000 $25,000 to < $55,000 $55,000+
## 0.3319 0.3755 0.2926
It would be nice to have all this information in one summary. Let’s write a function that does that.
myfun_cat <- function(x) {
# Count the number of missing values
nmiss <- sum(is.na(x))
# Frequency
n <- table(x)
# Proportion
p <- prop.table(n)
# Putting it together
OUT <- cbind(n, p)
# Add nmiss, but first pad to have the right number of rows
nmiss <- c(nmiss, rep(NA, nrow(OUT)-1))
OUT <- cbind(OUT, nmiss)
return(OUT)
}
myfun_cat(mydat$income)
## n p nmiss
## < $25,000 76 0.3319 21
## $25,000 to < $55,000 86 0.3755 NA
## $55,000+ 67 0.2926 NA