Chapter 4 Intro Stat Functions

Packages used: psych.

Many basic functions are intuitively named: mean, median, minimum, maximum, and standard deviation for example. You can individually call these functions on a single column of a dataframe:

#Assign data
df <- women

#Mean
mean(df$height)
## [1] 65
#Median
median(df$height)
## [1] 65
#Minimum
min(df$height)
## [1] 58
#Maximum
max(df$height)
## [1] 72
#Standard Deviation
sd(df$height)
## [1] 4.472136

You might run these to check your data, to verify coding of a new column, or because you just need that value for your analysis. We can also assign the output to an object rather than just having it print to the console, for later use.

m <- mean(df$height)

We could also get all of this information by using the summary() function. This takes as input your dataframe, and returns out a table with the minimum (Min.), 1st quantile (1st Qu.), Median, Mean, 3rd quantile (3rd Qu.), and the maximum (Max.) for each numeric column.

summary(df)
##      height         weight     
##  Min.   :58.0   Min.   :115.0  
##  1st Qu.:61.5   1st Qu.:124.5  
##  Median :65.0   Median :135.0  
##  Mean   :65.0   Mean   :136.7  
##  3rd Qu.:68.5   3rd Qu.:148.0  
##  Max.   :72.0   Max.   :164.0

4.1 Mode

Mode, unfortunately, is less easily obtained. Should you need it, you will have to run the mode function below to write it, then use mode() on your column. Onve you have run the function one time in your R session, you will not have to run it again. However, if you switch machines, or close R and reopen it, you will have to re-run the function prior to use. A bit annoying….

#Run this first!
mode <- function(x) {
  u <- unique(x)
  tab <- tabulate(match(x, u))
  u[tab == max(tab)]
}

#Then you can use it as a function
mode(df$height)
##  [1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72

You’ll notice that since there are no repeated numbers in the women dataset (what we have assigned to df), there are as many modes as there are entries. Going to the cars dataset, we get:

#Load a different dataset
cars <- mtcars

#Try the mode function again
mode(cars$cyl)
## [1] 8

4.2 Psych Package for Data Summary

There is also a package called psych that contains many useful functions, one of which is describe(). Similar to summary(), it provides a number of different values. However, it provides many more than summary: item name, item number, number of valid cases (n), mean, standard deviation, trimmed mean, median, median absolute deviation (mad), minimum, maximum, skew, kurtosis, and standard error.

#Call the package
library(psych)  

describe(df)
##        vars  n   mean    sd median trimmed   mad min max range skew kurtosis   se
## height    1 15  65.00  4.47     65   65.00  5.93  58  72    14 0.00    -1.44 1.15
## weight    2 15 136.73 15.50    135  136.31 17.79 115 164    49 0.23    -1.34 4.00