Chapter 4 Intro Stat Functions
Packages used: psych
.
Many basic functions are intuitively named: mean, median, minimum, maximum, and standard deviation for example. You can individually call these functions on a single column of a dataframe:
## [1] 65
## [1] 65
## [1] 58
## [1] 72
## [1] 4.472136
You might run these to check your data, to verify coding of a new column, or because you just need that value for your analysis. We can also assign the output to an object rather than just having it print to the console, for later use.
We could also get all of this information by using the summary()
function. This takes as input your dataframe, and returns out a table with the minimum (Min.), 1st quantile (1st Qu.), Median, Mean, 3rd quantile (3rd Qu.), and the maximum (Max.) for each numeric column.
## height weight
## Min. :58.0 Min. :115.0
## 1st Qu.:61.5 1st Qu.:124.5
## Median :65.0 Median :135.0
## Mean :65.0 Mean :136.7
## 3rd Qu.:68.5 3rd Qu.:148.0
## Max. :72.0 Max. :164.0
4.1 Mode
Mode, unfortunately, is less easily obtained. Should you need it, you will have to run the mode
function below to write it, then use mode()
on your column. Onve you have run the function one time in your R session, you will not have to run it again. However, if you switch machines, or close R and reopen it, you will have to re-run the function prior to use. A bit annoying….
#Run this first!
mode <- function(x) {
u <- unique(x)
tab <- tabulate(match(x, u))
u[tab == max(tab)]
}
#Then you can use it as a function
mode(df$height)
## [1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
You’ll notice that since there are no repeated numbers in the women dataset (what we have assigned to df), there are as many modes as there are entries. Going to the cars dataset, we get:
## [1] 8
4.2 Psych Package for Data Summary
There is also a package called psych that contains many useful functions, one of which is describe()
. Similar to summary()
, it provides a number of different values. However, it provides many more than summary: item name, item number, number of valid cases (n), mean, standard deviation, trimmed mean, median, median absolute deviation (mad), minimum, maximum, skew, kurtosis, and standard error.
## vars n mean sd median trimmed mad min max range skew kurtosis se
## height 1 15 65.00 4.47 65 65.00 5.93 58 72 14 0.00 -1.44 1.15
## weight 2 15 136.73 15.50 135 136.31 17.79 115 164 49 0.23 -1.34 4.00