Chapter 10 Measures of central tendency
Measures of central tendency enable us to provide a summary (“global view”) of our data; and preliminary indications of the nature of the variable/s involved.
10.0.1 Mean
There are types of means; arithmetic, geometric and harmonic. For \(n\) data points \(x_{i}\), \(i=1:n\), the arithmetic mean is calculated as dividing the sum of \(x_{i}\) by \(n\). This can be expressed as: \[ \frac{\sum_{i=1}^{n} x_{i}}{n}. \]
Using our dataset, we can use the following R code to find the mean of chocolate chips before the new factory policy.
= thebiscuits[which(thebiscuits$timeframe=="2017 - before no eating policy"),]
before round(mean(before$chocolate_chips),0)
## [1] 10
10.0.2 Median
To obtain the median, the data needs to be ordered from the smallest value to the largest. The data point in the “middle” of the ordered points is the median. In the case where the number of data points is an even number, the median is obtained by taking the mean of the two middle points (after the data is ordered).
Using our dataset, we can use the following R code to find the median of chocolate chips before the new factory policy.
= thebiscuits[which(thebiscuits$timeframe=="2017 - before no eating policy"),]
before median(before$chocolate_chips)
## [1] 10
10.0.3 Mode
The mode is the data point which is occurs most frequently in the dataset.
Using our dataset, we can use the following R code to find the most frequently observed count of chocolate chips before the new factory policy. The frequency table in the output shows which count category was most frequently observed.
= thebiscuits[which(thebiscuits$timeframe=="2017 - before no eating policy"),]
before = data.frame(summary(as.factor(before$chocolate_chips)))
thecounter colnames(thecounter)="Frequency"
$Category=rownames(thecounter)
thecounterrownames(thecounter)=NULL
kable(thecounter,
caption="Table 1: Frequency table for chocolate chip count categories ")%>%
kable_styling("striped", full_width = F)
Frequency | Category |
---|---|
1 | 4 |
3 | 5 |
6 | 6 |
14 | 7 |
42 | 8 |
50 | 9 |
59 | 10 |
41 | 11 |
42 | 12 |
26 | 13 |
11 | 14 |
2 | 15 |
3 | 16 |
In this example, the mean,median and mode were identical (the raw data was rounded for these calculations). In lecture two we will discuss the significance of this observation.