Chapter 10 Measures of central tendency

Measures of central tendency enable us to provide a summary (“global view”) of our data; and preliminary indications of the nature of the variable/s involved.

10.0.1 Mean

There are types of means; arithmetic, geometric and harmonic. For \(n\) data points \(x_{i}\), \(i=1:n\), the arithmetic mean is calculated as dividing the sum of \(x_{i}\) by \(n\). This can be expressed as: \[ \frac{\sum_{i=1}^{n} x_{i}}{n}. \]

Using our dataset, we can use the following R code to find the mean of chocolate chips before the new factory policy.

before = thebiscuits[which(thebiscuits$timeframe=="2017 - before no eating policy"),]
round(mean(before$chocolate_chips),0)
## [1] 10

10.0.2 Median

To obtain the median, the data needs to be ordered from the smallest value to the largest. The data point in the “middle” of the ordered points is the median. In the case where the number of data points is an even number, the median is obtained by taking the mean of the two middle points (after the data is ordered).

Using our dataset, we can use the following R code to find the median of chocolate chips before the new factory policy.

before = thebiscuits[which(thebiscuits$timeframe=="2017 - before no eating policy"),]
median(before$chocolate_chips)
## [1] 10

10.0.3 Mode

The mode is the data point which is occurs most frequently in the dataset.

Using our dataset, we can use the following R code to find the most frequently observed count of chocolate chips before the new factory policy. The frequency table in the output shows which count category was most frequently observed.

before = thebiscuits[which(thebiscuits$timeframe=="2017 - before no eating policy"),]
thecounter = data.frame(summary(as.factor(before$chocolate_chips)))
colnames(thecounter)="Frequency"
thecounter$Category=rownames(thecounter)
rownames(thecounter)=NULL
kable(thecounter,
caption="Table 1: Frequency table for chocolate chip count categories ")%>%
 kable_styling("striped", full_width = F)
Table 10.1: Table 1: Frequency table for chocolate chip count categories
Frequency Category
1 4
3 5
6 6
14 7
42 8
50 9
59 10
41 11
42 12
26 13
11 14
2 15
3 16

In this example, the mean,median and mode were identical (the raw data was rounded for these calculations). In lecture two we will discuss the significance of this observation.