Having different spreads in two data set does not prevent that they have the same mean (\(\mu\)).
11.2 Finding the Mean Absolute Deviation
To measure the dispersion of data you can’t simply sum the distances from the mean because the positive and negative differences cancel each other out. To take absolute values wouldn’t work either because we got higher values with more data. We need to normalize by dividing by the total number of observations.
The mean absolute deviation uses this procedure but instead to divide by the number of observation it divides by its reciprocal value \(\frac{1}{n}\).
Theorem 11.1 (Formula for the Mean Absolute Deviation (MAD)) \[MAD(x) = \frac{1}{n} \times \sum_{1}^{n}|x_{i} - \mu| \tag{11.1}\]
We call the result of this formula the mean absolute deviation (MAD). The MAD is a very useful and intuitive measure of how spread out your observations are. (98)
11.3 Finding the Variance
To square the differences to the mean (\(x_{i} - \mu\)) is another way to get only positive values. This measure of dispersion is called variance and has the advantage to produce an “exponential penalty, meaning measurements very far away from the mean are penalized much more.” (99)
Theorem 11.2 (Formula for the Variance (Var)) \[Var(x) = \frac{1}{n} \times \sum_{1}^{n}(x_{i} - \mu)^2 \tag{11.2}\]
The formula for the variance is exactly the same as MAD in Equation 11.1 except that the absolute value function in MAD has been replaced with squaring.
11.4 Finding the Standard Deviation
With the squared results in computing the variance we are loosing the intuitive meaning of the values. Therefore by taking the square root the standard deviation as another measure of dispersion that is easier to interpret than the variance.
Theorem 11.3 (Formula for the Standard Deviation (sigma, \(\sigma\))) \[\sigma = \sqrt{\frac{1}{n} \times \sum_{1}^{n}(x_{i} - \mu)^2} \tag{11.3}\]
The standard deviation is so useful and ubiquitous that, in most of the literature on probability and statistics, variance is defined simply as \(\sigma^2\), or sigma squared!
Warning
There is another difference between the simple variance and standard deviation. In the base R case both use as denominator \(n - 1\) and not just \(n\). The reason is – as Will Kurt explains in the solution manual page 242 – that both measure addresses the sample (variance and standard deviation) and not the population.
The difference is not important if you have a large data set, but with the toy data in this chapter the difference matters.
11.5 Wrapping Up
In this chapter we learned about three different methods to measure the spread of data:
Variance (var): It is mathematically easy to use and has the nice property of an exponential penalty.
Standard deviation (sigma, \(\sigma\)): This is the most used measure for dispersion as it is reasonable intuitive and mathematically easy to use.
11.6 Exercises
Try answering the following questions to see how well you understand these different methods of measuring the spread of data. The solutions can be found at https://nostarch.com/learnbayes/.
11.6.1 Exercise 11-1
One of the benefits of variance is that squaring the differences makes the penalties exponential. Give some examples of when this would be a useful property.
Solution:
Note
My solution hinted at outliers but this is not the much more practical intended solution. The penalty is not only useful whenever the distance to missing the intended value is important. Will Kurt uses the example of a teleporter missing its intended location by 3 feet, 3 miles of 30 miles.
11.6.2 Exercise 11-2
Calculate the mean, variance, and standard deviation for the following values: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
Solution:
Listing 11.1: Exercise 11-2: Calculate the mean, variance, and standard deviation for the following values: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
Listing 11.2: Exercise 11-2: Calculate the mean, variance, and standard deviation for the following values: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
var_fun(x3)
#> [1] 8.25
Listing 11.3: Exercise 11-2: Calculate the mean, variance, and standard deviation for the following values: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
sd_fun(x3)
#> [1] 2.872281
11.7 Experiments
11.7.1 Computing the MAD
There is a function stats::mad() in base R to compute the MAD, but it is in several aspects different to the calculation of Will Kurt:
First of all the abbreviation stands for Median Absolute Deviation., e.g. it computes by default the deviation from the median. This is the more robust central measure as outliers does not have so much impact as in the case of the mean calculation. But you could change the default parameter center = median(x) to center = mean(x).
The function includes a scale factor of 1.4826 to ensure consistency for \(X_{i}\), distributed as \(N(\mu, \sigma^2)\) and large \(n\). Again you could change this by setting the parameter constant = 1.
But most important the base R function divides by n and not by the reciprocal value of \(\frac{1}{n}\).
So the best way is to write our own R function corresponding to the calculation by Will Kurt:
Listing 11.4: Create a function to compute the mean absolute deviation (MAD) as used in the book
Listing 11.5: Create a function to compute the mean absolute deviation (MAD) as used in the book
mad_fun(x2)
#> [1] 0.416
This is the same result as in the book, page 98.
11.7.2 Computing the variance
The variance can be computed with the R base function stats::var(). But again there is the difference that this formula does not take the reciprocal value of the observed events but just divides by n.
Again we have to develop our own R function to get the same results as Will Kurt:
Listing 11.6: Create a function to compute the variance as used in the book
Listing 11.7: Create a function to compute the variance as used in the book
var_fun(x2)
#> [1] 0.26924
11.7.3 Comouting the Standard Deviation (sigma \(\sigma\))
The base R function for the standard deviation is stats::sd() But as in Equation 11.1 and Equation 11.2 we need to write our own function to get the same values as in the book because of the difference divided by \(n\) (base R) and divided by the reciprocal value \(\frac{1}{n}\) (book).
Listing 11.8: Create a function to compute the standard deviation as used in the book
---engine: knitr---# Measuring the Spread of our DataThe chapter explains three different methods to measure the spread ofdata:1. `r glossary("Mean absolute deviation")` (MAD)2. `r glossary("variance var", "Variance")`3. `r glossary("Standard deviation")` (sigma, $\sigma$)## Dropping Coins in a WellHaving different spreads in two data set does not prevent that they havethe same mean ($\mu$).## Finding the Mean Absolute DeviationTo measure the dispersion of data you can't simply sum the distancesfrom the mean because the positive and negative differences cancel eachother out. To take absolute values wouldn't work either because we gothigher values with more data. We need to normalize by dividing by thetotal number of observations.The mean absolute deviation uses this procedure but instead to divide bythe number of observation it divides by its reciprocal value$\frac{1}{n}$.------------------------------------------------------------------------::: {#thm-mad}#### Formula for the Mean Absolute Deviation (MAD)$$MAD(x) = \frac{1}{n} \times \sum_{1}^{n}|x_{i} - \mu|$$ {#eq-mad}> We call the result of this formula the> `r glossary("mean absolute deviation")` (MAD). The MAD is a very> useful and intuitive measure of how spread out your observations are.> (98):::------------------------------------------------------------------------## Finding the VarianceTo square the differences to the mean ($x_{i} - \mu$) is another way toget only positive values. This measure of dispersion is called`r glossary("variance var", "variance")` and has the advantage toproduce an "*exponential penalty*, meaning measurements very far awayfrom the mean are penalized much more." (99)------------------------------------------------------------------------::: {#thm-variance}#### Formula for the Variance (Var)$$Var(x) = \frac{1}{n} \times \sum_{1}^{n}(x_{i} - \mu)^2$$ {#eq-variance}The formula for the variance is exactly the same as MAD in @eq-madexcept that the absolute value function in MAD has been replaced withsquaring.:::------------------------------------------------------------------------## Finding the Standard DeviationWith the squared results in computing the variance we are loosing theintuitive meaning of the values. Therefore by taking the square root thestandard deviation as another measure of dispersion that is easier tointerpret than the variance.------------------------------------------------------------------------::: {#thm-sigma}#### Formula for the Standard Deviation (sigma, $\sigma$)$$\sigma = \sqrt{\frac{1}{n} \times \sum_{1}^{n}(x_{i} - \mu)^2}$$ {#eq-sigma}> The standard deviation is so useful and ubiquitous that, in most of> the literature on probability and statistics, variance is defined> simply as $\sigma^2$, or sigma squared!:::------------------------------------------------------------------------::: callout-warningThere is another difference between the simple variance and standarddeviation. In the base R case both use as denominator $n - 1$ and *not*just $n$. The reason is -- as Will Kurt explains in the solution manualpage 242 -- that both measure addresses the sample (variance andstandard deviation) and not the population.The difference is not important if you have a large data set, but withthe toy data in this chapter the difference matters.:::## Wrapping UpIn this chapter we learned about three different methods to measure thespread of data:1. `r glossary("Mean absolute deviation")` (MAD): It is the most intuitive measure.2. `r glossary("variance var", "Variance")` (var): It is mathematically easy to use and has the nice property of an exponential penalty.3. `r glossary("Standard deviation")` (sigma, $\sigma$): This is the most used measure for dispersion as it is reasonable intuitive and mathematically easy to use.## ExercisesTry answering the following questions to see how well you understandthese different methods of measuring the spread of data. The solutionscan be found at <https://nostarch.com/learnbayes/>.### Exercise 11-1One of the benefits of variance is that squaring the differences makesthe penalties exponential. Give some examples of when this would be auseful property.**Solution**:::: callout-noteMy solution hinted at outliers but this is not the much more practicalintended solution. The penalty is not only useful whenever the distanceto missing the intended value is important. Will Kurt uses the exampleof a teleporter missing its intended location by 3 feet, 3 miles of 30miles.:::### Exercise 11-2Calculate the mean, variance, and standard deviation for the followingvalues: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.**Solution**:```{r}#| label: exr-11-2#| attr-source: '#lst-exr-11-2 lst-cap="Exercise 11-2: Calculate the mean, variance, and standard deviation for the following values: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10."'var_fun <-function(x) {sum((x -mean(x))^2) *1/length(x)}sd_fun <-function(x) {sqrt(sum((x -mean(x))^2) *1/length(x))}x3 <-1:10mean(x3)var_fun(x3)sd_fun(x3)```## Experiments### Computing the MADThere is a function `stats::mad()` in base R to compute the MAD, but itis in several aspects different to the calculation of Will Kurt:1. First of all the abbreviation stands for Median Absolute Deviation., e.g. it computes by default the deviation from the median. This is the more robust central measure as outliers does not have so much impact as in the case of the mean calculation. But you could change the default parameter `center = median(x)` to `center = mean(x)`.2. The function includes a scale factor of 1.4826 to ensure consistency for $X_{i}$, distributed as $N(\mu, \sigma^2)$ and large $n$. Again you could change this by setting the parameter `constant = 1`.3. But most important the base R function divides by n and not by the reciprocal value of $\frac{1}{n}$.So the best way is to write our own R function corresponding to thecalculation by Will Kurt:```{r}#| label: compute-mad#| attr-source: '#lst-compute-mad lst-cap="Create a function to compute the mean absolute deviation (MAD) as used in the book"'mad_fun <-function(x) {sum(abs(x -mean(x))) *1/length(x)}x1 <-c(3.02, 2.95, 2.98, 3.08, 2.97)x2 <-c(3.31, 2.16, 3.02, 3.71, 2.80)mad_fun(x1)mad_fun(x2)```This is the same result as in the book, page 98.### Computing the varianceThe variance can be computed with the R base function `stats::var()`.But again there is the difference that this formula does not take thereciprocal value of the observed events but just divides by n.Again we have to develop our own R function to get the same results asWill Kurt:```{r}#| label: compute-variance#| attr-source: '#lst-compute-variance lst-cap="Create a function to compute the variance as used in the book"'var_fun <-function(x) {sum((x -mean(x))^2) *1/length(x)}var_fun(x1)var_fun(x2)```### Comouting the Standard Deviation (sigma $\sigma$)The base R function for the standard deviation is `stats::sd()` But asin @eq-mad and @eq-variance we need to write our own function to get thesame values as in the book because of the difference divided by $n$(base R) and divided by the reciprocal value $\frac{1}{n}$ (book).```{r}#| label: compute-sd#| attr-source: '#lst-compute-sd lst-cap="Create a function to compute the standard deviation as used in the book"'sd_fun <-function(x) {sqrt(sum((x -mean(x))^2) *1/length(x))}sd_fun(x1)sd_fun(x2)```