Chapter 5 Variance, Standard Deviation, and Range
- Range
- Variance
- Standard Deviation
- Z score
5.1 Measures of Spread
In the previous set of notes, we talk about measures of central tendency: mean, median, and mode.
In these notes, we will talk about how spread out the data are.
The Range of the data is the simplest to calculate.
We will use a dataset on extramarital affairs (yes, it is a real economics paper found in the Journal of Political Economy)
Fair, Ray C. “A theory of extramarital affairs.” Journal of political economy 86.1 (1978): 45-61.
suppressWarnings(suppressPackageStartupMessages(library(AER)))
data("Affairs")
range(Affairs$affairs)
## [1] 0 12
5.2 Population Variance
The variance of a random variable is the average squared deviation from the mean.
The formula for the population variance is \[\sigma^{2}=\frac{1}{N}\sum (X_{i}-\mu)^{2}\] Let’s take this formula into parts so that we understand it.
- Deviation from the mean \((X_{i}-\mu)\)
- Squared Deviation from the mean \((X_{i}-\mu)^{2}\)
- Average squared deviation from the mean. If we say that \(u_{i}=(X_{i}-\mu)^{2}\) then the formula looks a lot like a mean. \[\sigma^{2}=\frac{1}{n}\sum u_{i}\]
5.3 Sample Variance
The sample variance formula is \[s^{2}=\frac{1}{n-1}\sum (X_{i}-\bar{x})^{2}\]
There are two differences between the population variance and the sample variance.
- The population variance uses the population mean, \(\mu\), but the sample variance uses the sample mean, \(\bar{x}\).
- The population variance divides by the full size of the population N. The sample variance divides by the sample size n - 1.
In the appendix of the text (and on blackboard), we show that \[E[s^{2}]=\sigma^2\]
5.4 Standard Deviation (sd)
The standard deviation is simply the square root of the variance. Because the variance represents square terms, it is hard to compare with the mean.
The standard deviation converts the squared terms into univariate terms so that it matches the same units as the mean.
- population standard deviation \(\sigma = \sqrt{\sigma^2}\)
- sample standard deviation \(s=\sqrt{s}\)
5.5 Z score
Normal distributions have a few nice properties. - The distribution is symmetric with the mean, median, and mode at the center. - Approximately 98 percent of the data are found within two standard deviations of the mean. - A standard normal distribution has a mean of zero and a standard deviation of 1. - All normally distributed random variables can be changed into a standard normal using the Z-score.
The formula for the Z score is \[Z = \frac{X-\mu}{\sigma}\]
5.6 Properties of a Z score
- The Z score a unit of measurement.
- It tells us how far away an observation X is from the mean.
- The distance is measured in standard deviations.
- Remember that 98 percent of the data is found within 2 standard deviations of the mean for a normal distribution.
- An outlier is an observation with a Z score greater than 3.
5.7 The mechanics of a Z score
The Z score first subtracts the mean from every observation. By default, the new mean must be zero.
Next, it divides every observation by the standard deviation. This forces the new standard deviation to be equal to 1.
So if X had a mean of 10 and a standard deviation of 2. 1. subtract 10 from all of the observations. Our new mean must be zero now. 2. Divide all of the observation by 2. The new SD is equal to 2.