Chapter 5 Variance, Standard Deviation, and Range

  1. Range
  2. Variance
  3. Standard Deviation
  4. Z score

5.1 Measures of Spread

In the previous set of notes, we talk about measures of central tendency: mean, median, and mode.

In these notes, we will talk about how spread out the data are.

The Range of the data is the simplest to calculate.

[Range = Maximum Value - Minimum Value]

We will use a dataset on extramarital affairs (yes, it is a real economics paper found in the Journal of Political Economy)

Fair, Ray C. “A theory of extramarital affairs.” Journal of political economy 86.1 (1978): 45-61.

suppressWarnings(suppressPackageStartupMessages(library(AER)))
data("Affairs")
range(Affairs$affairs)
## [1]  0 12

5.2 Population Variance

The variance of a random variable is the average squared deviation from the mean.

The formula for the population variance is \[\sigma^{2}=\frac{1}{N}\sum (X_{i}-\mu)^{2}\] Let’s take this formula into parts so that we understand it.

  • Deviation from the mean \((X_{i}-\mu)\)
  • Squared Deviation from the mean \((X_{i}-\mu)^{2}\)
  • Average squared deviation from the mean. If we say that \(u_{i}=(X_{i}-\mu)^{2}\) then the formula looks a lot like a mean. \[\sigma^{2}=\frac{1}{n}\sum u_{i}\]

5.3 Sample Variance

The sample variance formula is \[s^{2}=\frac{1}{n-1}\sum (X_{i}-\bar{x})^{2}\]

There are two differences between the population variance and the sample variance.

  1. The population variance uses the population mean, \(\mu\), but the sample variance uses the sample mean, \(\bar{x}\).
  2. The population variance divides by the full size of the population N. The sample variance divides by the sample size n - 1.

In the appendix of the text (and on blackboard), we show that \[E[s^{2}]=\sigma^2\]

5.4 Standard Deviation (sd)

The standard deviation is simply the square root of the variance. Because the variance represents square terms, it is hard to compare with the mean.

The standard deviation converts the squared terms into univariate terms so that it matches the same units as the mean.

  • population standard deviation \(\sigma = \sqrt{\sigma^2}\)
  • sample standard deviation \(s=\sqrt{s}\)

5.5 Z score

Normal distributions have a few nice properties. - The distribution is symmetric with the mean, median, and mode at the center. - Approximately 98 percent of the data are found within two standard deviations of the mean. - A standard normal distribution has a mean of zero and a standard deviation of 1. - All normally distributed random variables can be changed into a standard normal using the Z-score.

The formula for the Z score is \[Z = \frac{X-\mu}{\sigma}\]

5.6 Properties of a Z score

  • The Z score a unit of measurement.
  • It tells us how far away an observation X is from the mean.
  • The distance is measured in standard deviations.
  • Remember that 98 percent of the data is found within 2 standard deviations of the mean for a normal distribution.
  • An outlier is an observation with a Z score greater than 3.

5.7 The mechanics of a Z score

The Z score first subtracts the mean from every observation. By default, the new mean must be zero.

Next, it divides every observation by the standard deviation. This forces the new standard deviation to be equal to 1.

So if X had a mean of 10 and a standard deviation of 2. 1. subtract 10 from all of the observations. Our new mean must be zero now. 2. Divide all of the observation by 2. The new SD is equal to 2.

5.8 Z scores in action with R

x <- rnorm(1000, mean = 10, sd = 2) # normal distribution with mean 10 and std 2
mean(x) #Mean of X
## [1] 10.03013
sd(x) #Standard Deviation of X
## [1] 2.009967
y <- (x - mean(x))/sd(x) #Z score of X
mean(y) #Mean of Y
## [1] -2.870272e-16
sd(y) #Standard Deviation of Y
## [1] 1

5.9 Z scores and probability

  • Pr(Z < 0)=0.5
  • Pr(-1 < Z < 1)=0.6826895
  • Pr(-2 < Z < 2)=0.9544997
  • Pr(-3 < Z < 3)=0.9973002

Use the images below to verify for yourself the probabilities.

A link to Z tables