Chapter 4 Understanding and Comparing Distributions

4.1 Displays for Comparing Groups

Let’s compare two popular tourist locations, in terms of temperature rather than rainfall. Santorini, Greece has the reputation of being warm in the summer (but cool in the winter, when there is little tourism), while Honolulu is noted for consistently warm temperatures throughout the year.

https://en.wikipedia.org/wiki/Santorini

https://en.wikipedia.org/wiki/Honolulu

Average High Temperatures (degrees Fahrenheit)

Month Santorini Honolulu
January 57 80.1
February 57 80.2
March 61 81.2
April 64 82.7
May 73 84.6
June 81 87.0
July 84 87.9
August 84 88.7
September 79 88.6
October 73 86.7
November 66 83.9
December 59 81.2

Here’s a time series plot of the data. Other graphical approaches could include bar charts or boxplots. Be careful with the scales of the axes!

##      Cities  min   Q1 median    Q3  max     mean        sd  n missing
## 1  Honolulu 80.1 81.2  84.25 87.45 88.7 84.40000  3.305918 12       0
## 2 Santorini 57.0 60.0  69.50 80.00 84.0 69.83333 10.460604 12       0

A meaningful stemplot for Honolulu would probably use the tenths place, rather than the ones place, as the leaf. 80.1 would appear as \(80 | 1\) rather than \(8 | 0\). The \(80\) row would be for all temperatures from 80.0 to 80.9 degrees Fahrenheit.

4.2 Add/Subtract a constant

Let’s take a small sample of size \(n=5\) that are the ages of people.

\[36 \: \: 34 \: \: 31 \: \: 30 \: \: 22\]

Suppose I have computed the sample mean and standard devitation and got \[\bar{X} = \frac{\sum X}{n} = \frac{153}{5} = 30.6\]

So the standard deviation is:

\(X\) \((X-\bar{X})\) \((X-\bar{X})^2\)
36 5.4 29.16
34 3.4 11.56
31 0.4 0.16
30 -0.6 0.36
22 -8.6 73.96

Thus \[s^2 = \frac{\sum(X-\bar{X})^2}{n-1} = \frac{115.2}{4}=28.8\] \[s = \sqrt{s^2}= \sqrt{28.8} = 5.37\]

Now let’s look at these people 10 years from today. Obviously they will all be 10 years older. Mathematically, if \(X\) is your age today, we will let \(Y=X+10\). What do you think \(\bar{y}\) will be? How about \(s_Y\)?

4.3 Multiply/Divide by a constant

Let \(X\) represent the amount of money a person has in their wallet right now, measured in U.S. dollars. The exhange rate to Mexican pesos is that $1 US is approximatley $20 Mexican pesos (Mexico also uses the dollar sign for their currency). Go to https://www.xe.com/ for more precise and up-to-date conversions.

Amount for \(n=5\) people, in U.S. dollars

\[ 20 \: \: 42 \: \: 8 \: \: 65 \: \: 15\]

For this small sample \(\bar{X}=30\) and \(s_X=23.3345\).

Let \(Y=20X\) be the amount of Mexican pesos you get. For example, the first person with a $20 bill gets \(Y=20 \times 5=100\) pesos.

What do you think \(\bar{Y}\) will be? What about \(s_Y\)?

It was probably not to hard to guess (and be correct in your guess) that \(\bar{Y}=20\bar{X}=20 \times 30 = 600\). Let’s see what \(s_Y\) will be.

\(X\) \(Y\) \((Y-\bar{Y})\) \((Y-\bar{Y})^2\)
20 400 -200 40000
42 840 240 57600
8 160 -440 193600
65 1300 700 490000
15 300 -300 90000

Thus \[s^2_Y = \frac{\sum(Y-\bar{Y})^2}{n-1} = \frac{871200}{4}=217800\] \[s_Y = \sqrt{s^2_Y}= \sqrt{217800} = 466.69\]

Notice that \(20 s_X = 20 \times 23.3345 = 466.69\)$ Both the mean and standard deviation of our money was multiplied by 20.

4.4 Linear Transformation

Let’s get crazy and do both add a constant and multiply a constant at the same time. For example, we can convert degrees Celsius to degress Fahrenheit with the formula \[F = 1.8 C + 32\]

Here’s the predicted low temperature in 6 world cities on January 22, in degrees Celsius.

http://www.weatherbug.com

City Temperature
Toronto, Canada -4
Seoul, South Korea 0
Amsterdam, Netherlands 5
Lagos, Nigeria 26
Murray, Kentucky 1
Melbourne, Australia 13

The mean and standard deviation of this sample of size \(n=6\) is \(\bar{C}=41/6=6.83\) and \(s_C=11.02\) degrees Celsius.

If you are American, you want this in degrees Fahrenheit, so let’s convert.

City Temperature
Toronto, Canada 1.8(-4)+32=24.8
Seoul, South Korea 1.8(0)+32=32
Amsterdam, Netherlands 1.8(5)+32=41
Lagos, Nigeria 1.8(26)+32=78.8
Murray, Kentucky 1.8(1)+32=33.8
Melbourne, Australia 1.8(13)+32=55.4

What will the mean be in degrees Fahrenheit? The mean is changed both by the multiplication of a constant (there are 1.8 Celsius degrees per Fahrenheit degree) and the addition of a constant (we add 32 to account for the fact that water freezes at 0 C and 32 F). So the mean in terms of degrees F will be: \[\bar{F} = 1.8 \bar{C} + 32 = 1.8(41/6)+32=44.3\]

However, the standard deviation is ONLY affected by multiplication/division of a constant and NOT by addition/subtraction. So \[s_F = 1.8 s_C = 1.8(11.02) = 19.84\]

In general, when we do the linear transformation \[Y = cX + d\] then the mean will be \[\bar{Y}=c\bar{X}+d\] and the standard deviation will be \[s_Y = c s_X\]

For Celsius (\(X\)) to Fahrenheit (\(Y\)), the constants \(c\) and \(d\) were \(c=1.8\) and \(d=32\)

4.5 Time Series

Below is a graph of the Fall enrollment at Murray State from 1923 to 2019. Can you think of some of the likely reasons for periods of time with a drop in enrollment or a steep increase in enrollment?

https://www.murraystate.edu/headermenu/administration/PresidentsOffice/institutional-effectiveness/OfficeOfInstitutionalResearch/Enrollment_history_1923-present_Apr_2019.pdf