Chapter 10 Stem and Leaf Plot

Let us look at a dataset built into R called rivers. To see a description of this dataset, type ?rivers. The description will appear on the 4th panel under the Help tab.

To view the whole dataset, use the command View(rivers). A column of observations will appear on the Source panel, under the tab called rivers. You should see 1 column with 141 entries.

Let us look at the first 6 lines of rivers.

## [1] 735 320 325 392 524 450

This dataset happens to be a vector since it has only 1 column of entries. The output for head(rivers) is given as a row of entries.

10.1 Making a Stem and Leaf Plot

To do a stemplot, we use the function stem(quantitative_variable)

## 
##   The decimal point is 2 digit(s) to the right of the |
## 
##    0 | 4
##    2 | 011223334555566667778888899900001111223333344455555666688888999
##    4 | 111222333445566779001233344567
##    6 | 000112233578012234468
##    8 | 045790018
##   10 | 04507
##   12 | 1471
##   14 | 56
##   16 | 7
##   18 | 9
##   20 | 
##   22 | 25
##   24 | 3
##   26 | 
##   28 | 
##   30 | 
##   32 | 
##   34 | 
##   36 | 1

Notice that the stem part is automatically incremented by 2. R figures out how best to increment the stem part unless you specify otherwise.

Be sure to read where R places the decimal point for the output. For this result, the decimal is placed 2 digits after the vertical bar. In other words, the decimal point is 1 digit after the leaf. Notice that the leaf is a single digit. That means, you need to add a 0 after each leaf. For example, the first entry has a stem of 0 and leaf of 4. That means that the shortest river is 40 miles. The next river has a stem of 2 and a leaf of 0. That means, it is 200 miles long. The third entry has a stem of 2 and a leaf of 1. That means, this river is 210 miles long.

Note: The shortest river is actually 135 miles and not 40 miles. Because the stems are incremented by 2, it is hard to know whether the stem for the shortest river is 0 or 1. In this case, it should have been one. With rounding, the shortest river should read 140 miles and not 40 miles.

10.2 Rescaling the Stemplot

To rescale the stemplot, change the “scale” argument of the function, stem( ). The default scale is 1. Therefore, a scale greater than 1 will increase the length of the stems.

## 
##   The decimal point is 2 digit(s) to the right of the |
## 
##    1 | 4
##    2 | 0112233345555666677788888999
##    3 | 00001111223333344455555666688888999
##    4 | 111222333445566779
##    5 | 001233344567
##    6 | 000112233578
##    7 | 012234468
##    8 | 04579
##    9 | 0018
##   10 | 045
##   11 | 07
##   12 | 147
##   13 | 1
##   14 | 56
##   15 | 
##   16 | 
##   17 | 7
##   18 | 9
##   19 | 
##   20 | 
##   21 | 
##   22 | 
##   23 | 25
##   24 | 
##   25 | 3
##   26 | 
##   27 | 
##   28 | 
##   29 | 
##   30 | 
##   31 | 
##   32 | 
##   33 | 
##   34 | 
##   35 | 
##   36 | 
##   37 | 1

Notice that the decimal is 2 digits to the right of the vertical bar or 1 decimal place after leaf. Therefore, the shortest river, with a stem of 1 and leaf of 4, is 140 miles long. The longest river, with a stem of 37 and leaf of 1, is 3710 miles long.

A scale between 0 and 1 will shorten the length of the stems.

## 
##   The decimal point is 3 digit(s) to the right of the |
## 
##   0 | 12222222222333333333333333333333333333333333333444444444444444444444
##   0 | 55555555555555556666666666677777777778888999999
##   1 | 0001122233
##   1 | 5589
##   2 | 33
##   2 | 5
##   3 | 
##   3 | 7

Notice that the leaf part is split from 0 to 4 and 5 to 9.

Note where the decimal point is placed. It is now 3 digits to the right of the vertical bar or 2 digits to the right of the leaf. That means, you have to add two 0s after each leaf. Therefore, fthe shortest river, with a stem of 0 and leaf of 1, is 100 miles long. The longest river, with a stem of 3 and leaf of 7, is 3700 miles long.

From each of the plots above, we see that no matter how we rescale, the length distribution is always skewed to the right with possible outliers.