Exercise solutions

Chapter 2

Exercise 2.2

Exercise 2.3

Using the tab object from previous solution, study and practice the following R code to recreate Table 2.29.

Exercise 2.4

Study and execute the following R code:

Exercise 2.5

Using the tab2 object from previous solution, study and practice the following R code to recreate Table 2.30. Note that the column distributions could also have been used.

Interpretation: The risk of death among non-smokers is higher than the risk of death among smokers, suggesting that there may be some confounding.

Exercise 2.6

Interpretation: The risk of death is not larger in non-smokers, in fact it is larger among smokers in older age groups.

Chapter 3

Exercise 3.1

Approach 1: do the arithmetic using R as a calculator with number counts from Table 3.14.

Approach 2: do the arithmetic using R objects.

Approach 3: do the arithmetic using apply and sweep functions. Study and try this solution.

Approach 4: do the arithmetic using prop.table functions. Study and try this solution.

Exercise 3.2

  1. Evaluate the structure of data frame.
  1. The Age variable levels are not ordered correctly. This is because R, by default, converts character vectors into factors with the levels in “alphabetical” order. One option is to set as.is=TRUE when using read.csv, and set the factor levels after reading in the data.
  1. Create 3-dimensional arrays using both the table and xtabs function, with and without attaching the data frame.
#> , ,  = Female
#> 
#>        
#>         <=14  >55 15-19 20-24 25-29 30-34 35-44 45-54
#>   Black  165   92  2257  4503  3590  2628  1505   392
#>   Other   11   15   158   307   283   167   149    40
#>   White   14   24   253   475   433   316   243    55
#> 
#> , ,  = Male
#> 
#>        
#>         <=14  >55 15-19 20-24 25-29 30-34 35-44 45-54
#>   Black   31  823  1412  4059  4121  4453  3858  1619
#>   Other    7  108   210   654   633   520   492   202
#>   White    2  216    88   407   550   564   654   323
#> , , Sex = Female
#> 
#>        Age
#> Race    <=14  >55 15-19 20-24 25-29 30-34 35-44 45-54
#>   Black  165   92  2257  4503  3590  2628  1505   392
#>   Other   11   15   158   307   283   167   149    40
#>   White   14   24   253   475   433   316   243    55
#> 
#> , , Sex = Male
#> 
#>        Age
#> Race    <=14  >55 15-19 20-24 25-29 30-34 35-44 45-54
#>   Black   31  823  1412  4059  4121  4453  3858  1619
#>   Other    7  108   210   654   633   520   492   202
#>   White    2  216    88   407   550   564   654   323
#> , , Sex = Female
#> 
#>        Age
#> Race    <=14  >55 15-19 20-24 25-29 30-34 35-44 45-54
#>   Black  165   92  2257  4503  3590  2628  1505   392
#>   Other   11   15   158   307   283   167   149    40
#>   White   14   24   253   475   433   316   243    55
#> 
#> , , Sex = Male
#> 
#>        Age
#> Race    <=14  >55 15-19 20-24 25-29 30-34 35-44 45-54
#>   Black   31  823  1412  4059  4121  4453  3858  1619
#>   Other    7  108   210   654   633   520   492   202
#>   White    2  216    88   407   550   564   654   323
#> , , Sex = Female
#> 
#>        Age
#> Race    <=14  >55 15-19 20-24 25-29 30-34 35-44 45-54
#>   Black  165   92  2257  4503  3590  2628  1505   392
#>   Other   11   15   158   307   283   167   149    40
#>   White   14   24   253   475   433   316   243    55
#> 
#> , , Sex = Male
#> 
#>        Age
#> Race    <=14  >55 15-19 20-24 25-29 30-34 35-44 45-54
#>   Black   31  823  1412  4059  4121  4453  3858  1619
#>   Other    7  108   210   654   633   520   492   202
#>   White    2  216    88   407   550   564   654   323

Exercise 3.3

Here we use the apply function to get marginal totals for the syphilis 3-dimensional array.

Exercise 3.4

For this syphilis data example, we’ll choose one 3-D array.

Exercise 3.5

Exercise 3.6

Use the rep function on the data frame rows to recreate the individual-level data frame with over 40,000 observations.

It’s a good idea to understand how the rep function works with two vectors:

#>  [1] 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3

We can see that the second vector determines the frequency of the first vector elements. Now use this understanding with the syphilis data.

Exercise 3.7

Chapter 5

Exercise 5.1

Exercise 5.2

Exercise 5.3

Exercise 5.4

Exercise 5.5

Exercise 5.6

Exercise 5.7

Exercise 5.8

Exercise 5.9