Chapter 3 Data manipulation & Inference I

1. Download the file testdata.xlsx into your working directory and read the tables in Sheet 1 and Sheet 2 into R, storing them under the names d1 and d2. Use the read.xslx() function from the package openxlsx to do so. (Check out the argument sheet for reading in different sheets of an Excel file and find help in Chapter 2.3.2)

 

2. Load the package tidyverse. If you haven’t installed it before, start by using install.packages() before library(). (Find help in Chapter 2.2)

 

3. Create a new data frame d that contains only cases which appear in both data frames by joining d1 and d2 by the variable id, using one of the functions from the xxx_join() family of the tidyverse package described in Chapter 3.1.3.

 

4. Create a new data frame youngPatients containing only patients younger than 50 years. (Find help in Chapter 3.1.2)

 

5. Create a new data frame youngPatientsHospitals that only contains the variables id and hospital from youngPatients. (Find help in Chapter 3.1.2)

 

6. Create a new variable weightLbs in d that contains the weight in pounds. To get from the kg in weight to pounds, multiply by 2.205. (Find help in Chapter 3.1.4)

 

7. Download the csv-file Melanoma.csv to your computer. Read it into R using read_csv() from the tidyverse package and assign it the name melanoma. All following exercises in this chapter refer to that data set. (Find help in Chapter 3.1.1)

 

8. Create a new data frame melanomaSummaries containing the median age and the mean thickness for every value of the variable status (i.e. 1, 2 and 3). (Find help in Chapter 3.1.5)

 

9. Plot a ROC curve that shows the diagnostic value of the variable thickness for the diagnosis of ulcer. (Find help in Chapter 3.2)

 

10. Choose and apply an appropriate test to check if the age differs significantly between male and female patients. (Find help in Chapter 3.3)

 

11. Choose and apply an appropriate test to check if the thickness differs significantly between men and women. (Find help in Chapter 3.3)

 

12. Choose and apply an appropriate test to check if the distribution of ulcer differs significantly between men and women. (Find help in Chapter 3.3)