Chapter 3 Data manipulation & Inference I
1.
Download the file testdata.xlsx into your working directory and read the tables in Sheet 1 and Sheet 2 into R, storing them under the names d1
and d2
. Use the read.xslx()
function from the package openxlsx
to do so. (Check out the argument sheet
for reading in different sheets of an Excel file and find help in Chapter 2.3.2)
2.
Load the package tidyverse
. If you haven’t installed it before, start by using install.packages()
before library()
. (Find help in Chapter 2.2)
3.
Create a new data frame d
that contains only cases which appear in both data frames by joining d1
and d2
by the variable id
, using one of the functions from the xxx_join()
family of the tidyverse
package described in Chapter 3.1.3.
4.
Create a new data frame youngPatients
containing only patients younger than 50 years. (Find help in Chapter 3.1.2)
5.
Create a new data frame youngPatientsHospitals
that only contains the variables id
and hospital
from youngPatients
. (Find help in Chapter 3.1.2)
6.
Create a new variable weightLbs
in d
that contains the weight in pounds. To get from the kg in weight
to pounds, multiply by 2.205. (Find help in Chapter 3.1.4)
7.
Download the csv-file Melanoma.csv to your computer. Read it into R using read_csv()
from the tidyverse package and assign it the name melanoma
. All following exercises in this chapter refer to that data set. (Find help in Chapter 3.1.1)
8.
Create a new data frame melanomaSummaries
containing the median age and the mean thickness for every value of the variable status
(i.e. 1, 2 and 3). (Find help in Chapter 3.1.5)
9.
Plot a ROC curve that shows the diagnostic value of the variable thickness
for the diagnosis of ulcer
. (Find help in Chapter 3.2)
10. Choose and apply an appropriate test to check if the age differs significantly between male and female patients. (Find help in Chapter 3.3)
11. Choose and apply an appropriate test to check if the thickness differs significantly between men and women. (Find help in Chapter 3.3)
12.
Choose and apply an appropriate test to check if the distribution of ulcer
differs significantly between men and women. (Find help in Chapter 3.3)