Resampling methods

Introduction

This module will cover bootstrap and cross-validation. These are two important techniques that are useful to study sample variability, evaluate model performance and choosing tuning parameters in many of the methods covered in this unit.

We will switch the order presented in the book Introduction to Statistical Learning and start with bootstrap and then proceed to cross-validation.

By the end of this module you should be able to:

Be able to compute standard errors for different statistics through bootstrapping
Compute model performance statistics by cross-validation
Use cross-validation to select tuning parameters such as the number of neighbours in KNN

Dataset used in the examples

The file bmd.csv contains 169 records of bone densitometries (measurement of bone mineral density). The following variables were collected:

id – patient’s number
age – patient’s age
fracture – hip fracture (fracture / no fracture)
weight_kg – weight measured in Kg
height_cm – height measure in cm
waiting_time – time the patient had to wait for the densitometry (in minutes)
bmd – bone mineral density measure in the hip

The file SBI.csv contains the records of 2349 children admitted to the emergency room with fever and tested for serious bacterial infection (sbi). The following variables were collected:

id – patient’s number
fever_hours – duration of the fever in hours
age – child’s age
sex – child’s sex (M / F)
wcc – white cell count
prevAB – previous antibiotics (Yes / No)
sbi – serious bacterial infection (Not Applicable / UTI / Pneum / Bact)
pct – procalcitonin
crp – c-reactive protein

Slides

You can download the slides used in the videos for resampling methods:

Slides

Machine Learning for Biostatistics

Machine Learning for Biostatistics

Module 3

Resampling methods

Introduction

Dataset used in the examples

Slides