# Machine Learning for Biostatistics

*Module 5*

*2022-08-01*

# Beyond Linearity

## Introduction

This module will cover methods to explore non-linear effects of numerical predictors on the outcome.

By the end of this module you should be able to:

- Identify approaches do model non-linear effects
- Implement linear and polynomial piecewise regression
- Understand the difference between polynomial splines, b-splines and natural splines
- Fit a GLM with different splines
- Use smoothing splines to approximate non-linear effects
- Integrate smoothing splines in modeling strategies using generalised additive models

## Dataset used in the examples

The dataset **triceps** is available in the `MultiKink`

package.
You may `install.packages("MultiKink")`

, load the library (`library(MultiKink)`

)
and then run `data("triceps")`

.

The data are derived from an anthropometric study of 892 females under 50 years in three Gambian villages in West Africa. There are 892 observations on the following 3 variables:

- age - Age of respondents.
- lntriceps - Log of the triceps skinfold thickness.
- triceps - Triceps skinfold thickness.

The data SA_heart.csv is retrospective sample of males in a heart-disease high-risk region of the Western Cape, South Africa. There are roughly two controls per case of CHD.

Many of the CHD positive men have undergone blood pressure reduction treatment and other programs to reduce their risk factors after their CHD event. In some cases the measurements were made after these treatments. These data are taken from a larger dataset, described in Rousseauw et al, 1983, South African Medical Journal.

The data contains 462 observations on the following 10 variables.

- sbp - systolic blood pressure
- tobacco - cumulative tobacco (kg)
- ldl - low density lipoprotein cholesterol
- adiposity - a numeric vector
- famhist - family history of heart disease, a factor with levels Absent Present
- typea - type-A behavior
- obesity - a numeric vector
- alcohol - current alcohol consumption
- age - age at onset
- chd- response, coronary heart disease (1 - chd, 0 - no chd)