A.6 BioLINCC teaching datasets

Background

BioLINCC is the National Heart, Lung, and Blood Institute (NHLBI) Biologic Specimen and Data Repository (National Heart, Lung and Blood Institute 2022). They have made available the following teaching datasets: the Childhood Asthma Management Program (CAMP) dataset, the Digitalis Investigation Group (DIG) dataset, and the Framingham Heart Study dataset.

Documentation

These teaching datasets, as provided by BioLINCC, are not appropriate for publication purposes. Each has been rendered anonymous through the application of certain statistical processes such as permutations and/or random visit selection.

Teaching Datasets

Any analyses, interpretations, or conclusions reached herein are only for the purpose of illustrating regression methods and are credited to the author, not to BioLINCC. The author makes no claim or implication that any inferences derived from these teaching datasets are valid.

For the versions of these teaching datasets used in this text, the data were not changed but some variables were converted to factors, and only a subset of variables were retained. For the CAMP dataset, only data for the main study and 48-month follow-up were retained. For the Framingham study, the dataset fram_time_invar_rmph.rData contains only time-invariant predictors and excludes individuals with prevalent coronary heart disease, angina pectoris, myocardial infarction, or stroke at baseline. Additionally, variables describing hypertension were set to missing for those with prevalent hypertension at baseline. Additional datasets containing both time-invariant and time-varying information were created for a set of outcomes.

Creating the Teaching Datasets

Request the DIG, CAMP, and Framingham datasets from the Request a teaching dataset link at the NHLBI Teaching Datasets site.
After your request has been granted, download the datasets DIG.csv, camp_teach.csv, and frmgham2.csv from the links provided by NHLBI.
Download the R script files DIG Process.R, CAMP Process.R, and Framingham Process.R from RMPH Resources.
Run the R scripts DIG Process.R, CAMP Process.R, and Framingham Process.R to process the raw data and create the following teaching datasets:
- dig_rmph.rData
- camp_0_48_rmph.rData
- fram_time_invar_rmph.rData
- fram_tv_angina_rmph.rData, fram_tv_mi_rmph.rData, fram_tv_mi_fchd_rmph.rData, fram_tv_anychd_rmph.rData, fram_tv_stroke_rmph.rData, fram_tv_cvd_rmph.rData, fram_tv_hyperten_rmph.rData, and fram_tv_death_rmph.rData

Rows and Columns

These files have the following numbers of rows and columns:

load("Data/dig_rmph.rData")
dim(dig)

## [1] 6800   71

load("Data/camp_0_48_rmph.rData")
dim(camp_0_48)

## [1] 629  15

load("Data/fram_time_invar_rmph.rData")
dim(fram_time_invar)

## [1] 4215   20

References

National Heart, Lung and Blood Institute. 2022. “About BioLINCC.” biolincc.nhlbi.nih.gov/about.