A.6 BioLINCC teaching datasets
Background
BioLINCC is the National Heart, Lung, and Blood Institute (NHLBI) Biologic Specimen and Data Repository (National Heart, Lung and Blood Institute 2022). They have made available the following teaching datasets: the Childhood Asthma Management Program (CAMP) dataset, the Digitalis Investigation Group (DIG) dataset, and the Framingham Heart Study dataset.
Documentation
- Digitalis Investigation Group (DIG)
- Childhood Asthma Management Program (CAMP)
- Framingham Heart Study
These teaching datasets, as provided by BioLINCC, are not appropriate for publication purposes. Each has been rendered anonymous through the application of certain statistical processes such as permutations and/or random visit selection.
Teaching Datasets
Any analyses, interpretations, or conclusions reached herein are only for the purpose of illustrating regression methods and are credited to the author, not to BioLINCC. The author makes no claim or implication that any inferences derived from these teaching datasets are valid.
For the versions of these teaching datasets used in this text, the data were not changed but some variables were converted to factors, and only a subset of variables were retained. For the CAMP dataset, only data for the main study and 48-month follow-up were retained. For the Framingham study, the dataset fram_time_invar_rmph.rData
contains only time-invariant predictors and excludes individuals with prevalent coronary heart disease, angina pectoris, myocardial infarction, or stroke at baseline. Additionally, variables describing hypertension were set to missing for those with prevalent hypertension at baseline. Additional datasets containing both time-invariant and time-varying information were created for a set of outcomes.
Creating the Teaching Datasets
- Request the DIG, CAMP, and Framingham datasets from the Request a teaching dataset link at the NHLBI Teaching Datasets site.
- After your request has been granted, download the datasets
DIG.csv
,camp_teach.csv
, andfrmgham2.csv
from the links provided by NHLBI. - Download the R script files
DIG Process.R
,CAMP Process.R
, andFramingham Process.R
from RMPH Resources. - Run the R scripts
DIG Process.R
,CAMP Process.R
, andFramingham Process.R
to process the raw data and create the following teaching datasets:dig_rmph.rData
camp_0_48_rmph.rData
fram_time_invar_rmph.rData
fram_tv_angina_rmph.rData
,fram_tv_mi_rmph.rData
,fram_tv_mi_fchd_rmph.rData
,fram_tv_anychd_rmph.rData
,fram_tv_stroke_rmph.rData
,fram_tv_cvd_rmph.rData
,fram_tv_hyperten_rmph.rData
, andfram_tv_death_rmph.rData
Rows and Columns
These files have the following numbers of rows and columns:
## [1] 6800 71
## [1] 629 15
## [1] 4215 20