A.3 U.S. Natality (2018)
Background
As required by federal law in the United States, the 2018 United States Birth Data were compiled from information on birth certificates by the National Vital Statistics System, part of the National Center for Health Statistics, in cooperation with states (National Center for Health Statistics 2022). Information collected includes gestational age, birthweight, maternal and paternal demographic information, risk factors, and characteristics of the labor and delivery.
Documentation
Documentation can be found in the User Guide to the 2018 Natality Public Use File. See also the NCHS Data User Agreement.
Teaching Datasets
Any analyses, interpretations, or conclusions reached herein are only for the purpose of illustrating regression methods and are credited to the author, not to NCHS, which is responsible only for the initial data. The author makes no claim or implication that any inferences derived from these teaching datasets are valid.
The teaching dataset natality2018_rmph.Rdata
is a simple random sample of 2000 births intended only for illustrating regression methods. In the teaching dataset, variable names in CAPS are coded as in the original dataset (with the exception of missing value codes being set to NA and some cases assigned values based on skip patterns). Variable names in lowercase were derived from other variables. The gestational age variable COMBGEST
was modified to create the variable gestage37
for use in survival analysis in which gestational ages >37 weeks were censored at 37 weeks and a random subset of gestational ages were censored at times <37 weeks. Thus, in addition to being only a small sample of U.S. births, the data are slightly modified for teaching purposes.
Creating the Teaching Datasets
To create the teaching datasets, do the following:
- Download the .zip file containing the 2018 CSV file found at Vital Statistics Natality Birth Data.
- Extract the CSV file
natl2018us.csv
from the .zip file. - Download the R script file
Natality 2018 Process.R
from RMPH Resources. - Run the R script file
Natality 2018 Process.R
to process the raw data and create the following teaching datasets:natality2018_rmph.Rdata
natality_CC_rmph.Rdata
(an artificial matched case-control dataset used to illustrate conditional logistic regression)
- Place these
.Rdata
files in your “Data” folder.
Rows and Columns
These files have the following numbers of rows and columns:
## [1] 2000 39
## [1] 1570 4