Chapter 4 Missing data in R

In R the missing values are denoted by NA which means “Not Available”. If we open the same dataset as above in R we get the following result.

## # A tibble: 10 x 7
##       ID  Pain Tampascale Disability Radiation Gender    GA
##    <dbl> <dbl>      <dbl>      <dbl>     <dbl>  <dbl> <dbl>
##  1     1     9         45         20         1      1     8
##  2     2     6         43         10         0      0    36
##  3     3     1         36          1         0      1     8
##  4     4     5         38         14         0      0    NA
##  5     5     6         44         14         1      1     8
##  6     6     7         43         11         1      0    29
##  7     7     8         43         18         0      0    NA
##  8     8     6         43         11         1      0    34
##  9     9     2         37         11         1      1     8
## 10    10     4         36          3         0      0    38

The Variable Gestational Age (GA) contains the values for GA (e.g. 36, 29, etc.), the value 8 for males and the NA’s. In R the value 8 will be treated as a real value, so we have to recode that value to NA by using the following code to convert an 8 into an NA for the males.

## # A tibble: 10 x 7
##       ID  Pain Tampascale Disability Radiation Gender    GA
##    <dbl> <dbl>      <dbl>      <dbl>     <dbl>  <dbl> <dbl>
##  1     1     9         45         20         1      1    NA
##  2     2     6         43         10         0      0    36
##  3     3     1         36          1         0      1    NA
##  4     4     5         38         14         0      0    NA
##  5     5     6         44         14         1      1    NA
##  6     6     7         43         11         1      0    29
##  7     7     8         43         18         0      0    NA
##  8     8     6         43         11         1      0    34
##  9     9     2         37         11         1      1    NA
## 10    10     4         36          3         0      0    38

Within most functions in R the handling of NA values has to be defined, otherwise the function returns an NA as a result. For example, the following code to obtain the mean of Gestational Age results in an NA.

## [1] NA

To obtain the mean of the observed data the statement na.rm = TRUE has to be added.

## [1] 35.09524

The na.rm = TRUE statement in the mean-function, indicates that values that are NA are removed before the analysis is executed. Another NA handling procedure that is regularly used in functions is called na.action with as options na.fail, na.omit, NULL (no action) and na.exclude. For more information about na.action options you can look at the help-file by typing ?na.action in the Console window.