6.4 Missing (NA) values

In R, missing data are coded as NA. In real datasets, NA values turn up all the time. Unfortunately, most descriptive statistics functions will freak out if there is a missing (NA) value in the data. For example, the following code will return NA as a result because there is an NA value in the data vector:

a <- c(1, 5, NA, 2, 10)
## [1] NA

Thankfully, there’s a way we can work around this. To tell a descriptive statistic function to ignore missing (NA) values, include the argument na.rm = TRUE in the function. This argument explicitly tells the function to ignore NA values. Let’s try calculating the mean of the vector a again, this time with the additionalna.rm = TRUE argument:

mean(a, na.rm = TRUE)
## [1] 4.5

Now, the function ignored the NA value and returned the mean of the remaining data. While this may seem trivial now (why did we include an NA value in the vector if we wanted to ignore it?!), it will be become very important when we apply the function to real data which, very often, contains missing values.