Chapter 4 Type Mismatch Errors

What You’ll Learn:

  • Understanding R’s type system
  • How coercion works (and fails)
  • Type checking and conversion
  • Common type mismatch scenarios
  • How to prevent type errors

Key Errors Covered: 20+ type-related errors

Difficulty: ⭐ Beginner to ⭐⭐ Intermediate

4.1 Introduction

R is dynamically typed but strongly typed. This means: - You don’t declare types (dynamic) - But types matter for operations (strong)

x <- 5        # R figures out it's numeric
y <- "5"      # R figures out it's character

But try to mix them:

x + y  # Error!
#> Error in x + y: non-numeric argument to binary operator

Understanding type errors is fundamental to R mastery. This chapter covers every type mismatch you’ll encounter.

4.2 R’s Basic Types

💡 Key Insight: The Six Atomic Types

R has six atomic (fundamental) types:

# 1. Logical
is_true <- TRUE
typeof(is_true)
#> [1] "logical"

# 2. Integer
age <- 25L  # Note the L
typeof(age)
#> [1] "integer"

# 3. Double (numeric)
price <- 19.99
typeof(price)
#> [1] "double"

# 4. Character
name <- "Alice"
typeof(name)
#> [1] "character"

# 5. Complex
z <- 3 + 2i
typeof(z)
#> [1] "complex"

# 6. Raw (rarely used)
raw_byte <- charToRaw("A")
typeof(raw_byte)
#> [1] "raw"

Most common: logical, integer, double, character

4.3 Error #1: non-numeric argument to binary operator

⭐ BEGINNER 🔢 TYPE

4.3.1 The Error

"10" + 5
#> Error in "10" + 5: non-numeric argument to binary operator

🔴 ERROR

Error in "10" + 5 : non-numeric argument to binary operator

4.3.2 What It Means

You tried to use a mathematical operator (+, -, *, /, ^, %%, %/%) with something that isn’t a number.

Binary operator = operator that works on two things (left + right)

4.3.3 Common Causes

4.3.3.1 Cause 1: Character That Looks Like Number

# Read from CSV without proper type specification
age <- "25"  # Actually character!
age + 10     # Error
#> Error in age + 10: non-numeric argument to binary operator
# Check type
class(age)
#> [1] "character"
is.numeric(age)
#> [1] FALSE
is.character(age)
#> [1] TRUE

4.3.3.2 Cause 2: Factor Instead of Numeric

# Factors are secretly integers with labels
scores <- factor(c("90", "85", "95"))
scores + 10  # Error!
#> Warning in Ops.factor(scores, 10): '+' not meaningful for factors
#> [1] NA NA NA
class(scores)
#> [1] "factor"
typeof(scores)  # "integer" but can't do math on it!
#> [1] "integer"

4.3.3.3 Cause 3: Missing Data Coerced to Character

# One NA can turn everything to character
values <- c(10, 20, NA, 40)
values <- as.character(values)  # Accidentally
values[1] + 5  # Error!
#> Error in values[1] + 5: non-numeric argument to binary operator

4.3.3.4 Cause 4: Logical in Math (This Actually Works!)

# Wait, this works?
TRUE + 5   # TRUE becomes 1
#> [1] 6
FALSE + 5  # FALSE becomes 0
#> [1] 5

# This is by design - logical coerces to numeric
sum(c(TRUE, FALSE, TRUE))  # Counts TRUEs
#> [1] 2

But mixing with character doesn’t:

"TRUE" + 5  # Character, not logical
#> Error in "TRUE" + 5: non-numeric argument to binary operator

4.3.4 Solutions

SOLUTION 1: Convert to Numeric

# Basic conversion
age <- "25"
age <- as.numeric(age)
age + 10
#> [1] 35

# Check before converting
if (is.character(age)) {
  age <- as.numeric(age)
}

SOLUTION 2: Handle Factors Correctly

# Wrong way:
scores <- factor(c("90", "85", "95"))
as.numeric(scores)  # Gives factor levels (1,2,3), not values!
#> [1] 2 1 3

# Right way:
as.numeric(as.character(scores))  # Convert to char first
#> [1] 90 85 95

# Better way:
as.numeric(levels(scores)[scores])
#> [1] 90 85 95

SOLUTION 3: Read Data with Correct Types

# Base R - specify column types
data <- read.csv("file.csv", 
                colClasses = c("numeric", "character", "numeric"))

# tidyverse - specify on read
library(readr)
data <- read_csv("file.csv",
                col_types = cols(
                  age = col_double(),
                  name = col_character(),
                  score = col_double()
                ))

SOLUTION 4: Safe Conversion with Error Handling

safe_as_numeric <- function(x) {
  result <- suppressWarnings(as.numeric(x))
  
  if (all(is.na(result)) && !all(is.na(x))) {
    warning("Conversion produced all NAs - check your data")
  }
  
  return(result)
}

# Test
safe_as_numeric("25")      # Works
#> [1] 25
safe_as_numeric("abc")     # Warning + NA
#> Warning in safe_as_numeric("abc"): Conversion produced all NAs - check your
#> data
#> [1] NA
safe_as_numeric(c("1", "2", "three"))  # Partial conversion
#> [1]  1  2 NA

⚠️ Common Pitfall: Silent Failures

# This looks like it worked...
x <- c("1", "2", "3", "four")
x <- as.numeric(x)
#> Warning: NAs introduced by coercion
x
#> [1]  1  2  3 NA

Problem: “four” became NA silently!

Solution: Check for NAs after conversion:

if (any(is.na(x))) {
  warning("Some values couldn't be converted")
}
#> Warning: Some values couldn't be converted

4.4 Error #2: non-numeric argument to mathematical function

⭐ BEGINNER 🔢 TYPE

4.4.1 The Error

sqrt("16")
#> Error in sqrt("16"): non-numeric argument to mathematical function

🔴 ERROR

Error in sqrt("16") : non-numeric argument to mathematical function

4.4.2 What It Means

Mathematical functions (sqrt, log, exp, sin, cos, etc.) need numbers, not characters or other types.

4.4.3 Common Functions That Give This Error

# All of these error with character input:
sqrt("16")
#> Error in sqrt("16"): non-numeric argument to mathematical function
log("10")
#> Error in log("10"): non-numeric argument to mathematical function
exp("2")
#> Error in exp("2"): non-numeric argument to mathematical function
abs("-5")
#> Error in abs("-5"): non-numeric argument to mathematical function
round("3.14")
#> Error in round("3.14"): non-numeric argument to mathematical function
floor("4.7")
#> Error in floor("4.7"): non-numeric argument to mathematical function
ceiling("4.2")
#> Error in ceiling("4.2"): non-numeric argument to mathematical function

4.4.4 Solutions

SOLUTIONS

1. Convert before calling function:

sqrt(as.numeric("16"))
#> [1] 4
log(as.numeric("10"))
#> [1] 2.302585

2. Vectorized conversion and operation:

values <- c("16", "25", "36")
sqrt(as.numeric(values))
#> [1] 4 5 6

3. Use type-safe reading:

# When reading data
data <- read.csv("data.csv", stringsAsFactors = FALSE)
data$numeric_col <- as.numeric(data$numeric_col)

4.5 Error #3: (list) object cannot be coerced to type 'double'

⭐⭐ INTERMEDIATE 🔢 TYPE

4.5.1 The Error

my_list <- list(a = 1, b = 2, c = 3)
sum(my_list)
#> Error in sum(my_list): invalid 'type' (list) of argument

🔴 ERROR

Error in sum(my_list) : invalid 'type' (list) of argument

4.5.2 What It Means

You’re trying to do mathematical operations on a list, which is a container that can hold anything. R can’t automatically convert a list to numbers.

4.5.3 Common Causes

4.5.3.1 Cause 1: Using List Instead of Vector

# List (wrong for math)
numbers_list <- list(1, 2, 3, 4, 5)
mean(numbers_list)  # Error!
#> Warning in mean.default(numbers_list): argument is not numeric or logical:
#> returning NA
#> [1] NA
# Vector (right for math)
numbers_vec <- c(1, 2, 3, 4, 5)
mean(numbers_vec)  # Works!
#> [1] 3

4.5.3.2 Cause 2: Extracting From Data Frame Incorrectly

df <- data.frame(x = 1:5, y = 6:10)

# Single bracket returns data frame (list-based)
sum(df[1])  # Error - still a data frame
#> [1] 15

# Double bracket returns vector
sum(df[[1]])  # Works!
#> [1] 15

# Dollar sign returns vector
sum(df$x)  # Works!
#> [1] 15

4.5.3.3 Cause 3: List Column in Data Frame

# Modern R can have list columns
df <- data.frame(id = 1:3)
df$values <- list(c(1,2), c(3,4), c(5,6))

# Can't do math on list column
sum(df$values)  # Error!
#> Error in sum(df$values): invalid 'type' (list) of argument

4.5.4 Solutions

SOLUTION 1: Convert List to Vector

my_list <- list(a = 1, b = 2, c = 3)

# Unlist to vector
unlist(my_list)
#> a b c 
#> 1 2 3
sum(unlist(my_list))
#> [1] 6

# Or use do.call
do.call(sum, my_list)
#> [1] 6

SOLUTION 2: Use Correct Extraction

df <- data.frame(x = 1:5, y = 6:10)

# Good ways:
sum(df$x)      # Dollar sign
#> [1] 15
sum(df[[1]])   # Double bracket
#> [1] 15
sum(df[, 1])   # Bracket with comma
#> [1] 15

# Bad way:
# sum(df[1])   # Single bracket = data frame

SOLUTION 3: Handle List Columns

df <- data.frame(id = 1:3)
df$values <- list(c(1,2), c(3,4), c(5,6))

# Apply operation to each list element
sapply(df$values, sum)
#> [1]  3  7 11
lapply(df$values, mean)
#> [[1]]
#> [1] 1.5
#> 
#> [[2]]
#> [1] 3.5
#> 
#> [[3]]
#> [1] 5.5

# Or unnest first (tidyverse)
library(tidyr)
df %>% unnest(values)
#> # A tibble: 6 × 2
#>      id values
#>   <int>  <dbl>
#> 1     1      1
#> 2     1      2
#> 3     2      3
#> 4     2      4
#> 5     3      5
#> 6     3      6

💡 Key Insight: List vs Vector

# Vector: All same type
vec <- c(1, 2, 3)
typeof(vec)
#> [1] "double"
class(vec)
#> [1] "numeric"

# List: Can mix types
lst <- list(1, "two", TRUE)
typeof(lst)
#> [1] "list"
class(lst)
#> [1] "list"

# Data frame: Special list of vectors
df <- data.frame(x = 1:3, y = 4:6)
typeof(df)  # "list"!
#> [1] "list"
class(df)   # "data.frame"
#> [1] "data.frame"

# Single bracket keeps structure
df[1]      # Data frame (list)
#>   x
#> 1 1
#> 2 2
#> 3 3
df[[1]]    # Vector
#> [1] 1 2 3

4.6 Error #4: invalid type (closure) for variable 'X'

⭐⭐ INTERMEDIATE 🔢 TYPE

4.6.1 The Error

# Accidentally using a function as data
data <- data.frame(x = 1:5)
plot(mean, data$x)  # mean is the function!
#> Error in curve(expr = x, from = from, to = to, xlim = xlim, ylab = ylab, : 'expr' did not evaluate to an object of length 'n'

🔴 ERROR

Error in plot.xy(xy.coords(x, y), type = type, ...) : 
  invalid type (closure) for variable 'mean'

4.6.2 What It Means

“Closure” = function. You passed a function where R expected data.

4.6.3 Common Causes

4.6.3.1 Cause 1: Forgot to Call Function

numbers <- 1:10
plot(mean, numbers)  # Passed function itself
#> Error in curve(expr = x, from = from, to = to, xlim = xlim, ylab = ylab, : 'expr' did not evaluate to an object of length 'n'
# Fix: Call the function
plot(mean(numbers), numbers)
#> Error in xy.coords(x, y, xlabel, ylabel, log): 'x' and 'y' lengths differ

4.6.3.2 Cause 2: Variable Name Same as Function

# Created variable named 'c'
c <- 100
data <- c(1, 2, 3)  # Now c() function is masked!

# Later, someone tries to use the function
# But 'c' is now the number 100
# Check what something is
is.function(mean)
#> [1] TRUE
is.function(100)
#> [1] FALSE

4.6.4 Solutions

SOLUTIONS

1. Call the function (add parentheses):

# Wrong:
plot(mean, data)

# Right:
plot(mean(data), ...)

2. Don’t name variables after functions:

# Bad:
# mean <- 42
# sum <- 100
# data <- my_data

# Good:
average_value <- 42
total_sum <- 100
my_data <- ...
#> Error: '...' used in an incorrect context

3. Remove conflicting variable:

# If you accidentally created:
# sum <- 100

# Remove it:
rm(sum)
#> Warning in rm(sum): object 'sum' not found

# Now sum() function works again
sum(1:10)
#> [1] 55

4.7 Error #5: cannot coerce class "X" to a data.frame

⭐⭐ INTERMEDIATE 🔢 TYPE

4.7.1 The Error

# Trying to convert incompatible type
my_func <- function() { return(42) }
as.data.frame(my_func)
#> Error in as.data.frame.default(my_func): cannot coerce class '"function"' to a data.frame

🔴 ERROR

Error in as.data.frame.default(my_func) : 
  cannot coerce class '"function"' to a data.frame

4.7.2 Common Causes

4.7.2.1 Cause 1: Wrong Object Type

# Can't convert function
as.data.frame(mean)
#> Error in as.data.frame.default(mean): cannot coerce class '"function"' to a data.frame

# Can't convert environment
as.data.frame(.GlobalEnv)
#> Error in as.data.frame.default(.GlobalEnv): cannot coerce class '"environment"' to a data.frame

4.7.2.2 Cause 2: Wrong List Structure

# Uneven list lengths
bad_list <- list(a = 1:3, b = 1:5)
as.data.frame(bad_list)  # Error - different lengths!
#> Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 3, 5
# Must be same length or length 1
good_list <- list(a = 1:3, b = 4:6)
as.data.frame(good_list)
#>   a b
#> 1 1 4
#> 2 2 5
#> 3 3 6

# Or use recycling
recycled_list <- list(a = 1:3, b = 1)  # b recycled
as.data.frame(recycled_list)
#>   a b
#> 1 1 1
#> 2 2 1
#> 3 3 1

4.7.2.3 Cause 3: Matrix With Wrong Type

# Some object types don't convert cleanly
nested <- list(list(1, 2), list(3, 4))
as.data.frame(nested)  # Error - nested lists!
#>   X1 X2 X3 X4
#> 1  1  2  3  4

4.7.3 Solutions

SOLUTION 1: Fix List Structure

# Uneven lengths - fix it
bad_list <- list(a = 1:3, b = 1:5)

# Option 1: Trim to shortest
min_len <- min(lengths(bad_list))
fixed_list <- lapply(bad_list, function(x) x[1:min_len])
as.data.frame(fixed_list)
#>   a b
#> 1 1 1
#> 2 2 2
#> 3 3 3

# Option 2: Pad with NA
max_len <- max(lengths(bad_list))
fixed_list <- lapply(bad_list, function(x) {
  c(x, rep(NA, max_len - length(x)))
})
as.data.frame(fixed_list)
#>    a b
#> 1  1 1
#> 2  2 2
#> 3  3 3
#> 4 NA 4
#> 5 NA 5

SOLUTION 2: Convert Correctly

# From matrix
mat <- matrix(1:6, nrow = 2)
as.data.frame(mat)
#>   V1 V2 V3
#> 1  1  3  5
#> 2  2  4  6

# From vector with names
vec <- c(a = 1, b = 2, c = 3)
as.data.frame(as.list(vec))
#>   a b c
#> 1 1 2 3

# From nested list - flatten first
nested <- list(list(1, 2), list(3, 4))
flat <- unlist(nested, recursive = FALSE)
# Or handle differently depending on structure

SOLUTION 3: Check Before Converting

safe_as_df <- function(x) {
  # Check if it's already a data frame
  if (is.data.frame(x)) return(x)
  
  # Check if it's a matrix
  if (is.matrix(x)) return(as.data.frame(x))
  
  # Check if it's a list with equal lengths
  if (is.list(x)) {
    lens <- lengths(x)
    if (length(unique(lens)) == 1 || all(lens == 1 | lens == max(lens))) {
      return(as.data.frame(x))
    } else {
      stop("List elements have incompatible lengths: ", 
           paste(lens, collapse = ", "))
    }
  }
  
  # Try generic conversion
  tryCatch(
    as.data.frame(x),
    error = function(e) {
      stop("Cannot convert ", class(x), " to data.frame: ", e$message)
    }
  )
}

# Test
safe_as_df(list(a = 1:3, b = 4:6))  # Works
#>   a b
#> 1 1 4
#> 2 2 5
#> 3 3 6

4.8 Error #6: NAs introduced by coercion

⭐ BEGINNER 🔢 TYPE

4.8.1 The Warning (Usually)

as.numeric(c("1", "2", "three", "4"))
#> Warning: NAs introduced by coercion
#> [1]  1  2 NA  4

🟡 WARNING

Warning message:
NAs introduced by coercion

4.8.2 What It Means

R tried to convert something to numeric, but some values couldn’t be converted, so they became NA.

4.8.3 Common Scenarios

4.8.3.1 Scenario 1: Text in Numeric Column

# Data entry errors
scores <- c("90", "85", "N/A", "92", "absent")
as.numeric(scores)
#> Warning: NAs introduced by coercion
#> [1] 90 85 NA 92 NA

# Check which became NA
is.na(as.numeric(scores))
#> Warning: NAs introduced by coercion
#> [1] FALSE FALSE  TRUE FALSE  TRUE

4.8.3.2 Scenario 2: Special Characters

# Currency symbols
prices <- c("$10.99", "$25.50", "$8.75")
as.numeric(prices)  # All become NA!
#> Warning: NAs introduced by coercion
#> [1] NA NA NA

# Need to remove $ first
as.numeric(gsub("\\$", "", prices))
#> [1] 10.99 25.50  8.75

4.8.3.3 Scenario 3: Scientific Notation Issues

# Usually these work fine
as.numeric("1.5e-10")  # Scientific notation OK
#> [1] 1.5e-10

# But typos don't
as.numeric("1.5E-10a")  # Typo creates NA
#> Warning: NAs introduced by coercion
#> [1] NA

4.8.3.4 Scenario 4: Factors with Text Levels

# Factor with non-numeric levels
responses <- factor(c("Yes", "No", "Yes", "Maybe"))
as.numeric(responses)  # Gives factor codes (1,2,1,3), not what you want
#> [1] 3 2 3 1

# And trying to convert to the levels gives NA
as.numeric(as.character(responses))
#> Warning: NAs introduced by coercion
#> [1] NA NA NA NA

4.8.4 Solutions

SOLUTION 1: Clean Data First

# Remove non-numeric characters
dirty <- c("$10.99", "€25.50", "8.75")

# Remove currency symbols
clean <- gsub("[^0-9.]", "", dirty)
as.numeric(clean)
#> [1] 10.99 25.50  8.75

# More robust cleaning
clean_numeric <- function(x) {
  # Remove everything except numbers, decimal, minus
  cleaned <- gsub("[^0-9.-]", "", x)
  as.numeric(cleaned)
}

clean_numeric(c("$10.99", "-25.5%", "8 dollars"))
#> [1]  10.99 -25.50   8.00

SOLUTION 2: Handle NAs Explicitly

values <- c("1", "2", "three", "4")
converted <- as.numeric(values)
#> Warning: NAs introduced by coercion

# Check which failed
failed <- is.na(converted) & !is.na(values)
if (any(failed)) {
  message("Could not convert: ", paste(values[failed], collapse = ", "))
}
#> Could not convert: three

# Or replace NAs with default
converted[is.na(converted)] <- 0
converted
#> [1] 1 2 0 4

SOLUTION 3: Use readr’s parse_number()

library(readr)

# Automatically extracts numbers
parse_number("$10.99")
#> [1] 10.99
parse_number("Price: $25.50")
#> [1] 25.5
parse_number("8.75%")
#> [1] 8.75

# Vector
parse_number(c("$10.99", "€25.50", "8.75"))
#> [1] 10.99 25.50  8.75

🎯 Best Practice: Validate After Coercion

coerce_with_validation <- function(x, to = "numeric") {
  original <- x
  
  if (to == "numeric") {
    converted <- as.numeric(x)
  } else if (to == "integer") {
    converted <- as.integer(x)
  } else {
    stop("Unsupported conversion type")
  }
  
  # Count NAs
  original_nas <- sum(is.na(original))
  new_nas <- sum(is.na(converted))
  introduced_nas <- new_nas - original_nas
  
  if (introduced_nas > 0) {
    warning(introduced_nas, " NAs introduced by coercion")
    failed_values <- original[is.na(converted) & !is.na(original)]
    message("Failed to convert: ", 
            paste(head(failed_values, 5), collapse = ", "),
            if(length(failed_values) > 5) "..." else "")
  }
  
  return(converted)
}

# Test
coerce_with_validation(c("1", "2", "three", "4"))
#> Warning in coerce_with_validation(c("1", "2", "three", "4")): NAs introduced by
#> coercion
#> Warning in coerce_with_validation(c("1", "2", "three", "4")): 1 NAs introduced
#> by coercion
#> Failed to convert: three
#> [1]  1  2 NA  4

4.9 Error #7: character string is not in a standard unambiguous format

⭐⭐ INTERMEDIATE 🔢 TYPE

4.9.1 The Error

as.Date("2024/13/01")  # Month 13 doesn't exist
#> Error in charToDate(x): character string is not in a standard unambiguous format

🔴 ERROR

Error in charToDate(x) : 
  character string is not in a standard unambiguous format

4.9.2 What It Means

You’re trying to convert a string to a Date, but R can’t figure out the format, or the date is invalid.

4.9.3 Common Causes

4.9.3.1 Cause 1: Wrong Date Format

# American format (month/day/year)
as.Date("12/25/2024")  # R expects YYYY-MM-DD
#> Error in charToDate(x): character string is not in a standard unambiguous format
# Specify format
as.Date("12/25/2024", format = "%m/%d/%Y")
#> [1] "2024-12-25"

4.9.3.2 Cause 2: Invalid Date

as.Date("2024-02-30")  # February doesn't have 30 days
#> Error in charToDate(x): character string is not in a standard unambiguous format
as.Date("2024-13-01")  # Month 13 doesn't exist
#> Error in charToDate(x): character string is not in a standard unambiguous format

4.9.3.3 Cause 3: Ambiguous Format

# Is this Jan 2 or Feb 1?
as.Date("01/02/2024")  # R gets confused
#> [1] "0001-02-20"
# Be explicit
as.Date("01/02/2024", format = "%m/%d/%Y")  # Jan 2
#> [1] "2024-01-02"
as.Date("01/02/2024", format = "%d/%m/%Y")  # Feb 1
#> [1] "2024-02-01"

4.9.4 Solutions

SOLUTION 1: Specify Format

# Common formats
as.Date("2024-12-25")  # ISO format (default)
#> [1] "2024-12-25"
as.Date("12/25/2024", format = "%m/%d/%Y")
#> [1] "2024-12-25"
as.Date("25/12/2024", format = "%d/%m/%Y")
#> [1] "2024-12-25"
as.Date("Dec 25, 2024", format = "%b %d, %Y")
#> [1] "2024-12-25"
as.Date("December 25, 2024", format = "%B %d, %Y")
#> [1] "2024-12-25"

Format codes: - %Y = 4-digit year (2024) - %y = 2-digit year (24) - %m = numeric month (12) - %d = day of month (25) - %b = abbreviated month (Dec) - %B = full month (December)

SOLUTION 2: Use lubridate (Easier!)

library(lubridate)

# Auto-detect common formats
ymd("2024-12-25")
#> [1] "2024-12-25"
mdy("12/25/2024")
#> [1] "2024-12-25"
dmy("25/12/2024")
#> [1] "2024-12-25"
mdy("Dec 25, 2024")
#> [1] "2024-12-25"

# Vector of dates
dates <- c("2024-12-25", "2024/01/15", "2024.06.30")
ymd(dates)
#> [1] "2024-12-25" "2024-01-15" "2024-06-30"

SOLUTION 3: Handle Parse Failures

dates <- c("2024-12-25", "invalid", "2024-02-30", "2024-01-15")

# Base R - NAs for failures
parsed <- as.Date(dates)  # Warnings
parsed
#> [1] "2024-12-25" NA           NA           "2024-01-15"

# lubridate - shows which failed
library(lubridate)
parsed <- ymd(dates, quiet = FALSE)
#> Warning: 2 failed to parse.
parsed
#> [1] "2024-12-25" NA           NA           "2024-01-15"

# Custom handling
safe_parse_date <- function(x, format = "%Y-%m-%d") {
  result <- as.Date(x, format = format)
  
  # Report failures
  failed <- is.na(result) & !is.na(x)
  if (any(failed)) {
    message("Failed to parse ", sum(failed), " dates:")
    message(paste(x[failed], collapse = ", "))
  }
  
  return(result)
}

safe_parse_date(dates)
#> Failed to parse 2 dates:
#> invalid, 2024-02-30
#> [1] "2024-12-25" NA           NA           "2024-01-15"

4.10 Type Checking Functions

🎯 Best Practice: Check Types Before Operating

# Checking functions
is.numeric(5)       # TRUE for integer or double
#> [1] TRUE
is.integer(5L)      # TRUE only for integer
#> [1] TRUE
is.double(5.0)      # TRUE only for double
#> [1] TRUE
is.character("5")   # TRUE for character
#> [1] TRUE
is.logical(TRUE)    # TRUE for logical
#> [1] TRUE
is.factor(factor(1:3))  # TRUE for factor
#> [1] TRUE

# Getting type info
typeof(5)           # "double"
#> [1] "double"
class(5)            # "numeric"
#> [1] "numeric"
mode(5)             # "numeric"
#> [1] "numeric"

# More specific checks
is.na(NA)           # TRUE for NA
#> [1] TRUE
is.null(NULL)       # TRUE for NULL
#> [1] TRUE
is.nan(NaN)         # TRUE for NaN (not a number)
#> [1] TRUE
is.infinite(Inf)    # TRUE for Inf
#> [1] TRUE
is.finite(5)        # TRUE for normal numbers
#> [1] TRUE

# Structure checks
is.vector(c(1,2,3))      # TRUE
#> [1] TRUE
is.list(list(1,2))       # TRUE
#> [1] TRUE
is.matrix(matrix(1:4, 2, 2))  # TRUE
#> [1] TRUE
is.data.frame(data.frame(x=1:3))  # TRUE
#> [1] TRUE
is.array(array(1:8, dim=c(2,2,2)))  # TRUE
#> [1] TRUE

4.11 Type Conversion Functions

💡 Key Insight: Conversion Functions

# To numeric
as.numeric("5")
#> [1] 5
as.integer("5")
#> [1] 5
as.double("5.5")
#> [1] 5.5

# To character
as.character(5)
#> [1] "5"
as.character(TRUE)
#> [1] "TRUE"

# To logical
as.logical(1)        # TRUE
#> [1] TRUE
as.logical(0)        # FALSE
#> [1] FALSE
as.logical("TRUE")   # TRUE
#> [1] TRUE
as.logical("T")      # TRUE
#> [1] TRUE

# To factor
as.factor(c("A", "B", "A"))
#> [1] A B A
#> Levels: A B

# Special conversions
as.Date("2024-01-15")
#> [1] "2024-01-15"
as.POSIXct("2024-01-15 10:30:00")
#> [1] "2024-01-15 10:30:00 CST"

Coercion Hierarchy: logical → integer → double → character

Everything can become character!

c(TRUE, 1L, 1.5, "text")  # All become character
#> [1] "TRUE" "1"    "1.5"  "text"

4.12 Summary

Key Takeaways:

  1. R has 6 atomic types: logical, integer, double, character, complex, raw
  2. Check types before operations: Use typeof(), class(), is.*() functions
  3. Explicit is better than implicit: Use as.numeric() rather than hoping
  4. Watch for silent failures: Check for NAs after coercion
  5. Factors are tricky: Convert to character before numeric
  6. Lists aren’t vectors: Use unlist() or [[]] extraction
  7. Specify date formats: Don’t rely on auto-detection
  8. Use lubridate for dates: Much easier than base R

Quick Reference:

Error Cause Fix
non-numeric argument to binary operator Character in math as.numeric()
non-numeric argument to math function Character in function as.numeric()
(list) cannot be coerced Wrong structure unlist() or [[]]
invalid type (closure) Function instead of data Call function or rename variable
cannot coerce to data.frame Incompatible type Fix structure or use correct conversion
NAs introduced by coercion Invalid values Clean data first
character string not in standard format Date parse failure Specify format or use lubridate

Type Checking Checklist:

# Before doing math:
is.numeric(x)

# Before subsetting:
is.vector(x) || is.list(x)

# Before data frame operations:
is.data.frame(df)

# After conversion:
any(is.na(result))

4.13 Exercises

📝 Exercise 1: Type Detective

What’s wrong and how do you fix it?

# Scenario 1
age <- "25"
next_year <- age + 1

# Scenario 2
scores <- factor(c("90", "85", "95"))
average <- mean(as.numeric(scores))

# Scenario 3
df <- data.frame(x = 1:5)
total <- sum(df[1])

# Scenario 4
dates <- c("2024-01-15", "15/01/2024", "Jan 15 2024")
parsed <- as.Date(dates)

📝 Exercise 2: Type Conversion

Write a function that: 1. Takes a vector of any type 2. Tries to convert to numeric 3. Reports which values failed 4. Returns numeric vector with NAs for failures 5. Provides a summary of conversions

📝 Exercise 3: Real Data

You receive this data:

sales <- c("$1,234.56", "$987.65", "N/A", "$2,345.67", "pending")
dates <- c("01/15/2024", "2024-02-20", "Mar 15, 2024")

Clean and convert both to appropriate types.

📝 Exercise 4: Data Frame Types

Debug this code:

df <- data.frame(
  id = 1:3,
  value = c("100", "200", "300"),
  date = c("2024-01-15", "2024-02-20", "2024-03-25")
)

# Want to do:
df$value_doubled <- df$value * 2
df$days_since <- Sys.Date() - df$date

Fix the types so operations work.

4.14 Exercise Answers

Click to see answers

Exercise 1:

# Scenario 1 - Character in math
age <- "25"
age <- as.numeric(age)  # Fix
next_year <- age + 1

# Scenario 2 - Factor to numeric wrong way
scores <- factor(c("90", "85", "95"))
# Wrong: as.numeric(scores) gives 1,2,3
# Right:
scores_num <- as.numeric(as.character(scores))
average <- mean(scores_num)

# Scenario 3 - Single bracket returns data frame
df <- data.frame(x = 1:5)
# Wrong: df[1] is still data frame
# Right:
total <- sum(df[[1]])  # or sum(df$x)

# Scenario 4 - Mixed date formats
dates <- c("2024-01-15", "15/01/2024", "Jan 15 2024")
# Need different formats for each
library(lubridate)
parsed <- c(ymd("2024-01-15"), 
            dmy("15/01/2024"),
            mdy("Jan 15 2024"))

Exercise 2:

smart_numeric_convert <- function(x) {
  # Store original
  original <- x
  original_class <- class(x)
  
  # Attempt conversion
  converted <- suppressWarnings(as.numeric(x))
  
  # Identify failures
  original_na <- is.na(original)
  new_na <- is.na(converted)
  failures <- new_na & !original_na
  
  # Report
  cat("Conversion Summary:\n")
  cat("  Original type:", original_class, "\n")
  cat("  Total values:", length(x), "\n")
  cat("  Successful:", sum(!new_na), "\n")
  cat("  Failed:", sum(failures), "\n")
  cat("  Already NA:", sum(original_na), "\n\n")
  
  if (any(failures)) {
    cat("Failed values:\n")
    print(head(original[failures], 10))
  }
  
  return(converted)
}

# Test
smart_numeric_convert(c("1", "2", "three", "4", "five"))
#> Conversion Summary:
#>   Original type: character 
#>   Total values: 5 
#>   Successful: 3 
#>   Failed: 2 
#>   Already NA: 0 
#> 
#> Failed values:
#> [1] "three" "five"
#> [1]  1  2 NA  4 NA

Exercise 3:

library(readr)
library(lubridate)

# Clean sales
sales <- c("$1,234.56", "$987.65", "N/A", "$2,345.67", "pending")

# Remove currency and commas, handle text
sales_clean <- gsub("[$,]", "", sales)
sales_num <- suppressWarnings(as.numeric(sales_clean))
sales_num[is.na(sales_num)]  <- 0  # Or handle differently

# Clean dates
dates <- c("01/15/2024", "2024-02-20", "Mar 15, 2024")

# Try multiple formats
dates_parsed <- as.Date(parse_date_time(dates, 
                                        orders = c("mdy", "ymd", "bdy")))

# Result
data.frame(
  sales = sales_num,
  date = dates_parsed
)
#> Error in data.frame(sales = sales_num, date = dates_parsed): arguments imply differing number of rows: 5, 3

Exercise 4:

df <- data.frame(
  id = 1:3,
  value = c("100", "200", "300"),
  date = c("2024-01-15", "2024-02-20", "2024-03-25"),
  stringsAsFactors = FALSE
)

# Fix types
df$value <- as.numeric(df$value)
df$date <- as.Date(df$date)

# Now operations work
df$value_doubled <- df$value * 2
df$days_since <- as.numeric(Sys.Date() - df$date)

df
#>   id value       date value_doubled days_since
#> 1  1   100 2024-01-15           200        650
#> 2  2   200 2024-02-20           400        614
#> 3  3   300 2024-03-25           600        580