Chapter 12 Factor Creation & Levels
What You’ll Learn:
- What factors are and why they exist
- Creating factors correctly
- Understanding levels and labels
- Ordered vs unordered factors
- Common factor creation pitfalls
Key Errors Covered: 15+ factor errors
Difficulty: ⭐⭐ Intermediate
12.1 Introduction
Factors are R’s way of representing categorical data, but they’re confusing:
# This looks like it should work...
grades <- factor(c("A", "B", "C"))
grades[1] <- "D"
#> Warning in `[<-.factor`(`*tmp*`, 1, value = "D"): invalid factor level, NA
#> generated🟡 WARNING
Warning message:
In `[<-.factor`(`*tmp*`, 1, value = "D") :
invalid factor level, NA generated
Let’s understand factors to avoid these surprises.
12.2 What Are Factors?
💡 Key Insight: Factors Are Integers in Disguise
# Create a factor
colors <- factor(c("red", "blue", "red", "green"))
colors
#> [1] red blue red green
#> Levels: blue green red
# But underneath, it's integers!
typeof(colors) # "integer"
#> [1] "integer"
as.integer(colors) # 3 1 3 2
#> [1] 3 1 3 2
# The labels are stored separately
levels(colors)
#> [1] "blue" "green" "red"
# Structure revealed
str(colors)
#> Factor w/ 3 levels "blue","green",..: 3 1 3 2Key points: - Factors store data as integers (1, 2, 3, …) - Each integer maps to a level (label) - Levels are stored once, data stores references - More memory-efficient for repeated values - Used extensively in statistical modeling
Why factors exist: 1. Memory efficiency (repeated strings) 2. Statistical modeling (R knows it’s categorical) 3. Ordering (can be ordered or unordered) 4. Validation (only valid levels allowed)
12.3 Factor vs Character
💡 Factor vs Character Comparison
# Character vector
char_vec <- c("red", "blue", "red", "green")
typeof(char_vec)
#> [1] "character"
class(char_vec)
#> [1] "character"
# Factor
fac_vec <- factor(char_vec)
typeof(fac_vec)
#> [1] "integer"
class(fac_vec)
#> [1] "factor"
# Memory difference (with many repetitions)
x_char <- rep(c("Category A", "Category B"), 10000)
x_fac <- factor(x_char)
object.size(x_char)
#> 160176 bytes
object.size(x_fac) # Much smaller!
#> 80576 bytes
# Statistical modeling difference
df <- data.frame(
group = factor(c("A", "B", "A", "B")),
value = c(10, 20, 15, 25)
)
# R knows 'group' is categorical
lm(value ~ group, data = df)
#>
#> Call:
#> lm(formula = value ~ group, data = df)
#>
#> Coefficients:
#> (Intercept) groupB
#> 12.5 10.0When to use each: - Character: Text data, unique values, will manipulate as strings - Factor: Categories, repeated values, for modeling/plotting
12.4 Error #1: invalid factor level, NA generated
⭐ BEGINNER 🔢 TYPE
12.4.1 The Error
sizes <- factor(c("small", "medium", "large"))
sizes[1] <- "extra-large" # Not in levels!
#> Warning in `[<-.factor`(`*tmp*`, 1, value = "extra-large"): invalid factor
#> level, NA generated🟡 WARNING
Warning message:
In `[<-.factor`(`*tmp*`, 1, value = "extra-large") :
invalid factor level, NA generated
12.4.2 What It Means
You’re trying to assign a value that’s not in the factor’s levels. R converts it to NA instead.
12.4.3 Why This Happens
sizes <- factor(c("small", "medium", "large"))
# Only these levels exist
levels(sizes)
#> [1] "large" "medium" "small"
# Can only assign existing levels
sizes[1] <- "medium" # OK
sizes
#> [1] medium medium large
#> Levels: large medium small
# New levels not allowed
sizes[2] <- "tiny" # Warning, becomes NA
#> Warning in `[<-.factor`(`*tmp*`, 2, value = "tiny"): invalid factor level, NA
#> generated
sizes
#> [1] medium <NA> large
#> Levels: large medium small12.4.4 Common Causes
12.4.5 Solutions
✅ SOLUTION 1: Add New Level First
sizes <- factor(c("small", "medium", "large"))
# Add new level
levels(sizes) <- c(levels(sizes), "extra-large")
levels(sizes)
#> [1] "large" "medium" "small" "extra-large"
# Now assignment works
sizes[1] <- "extra-large"
sizes
#> [1] extra-large medium large
#> Levels: large medium small extra-large✅ SOLUTION 2: Convert to Character, Modify, Convert Back
sizes <- factor(c("small", "medium", "large"))
# Convert to character
sizes_char <- as.character(sizes)
# Modify freely
sizes_char[1] <- "extra-large"
sizes_char[4] <- "tiny"
# Convert back to factor
sizes_new <- factor(sizes_char)
sizes_new
#> [1] extra-large medium large tiny
#> Levels: extra-large large medium tiny
levels(sizes_new)
#> [1] "extra-large" "large" "medium" "tiny"✅ SOLUTION 3: Specify All Levels Upfront
# Specify all possible levels when creating
sizes <- factor(
c("small", "medium", "large"),
levels = c("tiny", "small", "medium", "large", "extra-large")
)
levels(sizes)
#> [1] "tiny" "small" "medium" "large" "extra-large"
# Now any level can be assigned
sizes[1] <- "extra-large"
sizes[4] <- "tiny"
sizes
#> [1] extra-large medium large tiny
#> Levels: tiny small medium large extra-large✅ SOLUTION 4: Use forcats Package (Tidyverse)
library(forcats)
sizes <- factor(c("small", "medium", "large"))
# Add level dynamically
sizes <- fct_expand(sizes, "extra-large", "tiny")
levels(sizes)
#> [1] "large" "medium" "small" "extra-large" "tiny"
sizes[1] <- "extra-large"
sizes
#> [1] extra-large medium large
#> Levels: large medium small extra-large tiny⚠️ Common Pitfall: Silent NA Creation
# Create factor
status <- factor(c("active", "inactive", "active"))
# Update many values
new_values <- c("active", "paused", "inactive")
status <- new_values # Coerces to character!
class(status) # Not a factor anymore!
#> [1] "character"
# Or if forcing to stay factor:
status <- factor(c("active", "inactive", "active"))
status[] <- new_values # "paused" becomes NA silently!
#> Warning in `[<-.factor`(`*tmp*`, , value = c("active", "paused", "inactive":
#> invalid factor level, NA generated
status
#> [1] active <NA> inactive
#> Levels: active inactiveAlways check for NAs after factor assignment:
12.5 Error #2: number of levels differs
⭐⭐ INTERMEDIATE 🔢 TYPE
12.5.1 The Error
f1 <- factor(c("a", "b", "c"))
f2 <- factor(c("a", "b"))
c(f1, f2) # Try to combine
#> [1] a b c a b
#> Levels: a b c🟡 WARNING
Warning message:
In c.factor(f1, f2) : number of levels differs
12.5.3 The Problem
f1 <- factor(c("red", "blue"))
f2 <- factor(c("green", "yellow"))
levels(f1)
#> [1] "blue" "red"
levels(f2)
#> [1] "green" "yellow"
# Combine - loses factor structure!
combined <- c(f1, f2)
combined # Just integers!
#> [1] red blue green yellow
#> Levels: blue red green yellow
class(combined)
#> [1] "factor"12.5.4 Solutions
✅ SOLUTION 1: Convert to Character First
f1 <- factor(c("red", "blue"))
f2 <- factor(c("green", "yellow"))
# Convert both to character
combined <- c(as.character(f1), as.character(f2))
combined
#> [1] "red" "blue" "green" "yellow"
# Convert back to factor
combined <- factor(combined)
combined
#> [1] red blue green yellow
#> Levels: blue green red yellow
levels(combined)
#> [1] "blue" "green" "red" "yellow"✅ SOLUTION 2: Use Same Levels for Both
# Define all levels upfront
all_levels <- c("red", "blue", "green", "yellow")
f1 <- factor(c("red", "blue"), levels = all_levels)
f2 <- factor(c("green", "yellow"), levels = all_levels)
# Now same levels
identical(levels(f1), levels(f2))
#> [1] TRUE
# Combine works better
combined <- c(f1, f2)
combined <- factor(combined, levels = all_levels)
combined
#> [1] red blue green yellow
#> Levels: red blue green yellow12.6 Error #3: contrasts can be applied only to factors with 2 or more levels
⭐⭐ INTERMEDIATE 🧮 MATH
12.6.1 The Error
# Factor with only one level
single_level <- factor(c("A", "A", "A", "A"))
levels(single_level)
#> [1] "A"
# Try to use in model
df <- data.frame(
group = single_level,
value = c(10, 20, 15, 25)
)
lm(value ~ group, data = df)
#> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels🔴 ERROR
Error in `contrasts<-`(`*tmp*`, value = contr.treatment(2)) :
contrasts can be applied only to factors with 2 or more levels
12.6.2 What It Means
Statistical models need at least 2 levels to compare. A single-level factor can’t be used as a predictor.
12.6.3 Common Causes
12.6.3.1 Cause 1: Accidental Filtering
df <- data.frame(
treatment = factor(c("A", "B", "A", "B", "C")),
outcome = rnorm(5)
)
# Filter to subset
df_filtered <- df[df$treatment == "A", ]
df_filtered$treatment # Still a factor, but only one level used
#> [1] A A
#> Levels: A B C
# Try to model
lm(outcome ~ treatment, data = df_filtered)
#> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels12.6.3.2 Cause 2: Data Preparation Gone Wrong
# Read data
responses <- factor(c("yes", "no", "maybe", "yes"))
# Remove certain responses
clean_responses <- responses[responses != "no" & responses != "maybe"]
clean_responses # Only "yes" left
#> [1] yes yes
#> Levels: maybe no yes
df <- data.frame(
response = clean_responses,
score = c(80, 90)
)
lm(score ~ response, data = df)
#> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels12.6.4 Solutions
✅ SOLUTION 1: Drop Unused Levels
df <- data.frame(
treatment = factor(c("A", "B", "A", "B", "C")),
outcome = rnorm(5)
)
# Filter
df_filtered <- df[df$treatment == "A", ]
# Drop unused levels
df_filtered$treatment <- droplevels(df_filtered$treatment)
levels(df_filtered$treatment) # Only "A" now
#> [1] "A"
# Model will error (only 1 level)
# But at least levels match reality✅ SOLUTION 2: Check Before Modeling
check_factor_for_modeling <- function(f) {
# Check if factor
if (!is.factor(f)) {
stop("Input is not a factor")
}
# Count levels with data
level_counts <- table(f)
levels_with_data <- sum(level_counts > 0)
if (levels_with_data < 2) {
stop("Factor has only ", levels_with_data,
" level(s) with data. Need at least 2 for modeling.")
}
# Check for unused levels
if (nlevels(f) > levels_with_data) {
message("Factor has ", nlevels(f) - levels_with_data,
" unused level(s). Consider droplevels().")
}
return(TRUE)
}
# Test
single <- factor(c("A", "A"))✅ SOLUTION 3: Convert to Character If Needed
df <- data.frame(
treatment = factor(c("A", "B", "A", "B", "C")),
outcome = rnorm(5)
)
df_filtered <- df[df$treatment == "A", ]
# If you don't need it as a factor, convert
df_filtered$treatment <- as.character(df_filtered$treatment)
# Or remove from model
lm(outcome ~ 1, data = df_filtered) # Intercept-only model
#>
#> Call:
#> lm(formula = outcome ~ 1, data = df_filtered)
#>
#> Coefficients:
#> (Intercept)
#> 0.326212.7 Creating Factors Correctly
🎯 Best Practice: Factor Creation
# Method 1: Basic factor
sizes <- factor(c("S", "M", "L", "M", "S"))
sizes
#> [1] S M L M S
#> Levels: L M S
# Method 2: Specify levels explicitly
sizes <- factor(
c("S", "M", "L"),
levels = c("XS", "S", "M", "L", "XL")
)
sizes
#> [1] S M L
#> Levels: XS S M L XL
levels(sizes) # All levels present
#> [1] "XS" "S" "M" "L" "XL"
# Method 3: With labels (different from levels)
sizes <- factor(
c(1, 2, 3, 2, 1),
levels = 1:5,
labels = c("XS", "S", "M", "L", "XL")
)
sizes
#> [1] XS S M S XS
#> Levels: XS S M L XL
# Method 4: Ordered factor
sizes <- factor(
c("S", "M", "L", "M", "S"),
levels = c("XS", "S", "M", "L", "XL"),
ordered = TRUE
)
sizes
#> [1] S M L M S
#> Levels: XS < S < M < L < XL
class(sizes) # "ordered" "factor"
#> [1] "ordered" "factor"
# Can now compare
sizes[1] < sizes[3] # TRUE (S < L)
#> [1] TRUE
# Method 5: From numeric
ages_binned <- cut(
c(15, 25, 35, 45, 55),
breaks = c(0, 18, 30, 50, 100),
labels = c("Youth", "Young Adult", "Middle Age", "Senior")
)
ages_binned
#> [1] Youth Young Adult Middle Age Middle Age Senior
#> Levels: Youth Young Adult Middle Age Senior12.8 Levels vs Labels
💡 Key Insight: Levels vs Labels
# Levels: What you have in the data
# Labels: What you want to display
# Example: Survey responses coded as numbers
responses <- c(1, 2, 3, 2, 1, 3)
# Wrong: Just convert to factor
bad <- factor(responses)
bad # Shows 1, 2, 3
#> [1] 1 2 3 2 1 3
#> Levels: 1 2 3
# Right: Provide labels
good <- factor(
responses,
levels = 1:3,
labels = c("Disagree", "Neutral", "Agree")
)
good # Shows actual meanings
#> [1] Disagree Neutral Agree Neutral Disagree Agree
#> Levels: Disagree Neutral Agree
# The underlying data is still integers
as.integer(good)
#> [1] 1 2 3 2 1 3
# But displays with labels
print(good)
#> [1] Disagree Neutral Agree Neutral Disagree Agree
#> Levels: Disagree Neutral Agree
levels(good)
#> [1] "Disagree" "Neutral" "Agree"Key difference: - levels: Values in your data (what it IS) - labels: Display names (what you WANT TO SHOW)
12.9 Ordered Factors
💡 Ordered vs Unordered Factors
# Unordered (nominal)
colors <- factor(c("red", "blue", "green"))
colors
#> [1] red blue green
#> Levels: blue green red
class(colors)
#> [1] "factor"
# Can't compare
colors[1] < colors[2] # Not meaningful
#> Warning in Ops.factor(colors[1], colors[2]): '<' not meaningful for factors
#> [1] NA
# Ordered (ordinal)
sizes <- ordered(c("S", "M", "L", "M", "S"),
levels = c("S", "M", "L"))
sizes
#> [1] S M L M S
#> Levels: S < M < L
class(sizes)
#> [1] "ordered" "factor"
# Can compare
sizes[1] < sizes[3] # TRUE
#> [1] TRUE
# Or use factor with ordered = TRUE
grades <- factor(
c("B", "A", "C", "A"),
levels = c("F", "D", "C", "B", "A"),
ordered = TRUE
)
grades
#> [1] B A C A
#> Levels: F < D < C < B < A
grades[1] < grades[2] # TRUE (B < A)
#> [1] TRUEWhen to use ordered: - Size (S < M < L) - Grade (F < D < C < B < A) - Likert scales (Strongly Disagree < … < Strongly Agree) - Any natural ordering
When NOT to use ordered: - Colors (no natural order) - Categories (no natural order) - Nominal data
12.10 Checking and Modifying Levels
🎯 Best Practice: Working with Levels
sizes <- factor(c("S", "M", "L", "M", "S"))
# Check levels
levels(sizes)
#> [1] "L" "M" "S"
nlevels(sizes)
#> [1] 3
# Check for specific level
"XL" %in% levels(sizes)
#> [1] FALSE
# Add levels
levels(sizes) <- c(levels(sizes), "XS", "XL")
levels(sizes)
#> [1] "L" "M" "S" "XS" "XL"
# Rename levels
sizes <- factor(c("S", "M", "L"))
levels(sizes) <- c("Small", "Medium", "Large")
sizes
#> [1] Large Medium Small
#> Levels: Small Medium Large
# Reorder levels
sizes <- factor(c("L", "S", "M"))
sizes <- factor(sizes, levels = c("S", "M", "L"))
sizes
#> [1] L S M
#> Levels: S M L
# Drop unused levels
sizes <- factor(c("S", "M", "L"), levels = c("XS", "S", "M", "L", "XL"))
levels(sizes) # All 5 levels
#> [1] "XS" "S" "M" "L" "XL"
sizes <- sizes[sizes != "L"] # Remove L observations
levels(sizes) # Still shows L!
#> [1] "XS" "S" "M" "L" "XL"
sizes <- droplevels(sizes)
levels(sizes) # Now only S and M
#> [1] "S" "M"
# Collapse levels
sizes <- factor(c("XS", "S", "M", "L", "XL"))
sizes_collapsed <- fct_collapse(sizes,
Small = c("XS", "S"),
Medium = "M",
Large = c("L", "XL")
)
sizes_collapsed
#> [1] Small Small Medium Large Large
#> Levels: Large Medium Small12.11 Common Factor Mistakes
⚠️ Pitfall 1: Converting Factor to Numeric
# Factor with numeric-looking levels
scores <- factor(c("90", "85", "95", "88"))
scores
#> [1] 90 85 95 88
#> Levels: 85 88 90 95
# WRONG: Direct conversion
as.numeric(scores) # Gives 4 2 5 3 (factor codes!)
#> [1] 3 1 4 2
# RIGHT: Convert through character
as.numeric(as.character(scores)) # 90 85 95 88
#> [1] 90 85 95 88
# Or use levels
as.numeric(levels(scores))[scores] # 90 85 95 88
#> [1] 90 85 95 88⚠️ Pitfall 2: Unexpected Coercion
⚠️ Pitfall 3: Factor Subsetting Keeps All Levels
sizes <- factor(c("S", "M", "L", "XL"))
levels(sizes)
#> [1] "L" "M" "S" "XL"
# Subset to only S and M
sizes_small <- sizes[sizes %in% c("S", "M")]
sizes_small
#> [1] S M
#> Levels: L M S XL
# But levels still show L and XL!
levels(sizes_small)
#> [1] "L" "M" "S" "XL"
# Drop unused levels
sizes_small <- droplevels(sizes_small)
levels(sizes_small)
#> [1] "M" "S"12.12 Summary
Key Takeaways:
- Factors are integers with labels - Understanding this prevents confusion
- Can only assign existing levels - Add level first or convert to character
- Combining factors is tricky - Use forcats or convert to character
- Drop unused levels after subsetting with
droplevels() - Specify levels explicitly when creating factors
- Ordered factors for data with natural ordering
- Convert through character when converting factor to numeric
Quick Reference:
| Error/Warning | Cause | Fix |
|---|---|---|
| invalid factor level, NA | Assigning non-existent level | Add level first or use character |
| number of levels differs | Combining different factors | Use fct_c() or same levels |
| contrasts need 2+ levels | Single-level factor in model | Check levels before modeling |
| Wrong numeric conversion | as.numeric(factor) | as.numeric(as.character(factor)) |
Factor Operations:
# Creation
factor(x)
factor(x, levels = ...)
factor(x, levels = ..., labels = ...)
ordered(x, levels = ...)
# Inspection
levels(f)
nlevels(f)
is.factor(f)
is.ordered(f)
# Modification
levels(f) <- new_levels
f <- droplevels(f)
f <- factor(f, levels = new_order)
# Conversion
as.character(f)
as.numeric(as.character(f)) # If numeric-likeBest Practices:
# ✅ Good
factor(x, levels = all_possible_levels) # Explicit levels
as.character(f) %>% modify() %>% factor() # Modify as character
droplevels(f) # After subsetting
fct_c(f1, f2) # Combine factors
# ❌ Avoid
as.numeric(factor_with_numbers) # Wrong conversion
c(factor1, factor2) # Loses factor structure
factor(x) # Without explicit levels
f[f %in% subset] without droplevels() # Unused levels remain12.13 Exercises
📝 Exercise 1: Factor Conversion
You have:
- Convert to proper numeric values
- Bin into letter grades (A: 90-100, B: 80-89, etc.)
- Create ordered factor of letter grades
📝 Exercise 2: Combining Factors
You have survey data from two sources:
survey1 <- factor(c("Agree", "Disagree", "Neutral"))
survey2 <- factor(c("Strongly Agree", "Agree", "Disagree"))Combine them into one factor with all response levels.
📝 Exercise 3: Factor Validation
Write validate_factor(f) that checks:
1. If input is a factor
2. If it has at least 2 levels
3. If it has unused levels
4. Returns report of issues found
📝 Exercise 4: Safe Factor Assignment
Write safe_assign_level(f, index, value) that:
1. Checks if value is in levels
2. Adds level if not present
3. Assigns the value
4. Returns modified factor
5. Warns about any changes made
12.14 Exercise Answers
Click to see answers
Exercise 1:
scores <- factor(c("85", "90", "95", "88", "92"))
# 1. Convert to numeric
scores_num <- as.numeric(as.character(scores))
scores_num
#> [1] 85 90 95 88 92
# 2. Bin into letter grades
letter_grades <- cut(
scores_num,
breaks = c(0, 60, 70, 80, 90, 100),
labels = c("F", "D", "C", "B", "A"),
include.lowest = TRUE
)
letter_grades
#> [1] B B A B A
#> Levels: F D C B A
# 3. Create ordered factor
letter_grades_ordered <- ordered(
letter_grades,
levels = c("F", "D", "C", "B", "A")
)
letter_grades_ordered
#> [1] B B A B A
#> Levels: F < D < C < B < A
# Can now compare
letter_grades_ordered[1] < letter_grades_ordered[3]
#> [1] TRUEExercise 2:
library(forcats)
survey1 <- factor(c("Agree", "Disagree", "Neutral"))
survey2 <- factor(c("Strongly Agree", "Agree", "Disagree"))
# Define all possible levels
all_levels <- c("Strongly Disagree", "Disagree", "Neutral",
"Agree", "Strongly Agree")
# Recreate with same levels
survey1 <- factor(survey1, levels = all_levels)
survey2 <- factor(survey2, levels = all_levels)
# Combine
combined <- fct_c(survey1, survey2)
combined
#> [1] Agree Disagree Neutral Strongly Agree Agree
#> [6] Disagree
#> Levels: Strongly Disagree Disagree Neutral Agree Strongly Agree
levels(combined)
#> [1] "Strongly Disagree" "Disagree" "Neutral"
#> [4] "Agree" "Strongly Agree"
# Alternative: convert to character first
survey1 <- factor(c("Agree", "Disagree", "Neutral"))
survey2 <- factor(c("Strongly Agree", "Agree", "Disagree"))
combined <- c(as.character(survey1), as.character(survey2))
combined <- factor(combined, levels = all_levels)
combined
#> [1] Agree Disagree Neutral Strongly Agree Agree
#> [6] Disagree
#> Levels: Strongly Disagree Disagree Neutral Agree Strongly AgreeExercise 3:
validate_factor <- function(f) {
issues <- list()
# Check if factor
if (!is.factor(f)) {
issues$not_factor <- paste("Input is", class(f)[1], "not factor")
return(issues)
}
# Check number of levels
n_levels <- nlevels(f)
if (n_levels < 2) {
issues$too_few_levels <- paste("Only", n_levels, "level(s). Need at least 2 for most analyses.")
}
# Check for unused levels
used_levels <- unique(as.character(f))
all_levels <- levels(f)
unused <- setdiff(all_levels, used_levels)
if (length(unused) > 0) {
issues$unused_levels <- paste("Unused levels:",
paste(unused, collapse = ", "))
}
# Report
if (length(issues) == 0) {
message("✓ Factor validation passed")
return(invisible(NULL))
} else {
message("Factor validation issues found:")
for (name in names(issues)) {
message(" - ", issues[[name]])
}
return(invisible(issues))
}
}
# Test
good <- factor(c("A", "B", "A", "B"))
validate_factor(good)
#> ✓ Factor validation passed
bad <- factor(c("A", "A", "A"), levels = c("A", "B", "C"))
validate_factor(bad)
#> Factor validation issues found:
#> - Unused levels: B, CExercise 4:
safe_assign_level <- function(f, index, value) {
# Validate input
if (!is.factor(f)) {
stop("Input must be a factor")
}
if (index < 1 || index > length(f)) {
stop("Index out of bounds")
}
# Check if value is in levels
if (!value %in% levels(f)) {
message("Adding new level: '", value, "'")
levels(f) <- c(levels(f), value)
}
# Assign
old_value <- as.character(f[index])
f[index] <- value
if (old_value != value) {
message("Changed position ", index, " from '", old_value,
"' to '", value, "'")
}
return(f)
}
# Test
sizes <- factor(c("S", "M", "L"))
# Existing level
sizes <- safe_assign_level(sizes, 1, "M")
#> Changed position 1 from 'S' to 'M'
sizes
#> [1] M M L
#> Levels: L M S
# New level
sizes <- safe_assign_level(sizes, 2, "XL")
#> Adding new level: 'XL'
#> Changed position 2 from 'M' to 'XL'
sizes
#> [1] M XL L
#> Levels: L M S XL
levels(sizes)
#> [1] "L" "M" "S" "XL"