Chapter 5 Vectors & Recycling

What You’ll Learn:

  • How R’s vector recycling works
  • When recycling helps and when it hurts
  • Length mismatch errors
  • Replacement length errors
  • Vectorization best practices

Key Errors Covered: 15+ recycling and length errors

Difficulty: ⭐ Beginner to ⭐⭐ Intermediate

5.1 Introduction

R’s superpower is vectorization - operations work on entire vectors at once. But with this power comes a quirky feature called recycling that causes endless confusion.

# Simple vectorization
c(1, 2, 3) + c(10, 20, 30)
#> [1] 11 22 33

But what about this?

# Different lengths!
c(1, 2, 3, 4) + c(10, 20)
#> [1] 11 22 13 24

It works! But is this what you wanted? Let’s explore when recycling helps and when it causes errors.

5.2 Understanding Recycling

💡 Key Insight: The Recycling Rule

When vectors of different lengths are used together, R repeats the shorter one to match the longer one.

# What happens:
c(1, 2, 3, 4) + c(10, 20)
#> [1] 11 22 13 24

# R expands to:
c(1, 2, 3, 4) + c(10, 20, 10, 20)
#> [1] 11 22 13 24
#                        ↑   ↑  recycled!

Works smoothly when: - One vector is length 1 (scalar) - Lengths are multiples (2 and 4, 3 and 6)

Warns when: - Lengths aren’t multiples (3 and 5)

Errors when: - Replacement context and lengths don’t match

5.3 Error #1: longer object length is not a multiple

⭐ BEGINNER 📏 LENGTH

5.3.1 The Warning

c(1, 2, 3) + c(10, 20, 30, 40, 50)
#> Warning in c(1, 2, 3) + c(10, 20, 30, 40, 50): longer object length is not a
#> multiple of shorter object length
#> [1] 11 22 33 41 52

🟡 WARNING

Warning message:
In c(1, 2, 3) + c(10, 20, 30, 40, 50) :
  longer object length is not a multiple of shorter object length

5.3.2 What It Means

R is recycling, but the lengths don’t match evenly. This usually indicates a mistake.

5.3.3 Common Causes

5.3.3.1 Cause 1: Data Mismatch

# You have 100 observations
data <- rnorm(100)

# But only 3 group labels
groups <- c("A", "B", "C")

# Recycling happens
combined <- data.frame(value = data, group = groups)
#> Error in data.frame(value = data, group = groups): arguments imply differing number of rows: 100, 3

The warning tells you: “Hey, are you sure about this?”

5.3.3.2 Cause 2: Filtering Gone Wrong

x <- 1:10
y <- 1:7  # Oops, lost some values

# Operations warn
x + y
#> Warning in x + y: longer object length is not a multiple of shorter object
#> length
#>  [1]  2  4  6  8 10 12 14  9 11 13
x * y
#> Warning in x * y: longer object length is not a multiple of shorter object
#> length
#>  [1]  1  4  9 16 25 36 49  8 18 30

5.3.3.3 Cause 3: Unintended Partial Matching

treatment <- c("Drug", "Placebo")
outcomes <- rnorm(25)  # 25 subjects

# Assigning treatment to outcomes
data.frame(outcome = outcomes, treatment = treatment)
#> Error in data.frame(outcome = outcomes, treatment = treatment): arguments imply differing number of rows: 25, 2

Warning: 25 is not a multiple of 2!

5.3.4 Solutions

SOLUTION 1: Fix the Lengths

# Original problem
x <- 1:10
y <- 1:7

# Option A: Trim to match
min_len <- min(length(x), length(y))
x[1:min_len] + y[1:min_len]
#> [1]  2  4  6  8 10 12 14

# Option B: Extend with NA
y_extended <- c(y, rep(NA, length(x) - length(y)))
x + y_extended
#>  [1]  2  4  6  8 10 12 14 NA NA NA

# Option C: Explicit recycling (if intentional)
y_recycled <- rep(y, length.out = length(x))
x + y_recycled
#>  [1]  2  4  6  8 10 12 14  9 11 13

SOLUTION 2: Check Lengths Before Operating

safe_operation <- function(x, y, op = `+`) {
  if (length(x) != length(y)) {
    # Check if one is length 1 (scalar - OK)
    if (length(x) == 1 || length(y) == 1) {
      return(op(x, y))
    }
    
    # Check if lengths are multiples
    if (max(length(x), length(y)) %% min(length(x), length(y)) != 0) {
      warning("Lengths are not multiples: ", 
              length(x), " and ", length(y))
    }
  }
  
  return(op(x, y))
}

# Test
safe_operation(1:10, 1:7, `+`)  # Warns
#> Warning in safe_operation(1:10, 1:7, `+`): Lengths are not multiples: 10 and 7
#> Warning in op(x, y): longer object length is not a multiple of shorter object
#> length
#>  [1]  2  4  6  8 10 12 14  9 11 13
safe_operation(1:10, 1:5, `+`)  # No warning (10/5 = 2)
#>  [1]  2  4  6  8 10  7  9 11 13 15
safe_operation(1:10, 2, `+`)    # No warning (scalar)
#>  [1]  3  4  5  6  7  8  9 10 11 12

SOLUTION 3: Use rep() Explicitly

# Make intention clear
x <- 1:12
pattern <- c(1, 2, 3)

# Explicit recycling
y <- rep(pattern, length.out = length(x))
x + y
#>  [1]  2  4  6  5  7  9  8 10 12 11 13 15

# Or with times
y <- rep(pattern, times = length(x) / length(pattern))
x + y
#>  [1]  2  4  6  5  7  9  8 10 12 11 13 15

⚠️ Common Pitfall: Silent Recycling with Multiples

# No warning when lengths are multiples!
x <- 1:6
y <- c(10, 20, 30)  # 6 is multiple of 3

result <- x + y
result
#> [1] 11 22 33 14 25 36

# R expanded y to: c(10, 20, 30, 10, 20, 30)
# Was this intended?

Always check: Just because it doesn’t warn doesn’t mean it’s correct!

5.4 Error #2: replacement has X rows, data has Y

⭐⭐ INTERMEDIATE 📏 LENGTH

5.4.1 The Error

df <- data.frame(x = 1:5, y = 6:10)
df$z <- 1:3  # Wrong length!
#> Error in `$<-.data.frame`(`*tmp*`, z, value = 1:3): replacement has 3 rows, data has 5

🔴 ERROR

Error in `$<-.data.frame`(`*tmp*`, z, value = 1:3) : 
  replacement has 3 rows, data has 5

5.4.2 What It Means

You’re trying to add/replace a column, but the number of values doesn’t match the number of rows.

5.4.3 Common Causes

5.4.3.1 Cause 1: Wrong Length Column

df <- data.frame(id = 1:10)

# Calculated something with wrong length
summary_values <- c(100, 200, 300)  # Only 3 values

# Try to add as column
df$summary <- summary_values  # Error!
#> Error in `$<-.data.frame`(`*tmp*`, summary, value = c(100, 200, 300)): replacement has 3 rows, data has 10

5.4.3.2 Cause 2: Filtered Data Reassignment

df <- data.frame(x = 1:10, y = rnorm(10))

# Filter
subset_df <- df[df$y > 0, ]  # Maybe 6 rows

# Create column for subset
new_values <- 1:6

# Try to add to original
df$new <- new_values  # Error! Original has 10 rows
#> Error in `$<-.data.frame`(`*tmp*`, new, value = 1:6): replacement has 6 rows, data has 10

5.4.3.3 Cause 3: Aggregation Length Mismatch

# 20 observations
df <- data.frame(
  id = 1:20,
  group = rep(c("A", "B"), each = 10)
)

# Aggregate to 2 values (one per group)
group_means <- tapply(df$id, df$group, mean)

# Try to add back to original
df$group_mean <- group_means  # Error! 2 values, 20 rows

5.4.4 Solutions

SOLUTION 1: Match the Length

df <- data.frame(id = 1:10)
summary_values <- c(100, 200, 300)

# Recycle explicitly
df$summary <- rep(summary_values, length.out = nrow(df))

# Or extend with NA
df$summary <- c(summary_values, rep(NA, nrow(df) - length(summary_values)))

SOLUTION 2: Use Merge/Join for Aggregates

# Original data
df <- data.frame(
  id = 1:20,
  group = rep(c("A", "B"), each = 10),
  value = rnorm(20)
)

# Aggregate
group_summary <- aggregate(value ~ group, df, mean)
names(group_summary)[2] <- "group_mean"

# Merge back
df <- merge(df, group_summary, by = "group")
head(df)
#>   group id      value group_mean
#> 1     A  1 -0.7212893 -0.5035515
#> 2     A  2 -0.3361355 -0.5035515
#> 3     A  3 -0.5519150 -0.5035515
#> 4     A  4  0.1108687 -0.5035515
#> 5     A  5  0.5672052 -0.5035515
#> 6     A  6 -2.0882567 -0.5035515

SOLUTION 3: dplyr Way (Cleaner)

library(dplyr)

df <- data.frame(
  id = 1:20,
  group = rep(c("A", "B"), each = 10),
  value = rnorm(20)
)

# Add group mean to each row
df <- df %>%
  group_by(group) %>%
  mutate(group_mean = mean(value)) %>%
  ungroup()

head(df)
#> # A tibble: 6 × 4
#>      id group  value group_mean
#>   <int> <chr>  <dbl>      <dbl>
#> 1     1 A     -0.248     -0.782
#> 2     2 A     -1.84      -0.782
#> 3     3 A     -0.314     -0.782
#> 4     4 A     -0.769     -0.782
#> 5     5 A     -0.802     -0.782
#> 6     6 A     -0.512     -0.782

5.5 Error #3: number of items to replace is not a multiple

⭐⭐ INTERMEDIATE 📏 LENGTH

5.5.1 The Error

x <- 1:10
x[1:7] <- c(100, 200)  # 7 positions, 2 values
#> Warning in x[1:7] <- c(100, 200): number of items to replace is not a multiple
#> of replacement length

🔴 ERROR

Error in x[1:7] <- c(100, 200) : 
  number of items to replace is not a multiple of replacement length

5.5.2 What It Means

You’re replacing a subset, but the lengths don’t match evenly (not multiples).

5.5.3 When This Happens

# Replacing 10 items with 3 values
x <- 1:10
x[] <- c(1, 2, 3)  # 10 is not a multiple of 3
#> Warning in x[] <- c(1, 2, 3): number of items to replace is not a multiple of
#> replacement length

# Replacing 7 items with 2 values
x[1:7] <- c(10, 20)  # 7 is not a multiple of 2
#> Warning in x[1:7] <- c(10, 20): number of items to replace is not a multiple of
#> replacement length

But these work:

# Length 1 always works
x <- 1:10
x[1:7] <- 99
x
#>  [1] 99 99 99 99 99 99 99  8  9 10

# Multiples work
x <- 1:10
x[1:6] <- c(10, 20, 30)  # 6 is multiple of 3
x
#>  [1] 10 20 30 10 20 30  7  8  9 10

5.5.4 Solutions

SOLUTION 1: Make Lengths Match

x <- 1:10

# Option A: Recycle explicitly
replacement <- rep(c(100, 200), length.out = 7)
x[1:7] <- replacement
x
#>  [1] 100 200 100 200 100 200 100   8   9  10

# Option B: Subset to match
x <- 1:10
x[1:2] <- c(100, 200)  # Only replace 2
x
#>  [1] 100 200   3   4   5   6   7   8   9  10

SOLUTION 2: Use ifelse() for Conditional Replacement

x <- 1:10

# Replace first 7 with pattern
x <- ifelse(seq_along(x) <= 7, 
            rep(c(100, 200), length.out = length(x))[seq_along(x)], 
            x)
x
#>  [1] 100 200 100 200 100 200 100   8   9  10

5.6 Error #4: replacement has length zero

⭐⭐ INTERMEDIATE 📏 LENGTH

5.6.1 The Error

x <- 1:5
x[3] <- c()  # Empty vector!
#> Error in x[3] <- c(): replacement has length zero

🔴 ERROR

Error in x[3] <- c() : replacement has length zero

5.6.2 What It Means

You’re trying to replace elements with an empty vector (length 0).

5.6.3 Common Causes

5.6.3.1 Cause 1: Empty Filter Result

df <- data.frame(x = 1:10, y = letters[1:10])

# Filter returns empty
subset_values <- df$x[df$y == "z"]  # No "z", returns numeric(0)

# Try to use for replacement
df$new[1:5] <- subset_values  # Error!

5.6.3.2 Cause 2: Function Returns Empty

get_values <- function(condition) {
  if (condition) {
    return(1:5)
  } else {
    return(numeric(0))  # Oops!
  }
}

x <- 1:10
x[1:5] <- get_values(FALSE)  # Error!
#> Error in x[1:5] <- get_values(FALSE): replacement has length zero

5.6.4 Solutions

SOLUTION 1: Check Before Replacing

x <- 1:10
replacement <- numeric(0)  # Empty

# Check first
if (length(replacement) > 0) {
  x[1:length(replacement)] <- replacement
} else {
  message("No replacement values")
}
#> No replacement values

SOLUTION 2: Use NA as Default

get_values_safe <- function(condition) {
  if (condition) {
    return(1:5)
  } else {
    return(NA)  # Or a default value
  }
}

x <- 1:10
x[1:5] <- get_values_safe(FALSE)  # Works, assigns NA
x
#>  [1] NA NA NA NA NA  6  7  8  9 10

5.7 Vectorization Best Practices

🎯 Best Practice: Length-Safe Operations

# 1. Check lengths match
operate_safely <- function(x, y, fun) {
  if (length(x) != length(y)) {
    stop("Vectors must be same length. Got ", 
         length(x), " and ", length(y))
  }
  fun(x, y)
}

# 2. Use recycling intentionally (scalars only)
add_scalar <- function(vec, scalar) {
  stopifnot(length(scalar) == 1)
  vec + scalar
}

# 3. Document recycling behavior
#' Add vectors with explicit recycling
#' @param x numeric vector
#' @param y numeric vector (will be recycled to length of x)
add_with_recycling <- function(x, y) {
  if (length(y) == 1) {
    return(x + y)  # Scalar - always OK
  }
  
  y_recycled <- rep(y, length.out = length(x))
  return(x + y_recycled)
}

5.8 Understanding Vector Operations

💡 Key Insight: What Gets Recycled

# Arithmetic operators
1:4 + c(10, 20)           # Addition
#> [1] 11 22 13 24
1:4 - c(10, 20)           # Subtraction
#> [1]  -9 -18  -7 -16
1:4 * c(2, 3)             # Multiplication
#> [1]  2  6  6 12
1:4 / c(2, 4)             # Division
#> [1] 0.5 0.5 1.5 1.0

# Logical operators
c(TRUE, FALSE) & c(TRUE, TRUE, FALSE, FALSE)
#> [1]  TRUE FALSE FALSE FALSE
c(TRUE, FALSE) | c(FALSE, FALSE, TRUE, TRUE)
#> [1]  TRUE FALSE  TRUE  TRUE

# Comparison operators
1:6 > c(2, 4, 6)          # Recycles both
#> [1] FALSE FALSE FALSE  TRUE  TRUE FALSE

# Assignment
x <- 1:12
x[] <- c(1, 2, 3)         # Recycles to 12
x
#>  [1] 1 2 3 1 2 3 1 2 3 1 2 3

Key point: Recycling happens in MANY contexts!

5.9 Edge Cases and Gotchas

5.9.1 Gotcha #1: Matrix Recycling

# Matrices recycle by column!
matrix(1:2, nrow = 3, ncol = 4)
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    2    1    2
#> [2,]    2    1    2    1
#> [3,]    1    2    1    2

Warning appears because 12 (3×4) is not multiple of 2.

5.9.2 Gotcha #2: Data Frame Column Recycling

# This works - length 1 always recycles
df <- data.frame(
  x = 1:5,
  y = 10  # Recycled to 5
)
df
#>   x  y
#> 1 1 10
#> 2 2 10
#> 3 3 10
#> 4 4 10
#> 5 5 10

# This works - multiple lengths
df <- data.frame(
  x = 1:6,
  y = c(1, 2)  # Recycled to 6
)
df
#>   x y
#> 1 1 1
#> 2 2 2
#> 3 3 1
#> 4 4 2
#> 5 5 1
#> 6 6 2
# This fails - not a multiple
df <- data.frame(
  x = 1:5,
  y = c(1, 2)  # 5 is not multiple of 2
)
#> Error in data.frame(x = 1:5, y = c(1, 2)): arguments imply differing number of rows: 5, 2

5.9.3 Gotcha #3: List Operations Don’t Recycle

# Vectors recycle
c(1, 2) + c(10, 20, 30)  # Works (with warning)
#> Warning in c(1, 2) + c(10, 20, 30): longer object length is not a multiple of
#> shorter object length
#> [1] 11 22 31

# Lists don't
list(1, 2) + list(10, 20, 30)  # Error!
#> Error in list(1, 2) + list(10, 20, 30): non-numeric argument to binary operator

Lists need explicit handling:

x <- list(1, 2, 3)
y <- list(10, 20)

# Use Map or mapply
Map(`+`, x, rep(y, length.out = length(x)))
#> [[1]]
#> [1] 11
#> 
#> [[2]]
#> [1] 22
#> 
#> [[3]]
#> [1] 13

5.10 Debugging Recycling Issues

💡 Debugging Checklist

# 1. Check lengths
x <- 1:10
y <- 1:7
length(x)
#> [1] 10
length(y)
#> [1] 7

# 2. Check if they're multiples
max(length(x), length(y)) %% min(length(x), length(y))
#> [1] 3
# 0 = clean multiple, anything else = partial recycling

# 3. Visualize recycling
rep(y, length.out = length(x))
#>  [1] 1 2 3 4 5 6 7 1 2 3

# 4. Test operation
tryCatch(
  x + y,
  warning = function(w) {
    message("Warning caught: ", w$message)
  }
)
#> Warning caught: longer object length is not a multiple of shorter object length

# 5. Check for unexpected conversions
class(x); typeof(x)
#> [1] "integer"
#> [1] "integer"
class(y); typeof(y)
#> [1] "integer"
#> [1] "integer"

5.11 Summary

Key Takeaways:

  1. Recycling is automatic: R repeats shorter vectors to match longer ones
  2. Warnings appear: When lengths aren’t multiples (except scalars)
  3. Scalars always work: Length 1 recycles to any length
  4. Check before operating: Use length() to verify matches
  5. Explicit is better: Use rep() to show intent
  6. Data frames are strict: Column lengths must match (or be length 1)
  7. Errors vs warnings: Replacement operations error, arithmetic operations warn

Quick Reference:

Situation Behavior
Same length No recycling needed
One is length 1 Silent recycling (scalar)
Lengths are multiples Silent recycling (e.g., 2 and 6)
Lengths not multiples Warning + recycling (e.g., 3 and 7)
Replacement, wrong length Error (not multiples)
Replacement, length 0 Error
Data frame column Error if not length 1 or nrow

Prevention:

# Always check
stopifnot(length(x) == length(y))

# Or use scalars only
stopifnot(length(y) == 1)

# Or recycle explicitly
y <- rep(y, length.out = length(x))

Remember: No warning doesn’t mean correct! Multiples recycle silently.

5.12 Exercises

📝 Exercise 1: Predict the Outcome

What will happen? Will it work, warn, or error?

# A
c(1, 2, 3, 4) + c(10, 20)

# B
c(1, 2, 3, 4, 5) + c(10, 20)

# C
df <- data.frame(x = 1:10)
df$y <- c(1, 2, 3, 4, 5)

# D
x <- 1:12
x[] <- c(1, 2, 3, 4)

# E
matrix(1:5, nrow = 5, ncol = 5)

📝 Exercise 2: Fix the Code

Debug these recycling problems:

# Problem 1
students <- 1:25
groups <- c("A", "B", "C")
data.frame(student = students, group = groups)

# Problem 2
values <- rnorm(100)
weights <- c(1, 2, 3)
weighted <- values * weights

# Problem 3
df <- data.frame(id = 1:20)
summary_stats <- c(mean = 50, sd = 10, n = 20)
df$mean <- summary_stats["mean"]

📝 Exercise 3: Safe Operations

Write a function safe_add(x, y) that: 1. Checks if lengths match 2. If not, asks user what to do: - Error - Recycle shorter - Trim longer - Extend with NA 3. Performs the operation 4. Returns result with attribute showing what was done

📝 Exercise 4: Real World

You have exam scores for 100 students across 4 quarters:

scores_q1 <- rnorm(100, mean = 75, sd = 10)
scores_q2 <- rnorm(98, mean = 78, sd = 10)   # 2 students dropped
scores_q3 <- rnorm(102, mean = 80, sd = 10)  # 2 new students
scores_q4 <- rnorm(100, mean = 82, sd = 10)

Create a data frame with: - All students who completed at least one quarter - NA for missing scores - Calculate average score per student

5.13 Exercise Answers

Click to see answers

Exercise 1:

# A - Works, silent (4 is multiple of 2)
c(1, 2, 3, 4) + c(10, 20)
#> [1] 11 22 13 24

# B - Works, warns (5 not multiple of 2)
c(1, 2, 3, 4, 5) + c(10, 20)
#> Warning in c(1, 2, 3, 4, 5) + c(10, 20): longer object length is not a multiple
#> of shorter object length
#> [1] 11 22 13 24 15

# C - Errors (10 not multiple of 5)
tryCatch(
  data.frame(x = 1:10, y = c(1, 2, 3, 4, 5)),
  error = function(e) message("Error: ", e$message)
)
#>     x y
#> 1   1 1
#> 2   2 2
#> 3   3 3
#> 4   4 4
#> 5   5 5
#> 6   6 1
#> 7   7 2
#> 8   8 3
#> 9   9 4
#> 10 10 5

# D - Works, silent (12 is multiple of 4)
x <- 1:12
x[] <- c(1, 2, 3, 4)
x
#>  [1] 1 2 3 4 1 2 3 4 1 2 3 4

# E - Works, silent (25 is multiple of 5)
matrix(1:5, nrow = 5, ncol = 5)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    1    1    1    1
#> [2,]    2    2    2    2    2
#> [3,]    3    3    3    3    3
#> [4,]    4    4    4    4    4
#> [5,]    5    5    5    5    5

Exercise 2:

# Problem 1 - Recycle groups explicitly
students <- 1:25
groups <- c("A", "B", "C")
data.frame(
  student = students, 
  group = rep(groups, length.out = length(students))
)
#>    student group
#> 1        1     A
#> 2        2     B
#> 3        3     C
#> 4        4     A
#> 5        5     B
#> 6        6     C
#> 7        7     A
#> 8        8     B
#> 9        9     C
#> 10      10     A
#> 11      11     B
#> 12      12     C
#> 13      13     A
#> 14      14     B
#> 15      15     C
#> 16      16     A
#> 17      17     B
#> 18      18     C
#> 19      19     A
#> 20      20     B
#> 21      21     C
#> 22      22     A
#> 23      23     B
#> 24      24     C
#> 25      25     A

# Problem 2 - Make intention clear
values <- rnorm(100)
weights <- c(1, 2, 3)
weights_full <- rep(weights, length.out = length(values))
weighted <- values * weights_full

# Problem 3 - Extract scalar properly
df <- data.frame(id = 1:20)
summary_stats <- c(mean = 50, sd = 10, n = 20)
df$mean <- summary_stats[["mean"]]  # Single value

Exercise 3:

safe_add <- function(x, y, action = c("error", "recycle", "trim", "extend")) {
  action <- match.arg(action)
  
  if (length(x) == length(y)) {
    result <- x + y
    attr(result, "action") <- "none_needed"
    return(result)
  }
  
  if (action == "error") {
    stop("Lengths don't match: ", length(x), " vs ", length(y))
  }
  
  if (action == "recycle") {
    max_len <- max(length(x), length(y))
    x <- rep(x, length.out = max_len)
    y <- rep(y, length.out = max_len)
    result <- x + y
    attr(result, "action") <- "recycled"
  }
  
  if (action == "trim") {
    min_len <- min(length(x), length(y))
    result <- x[1:min_len] + y[1:min_len]
    attr(result, "action") <- "trimmed"
  }
  
  if (action == "extend") {
    max_len <- max(length(x), length(y))
    x <- c(x, rep(NA, max_len - length(x)))
    y <- c(y, rep(NA, max_len - length(y)))
    result <- x + y
    attr(result, "action") <- "extended"
  }
  
  return(result)
}

# Test
safe_add(1:5, 1:3, "recycle")
#> [1] 2 4 6 5 7
#> attr(,"action")
#> [1] "recycled"

Exercise 4:

# Create scores with different lengths
set.seed(123)
scores_q1 <- rnorm(100, mean = 75, sd = 10)
scores_q2 <- rnorm(98, mean = 78, sd = 10)
scores_q3 <- rnorm(102, mean = 80, sd = 10)
scores_q4 <- rnorm(100, mean = 82, sd = 10)

# Find max number of students
max_students <- max(length(scores_q1), length(scores_q2), 
                   length(scores_q3), length(scores_q4))

# Extend all to max length with NA
extend_with_na <- function(x, target_len) {
  c(x, rep(NA, target_len - length(x)))
}

# Create data frame
df <- data.frame(
  student_id = 1:max_students,
  q1 = extend_with_na(scores_q1, max_students),
  q2 = extend_with_na(scores_q2, max_students),
  q3 = extend_with_na(scores_q3, max_students),
  q4 = extend_with_na(scores_q4, max_students)
)

# Calculate average (ignoring NAs)
df$average <- rowMeans(df[, c("q1", "q2", "q3", "q4")], na.rm = TRUE)

# Keep only students with at least one score
df <- df[!is.nan(df$average), ]

head(df)
#>   student_id       q1       q2        q3       q4  average
#> 1          1 69.39524 70.89593  73.88834 74.84758 72.25677
#> 2          2 72.69823 80.56884  68.14520 74.47311 73.97134
#> 3          3 90.58708 75.53308 101.98810 72.61461 85.18072
#> 4          4 75.70508 74.52457  93.12413 71.47487 78.70716
#> 5          5 76.29288 68.48381  77.34855 77.62840 74.93841
#> 6          6 92.15065 77.54972  85.43194 85.31179 85.11103