## 5.10 Time: Wave participation & time-point presence

### 5.10.1 Data & Packages & functions

• Plot type: Stacked bar plot
• tidyr::expand(): To create observations/rows for non-observed variable combinations

### 5.10.2 Graph

• Here we’ll reproduce and maybe criticize as well as improve Figure 5.17 (Bauer et al. 2020)
• Questions:
• What does the graph show? What are the underlying variables (and data)?
• How many scales/mappings does it use? Could we reduce them?
• What do you like, what do you dislike about the figure? What is good, what is bad?
• What kind of information could we add to the graph (if any)?
• How would you approach a replication of the graph? Figure 5.17: Presence/participation at/in different time points/waves

### 5.10.3 Lab: Data & Code

• The code for Figure 5.17 is shown below (and creates Figure 5.18).

• Learning objectives

• How to make stacked barplots
• How to expand data

We’ll start by preparing the data for our plot. As you can see below the data is in long-format already and contains an individual identifier pid as well as two variables that contain the same information namely the wave identifier in different format: wave.num and wave.

If you want directly move to the plot…

# data_wave_participation.csv
"1Y9z1shAjyaHgqpOxwt2T8uSoe-3RW-WI"),
col_types = cols())
head(data)
pid wave.num wave
421518540 1 Wave 1
441620046 1 Wave 1
454072144 1 Wave 1
477478244 1 Wave 1
481214044 1 Wave 1
453648542 1 Wave 1
nrow(data)
##  6258

We expand the data creating a new dataframe that we join with the older one. Like that we end up with a dataframe that indicated missings for missing $$\times$$ respondent wave observations.

# Expand to get dataset with rows for non observations
data.expand <- data %>% tidyr::expand(pid, wave.num)
head(data.expand)
pid wave.num
401008246 1
401008246 2
401008246 3
401008443 1
401008443 2
401008443 3
nrow(data.expand)
##  10269
# Right_Join with longformat data to real presence of respondents
data.expand <- data %>% right_join(data.expand) %>% arrange(pid, wave.num)
## Joining, by = c("pid", "wave.num")
head(data.expand)
pid wave.num wave
401008246 1 NA
401008246 2 NA
401008246 3 Wave 3
401008443 1 Wave 1
401008443 2 NA
401008443 3 NA
nrow(data.expand)
##  10269

Subsequently, we have to pursue different steps to summarize the data across waves as well as delete the categories with the smallest numbers (participants only in W2/W3 (N = 3) and only in W1/W3 (N = 2)). If you like you can skip this whole part and directly go to the function below.

# Subset variables
#data.expand <- data.expand %>% select(pid, wave.num, wave) # %>% distinct()

data.expand <- data.expand %>%
pivot_wider(names_from = wave.num, values_from = wave) %>%
arrange(pid)

# Rename wave variables
data.expand <- rename(data.expand,
wave1 = "1",
wave2 = "2",
wave3 = "3")

# Create "across_waves" with information of presence in single waves
data.expand <- unite(data.expand, across_waves, -pid)

# Aggregate to get observations per presence in different waves
data.expand <- data.expand %>% group_by(across_waves) %>% summarize(n = n())

# Separate united variable
data.expand <- data.expand %>% separate(across_waves, c("Wave1", "Wave2", "Wave3"), sep = "_")

# Replace values of wave variables with N values
data.expand$Wave1[data.expand$Wave1 != "NA"] <- data.expand$n[data.expand$Wave1 != "NA"]
data.expand$Wave2[data.expand$Wave2 != "NA"] <- data.expand$n[data.expand$Wave2 != "NA"]
data.expand$Wave3[data.expand$Wave3 != "NA"] <- data.expand$n[data.expand$Wave3 != "NA"]

# Delete groups only W2/W3 (N = 3) and only W1/W3 (N = 2)
data.expand <- data.expand %>% filter(n > 5) %>% select(-n)

# Create barplot illustrating the sampels across waves
data.expand <- pivot_longer(data.expand, Wave1:Wave3, names_to = "wave", values_to = "samples")
data.expand$samples <- as.numeric(data.expand$samples)
## Warning: NAs introduced by coercion
data.expand$samples_labels <- dplyr::recode(data.expand$samples,
"532" = "Only W1 (N = 532)",
"292" = "W1 and W2 (N = 292)",
"1269" = "W1, W2 and W3 (N = 1269)",
"482" = "Only W2 (N = 482)",
"843" = "Only W3 (N = 843)"
)

data.expand <- data.expand %>% filter(!is.na(samples))
data.expand <- data.expand %>% arrange(wave)
data_plot <- data.expand
data_plot
wave samples samples_labels
Wave1 532 Only W1 (N = 532)
Wave1 292 W1 and W2 (N = 292)
Wave1 1269 W1, W2 and W3 (N = 1269)
Wave2 482 Only W2 (N = 482)
Wave2 292 W1 and W2 (N = 292)
Wave2 1269 W1, W2 and W3 (N = 1269)
Wave3 843 Only W3 (N = 843)
Wave3 1269 W1, W2 and W3 (N = 1269)

Finally, we plot the participation across waves in Figure 5.18. Figure 5.18: Presence/participation at/in different time points/waves

### 5.10.4 Exercise

• Try to produce such a graph with a panel survey that you are currently using. Store the panel data in long-format, only keep the participant ID as well as the wave number, rename these pid and wave.num and then start with the code.

### References

Bauer, Paul C, Frederic Gerdon, Florian Keusch, and Frauke Kreuter. 2020. “The Impact of the GDPR Policy on Data Sharing/Privacy Attitudes.” Preliminary Draft, 1–22.