5.10 Time: Wave participation & time-point presence
5.10.1 Data & Packages & functions
- Plot type: Stacked bar plot
tidyr::expand(): To create observations/rows for non-observed variable combinations
- Here we’ll reproduce and maybe criticize as well as improve Figure 5.17 (Bauer et al. 2020)
- What does the graph show? What are the underlying variables (and data)?
- How many scales/mappings does it use? Could we reduce them?
- What do you like, what do you dislike about the figure? What is good, what is bad?
- What kind of information could we add to the graph (if any)?
- How would you approach a replication of the graph?
5.10.3 Lab: Data & Code
- How to make stacked barplots
- How to expand data
We’ll start by preparing the data for our plot. As you can see below the data is in long-format already and contains an individual identifier
pid as well as two variables that contain the same information namely the wave identifier in different format:
If you want directly move to the plot…
##  6258
We expand the data creating a new dataframe that we join with the older one. Like that we end up with a dataframe that indicated missings for missing \(\times\) respondent wave observations.
##  10269
## Joining, by = c("pid", "wave.num")
##  10269
Subsequently, we have to pursue different steps to summarize the data across waves as well as delete the categories with the smallest numbers (participants only in W2/W3 (N = 3) and only in W1/W3 (N = 2)). If you like you can skip this whole part and directly go to the function below.
# Subset variables #data.expand <- data.expand %>% select(pid, wave.num, wave) # %>% distinct() # Spread dataset and arrange data.expand <- data.expand %>% pivot_wider(names_from = wave.num, values_from = wave) %>% arrange(pid) # Rename wave variables data.expand <- rename(data.expand, wave1 = "1", wave2 = "2", wave3 = "3") # Create "across_waves" with information of presence in single waves data.expand <- unite(data.expand, across_waves, -pid) # Aggregate to get observations per presence in different waves data.expand <- data.expand %>% group_by(across_waves) %>% summarize(n = n()) # Separate united variable data.expand <- data.expand %>% separate(across_waves, c("Wave1", "Wave2", "Wave3"), sep = "_") # Replace values of wave variables with N values data.expand$Wave1[data.expand$Wave1 != "NA"] <- data.expand$n[data.expand$Wave1 != "NA"] data.expand$Wave2[data.expand$Wave2 != "NA"] <- data.expand$n[data.expand$Wave2 != "NA"] data.expand$Wave3[data.expand$Wave3 != "NA"] <- data.expand$n[data.expand$Wave3 != "NA"] # Delete groups only W2/W3 (N = 3) and only W1/W3 (N = 2) data.expand <- data.expand %>% filter(n > 5) %>% select(-n) # Create barplot illustrating the sampels across waves data.expand <- pivot_longer(data.expand, Wave1:Wave3, names_to = "wave", values_to = "samples") data.expand$samples <- as.numeric(data.expand$samples)
## Warning: NAs introduced by coercion
data.expand$samples_labels <- dplyr::recode(data.expand$samples, "532" = "Only W1 (N = 532)", "292" = "W1 and W2 (N = 292)", "1269" = "W1, W2 and W3 (N = 1269)", "482" = "Only W2 (N = 482)", "843" = "Only W3 (N = 843)" ) data.expand <- data.expand %>% filter(!is.na(samples)) data.expand <- data.expand %>% arrange(wave) data_plot <- data.expand data_plot
|Wave1||532||Only W1 (N = 532)|
|Wave1||292||W1 and W2 (N = 292)|
|Wave1||1269||W1, W2 and W3 (N = 1269)|
|Wave2||482||Only W2 (N = 482)|
|Wave2||292||W1 and W2 (N = 292)|
|Wave2||1269||W1, W2 and W3 (N = 1269)|
|Wave3||843||Only W3 (N = 843)|
|Wave3||1269||W1, W2 and W3 (N = 1269)|
Finally, we plot the participation across waves in Figure 5.18.
- Try to produce such a graph with a panel survey that you are currently using. Store the panel data in long-format, only keep the participant ID as well as the wave number, rename these
wave.numand then start with the code.
Bauer, Paul C, Frederic Gerdon, Florian Keusch, and Frauke Kreuter. 2020. “The Impact of the GDPR Policy on Data Sharing/Privacy Attitudes.” Preliminary Draft, 1–22.