5.11 Time: Means across time (or other categories)
5.11.1 Data & Packages & functions
- Data: Various one-dimensional distributions (several single variables)
- Plot type: Dot plot with error bars
geom_errorbar()
: To create error barsposition=position_dodge(0.6)
: Dodge graph elements
5.11.2 Graph
- We’ll reproduce and maybe criticize as well as improve Figure 5.19 (Bauer et al. 2020)
- Questions:
- What does the graph show? What are the underlying variables (and data)?
- How many scales/mappings does it use? Could we reduce them?
- What do you like, what do you dislike about the figure? What is good, what is bad?
- What kind of information could we add to the graph (if any)?
- How would you approach a replication of the graph?

Figure 5.19: Means across time/categories
5.11.3 Lab: Data & Code
The code for Figure 5.19 is shown below (and creates Figure 5.20).
Learning objectives
- How to plot error bars
- How to dodge graph elements
The data has already been pre-processed, i.e., we have a dataframe that contains both our means as well as 90% and 95% percent confidence intervals for different subsamples of the data (as well as the full sample). The subsample are constructed from information on whether certain respondents participated across all waves or not. The dataframe also provides information on how these means should be grouped.
# data_gdpr_means_time.csv
data <- read_csv(sprintf("https://docs.google.com/uc?id=%s&export=download",
"1Ay7g1iIaCyxuDj2ce8UNc4kSRubForMN"))
## Parsed with column specification:
## cols(
## wave = col_character(),
## gdpr.know.num.mean = col_double(),
## gdpr.know.num.sd = col_double(),
## n = col_double(),
## n_wo_NA = col_double(),
## se = col_double(),
## ci_95 = col_double(),
## ci_90 = col_double(),
## participation = col_character(),
## label = col_character()
## )
pd <- position_dodge(0.6)
ggplot(data, aes(x = wave,
y = gdpr.know.num.mean,
color = factor(label),
group = factor(label))) +
geom_errorbar(aes(ymin=gdpr.know.num.mean - ci_90,
ymax=gdpr.know.num.mean + ci_90,
color = factor(label)),
colour="black",
size = 1,
width=.0,
position=pd) +
geom_errorbar(aes(ymin=gdpr.know.num.mean - ci_95,
ymax=gdpr.know.num.mean + ci_95,
color = factor(label)),
colour="black",
size = 0.4,
width=.1,
position=pd) +
geom_point(size = 3,
position=pd) +
scale_shape(solid = FALSE) +
ylim(0, 100) +
ylab("% GDPR Awareness") +
scale_x_discrete(labels = c(
"Wave 1 (N = 2093)\nApr 16 - 23, 2018",
"Wave 2 (N = 2043)\nJul 24 - Aug 02, 2018",
"Wave 3 (N = 2112)\nOct 29 - Nov 07, 2018"
)) +
theme_light() +
theme(axis.title.x = element_blank()) + scale_color_manual(
values = c("black", "#e41a1c", "#377eb8", "#4daf4a", "#984ea3", "#ff7f00"),
name = "Participation",
breaks = levels(factor(data$label)),
labels = c(
"Full Sample",
"Only W1 (N = 532)",
"Only W2 (N = 482)",
"Only W3 (N = 843)",
"W1 and W2 (N = 292)",
"W1, W2 and W3 (N = 1269)"
)
)

Figure 5.20: Means across time/categories
References
Bauer, Paul C, Frederic Gerdon, Florian Keusch, and Frauke Kreuter. 2020. “The Impact of the GDPR Policy on Data Sharing/Privacy Attitudes.” Preliminary Draft, 1–22.