5.8 Numeric vs. numeric: Scatterplots + smoother
5.8.1 Data & Packages & functions
geom_smooth()
: Adds smoothergeom_smooth(se= FALSE)
: Display confidence interval around smooth?
method = "loess"
- Default for small n, uses a smooth local regression(as described in?loess)
- Wiggliness of the line is controlled by the span parameter, which ranges from 0 (exceedingly wiggly) to 1 (not so wiggly)
- If n > 1000 alternative smoothing algorithm is used (Wickham 2016, 19)
5.8.2 Graph
- Figure 5.13 and 5.14 provide two examples:
- Questions:
- What does the graph show? What are the underlying variables (and data)?
- How many scales/mappings does it use? Could we reduce them?
- What do you like, what do you dislike about the figure? What is good, what is bad?
- What kind of information could we add to the graph (if any)?
- How would you approach a replication of the graph?

Figure 5.13: Small multiples of scatterplots

Figure 5.14: Scatterplot with colored subsets
5.8.3 Lab: Data & Code
# data_twitter_influence.csv
data <- read_csv(sprintf("https://docs.google.com/uc?id=%s&export=download",
"1dLSTUJ5KA-BmAdS-CHmmxzqDFm2xVfv6"))
ggplot(data %>% filter(followers_count<50000),
aes(x = account_age_years,
y = followers_count)) +
geom_point(alpha =0.5) +
facet_wrap(~party) +
ylab("Number of followers") +
xlab("Account age (in years)") +
scale_x_continuous(breaks = c(0, 5, 10), limits = c(0,10)) +
geom_smooth(method=lm, color = "black", fill="lightgray") +
geom_smooth(span = 0.3) +
theme_light()
ggplot(data %>% filter(followers_count<50000),
aes(x = account_age_years,
y = followers_count,
color = factor(party))) +
geom_point(alpha =0.5) +
#facet_wrap(~party) +
ylab("Number of followers") +
xlab("Account age (in years)") +
scale_x_continuous(breaks = c(0, 5, 10), limits = c(0,10)) +
geom_smooth(method=lm, aes(fill=party, color=party)) +
theme_light()
References
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer.