## 6.6 Graph: Coefficient plots with facetting and coloring

• Figure 6.4 below provides a combination of facetting and coloring since models have been estimated in subsets of the data defined by two variables: Infant.Mortality_cat and Examination_cat.
• Questions:
• What does the graph show? What are the underlying variables (and data)?
• How many scales/mappings does it use? Could we reduce them?
• What do you like, what do you dislike about the figure? What is good, what is bad?
• What kind of information could we add to the graph (if any)?
• How would you approach a replication of the graph?

### 6.6.1 Lab: Data & code

Data preparations are somewhat more complicated when we want to show facets and colors, i.e., in other words we want to produce coefficient plots for subsets defined by two variables. Again we work with nested dataframes, as well as model results that are nested in a dataframe.

We proceed in several steps:

1. We split the dataset into subsets according to Examination_cat AND Infant.Mortality_cat.
2. We estimate the linear models in those subsets (see map(..lm(...))).
3. We tidy the estimations as to obtain a nice dataframe.
4. We estimate confidence intervals also obtaining nice dataframes (see map(fit, conf.level = 0.90, confint_tidy)).
5. We rename the vars in the confidence intervals dataframes (see rename_all(...)).
6. We unnest() the data obtaining one dataframe that contains the estimates and intervals across all subsets defined by Examination_cat AND Infant.Mortality_cat.
• Here we add to delete the results for two subsets because they didn’t contain enough observations
• see e.g., filter(!(Examination_cat == "higher" & Infant.Mortality_cat == "high"))
7. Finally, we filter the intercepts in those estimations.
results <- swiss %>%
filter(!is.na(Infant.Mortality_cat), !is.na(Examination_cat)) %>%
select(Examination_cat, Infant.Mortality_cat, Fertility, Agriculture, Education, Catholic) %>%
nest(data = c(Fertility, Agriculture, Education, Catholic)) %>%
mutate(fit = map(data, ~ lm(Fertility ~ Catholic + Agriculture + Education, data = .)),
results = map(fit, tidy),
results_90 = map(fit, conf.level = 0.90, confint_tidy),
results_95 = map(fit, conf.level = 0.95, confint_tidy)) %>%
mutate(results_90 = map(results_90, ~ rename_all(., function(x){paste(x, "_90", sep="")})),
results_95 = map(results_95, ~ rename_all(., function(x){paste(x, "_95", sep="")}))) %>%
filter(!(Examination_cat == "higher" & Infant.Mortality_cat == "high")) %>%
filter(!(Examination_cat == "highest" & Infant.Mortality_cat == "low")) %>%

unnest(c(results, results_90, results_95)) %>%
rename(Variable = term,
Coefficient = estimate,
SE = std.error) %>%
filter(Variable != "(Intercept)")

Plotting the data is straightforward again. We use the same code as above but now we combine facetting and coloring: * facet_grid(Examination_cat ~ .) * colour = Infant.Mortality_cat within aes()

  # GGPLOT
ggplot(results, aes(x = Variable, y = Coefficient, colour = Infant.Mortality_cat)) +
geom_hline(yintercept = 0, colour = gray(1/2), lty = 2) +
geom_point(aes(x = Variable,
y = Coefficient), position=position_dodge(width=0.3)) +
geom_linerange(aes(x = Variable,
ymin = conf.low_90,
ymax = conf.high_90),
lwd = 1, position=position_dodge(width=0.3)) +
geom_linerange(aes(x = Variable,
ymin = conf.low_95,
ymax = conf.high_95),
lwd = 1/2, position=position_dodge(width=0.3)) +
ggtitle("Outcome: Fertility (Subsets: Infant.Mortality, Examination)") +
coord_flip() +
facet_grid(Examination_cat ~ .)