5.6 Numeric vs. various variables

  • Here we’ll reproduce parts of Figure 5.10 (Bauer and Clemm von Hohenberg 2020)
  • Questions:
    • What does the graph show? What are the underlying variables (and data)?36
    • How many scales/mappings does it use? Could we reduce them?
    • What do you like, what do you dislike about the figure? What is good, what is bad?
    • What kind of information could we add to the graph (if any)?
    • How would you approach a replication of the graph?


Several categorical variables

Figure 5.10: Several categorical variables



5.6.1 Lab: Data & Code

  • The code for a subset of Figure 5.10 is shown below (and creates Figure 5.11).

  • Learning objectives

    • How to generate ggplot plots in loops (aes_string)
    • How to visualize a numeric variable (Y) vs. different variables (X)
    • How to create graphs conditional on loop elements depending on variable types

Let’s check out (and load) the datasets the underlie the plot first.

  • data_loop: Contains the variables names (variable) and labels (label) and type (type) of different covariates.
    • We’ll loop over the content of this dataframe (it’s ordered by the variable importance)
  • data_heterogeneity: Contains covariate values across individuals, as well as predictions for each individual (these are the predictions for a causal effect)
    • This is that data that is getting visualized.
Importance variable label type
0.3680689 trust_source_mainstream Mainstr. media trust continuous
0.1248805 vote_choice_afd_num Vote choice AfD categorical
0.0786331 income_num Income categorical
predictions trust_source_mainstream vote_choice_afd_num income_num
1.1508661 3.2857143 0 11
0.6617550 3.0000000 0 6
0.4056240 1.0000000 0 11
0.3809769 2.0000000 0 3
0.6726859 2.5714286 0 3
0.2334704 0.5714286 1 10



On the basis of data_loop and data_heterogeneity we then write a loop the cycles through values of data_loop and generates the corresponding plots.

  • The things that are varies are variable name, label and variable type.
  • There are two variable types numeric and categorical.
for(i in 1:nrow(data_loop)){
  #print(i)
  
  # Define objects taking them from the looping dataframe
  var_name <- data_loop$variable[i]
  var_label <- data_loop$label[i]
  var_type <- data_loop$type[i]
  
  # Create a plot number
  plot_number <- LETTERS[seq(from = 1, to = nrow(data_loop))][i]

  # Define angle conditionally
  if (var_name %in% c("income_num")){angle <- 45}else{angle <- 0}
  
  # Select data for plot
  data_plot <- data_heterogeneity %>% select(var_name, predictions)
  # select takes strings and non-strings
  
  # CREATE PLOT DEPENDING ON VARIABLE TYPE
  if(var_type == "continuous") {
    
    p <- ggplot(data_plot, aes_string(x = as.name(var_name), 
                                      y = as.name("predictions"))) +
      geom_point(alpha = 3/10) +
      geom_smooth(method = "loess", span = 1, se=F, colour="gray") +
      labs(title = paste0("(", plot_number, ") ", var_label)) + 
      theme_light() +
      theme(axis.text.x = element_text(size = 6, angle = angle),
            axis.title.y = element_blank(),
            axis.title.x = element_blank(),
            plot.title = element_text(size = 8))
    
    } else {

      # Convert from tibble
      data_plot[,var_name] <- factor(round(data_plot[,var_name])%>% dplyr::pull(1))

      p <- ggplot(data_plot, 
                  aes_string(x = as.name(var_name),
                             y = as.name("predictions"))) + 
        geom_boxplot() +
        geom_smooth(method = "loess", se=FALSE, aes(group=1), colour="gray") +
        labs(title = paste0("(", plot_number, ") ", var_label)) + 
        #scale_x_discrete(labels = labels) +
        theme_light() +
        theme(axis.text.x = element_text(size = 6, angle = angle,
                                         hjust = 1, vjust = 1),
              axis.title.y = element_blank(),
              axis.title.x = element_blank(),
              plot.title = element_text(size = 8))
    }
    assign(paste("p", i, sep=""), p) # Create object
}

  
grid.arrange(arrangeGrob(p1, p2, p3, ncol = 3),
                  left = grid::textGrob("Predicted source\ntreatment effect", 
                                        rot = 90, vjust = 1))
Numeric vs different variable types

Figure 5.11: Numeric vs different variable types

References

Bauer, Paul C, and Bernhard Clemm von Hohenberg. 2020. “Believing and Sharing Information by Fake Sources: An Experiment.”


  1. Data: Data is four categorical variables with the same ordered categories. In the graph they are combined, i.e., the four variables are combined into two categorical variables: Platform with unordered categories and Account with 3 ordered categories.