## 5.5 Numeric vs. categorical: Various plot types

### 5.5.1 Data & Packages & functions

• Data: 1 categorical variable, 1 numeric variable
• Packages & functions:
• geomjitter() offers the same control over aesthetics geompoint() (size, color, shape)
• geomboxplot(), geomviolin(): You can control the outline color or the internal fill color
• Strengths and weaknesses
• Boxplots summarize the bulk of the distribution with only five numbers
• Jittered plots show every point but only work with relatively small datasets
• Violin plots give the richest display, but rely on the calculation of a density estimate, which can be hard to interpret

### 5.5.2 Graph

• Figure 5.9 visualizes different ways of plotting a categorical vs. a numerical variable.
• Questions:
• What does the graph show? What are the underlying variables (and data)?
• How many scales/mappings does it use? Could we reduce them?
• What do you like, what do you dislike about the figure? What is good, what is bad?
• What kind of information could we add to the graph (if any)?
• How would you approach a replication of the graph?
## Parsed with column specification:
## cols(
##   screen_name = col_character(),
##   n_retweets = col_double(),
##   followers_count = col_double(),
##   party = col_character(),
##   party_color = col_character(),
##   first_name = col_character(),
##   account_created_at = col_datetime(format = ""),
##   account_age_months = col_double(),
##   account_age_years = col_double(),
##   last_name = col_character(),
##   female = col_double()
## ) Figure 5.9: Boxplots and jittered points

### 5.5.3 Lab: Data & Code

# data_twitter_influence.csv
grid.arrange(p1, p2, p3, p4, ncol=2)