Chapter 2 M2: Descriptives

This module’s focus is on describing variables and values that we’ve observed. We’ll see both visualization tools and some numerical summary measures, whether for a single variable or for the relationships between two (or more!) variables. Some key skills are:

  • Interpret contingency tables and proportions
  • Interpret bar plots
  • Be snarky about pie charts
  • Describe the distribution of numerical values in terms of shape, center, and spread
    • Find and interpret numerical summary measures like the mean, median, SD, variance, and IQR
    • Discuss and decide when it’s appropriate to use each kind of summary (considering things like outliers and skew)
    • Use terms like symmetry, skew, tail, uni/bi/multimodal, etc. to describe shape
  • Interpret histograms and box plots, and decide which one would be more appropriate for a given context/dataset
  • Explain why we might want to use a transformation on a quantitative variable
    • You don’t need to be able to magically decide on the Best Transformation for a given situation :) But it’s good to think about the kind of effect some common transformations can have – for example, the log “shrinks in” large values to reduce right skew, while linear transformations like unit changes don’t alter the shape of a distribution, just its scale.
  • Interpret visualizations for multiple variables (mosaic plots, different types of bar plots, scatterplots, faceted and side-by-side plots)
  • Decide which visualizations are most appropriate for a given dataset/context
  • Describe the relationship between numerical variables (think direction, form, and strength: is the relationship positive/negative, linear/nonlinear, weak/medium/strong?)
  • Interpret a correlation value
  • Write the basic linear regression equation and explain what each piece represents
  • Define a residual and interpret it in context
  • Formally interpret \(R^2\) (the “proportion of variation in the response explained by the model” thing)
  • Notice when someone is doing unreasonable extrapolation, explain the problem, and avoid it in your own work :)