3.2 Graphical Measures

The following table summarizes key graphical measures along with guidance on when and why to use each. More detailed explanations, visual examples, and sample code will be discussed after this table.

Summary of key graphical measures
Graph Type When to Use Why It’s Useful
Histogram - Exploring the distribution (shape, center, spread) of a single continuous variable - Quickly identifies frequency, modes, skewness, and potential outliers
- Provides an overview of data “shape”
Box-and-Whisker Plot - Comparing the same continuous variable across multiple categories
- Identifying median, IQR, and outliers
- Shows distribution at a glance (median, quartiles)
- Highlights outliers and potential group differences
Stem-and-Leaf Plot - Small, single-variable datasets where you want a textual yet visual distribution view - Reveals the distribution while preserving actual data values
- Easy to spot clusters and gaps for small datasets
Notched Boxplot - Similar to a standard boxplot but with confidence intervals around the median - If notches don’t overlap, it suggests the medians differ significantly
- Helps clarify whether differences in medians are likely meaningful
Bagplot (2D Boxplot) - Bivariate data where you want a 2D “boxplot”-style overview
- Identifying outliers in two-dimensional space
- Depicts both central region (“bag”) and potential outliers
- Ideal for discovering clusters or unusual points in two continuous variables
Boxplot Matrix - Multiple continuous variables that you want to compare side-by-side - Quickly compares distributions of many variables simultaneously
- Helpful for spotting differences in median, spread, and outliers
Violin Plot - Same use case as boxplot but you want more detail on the distribution’s shape - Combines boxplot features with a density plot
- Shows where data are concentrated or sparse within each category
Scatterplot - Two continuous variables to check for relationships, trends, or outliers - Visualizes correlation or non-linear patterns
- Aids in identifying clusters or extreme values
Pairwise Scatterplots - Initial exploration of several variables at once - Enables a quick scan of relationships between all variable pairs
- Useful for identifying multivariate patterns or potential correlation structures

Tips for Selecting the Right Plot:

  • Focus on Your Question: Are you comparing groups, investigating correlations, or just exploring the overall shape of the data?

  • Match the Plot to Your Data Type: Continuous vs. categorical data often dictates your choice of chart.

  • Mind the Data Size: Some plots become cluttered or lose clarity with very large datasets (e.g., stem-and-leaf), while others may be less informative with very few data points.

3.2.1 Shape

Properly labeling your graphs is essential to ensure that viewers can easily understand the data presented. Below are several examples of graphical measures used to assess the shape of a dataset.

# Generate random data for demonstration purposes
data <- rnorm(100)

# Histogram: A graphical representation of the distribution 
# of a dataset.
hist(
    data,
    labels = TRUE,
    col = "grey",
    breaks = 12,
    main = "Histogram of Random Data",
    xlab = "Value",
    ylab = "Frequency"
)
Histogram displays frequency distribution of values ranging from -2 to 3. The highest frequency is 20 at value 1, with other notable frequencies being 16 at value 0 and 15 at value -1. Frequencies decrease towards the extremes of the value range.

Figure 2.2: Histogram of Random Data

# Interactive Histogram: Using 'highcharter' for 
# a more interactive visualization.
# pacman::p_load("highcharter")
# hchart(data, type = "column", name = "Random Data Distribution")
# Box-and-Whisker Plot: Useful for visualizing the distribution 
# and identifying outliers.
boxplot(
    count ~ spray,
    data = InsectSprays,
    col = "lightgray",
    main = "Boxplot of Insect Sprays",
    xlab = "Spray Type",
    ylab = "Count"
)
Boxplot of insect sprays showing the distribution of counts for six spray types labeled A to F. The y-axis represents the count, ranging from 0 to 25. Spray types A, B, and F have higher median counts, with F having the highest. Types C, D, and E have lower median counts, with some outliers present.

Figure 2.4: Boxplot of Insect Sprays

# Notched Boxplot: The notches indicate a confidence interval 
# around the median.
boxplot(
    len ~ supp * dose,
    data = ToothGrowth,
    notch = TRUE,
    col = c("gold", "darkgreen"),
    main = "Tooth Growth by Supplement and Dose",
    xlab = "Supplement and Dose",
    ylab = "Length"
)
Notched Boxplot illustrating tooth growth by supplement type and dose. The x-axis represents different supplement and dose combinations: OJ.0.5, VC.0.5, OJ.1, VC.1, OJ.2, and VC.2. The y-axis shows tooth length. Each box plot displays the median, quartiles, and potential outliers for each group. Yellow and green colors differentiate the supplement types.

Figure 2.5: Tooth Growth by Supplement and Dose

If the notches of two boxes do not overlap, this suggests that the medians differ significantly.

# Stem-and-Leaf Plot: Provides a quick way to visualize 
# the distribution of data.
stem(data)
#> 
#>   The decimal point is at the |
#> 
#>   -1 | 97555
#>   -1 | 4433221100
#>   -0 | 9998888877776655555
#>   -0 | 443333222222221100
#>    0 | 011111222222233333334444
#>    0 | 555577777899
#>    1 | 00111222
#>    1 | 67
#>    2 | 01
# Bagplot - A 2D Boxplot Extension: Visualizes the spread and 
# identifies outliers in two-dimensional data.
pacman::p_load(aplpack)
attach(mtcars)
bagplot(wt,
        mpg,
        xlab = "Car Weight",
        ylab = "Miles Per Gallon",
        main = "Bagplot of Car Weight vs. Miles Per Gallon")
Bagplot of car weight versus miles per gallon. The X-axis represents car weight, ranging from 2 to 5, and the Y-axis represents miles per gallon, ranging from 10 to 30. The plot includes a central blue area indicating the data's median and spread, with outliers marked outside this region. Red lines and a central orange arrow highlight data distribution and direction.

Figure 2.7: Bagplot of Car Weight vs. Miles Per Gallon

detach(mtcars)

Below are some advanced plot types that can provide deeper insights into data:

# boxplot.matrix(): Creates boxplots for each column in a matrix. 
# Useful for comparing multiple variables.
graphics::boxplot.matrix(
    cbind(
        Uni05 = (1:100) / 21,
        Norm = rnorm(100),
        T5 = rt(100, df = 5),
        Gam2 = rgamma(100, shape = 2)
    ),
    main = "Boxplot Marix",
    notch = TRUE,
    col = 1:4
)
Boxplot displays data for four categories: Uni05, Norm, T5, and Gam2. Each boxplot shows the median, quartiles, and potential outliers. Uni05 has the widest interquartile range, while Gam2 shows several outliers above the upper whisker. The y-axis ranges from -4 to 8.

Figure 2.9: Boxplot Marix

# Violin Plot (vioplot()): Combines a boxplot with a density plot, 
# providing more information about the distribution.
library("vioplot")
vioplot(data, col = "lightblue", main = "Violin Plot Example")
A violin plot displays data distribution. The plot shows a symmetrical, light blue shape with a central black boxplot, including a white dot representing the median. The y-axis ranges from -2 to 3, and the x-axis is labeled with a single category, 1. The plot illustrates data density and variability.

Figure 2.10: Violin Plot Example

3.2.2 Scatterplot

Scatterplots are useful for visualizing relationships between two continuous variables. They help identify patterns, correlations, and outliers.

Pairwise Scatterplots: Visualizes relationships between all pairs of variables in a dataset. This is especially useful for exploring potential correlations.

pairs(mtcars,
      main = "Pairwise Scatterplots",
      pch = 19,
      col = "blue")
Scatterplot matrix displays pairwise relationships between variables: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, and carb. Each subplot shows a scatterplot with blue data points, illustrating correlations between two variables. Axes are labeled with variable names and scales. The matrix provides a comprehensive visual overview of potential correlations in the dataset.

Figure 2.11: Pairwise Scatterplots