Using the scales package

01 - Tweaking axis breaks and labels

Qiushi Yan https://qiushi.netlify.com (Communication University of China)http://www.cuc.edu.cn/
11-27-2019

Table of Contents


One of the most slighted parts of making a ggplot2(Wickham et al. 2019) visualization is scaling, and its inverse, guiding. This is the case partly because in ggplot2 scales and guides are automatically generated, and generated pretty well. Perhaps frequentyly we work with scale_color_ and scale_fill_ to change palettes used, yet aside from that, we have few experience tweaking scales, adjusting breaks and labels, modifying axes and legends or so. The scales(Wickham and Seidel 2019) provides a internal scaling infrastructure used by ggplot2, and a set of consistent tools to override the default breaks, labels, transformations and palettes.

The scales package can be installed from cran via:


install.packages("scales")

or from GitHub if you want the development version:


devtools::install_github("r-lib/scales")

library(scales)
library(ggplot2)

Note: This sereis of blogs are based on scales 1.1.0.9000.

Basics

There are 4 helper functions in scales used to demonstrate ggplot2 style scales for specific types of data:

These functions share common API deisgn, with the first argument specifying the limits of the scale, and breaks, labels arguments overriding its default apperance.


demo_continuous(c(1, 10), breaks = breaks_width(2))

#> scale_x_continuous(breaks = breaks_width(2))


demo_discrete(c("A", "B", "C"))

#> scale_x_discrete()


one_month <- as.POSIXct(c("2020-05-01", "2020-06-01"))
demo_datetime(one_month, labels = label_date_short())

#> scale_x_datetime(labels = label_date_short())

Axis breaks

breaks_width(): equally spaced breaks

breaks_width() is commoly supplied to the breaks arguent in scale function for equally spaced breaks, useful for numeric, date, and date-time scales.


breaks_width(width, offset = 0)

width: Distance between each break. Either a number, or for date/times, a single string of the form “n unit”, e.g. “1 month”, “5 days”. Unit can be of one “sec”, “min”, “hour”, “day”, “week”, “month”, “year”.
offset: Use if you don’t want breaks to start at zero

An simple example :


demo_continuous(c(0, 100), breaks = breaks_width(20))

#> scale_x_continuous(breaks = breaks_width(20))

The break width doesn’t have to be a divisor of the scale span, in those cases limits of the scale will be automatically extented or cut:


demo_continuous(c(0, 100), breaks = breaks_width(30))

#> scale_x_continuous(breaks = breaks_width(30))

The offset argument specifies an new starting point with an “offset” away from the original one:


demo_continuous(c(0, 100), breaks = breaks_width(10, -4))

#> scale_x_continuous(breaks = breaks_width(10, -4))

breaks_width() also works on dates and time, now width could be a single string of the form “n unit”, e.g. “1 month”, “5 days”, or one of “sec”, “min”, “hour”, “day”, “week”, “month”, “year”.


one_month <- as.POSIXct(c("2020-05-01", "2020-06-01"))
demo_datetime(one_month)

#> scale_x_datetime()


# better specifying labels as well
demo_datetime(one_month, breaks = breaks_width("5 days"))

#> scale_x_datetime(breaks = breaks_width("5 days"))


demo_datetime(one_month, breaks = breaks_width("10 days"))

#> scale_x_datetime(breaks = breaks_width("10 days"))


demo_datetime(one_month, breaks = breaks_width("month"))

#> scale_x_datetime(breaks = breaks_width("month"))

breaks_pretty(): pretty breaks

In base R, pretty() compute breaks based on a specific sequence, i.e:


# automatically choosing # of breaks
pretty(1:30)  

#> [1]  0  5 10 15 20 25 30

# n giving the desired number of intervals,  result may be more or fewer
pretty(1:30, n = 3)

#> [1]  0 10 20 30

pretty() could also be used to compute breakpoints for date / time object, since they can be coerced to numeric data:


pretty(one_month, n = 6)

#> [1] "2020-04-27 CST" "2020-05-04 CST" "2020-05-11 CST"
#> [4] "2020-05-18 CST" "2020-05-25 CST" "2020-06-01 CST"

as.numeric(one_month)

#> [1] 1588262400 1590940800

Other breakpoints algorithm can be found in the labeling package (Talbot 2014).

breaks_pretty() uses default R break algorithm as implemented in pretty(), this is primarily used for datetime axes in ggplot2 ecosystem, and breaks_extended should do a slightly better job for numerical scales:


demo_datetime(one_month)

#> scale_x_datetime()


demo_datetime(one_month, breaks = breaks_pretty(n = 4))

#> scale_x_datetime(breaks = breaks_pretty(n = 4))

breaks_extended(): Wilkinson’s extended breaks algorithm for numerical axes

breaks_extended() uses Wilkinson’s extended breaks algorithm as implemented in the labeling package. extended(), its corresponding function in base R, is an enhanced version of Wilkinson’s optimization-based axis labeling approach wilkinson(). It performs better than a variety of labeling algorithm on random labeling and breaking tasks, including pretty().

For more details, please see Talbot, Lin, and Hanrahan (2010).

A algorithm comparison plot presented in the paper mentioned above

Figure 1: A algorithm comparison plot presented in the paper mentioned above


breaks_extended(n = 5, ...)  

n
Desired number of breaks. You may get slightly more or fewer breaks that requested.

other arguments passed on to labeling::extended()


demo_continuous(c(0, 10), breaks = breaks_extended(3))

#> scale_x_continuous(breaks = breaks_extended(3))


demo_continuous(c(0, 10), breaks = breaks_extended(10))

#> scale_x_continuous(breaks = breaks_extended(10))

breaks_log(): breaks for log axes


demo_log10(c(1, 1e5))

#> scale_x_log10()


# Request more breaks by setting n
demo_log10(c(1, 1e5), breaks = breaks_log(n = 6))

#> scale_x_log10(breaks = breaks_log(n = 6))

Axis labels

label numbers

decimal format

Use label_number() and its variants to force decimal display of numbers, that is, the antithesis of using scientific notation(e.g., \(2 \times 10^6\) in decimal format would be \(2,000, 000\)). label_comma() is a special case that inserts a comma every three digits.


label_number(accuracy = NULL, scale = 1, 
             prefix = "", suffix = "",
             big.mark = " ", decimal.mark = ".")

label_comma(accuracy = NULL, scale = 1, 
            prefix = "", suffix = "",
            big.mark = ",", decimal.mark = ".")
            
comma(x, accuracy = NULL, scale = 1, 
      prefix = "", suffix = "",
      big.mark = ",", decimal.mark = ".")

accuracy
A number to round to. Use (e.g.) 0.01 to show 2 decimal places of precision. If NULL, the default, uses a heuristic that should ensure breaks have the minimum number of digits needed to show the difference between adjacent values.

scale
A scaling factor: x will be multiplied by scale before formating. This is useful if the underlying data is very small or very large.

prefix, suffix
Symbols to display before and after value.

big.mark
Character used between every 3 digits to separate thousands.

decimal.mark
The character to be used to indicate the numeric decimal point.

label_numebr is maily used for large number and label_comma() for smaller one, but they are exchangeable.

some examples:


demo_continuous(c(-1e6, 1e6))

#> scale_x_continuous()


demo_continuous(c(-1e6, 1e6), labels = label_number())

#> scale_x_continuous(labels = label_number())


demo_continuous(c(-1e6, 1e6), labels = label_comma())

#> scale_x_continuous(labels = label_comma())


# smaller data
demo_continuous(c(-1e-6, 1e-6))

#> scale_x_continuous()


demo_continuous(c(-1e-6, 1e-6), labels = label_number())

#> scale_x_continuous(labels = label_number())

Use scale to rescale very small or large numbers to generate more readable labels:


demo_continuous(c(0, 1e6), labels = label_number(scale = 1 / 1e3))

#> scale_x_continuous(labels = label_number(scale = 1/1000))


demo_continuous(c(0, 1e-6), labels = label_number(scale = 1e6))

#> scale_x_continuous(labels = label_number(scale = 1e+06))

Use prefix and suffix for other types of display:


demo_continuous(c(32, 40), label = label_number(suffix = "\u00b0C"))

#> scale_x_continuous(label = label_number(suffix = "°C"))


demo_continuous(c(0, 100), label = label_number(suffix = " kg"))

#> scale_x_continuous(label = label_number(suffix = " kg"))

There is a label_number_auto() function that are designed to automatically generated scientific or decimal format labels:


# scientific notation
demo_continuous(c(0, 1e8), labels = label_number_auto())

#> scale_x_continuous(labels = label_number_auto())


# decimal foramt
demo_continuous(c(0, 1e-3), labels = label_number_auto())

#> scale_x_continuous(labels = label_number_auto())

scientific format

label_scientific() forces numbers to be labelled with scientific notation;


label_scientific(digits = 3, scale = 1, prefix = "", suffix = "",
  decimal.mark = "."

demo_continuous(c(1, 10), labels = label_scientific())

#> scale_x_continuous(labels = label_scientific())


demo_continuous(c(0, 1e6), labels = label_scientific(digits = 1))

#> scale_x_continuous(labels = label_scientific(digits = 1))

ordinal numbers (1st, 2nd, 3rd, etc.)

Round values to integers and then display as ordinal values (e.g. 1st, 2nd, 3rd). Built-in rules are provided for English, French, and Spanish.


label_ordinal(prefix = "", suffix = "", big.mark = " ",
              rules = ordinal_english(), ...)

demo_continuous(c(1, 5), labels = label_ordinal())

#> scale_x_continuous(labels = label_ordinal())

Other languages:


demo_continuous(c(1, 5), labels = label_ordinal(rules = ordinal_french()))

#> scale_x_continuous(labels = label_ordinal(rules = ordinal_french()))

SI unit prefix

SI units are any of the units adopted for international use under the Système International d’Unités, now employed for all scientific and most technical purposes. There are seven fundamental units: the metre, kilogram, second, ampere, kelvin, candela, and mole; and two supplementary units: the radian and the steradian.

label_number_si() automatically scales and labels with the best SI prefix, “K” for values ≥ 10e3, “M” for ≥ 10e6, “B” for ≥ 10e9, and “T” for ≥ 10e12.


label_number_si(accuracy = 1, unit = NULL)

# default si units
demo_continuous(c(1, 1e9), label = label_number_si())

#> scale_x_continuous(label = label_number_si())


# the original data are measuring weight, in g
demo_continuous(c(1e3, 1e6), label = label_number_si(unit = "g"))

#> scale_x_continuous(label = label_number_si(unit = "g"))


# the original data are measuring length, in m
demo_continuous(c(1, 1000), label = label_number_si(unit = "m"))

#> scale_x_continuous(label = label_number_si(unit = "m"))

percent format

label_percent() is used to generate percentage-format labels(e.g., 2.5%, 50%, etc.)


label_percent(accuracy = NULL, scale = 100, prefix = "",
  suffix = "%", big.mark = " ", decimal.mark = ".", trim = TRUE,
  ...)

demo_continuous(c(0, 1), labels = label_percent())

#> scale_x_continuous(labels = label_percent())

When applying label_percent(), every numebr are first multiplied by 100 and then assigned a “%” suffix, it’s sometimes useful to adjust scale to change this behaviour:


demo_continuous(c(0, 100), labels = label_percent(scale = 1))

#> scale_x_continuous(labels = label_percent(scale = 1))

label currencies

label_dollar() format numbers as currency, rounding values to dollars or cents using a convenient heuristic.


label_dollar(accuracy = NULL, scale = 1, prefix = "$", suffix = "",
  big.mark = ",", decimal.mark = ".", trim = TRUE,
  largest_with_cents = 1e+05, negative_parens = FALSE, ...)

demo_continuous(c(0, 1), labels = label_dollar())

#> scale_x_continuous(labels = label_dollar())

Change prefix:


demo_continuous(c(0, 1), labels = label_dollar(prefix = "USD "))

#> scale_x_continuous(labels = label_dollar(prefix = "USD "))

Use negative_parens = TRUE for finance style display:


demo_continuous(c(-1000, 1000), labels = label_dollar(negative_parens = T))

#> scale_x_continuous(labels = label_dollar(negative_parens = T))

mathematical annotations

label_parse() produces expression from strings by parsing them; label_math() constructs expressions by replacing the pronoun .x with each string.


label_parse()

label_math(expr = 10^.x, format = force)

Use label_parse() with discrete scales:


demo_discrete(c("alpha", "beta", "gamma", "theta"))

#> scale_x_discrete()


demo_discrete(c("alpha", "beta", "gamma", "theta"), labels = label_parse())

#> scale_x_discrete(labels = label_parse())

Use label_math() with continuous scales:


demo_continuous(c(1, 5), labels = label_math(alpha[.x]))

#> scale_x_continuous(labels = label_math(alpha[.x]))

label p-values

label_pvalue() is a convenient formmater for p-values, using “<” and “>” for p-values close to 0 and 1.


label_pvalue(accuracy = 0.001, decimal.mark = ".", prefix = NULL,
  add_p = FALSE)

demo_continuous(c(0, 1), labels = label_pvalue())

#> scale_x_continuous(labels = label_pvalue())

accuracy can be used as significant level:


demo_continuous(c(0, 1), labels = label_pvalue(accuracy = 0.05, add_p = TRUE))

#> scale_x_continuous(labels = label_pvalue(accuracy = 0.05, add_p = TRUE))

Or provide your own prefixes:


prefix <- c("p < ", "p = ", "p > ")
demo_continuous(c(0, 1), labels = label_pvalue(prefix = prefix))

#> scale_x_continuous(labels = label_pvalue(prefix = prefix))

label bytes

label_bytes scale bytes into human friendly units. Can use either SI units (e.g. kB = 1000 bytes) or binary units (e.g. kiB = 1024 bytes).


label_bytes(units = "auto_si", accuracy = 1)

units
Unit to use. Should either one of:
- “kB”, “MB”, “GB”, “TB”, “PB”, “EB”, “ZB”, and “YB” for SI units (base 1000). - “kiB”, “MiB”, “GiB”, “TiB”, “PiB”, “EiB”, “ZiB”, and “YiB” for binary units (base 1024).

auto_si or auto_binary to automatically pick the most approrpiate unit for each value.


demo_continuous(c(1, 1e6), label = label_bytes("kB"))

#> scale_x_continuous(label = label_bytes("kB"))

accuracy
A number to round to. Use (e.g.) 0.01 to show 2 decimal places of precision. If NULL, the default, uses a heuristic that should ensure breaks have the minimum number of digits needed to show the difference between adjacent values.

label date / times

label_date() and label_time() label date/times using date/time format strings. label_date_short() automatically constructs a short format string suffiicient to uniquely identify labels.


label_date(format = "%Y-%m-%d", tz = "UTC")

label_date_short(format = c("%Y", "%b", "%d", "%H:%M"),
  sep = "\n")

label_time(format = "%H:%M:%S", tz = "UTC")

date_range <- function(start, days) {
  library(lubridate)
  start <- ymd(start)
  c(as.POSIXct(start), as.POSIXct(start + days(days)))
}

library(scales)
demo_datetime(date_range("20170115", 30))

#> scale_x_datetime()


demo_datetime(date_range("20170115", 30), labels = label_date())

#> scale_x_datetime(labels = label_date())

Use label_date_short(), not here we combine what we have learned in breaks_width()


demo_datetime(date_range("20170115", 480), labels = label_date_short(), 
              breaks = breaks_width("60 days"))

#> scale_x_datetime(labels = label_date_short(), breaks = breaks_width("60 days"))

When scaling dates and times, more often than not we have to specify labels and breaks, so ggplot2 provides 2 short-hand arguments date_breaks() and date_labels()

i.e.
date_breaks = "2 weeks" equivalent to breaks = breaks_width("2 weeks")
date_labels = "%m/%d/%y" equivalent to labels = label_date(format = "%m/%d/%y")

if both are specified, date_labels and date_breaks override the other two.


demo_datetime(date_range("20170115", 30), date_labels = "%d/%m",
              date_breaks = "5 days")

#> scale_x_datetime(date_labels = "%d/%m", date_breaks = "5 days")

mix 2 types of argument:


demo_datetime(date_range("20170115", 180), 
              date_breaks = "month", labels = label_date_short())

#> scale_x_datetime(date_breaks = "month", labels = label_date_short())

label strings

Use label_wrap() to wrap long strings:


label_wrap(width)

width: Number of characters per line


x <- c(
  "this is a long label",
  "this is another long label",
  "this a label this is even longer"
)
demo_discrete(x)

#> scale_x_discrete()


demo_discrete(x, labels = label_wrap(width = 5))

#> scale_x_discrete(labels = label_wrap(width = 5))

Talbot, Justin. 2014. Labeling: Axis Labeling. https://CRAN.R-project.org/package=labeling.

Talbot, Justin, Sharon Lin, and Pat Hanrahan. 2010. “An Extension of Wilkinson’s Algorithm for Positioning Tick Labels on Axes.” IEEE Transactions on Visualization and Computer Graphics 16 (6): 1036–43.

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, and Hiroaki Yutani. 2019. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.

Wickham, Hadley, and Dana Seidel. 2019. Scales: Scale Functions for Visualization. https://CRAN.R-project.org/package=scales.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Yan (2019, Nov. 27). R Visualization Tips: Using the scales package. Retrieved from https://bookdown.org/Maxine/ggplot2-maps/posts/2019-11-27-using-scales-package-to-modify-ggplot2-scale/

BibTeX citation

@misc{yan2019using,
  author = {Yan, Qiushi},
  title = {R Visualization Tips: Using the scales package},
  url = {https://bookdown.org/Maxine/ggplot2-maps/posts/2019-11-27-using-scales-package-to-modify-ggplot2-scale/},
  year = {2019}
}