01 - Tweaking axis breaks and labels
One of the most slighted parts of making a ggplot2scale_color_
and scale_fill_
to change palettes used, yet aside from that, we have few experience tweaking scales, adjusting breaks and labels, modifying axes and legends or so. The scales
The scales package can be installed from cran via:
or from GitHub if you want the development version:
Note: This sereis of blogs are based on scales 1.1.0.9000.
There are 4 helper functions in scales used to demonstrate ggplot2 style scales for specific types of data:
demo_continuous()
and demo_log10()
for numerical axesdemo_discrete()
for discrete axesdemo_datetime
for data / time axesThese functions share common API deisgn, with the first argument specifying the limits of the scale, and breaks
, labels
arguments overriding its default apperance.
#> scale_x_continuous(breaks = breaks_width(2))
#> scale_x_discrete()
#> scale_x_datetime(labels = label_date_short())
breaks_width()
: equally spaced breaksbreaks_width()
is commoly supplied to the breaks
arguent in scale function for equally spaced breaks, useful for numeric, date, and date-time scales.
width: Distance between each break. Either a number, or for date/times, a single string of the form “n unit”, e.g. “1 month”, “5 days”. Unit can be of one “sec”, “min”, “hour”, “day”, “week”, “month”, “year”.
offset: Use if you don’t want breaks to start at zero
An simple example :
#> scale_x_continuous(breaks = breaks_width(20))
The break width doesn’t have to be a divisor of the scale span, in those cases limits of the scale will be automatically extented or cut:
#> scale_x_continuous(breaks = breaks_width(30))
The offset
argument specifies an new starting point with an “offset” away from the original one:
#> scale_x_continuous(breaks = breaks_width(10, -4))
breaks_width()
also works on dates and time, now width
could be a single string of the form “n unit”, e.g. “1 month”, “5 days”, or one of “sec”, “min”, “hour”, “day”, “week”, “month”, “year”.
#> scale_x_datetime()
#> scale_x_datetime(breaks = breaks_width("5 days"))
#> scale_x_datetime(breaks = breaks_width("10 days"))
#> scale_x_datetime(breaks = breaks_width("month"))
breaks_pretty()
: pretty breaksIn base R, pretty()
compute breaks based on a specific sequence, i.e:
#> [1] 0 5 10 15 20 25 30
#> [1] 0 10 20 30
pretty()
could also be used to compute breakpoints for date / time object, since they can be coerced to numeric data:
#> [1] "2020-04-27 CST" "2020-05-04 CST" "2020-05-11 CST" #> [4] "2020-05-18 CST" "2020-05-25 CST" "2020-06-01 CST"
#> [1] 1588262400 1590940800
Other breakpoints algorithm can be found in the labeling package
breaks_pretty()
uses default R break algorithm as implemented in pretty()
, this is primarily used for datetime axes in ggplot2 ecosystem, and breaks_extended
should do a slightly better job for numerical scales:
#> scale_x_datetime()
#> scale_x_datetime(breaks = breaks_pretty(n = 4))
breaks_extended()
: Wilkinson’s extended breaks algorithm for numerical axesbreaks_extended()
uses Wilkinson’s extended breaks algorithm as implemented in the labeling package. extended()
, its corresponding function in base R, is an enhanced version of Wilkinson’s optimization-based axis labeling approach wilkinson()
. It performs better than a variety of labeling algorithm on random labeling and breaking tasks, including pretty()
.
For more details, please see
Figure 1: A algorithm comparison plot presented in the paper mentioned above
n
Desired number of breaks. You may get slightly more or fewer breaks that requested.
…
other arguments passed on to labeling::extended()
#> scale_x_continuous(breaks = breaks_extended(3))
#> scale_x_continuous(breaks = breaks_extended(10))
breaks_log()
: breaks for log axes#> scale_x_log10()
#> scale_x_log10(breaks = breaks_log(n = 6))
Use label_number()
and its variants to force decimal display of numbers, that is, the antithesis of using scientific notation(e.g., 2×106 in decimal format would be 2,000,000). label_comma()
is a special case that inserts a comma every three digits.
accuracy
A number to round to. Use (e.g.) 0.01 to show 2 decimal places of precision. If NULL, the default, uses a heuristic that should ensure breaks have the minimum number of digits needed to show the difference between adjacent values.
scale
A scaling factor: x will be multiplied by scale before formating. This is useful if the underlying data is very small or very large.
prefix, suffix
Symbols to display before and after value.
big.mark
Character used between every 3 digits to separate thousands.
decimal.mark
The character to be used to indicate the numeric decimal point.
label_numebr
is maily used for large number and label_comma()
for smaller one, but they are exchangeable.
some examples:
#> scale_x_continuous()
#> scale_x_continuous(labels = label_number())
#> scale_x_continuous(labels = label_comma())
#> scale_x_continuous()
#> scale_x_continuous(labels = label_number())
Use scale to rescale very small or large numbers to generate more readable labels:
#> scale_x_continuous(labels = label_number(scale = 1/1000))
#> scale_x_continuous(labels = label_number(scale = 1e+06))
Use prefix
and suffix
for other types of display:
#> scale_x_continuous(label = label_number(suffix = "°C"))
#> scale_x_continuous(label = label_number(suffix = " kg"))
There is a label_number_auto()
function that are designed to automatically generated scientific or decimal format labels:
#> scale_x_continuous(labels = label_number_auto())
#> scale_x_continuous(labels = label_number_auto())
label_scientific()
forces numbers to be labelled with scientific notation;
#> scale_x_continuous(labels = label_scientific())
#> scale_x_continuous(labels = label_scientific(digits = 1))
Round values to integers and then display as ordinal values (e.g. 1st, 2nd, 3rd). Built-in rules are provided for English, French, and Spanish.
#> scale_x_continuous(labels = label_ordinal())
Other languages:
#> scale_x_continuous(labels = label_ordinal(rules = ordinal_french()))
SI units are any of the units adopted for international use under the Système International d’Unités, now employed for all scientific and most technical purposes. There are seven fundamental units: the metre, kilogram, second, ampere, kelvin, candela, and mole; and two supplementary units: the radian and the steradian.
label_number_si()
automatically scales and labels with the best SI prefix, “K” for values ≥ 10e3, “M” for ≥ 10e6, “B” for ≥ 10e9, and “T” for ≥ 10e12.
#> scale_x_continuous(label = label_number_si())
#> scale_x_continuous(label = label_number_si(unit = "g"))
#> scale_x_continuous(label = label_number_si(unit = "m"))
label_percent()
is used to generate percentage-format labels(e.g., 2.5%, 50%, etc.)
#> scale_x_continuous(labels = label_percent())
When applying label_percent()
, every numebr are first multiplied by 100 and then assigned a “%” suffix, it’s sometimes useful to adjust scale
to change this behaviour:
#> scale_x_continuous(labels = label_percent(scale = 1))
label_dollar()
format numbers as currency, rounding values to dollars or cents using a convenient heuristic.
#> scale_x_continuous(labels = label_dollar())
Change prefix
:
#> scale_x_continuous(labels = label_dollar(prefix = "USD "))
Use negative_parens = TRUE
for finance style display:
#> scale_x_continuous(labels = label_dollar(negative_parens = T))
label_parse()
produces expression from strings by parsing them; label_math()
constructs expressions by replacing the pronoun .x
with each string.
Use label_parse()
with discrete scales:
#> scale_x_discrete()
#> scale_x_discrete(labels = label_parse())
Use label_math()
with continuous scales:
#> scale_x_continuous(labels = label_math(alpha[.x]))
label_pvalue()
is a convenient formmater for p-values, using “<” and “>” for p-values close to 0 and 1.
#> scale_x_continuous(labels = label_pvalue())
accuracy
can be used as significant level:
#> scale_x_continuous(labels = label_pvalue(accuracy = 0.05, add_p = TRUE))
Or provide your own prefixes:
#> scale_x_continuous(labels = label_pvalue(prefix = prefix))
label_bytes
scale bytes into human friendly units. Can use either SI units (e.g. kB = 1000 bytes) or binary units (e.g. kiB = 1024 bytes).
units
Unit to use. Should either one of:
- “kB”, “MB”, “GB”, “TB”, “PB”, “EB”, “ZB”, and “YB” for SI units (base 1000). - “kiB”, “MiB”, “GiB”, “TiB”, “PiB”, “EiB”, “ZiB”, and “YiB” for binary units (base 1024).
auto_si
or auto_binary
to automatically pick the most approrpiate unit for each value.
#> scale_x_continuous(label = label_bytes("kB"))
accuracy
A number to round to. Use (e.g.) 0.01 to show 2 decimal places of precision. If NULL, the default, uses a heuristic that should ensure breaks have the minimum number of digits needed to show the difference between adjacent values.
label_date()
and label_time()
label date/times using date/time format strings. label_date_short()
automatically constructs a short format string suffiicient to uniquely identify labels.
#> scale_x_datetime()
#> scale_x_datetime(labels = label_date())
Use label_date_short()
, not here we combine what we have learned in breaks_width()
#> scale_x_datetime(labels = label_date_short(), breaks = breaks_width("60 days"))
When scaling dates and times, more often than not we have to specify labels
and breaks
, so ggplot2 provides 2 short-hand arguments date_breaks()
and date_labels()
i.e.
date_breaks = "2 weeks"
equivalent to breaks = breaks_width("2 weeks")
date_labels = "%m/%d/%y
" equivalent to labels = label_date(format = "%m/%d/%y")
if both are specified, date_labels
and date_breaks
override the other two.
#> scale_x_datetime(date_labels = "%d/%m", date_breaks = "5 days")
mix 2 types of argument:
#> scale_x_datetime(date_breaks = "month", labels = label_date_short())
Use label_wrap()
to wrap long strings:
width: Number of characters per line
#> scale_x_discrete()
#> scale_x_discrete(labels = label_wrap(width = 5))
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Yan (2019, Nov. 27). R Visualization Tips: Using the scales package. Retrieved from https://bookdown.org/Maxine/ggplot2-maps/posts/2019-11-27-using-scales-package-to-modify-ggplot2-scale/
BibTeX citation
@misc{yan2019using, author = {Yan, Qiushi}, title = {R Visualization Tips: Using the scales package}, url = {https://bookdown.org/Maxine/ggplot2-maps/posts/2019-11-27-using-scales-package-to-modify-ggplot2-scale/}, year = {2019} }