12  Tidy evaluation

Table of content for chapter 12

Chapter section list

If you are using Shiny with the {tidyverse}, you will almost certainly encounter the challenge of programming with tidy evaluation. Tidy evaluation is used throughout the tidyverse to make interactive data exploration more fluid, but it comes with a cost: it’s hard to refer to variables indirectly, and hence harder to program with.

In this chapter, you’ll learn how to wrap {ggplot2} and {dplyr} functions in a Shiny app. The techniques for wrapping ggplot2 and dplyr functions in a function and/or a package, are a little different and covered in other resources:

Resource 12.1 : Wrapping {tidyverse} functions in a function and/or a package

Tutorials for beginners

Additionally I have found other important (newer) information in articles about the {rlang} package. The references are sorted by increasing difficulty/specialization: Overviews — Guides — Notes.

Tidy evaluation

Metaprogramming

Metaprogramming in Advanced R

Additionally there are several chapters on metaprogramming in Advanced R (). There is also an online version.

12.1 Motivation

R Code 12.1 : Filter numeric variable to select rows that are greater than a threshold (not working)

Listing / Output 12.1: Trying to select rows that are greater than a threshold on a user-selected variable (not working correctly!)
Loading...




As you can see from , the app runs without error, but it doesn’t return the correct result — all the rows have values of carat less than 1. {dplyr} thinks you have asked for filter(diamonds, "carat" > 1).

This is a problem of indirection: normally when using {tidyverse} functions you type the name of the variable directly in the function call. But now you want to refer to it indirectly: the variable (carat) is stored inside another variable (input$var).

That sentence might have made intuitive sense to you, but it’s a bit confusing because the “variable” means in the above sentence two slightly different things. It’s going to be easier to understand what’s happening if we can disambiguate the two uses by introducing two new terms:

An env-variable (environment variable) is a “programming” variable that you create with <-. input$var is an env-variable.

A data-variable (data frame variables) is “statistical” variable that lives inside a data frame or tibble. carat is a data-variable.

With these new terms we can make the problem of indirection more clear: we have a data-variable (carat) stored inside an env-variable (input$var), and we need some way to tell {dplyr} this. There are two slightly different ways to inform {dplyr} depending on whether the function you’re working with is a “data-masking” function or a “tidy-selection” function.

12.2 Data-masking

Data-masking functions allow you to use variables in the “current” data frame without any extra syntax. It’s used in many {dplyr} functions like arrange(), filter(), group_by(), mutate(), and summarise(), and in {ggplot2}’s aes(). Data-masking is useful because it lets you use data-variables without any additional syntax.

Let’s begin with this call to dplyr::filter() which uses a data-variable (carat) and an env-variable (min):

Code
min <- 1
ggplot2::diamonds  |> dplyr::filter(carat > min)

Compare this to the base R equivalent:

Code
ggplot2::diamonds[ggplot2::diamonds$carat > min, ]

In most base R functions you have to refer to data-variables with $. This means that you often have to repeat the name of the data frame multiple times, but does make it clear exactly what is a data-variable and what is an env-variable. It also makes it straightforward to use indirection because you can store the name of the data-variable in an env-variable, and then switch from $ to [[:

Code
var <- "carat"
ggplot2::diamonds[ggplot2::diamonds[[var]] > min, ]

How can we achieve the same result with tidy evaluation? We need some way to add $ back into the picture. Fortunately, inside data-masking functions you can use .data or .env if you want to be explicit about whether you’re talking about a data-variable or an env-variable:

Code
ggplot2::diamonds |>  dplyr::filter(.data$carat > .env$min)

Now we can switch from $ to [[:

Code
ggplot2::diamonds |> dplyr::filter(.data[[var]] > .env$min)

R Code 12.2 : Filter numeric variable to select rows that are greater than a threshold (working!)

Listing / Output 12.2: Select rows that are greater than a threshold on a user-selected variable
Loading...




12.2.1 Example {ggplot2}

Let’s apply this idea to a dynamic plot where we allow the user to create a scatterplot by selecting the variables to appear on the x and y axes.

Code Collection 12.1 : Using in indirection in {ggplot2}

R Code 12.3 : Indirection in {ggplot2} using {ggforce}

Listing / Output 12.3: A simple app that allows you to select which variables are plotted on the x and y axes.
Loading...




R Code 12.4 : Indirection in {ggplot2} changing geom

Listing / Output 12.4: A more complex app that allows you to select which variables are plotted on the x and y axes and to choose the geom.
Loading...




  • In the first tab “Using {ggforce}” we’ve used ggforce::position_auto() so that geom_point() works nicely regardless of whether the x and `y variables are continuous or discrete.
  • Alternatively, we could allow the user to pick the geom. The second tab “Choosing geom the app uses a switch() statement to generate a reactive geom that is later added to the plot.
  • I could image a third — more complex variant where the method of the smooth argument can be chosen for datasets with fewer and more than 1000 observations.
  • Another — even more complex variant — would be to let the user change which dataset to use.
Important 12.1: Programming complexity increases with numer of user choices

This is one of the challenges of programming with user selected variables: your code has to become more complicated to handle all the cases the user might generate.

12.2.2 Example {dplyr}

The same technique also works for {dplyr}. The following app extends the previous simple example to allow you to choose a variable to filter, a minimum value to select, and a variable to sort by.

R Code 12.5 : Indirection in {dplyr}: Choose sorting order

Listing / Output 12.5: Indirection in {dplyr}: Choose sorting order
Loading...




12.2.3 User-supplied data

The following app allows the user to upload a tsv file, then select a variable and filter by it. It will work for the vast majority of inputs that you might try it with.

R Code 12.6 : User-supplied data could result in the clash of variable names

Listing / Output 12.6: User-supplied data could result in the clash of variable names: A variable name of input would result in an error
Loading...




There could be a clash in the variable/column names when you work with user-supplied data. If there is a variable input the following Shiny app would result in an error because dplyr::filter() is attempting to evaluate df$input$min.

This problem is due to the ambiguity of data-variables and env-variables, and because data-masking prefers to use a data-variable if both are available. We can resolve the problem by using .env to tell dplyr::filter() only look for min in the env-variables:

Code
### use
df  |>  dplyr::filter(.data[[input$var]] > .env$input$min)
### instead of
## df  |>  dplyr::filter(.data[[input$var]] > input$min) 

You only need to worry about this problem when working with user supplied data; when working with your own data, you can ensure the names of your data-variables don’t clash with the names of your env-variables (and if they accidentally do, you’ll discover it right away).

Caution 12.1: Two issues with user supplied data example

There are two issues with :

  1. The live version at https://hadley.shinyapps.io/ms-user-supplied does not work. After uploading the file the apps disconnects from the server.
  2. For a real world Shiny app you would have to provide a solution for factor variables because the example works only with numeric variables.

12.2.4 Why not use base R?

At this point you might wonder if you’re better off without filter(), and if instead you should use the equivalent base R code:

Code
df[df[[input$var]] > input$min, ]

That’s a totally legitimate position, as long as you’re aware of the work that dplyr::filter() does for you so you can generate the equivalent base R code. In this case:

  • You’ll need drop = FALSE if df only contains a single column (otherwise you’ll get a vector instead of a data frame).
  • You’ll need to use which() or similar to drop any missing values.
  • You can’t do group-wise filtering (e.g. df %>% group_by(g) %>% filter(n() == 1)).

In general, if you’re using {dplyr} for very simple cases, you might find it easier to use base R functions that don’t use data-masking. However, one of the advantages of the {tidyverse} is the careful thought that has been applied to edge cases so that functions work more consistently. It’s easy to forget the quirks of specific base R functions, and write code that works 95+% of the time, but fails in unusual ways the other 5% of the time.

12.3 Tidy selection and data-masking

Working with multiple variables is trivial when you’re working with a function that uses tidy-selection: you can just pass a character vector of variable names into tidyselect::any_of() or tidyselect::all_of(). Wouldn’t it be nice if we could do that in data-masking functions too? That’s the idea of the across() function, added in {dplyr} 1.0.0. It allows you to use tidy-selection inside data-masking functions.

dplyr::across() is typically used with either one or two arguments. The first argument selects variables, and is useful in functions like dplyr::group_by() or dplyr::distinct(). For example, the following app allows you to select any number of variables and count their unique values.

R Code 12.7 : Tidy selection: Using dplyr::across()

Listing / Output 12.7: Tidy-selection with one argument: Using dplyr::across()
Loading...




The second argument is a function (or list of functions) that’s applied to each selected column. That makes it a good fit for dplyr::mutate() and dplyr::summarize() where you typically want to transform each variable in some way. For example, the following code lets the user select any number of grouping variables, and any number of variables to summarize with their means.

R Code 12.8 : Tidy-select with second parameter

Listing / Output 12.8: Tidy-select with two arguments using dplyr::across()
Loading...




12.4 parse() + eval()

Before we go, it’s worth a brief comment about base::paste() + base::parse() + base::eval(). If you have no idea what this combination is, you can skip this section, but if you have used it, there is a small note of caution necessary.

It’s a tempting approach because it requires learning very few new ideas. But it has some major downsides: because you are pasting strings together, it’s very easy to accidentally create invalid code, or code that can be abused to do something that you didn’t want. This isn’t super important if it’s a Shiny app that only you use, but it isn’t a good habit to get into — otherwise it’s very easy to accidentally create a security hole in an app that you share more widely. We’ll come back that idea in XXX_22.

Caution 12.2: Try to prvent string manipulation to solve indirection problems

You shouldn’t feel bad if this is the only way you can figure out to solve a problem, but when you have a bit more mental space, it is better to spend some time figuring out how to do it without string manipulation. This will help you to become a better R programmer.

12.5 Glossary Entries

term definition
Data-masking Data-masking is a feature in the R programming language, particularly within the tidyverse ecosystem, that allows programming directly on a dataset, treating columns as normal objects. This feature simplifies data manipulation by reducing boilerplate code and making it easier to refer to data frame columns without explicitly specifying the data frame name each time. Data-masking works by defusing R code to prevent its immediate evaluation. The defused code is then resumed later in a context where data frame columns are defined.
embracing operator The "embracing operator" `{{ }}` in R, particularly within the tidyverse, is used to handle data-masking in functions. It allows arguments that refer to columns of a data frame to be passed from one function to another. This operator simplifies the process of working with data-masking by combining the functionality of `enquo()` and `!!` in one step.
Tidy evaluation Tidy evaluation is a concept used in the tidyverse, particularly in packages like dplyr and ggplot2, to handle the evaluation of expressions in a way that allows for more flexible and powerful programming. It is especially useful when you want to build functions that can take column names as arguments and use them within dplyr verbs.

Session Info

Session Info

Code
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.5.1 (2025-06-13)
#>  os       macOS Sequoia 15.5
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Vienna
#>  date     2025-07-11
#>  pandoc   3.7.0.2 @ /opt/homebrew/bin/ (via rmarkdown)
#>  quarto   1.8.4 @ /usr/local/bin/quarto
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version    date (UTC) lib source
#>  cli            3.6.5      2025-04-23 [1] CRAN (R 4.5.0)
#>  commonmark     2.0.0      2025-07-07 [1] CRAN (R 4.5.0)
#>  curl           6.4.0      2025-06-22 [1] CRAN (R 4.5.0)
#>  dichromat      2.0-0.1    2022-05-02 [1] CRAN (R 4.5.0)
#>  digest         0.6.37     2024-08-19 [1] CRAN (R 4.5.0)
#>  evaluate       1.0.4      2025-06-18 [1] CRAN (R 4.5.0)
#>  farver         2.1.2      2024-05-13 [1] CRAN (R 4.5.0)
#>  fastmap        1.2.0      2024-05-15 [1] CRAN (R 4.5.0)
#>  glossary     * 1.0.0.9003 2025-06-08 [1] local
#>  glue           1.8.0      2024-09-30 [1] CRAN (R 4.5.0)
#>  htmltools      0.5.8.1    2024-04-04 [1] CRAN (R 4.5.0)
#>  htmlwidgets    1.6.4      2023-12-06 [1] CRAN (R 4.5.0)
#>  jsonlite       2.0.0      2025-03-27 [1] CRAN (R 4.5.0)
#>  kableExtra     1.4.0      2024-01-24 [1] CRAN (R 4.5.0)
#>  knitr          1.50       2025-03-16 [1] CRAN (R 4.5.0)
#>  lifecycle      1.0.4      2023-11-07 [1] CRAN (R 4.5.0)
#>  litedown       0.7        2025-04-08 [1] CRAN (R 4.5.0)
#>  magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.5.0)
#>  markdown       2.0        2025-03-23 [1] CRAN (R 4.5.0)
#>  R6             2.6.1      2025-02-15 [1] CRAN (R 4.5.0)
#>  RColorBrewer   1.1-3      2022-04-03 [1] CRAN (R 4.5.0)
#>  rlang          1.1.6      2025-04-11 [1] CRAN (R 4.5.0)
#>  rmarkdown      2.29       2024-11-04 [1] CRAN (R 4.5.0)
#>  rstudioapi     0.17.1     2024-10-22 [1] CRAN (R 4.5.0)
#>  rversions      2.1.2      2022-08-31 [1] CRAN (R 4.5.0)
#>  scales         1.4.0      2025-04-24 [1] CRAN (R 4.5.0)
#>  sessioninfo    1.2.3      2025-02-05 [1] CRAN (R 4.5.0)
#>  stringi        1.8.7      2025-03-27 [1] CRAN (R 4.5.0)
#>  stringr        1.5.1      2023-11-14 [1] CRAN (R 4.5.0)
#>  svglite        2.2.1      2025-05-12 [1] CRAN (R 4.5.0)
#>  systemfonts    1.2.3      2025-04-30 [1] CRAN (R 4.5.0)
#>  textshaping    1.0.1      2025-05-01 [1] CRAN (R 4.5.0)
#>  vctrs          0.6.5      2023-12-01 [1] CRAN (R 4.5.0)
#>  viridisLite    0.4.2      2023-05-02 [1] CRAN (R 4.5.0)
#>  xfun           0.52       2025-04-02 [1] CRAN (R 4.5.0)
#>  xml2           1.3.8      2025-03-14 [1] CRAN (R 4.5.0)
#>  yaml           2.3.10     2024-07-26 [1] CRAN (R 4.5.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.5-arm64/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
#>  * ── Packages attached to the search path.
#> 
#> ──────────────────────────────────────────────────────────────────────────────

References

Wickham, Hadley. 2019. Advanced R, Second Edition. 2nd ed. Boca Raton: Taylor & Francis.

  1. One exception is base::subest().↩︎

  2. In Shiny apps, the most common form of indirection is having the name of data-variable stored in a reactive value. Another form of indirection that is useful when you’re writing functions is denoted by double curly-brackets { x }, also called embracing.↩︎