12 Tidy evaluation
Table of content for chapter 12
Chapter section list
If you are using Shiny with the {tidyverse}, you will almost certainly encounter the challenge of programming with tidy evaluation. Tidy evaluation is used throughout the tidyverse to make interactive data exploration more fluid, but it comes with a cost: it’s hard to refer to variables indirectly, and hence harder to program with.
In this chapter, you’ll learn how to wrap {ggplot2} and {dplyr} functions in a Shiny app. The techniques for wrapping ggplot2 and dplyr functions in a function and/or a package, are a little different and covered in other resources:
Resource 12.1 : Wrapping {tidyverse} functions in a function and/or a package
Tutorials for beginners
Additionally I have found other important (newer) information in articles about the {rlang} package. The references are sorted by increasing difficulty/specialization: Overviews — Guides — Notes.
Tidy evaluation
- Overview: What is data-masking and why do I need
{{
? - Overview: Data mask programming patterns
- Guide: The data mask ambiguity
- Guide: The double evaluation problem
- Note: What happens if I use injection operators out of context?
- Note: Does
{{
work on regular objects?
Metaprogramming
- Overview: Defusing R expressions
- Overview: Injecting with
!!
,!!!
, and glue syntax - Overview: Metaprogramming pattern
- Overview: What are quosures and when are they needed?
- Guide: Taking multiple columns without …
- Note: Why are strings and other constants enquosed in the empty environment?
Metaprogramming in Advanced R
Additionally there are several chapters on metaprogramming in Advanced R (Wickham 2019). There is also an online version.
12.1 Motivation
As you can see from Listing / Output 12.1, the app runs without error, but it doesn’t return the correct result — all the rows have values of carat
less than 1. {dplyr} thinks you have asked for filter(diamonds, "carat" > 1)
.
This is a problem of indirection: normally when using {tidyverse} functions you type the name of the variable directly in the function call. But now you want to refer to it indirectly: the variable (carat
) is stored inside another variable (input$var
).
That sentence might have made intuitive sense to you, but it’s a bit confusing because the “variable” means in the above sentence two slightly different things. It’s going to be easier to understand what’s happening if we can disambiguate the two uses by introducing two new terms:
An env-variable (environment variable) is a “programming” variable that you create with <-
. input$var
is an env-variable.
A data-variable (data frame variables) is “statistical” variable that lives inside a data frame or tibble. carat
is a data-variable.
With these new terms we can make the problem of indirection more clear: we have a data-variable (carat
) stored inside an env-variable (input$var
), and we need some way to tell {dplyr} this. There are two slightly different ways to inform {dplyr} depending on whether the function you’re working with is a “data-masking” function or a “tidy-selection” function.
12.2 Data-masking
Data-masking functions allow you to use variables in the “current” data frame without any extra syntax. It’s used in many {dplyr} functions like arrange()
, filter()
, group_by()
, mutate()
, and summarise()
, and in {ggplot2}’s aes()
. Data-masking is useful because it lets you use data-variables without any additional syntax.
Let’s begin with this call to dplyr::filter()
which uses a data-variable (carat
) and an env-variable (min
):
Compare this to the base R equivalent:
In most base R functions1 you have to refer to data-variables with $
. This means that you often have to repeat the name of the data frame multiple times, but does make it clear exactly what is a data-variable and what is an env-variable. It also makes it straightforward to use indirection2 because you can store the name of the data-variable in an env-variable, and then switch from $
to [[
:
How can we achieve the same result with tidy evaluation? We need some way to add $
back into the picture. Fortunately, inside data-masking functions you can use .data
or .env
if you want to be explicit about whether you’re talking about a data-variable or an env-variable:
Now we can switch from $
to [[
:
12.2.1 Example {ggplot2}
Let’s apply this idea to a dynamic plot where we allow the user to create a scatterplot by selecting the variables to appear on the x and y axes.
Code Collection 12.1 : Using in indirection in {ggplot2}
- In the first tab “Using {ggforce}” we’ve used
ggforce::position_auto()
so thatgeom_point()
works nicely regardless of whether thex
and `y variables are continuous or discrete. - Alternatively, we could allow the user to pick the
geom
. The second tab “Choosinggeom
the app uses aswitch()
statement to generate a reactivegeom
that is later added to the plot. - I could image a third — more complex variant where the method of the
smooth
argument can be chosen for datasets with fewer and more than 1000 observations. - Another — even more complex variant — would be to let the user change which dataset to use.
This is one of the challenges of programming with user selected variables: your code has to become more complicated to handle all the cases the user might generate.
12.2.2 Example {dplyr}
The same technique also works for {dplyr}. The following app extends the previous simple example to allow you to choose a variable to filter, a minimum value to select, and a variable to sort by.
12.2.3 User-supplied data
The following app allows the user to upload a tsv
file, then select a variable and filter by it. It will work for the vast majority of inputs that you might try it with.
There could be a clash in the variable/column names when you work with user-supplied data. If there is a variable input
the following Shiny app would result in an error because dplyr::filter()
is attempting to evaluate df$input$min
.
This problem is due to the ambiguity of data-variables and env-variables, and because data-masking prefers to use a data-variable if both are available. We can resolve the problem by using .env
to tell dplyr::filter()
only look for min in the env-variables:
Code
### use
df |> dplyr::filter(.data[[input$var]] > .env$input$min)
### instead of
## df |> dplyr::filter(.data[[input$var]] > input$min)
You only need to worry about this problem when working with user supplied data; when working with your own data, you can ensure the names of your data-variables don’t clash with the names of your env-variables (and if they accidentally do, you’ll discover it right away).
There are two issues with Listing / Output 12.6:
- The live version at https://hadley.shinyapps.io/ms-user-supplied does not work. After uploading the file the apps disconnects from the server.
- For a real world Shiny app you would have to provide a solution for factor variables because the example works only with numeric variables.
12.2.4 Why not use base R?
At this point you might wonder if you’re better off without filter(), and if instead you should use the equivalent base R code:
Code
df[df[[input$var]] > input$min, ]
That’s a totally legitimate position, as long as you’re aware of the work that dplyr::filter()
does for you so you can generate the equivalent base R code. In this case:
- You’ll need
drop = FALSE
ifdf
only contains a single column (otherwise you’ll get a vector instead of a data frame). - You’ll need to use
which()
or similar to drop any missing values. - You can’t do group-wise filtering (e.g.
df %>% group_by(g) %>% filter(n() == 1
)).
In general, if you’re using {dplyr} for very simple cases, you might find it easier to use base R functions that don’t use data-masking. However, one of the advantages of the {tidyverse} is the careful thought that has been applied to edge cases so that functions work more consistently. It’s easy to forget the quirks of specific base R functions, and write code that works 95+% of the time, but fails in unusual ways the other 5% of the time.
12.3 Tidy selection and data-masking
Working with multiple variables is trivial when you’re working with a function that uses tidy-selection: you can just pass a character vector of variable names into tidyselect::any_of()
or tidyselect::all_of()
. Wouldn’t it be nice if we could do that in data-masking functions too? That’s the idea of the across()
function, added in {dplyr} 1.0.0. It allows you to use tidy-selection inside data-masking functions.
dplyr::across()
is typically used with either one or two arguments. The first argument selects variables, and is useful in functions like dplyr::group_by()
or dplyr::distinct()
. For example, the following app allows you to select any number of variables and count their unique values.
R Code 12.7 : Tidy selection: Using dplyr::across()
dplyr::across()
The second argument is a function (or list of functions) that’s applied to each selected column. That makes it a good fit for dplyr::mutate()
and dplyr::summarize()
where you typically want to transform each variable in some way. For example, the following code lets the user select any number of grouping variables, and any number of variables to summarize with their means.
R Code 12.8 : Tidy-select with second parameter
dplyr::across()
12.4 parse()
+ eval()
Before we go, it’s worth a brief comment about base::paste()
+ base::parse(
) + base::eval()
. If you have no idea what this combination is, you can skip this section, but if you have used it, there is a small note of caution necessary.
It’s a tempting approach because it requires learning very few new ideas. But it has some major downsides: because you are pasting strings together, it’s very easy to accidentally create invalid code, or code that can be abused to do something that you didn’t want. This isn’t super important if it’s a Shiny app that only you use, but it isn’t a good habit to get into — otherwise it’s very easy to accidentally create a security hole in an app that you share more widely. We’ll come back that idea in XXX_22.
You shouldn’t feel bad if this is the only way you can figure out to solve a problem, but when you have a bit more mental space, it is better to spend some time figuring out how to do it without string manipulation. This will help you to become a better R programmer.
12.5 Glossary Entries
term | definition |
---|---|
Data-masking | Data-masking is a feature in the R programming language, particularly within the tidyverse ecosystem, that allows programming directly on a dataset, treating columns as normal objects. This feature simplifies data manipulation by reducing boilerplate code and making it easier to refer to data frame columns without explicitly specifying the data frame name each time. Data-masking works by defusing R code to prevent its immediate evaluation. The defused code is then resumed later in a context where data frame columns are defined. |
embracing operator | The "embracing operator" `{{ }}` in R, particularly within the tidyverse, is used to handle data-masking in functions. It allows arguments that refer to columns of a data frame to be passed from one function to another. This operator simplifies the process of working with data-masking by combining the functionality of `enquo()` and `!!` in one step. |
Tidy evaluation | Tidy evaluation is a concept used in the tidyverse, particularly in packages like dplyr and ggplot2, to handle the evaluation of expressions in a way that allows for more flexible and powerful programming. It is especially useful when you want to build functions that can take column names as arguments and use them within dplyr verbs. |
Session Info
Session Info
Code
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.5.1 (2025-06-13)
#> os macOS Sequoia 15.5
#> system aarch64, darwin20
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Vienna
#> date 2025-07-11
#> pandoc 3.7.0.2 @ /opt/homebrew/bin/ (via rmarkdown)
#> quarto 1.8.4 @ /usr/local/bin/quarto
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> cli 3.6.5 2025-04-23 [1] CRAN (R 4.5.0)
#> commonmark 2.0.0 2025-07-07 [1] CRAN (R 4.5.0)
#> curl 6.4.0 2025-06-22 [1] CRAN (R 4.5.0)
#> dichromat 2.0-0.1 2022-05-02 [1] CRAN (R 4.5.0)
#> digest 0.6.37 2024-08-19 [1] CRAN (R 4.5.0)
#> evaluate 1.0.4 2025-06-18 [1] CRAN (R 4.5.0)
#> farver 2.1.2 2024-05-13 [1] CRAN (R 4.5.0)
#> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.5.0)
#> glossary * 1.0.0.9003 2025-06-08 [1] local
#> glue 1.8.0 2024-09-30 [1] CRAN (R 4.5.0)
#> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.5.0)
#> htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.5.0)
#> jsonlite 2.0.0 2025-03-27 [1] CRAN (R 4.5.0)
#> kableExtra 1.4.0 2024-01-24 [1] CRAN (R 4.5.0)
#> knitr 1.50 2025-03-16 [1] CRAN (R 4.5.0)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.5.0)
#> litedown 0.7 2025-04-08 [1] CRAN (R 4.5.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.5.0)
#> markdown 2.0 2025-03-23 [1] CRAN (R 4.5.0)
#> R6 2.6.1 2025-02-15 [1] CRAN (R 4.5.0)
#> RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.5.0)
#> rlang 1.1.6 2025-04-11 [1] CRAN (R 4.5.0)
#> rmarkdown 2.29 2024-11-04 [1] CRAN (R 4.5.0)
#> rstudioapi 0.17.1 2024-10-22 [1] CRAN (R 4.5.0)
#> rversions 2.1.2 2022-08-31 [1] CRAN (R 4.5.0)
#> scales 1.4.0 2025-04-24 [1] CRAN (R 4.5.0)
#> sessioninfo 1.2.3 2025-02-05 [1] CRAN (R 4.5.0)
#> stringi 1.8.7 2025-03-27 [1] CRAN (R 4.5.0)
#> stringr 1.5.1 2023-11-14 [1] CRAN (R 4.5.0)
#> svglite 2.2.1 2025-05-12 [1] CRAN (R 4.5.0)
#> systemfonts 1.2.3 2025-04-30 [1] CRAN (R 4.5.0)
#> textshaping 1.0.1 2025-05-01 [1] CRAN (R 4.5.0)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.5.0)
#> viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.5.0)
#> xfun 0.52 2025-04-02 [1] CRAN (R 4.5.0)
#> xml2 1.3.8 2025-03-14 [1] CRAN (R 4.5.0)
#> yaml 2.3.10 2024-07-26 [1] CRAN (R 4.5.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.5-arm64/library
#> [2] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
#> * ── Packages attached to the search path.
#>
#> ──────────────────────────────────────────────────────────────────────────────