% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/topic-nse.R
\name{topic-metaprogramming}
\alias{topic-metaprogramming}
\title{Metaprogramming patterns}
\description{
The patterns covered in this article rely on \emph{metaprogramming}, the ability to defuse, create, expand, and inject R expressions. A good place to start if you're new to programming on the language is the \href{https://adv-r.hadley.nz/metaprogramming.html}{Metaprogramming chapter} of the \href{https://adv-r.hadley.nz}{Advanced R} book.
If you haven't already, read \ifelse{html}{\link[=topic-data-mask-programming]{Data mask programming patterns}}{\link[=topic-data-mask-programming]{Data mask programming patterns}} which covers simpler patterns that do not require as much theory to get up to speed. It covers concepts like argument behaviours and the various patterns you can add to your toolbox (forwarding, names, bridge, and transformative patterns).
}
\section{Forwarding patterns}{
\subsection{Defuse and inject}{
\ifelse{html}{\code{\link[=embrace-operator]{\{\{}}}{\verb{\{\{}} and \code{...} are sufficient for most purposes. Sometimes however, it is necessary to decompose the forwarding action into its two constitutive steps, \link[=topic-defuse]{defusing} and \link[=topic-inject]{injecting}.
\verb{\{\{} is the combination of \code{\link[=enquo]{enquo()}} and \code{\link[=injection-operator]{!!}}. These functions are completely equivalent:
\if{html}{\out{
}}\preformatted{my_summarise <- function(data, var) \{
data \%>\% dplyr::summarise(\{\{ var \}\})
\}
my_summarise <- function(data, var) \{
data \%>\% dplyr::summarise(!!enquo(var))
\}
}\if{html}{\out{
}}
Passing \code{...} is equivalent to the combination of \code{\link[=enquos]{enquos()}} and \code{\link[=splice-operator]{!!!}}:
\if{html}{\out{}}\preformatted{my_group_by <- function(.data, ...) \{
.data \%>\% dplyr::group_by(...)
\}
my_group_by <- function(.data, ...) \{
.data \%>\% dplyr::group_by(!!!enquos(...))
\}
}\if{html}{\out{
}}
The advantage of decomposing the steps is that you gain access to the \link[=topic-defuse]{defused expressions}. Once defused, you can inspect or modify the expressions before injecting them in their target context.
}
\subsection{Inspecting input labels}{
For instance, here is how to create an automatic name for a defused argument using \code{\link[=as_label]{as_label()}}:
\if{html}{\out{}}\preformatted{f <- function(var) \{
var <- enquo(var)
as_label(var)
\}
f(cyl)
#> [1] "cyl"
f(1 + 1)
#> [1] "1 + 1"
}\if{html}{\out{
}}
This is essentially equivalent to formatting an argument using \code{\link[=englue]{englue()}}:
\if{html}{\out{}}\preformatted{f2 <- function(var) \{
englue("\{\{ var \}\}")
\}
f2(1 + 1)
#> [1] "1 + 1"
}\if{html}{\out{
}}
With multiple arguments, use the plural variant \code{\link[=enquos]{enquos()}}. Set \code{.named} to \code{TRUE} to automatically call \code{\link[=as_label]{as_label()}} on the inputs for which the user has not provided a name (the same behaviour as in most dplyr verbs):
\if{html}{\out{}}\preformatted{g <- function(...) \{
vars <- enquos(..., .named = TRUE)
names(vars)
\}
g(cyl, 1 + 1)
#> [1] "cyl" "1 + 1"
}\if{html}{\out{
}}
Just like with \code{dplyr::mutate()}, the user can override automatic names by supplying explicit names:
\if{html}{\out{}}\preformatted{g(foo = cyl, bar = 1 + 1)
#> [1] "foo" "bar"
}\if{html}{\out{
}}
Defuse-and-inject patterns are most useful for transforming inputs. Some applications are explored in the Transformation patterns section.
}
}
\section{Names patterns}{
\subsection{Symbolise and inject}{
The symbolise-and-inject pattern is a \emph{names pattern} that you can use when \code{across(all_of())} is not supported. It consists in creating \link[=topic-defuse]{defused expressions} that refer to the data-variables represented in the names vector. These are then injected in the data mask context.
Symbolise a single string with \code{\link[=sym]{sym()}} or \code{\link[=data_sym]{data_sym()}}:
\if{html}{\out{}}\preformatted{var <- "cyl"
sym(var)
#> cyl
data_sym(var)
#> .data$cyl
}\if{html}{\out{
}}
Symbolise a character vector with \code{\link[=syms]{syms()}} or \code{\link[=data_syms]{data_syms()}}.
\if{html}{\out{}}\preformatted{vars <- c("cyl", "am")
syms(vars)
#> [[1]]
#> cyl
#>
#> [[2]]
#> am
data_syms(vars)
#> [[1]]
#> .data$cyl
#>
#> [[2]]
#> .data$am
}\if{html}{\out{
}}
Simple symbols returned by \code{sym()} and \code{syms()} work in a wider variety of cases (with base functions in particular) but we'll use mostly use \code{data_sym()} and \code{data_syms()} because they are more robust (see \ifelse{html}{\link[=topic-data-mask-ambiguity]{The data mask ambiguity}}{\link[=topic-data-mask-ambiguity]{The data mask ambiguity}}). Note that these do not return \emph{symbols} per se, instead they create \emph{calls} to \code{$} that subset the \code{\link{.data}} pronoun.
Since the \code{.data} pronoun is a tidy eval feature, you can't use it in base functions. As a rule, prefer the \code{data_}-prefixed variants when you're injecting in tidy eval functions and the unprefixed functions for base functions.
A list of symbols can be injected in data-masked dots with the splice operator \code{\link[=splice-operator]{!!!}}, which injects each element of the list as a separate argument. For instance, to implement a \code{group_by()} variant that takes a character vector of column names, you might write:
\if{html}{\out{}}\preformatted{my_group_by <- function(data, vars) \{
data \%>\% dplyr::group_by(!!!data_syms(vars))
\}
my_group_by(vars)
}\if{html}{\out{
}}
In more complex case, you might want to add R code around the symbols. This requires \emph{transformation} patterns, see the section below.
}
}
\section{Bridge patterns}{
\subsection{\code{mutate()} as a data-mask to selection bridge}{
This is a variant of the \code{transmute()} bridge pattern described in \ifelse{html}{\link[=topic-data-mask-programming]{Data mask programming patterns}}{\link[=topic-data-mask-programming]{Data mask programming patterns}} that does not materialise \code{...} in the intermediate step. Instead, the \code{...} expressions are defused and inspected. Then the expressions, rather than the columns, are spliced in \code{mutate()}.
\if{html}{\out{}}\preformatted{my_pivot_longer <- function(data, ...) \{
# Defuse the dots and inspect the names
dots <- enquos(..., .named = TRUE)
names <- names(dots)
# Pass the inputs to `mutate()`
data <- data \%>\% dplyr::mutate(!!!dots)
# Select `...` inputs by name with `all_of()`
data \%>\%
tidyr::pivot_longer(cols = all_of(names))
\}
mtcars \%>\% my_pivot_longer(cyl, am = am * 100)
}\if{html}{\out{
}}
\enumerate{
\item Defuse the \code{...} expressions. The \code{.named} argument ensures unnamed inputs get a default name, just like they would if passed to \code{mutate()}. Take the names of the list of inputs.
\item Once we have the names, inject the argument expressions into \code{mutate()} to update the data frame.
\item Finally, pass the names to the tidy selection via \href{https://tidyselect.r-lib.org/reference/all_of.html}{\code{all_of()}}.
}
}
}
\section{Transformation patterns}{
\subsection{Transforming inputs manually}{
If \code{across()} and variants are not available, you will need to transform the inputs yourself using metaprogramming techniques. To illustrate the technique we'll reimplement \code{my_mean()} and without using \code{across()}. The pattern consists in defusing the input expression, building larger calls around them, and finally inject the modified expressions inside the data-masking functions.
We'll start with a single named argument for simplicity:
\if{html}{\out{}}\preformatted{my_mean <- function(data, var) \{
# Defuse the expression
var <- enquo(var)
# Wrap it in a call to `mean()`
var <- expr(mean(!!var, na.rm = TRUE))
# Inject the expanded expression
data \%>\% dplyr::summarise(mean = !!var)
\}
mtcars \%>\% my_mean(cyl)
#> # A tibble: 1 x 1
#> mean
#>
#> 1 6.19
}\if{html}{\out{
}}
With \code{...} the technique is similar, though a little more involved. We'll use the plural variants \code{enquos()} and \code{\link{!!!}}. We'll also loop over the variable number of inputs using \code{purrr::map()}. But the pattern is otherwise basically the same:
\if{html}{\out{}}\preformatted{my_mean <- function(.data, ...) \{
# Defuse the dots. Make sure they are automatically named.
vars <- enquos(..., .named = TRUE)
# Map over each defused expression and wrap it in a call to `mean()`
vars <- purrr::map(vars, ~ expr(mean(!!.x, na.rm = TRUE)))
# Inject the expressions
.data \%>\% dplyr::summarise(!!!vars)
\}
mtcars \%>\% my_mean(cyl)
#> # A tibble: 1 x 1
#> cyl
#>
#> 1 6.19
}\if{html}{\out{
}}
Note that we are inheriting the data-masking behaviour of \code{summarise()} because we have effectively forwarded \code{...} inside that verb. This is different than transformation patterns based on \code{across()} which inherit tidy selection behaviour. In practice, this means the function doesn't support selection helpers and syntax. Instead, it gains the ability to create new vectors on the fly:
\if{html}{\out{}}\preformatted{mtcars \%>\% my_mean(cyl = cyl * 100)
#> # A tibble: 1 x 1
#> cyl
#>
#> 1 619.
}\if{html}{\out{
}}
}
}
\section{Base patterns}{
In this section, we review patterns for programming with \emph{base} data-masking functions. They essentially consist in building and evaluating expressions in the data mask. We review these patterns and compare them to rlang idioms.
\subsection{Data-masked \code{get()}}{
In the simplest version of this pattern, \code{get()} is called with a variable name to retrieve objects from the data mask:
\if{html}{\out{}}\preformatted{var <- "cyl"
with(mtcars, mean(get(var)))
#> [1] 6.1875
}\if{html}{\out{
}}
This sort of pattern is susceptible to \link[=topic-data-mask-ambiguity]{names collisions}. For instance, the input data frame might contain a variable called \code{var}:
\if{html}{\out{}}\preformatted{df <- data.frame(var = "wrong")
with(df, mean(get(var)))
#> Error in get(var): object 'wrong' not found
}\if{html}{\out{
}}
In general, prefer symbol injection over \code{get()} to prevent this sort of collisions. With base functions you will need to enable injection operators explicitly using \code{\link[=inject]{inject()}}:
\if{html}{\out{}}\preformatted{inject(
with(mtcars, mean(!!sym(var)))
)
#> [1] 6.1875
}\if{html}{\out{
}}
See \ifelse{html}{\link[=topic-data-mask-ambiguity]{The data mask ambiguity}}{\link[=topic-data-mask-ambiguity]{The data mask ambiguity}} for more information about names collisions.
}
\subsection{Data-masked \code{parse()} and \code{eval()}}{
A more involved pattern consists in building R code in a string and evaluating it in the mask:
\if{html}{\out{}}\preformatted{var1 <- "am"
var2 <- "vs"
code <- paste(var1, "==", var2)
with(mtcars, mean(eval(parse(text = code))))
#> [1] 0.59375
}\if{html}{\out{
}}
As before, the \code{code} variable is vulnerable to \link[=topic-data-mask-ambiguity]{names collisions}. More importantly, if \code{var1} and \code{var2} are user inputs, they could contain \href{https://xkcd.com/327/}{adversarial code}. Evaluating code assembled from strings is always a risky business:
\if{html}{\out{}}\preformatted{var1 <- "(function() \{
Sys.sleep(Inf) # Could be a coin mining routine
\})()"
var2 <- "vs"
code <- paste(var1, "==", var2)
with(mtcars, mean(eval(parse(text = code))))
}\if{html}{\out{
}}
This is not a big deal if your code is only used internally. However, this code could be part of a public Shiny app which Internet users could exploit. But even internally, parsing is a source of bugs when variable names contain syntactic symbols like \code{-} or \code{:}.
\if{html}{\out{}}\preformatted{var1 <- ":var:"
var2 <- "vs"
code <- paste(var1, "==", var2)
with(mtcars, mean(eval(parse(text = code))))
#> Error in parse(text = code): :1:1: unexpected ':'
#> 1: :
#> ^
}\if{html}{\out{
}}
For these reasons, always prefer to \emph{build} code instead of parsing code. Building variable names with \code{\link[=sym]{sym()}} is a way of sanitising inputs.
\if{html}{\out{}}\preformatted{var1 <- "(function() \{
Sys.sleep(Inf) # Could be a coin mining routine
\})()"
var2 <- "vs"
code <- call("==", sym(var1), sym(var2))
code
#> `(function() \{\\n Sys.sleep(Inf) # Could be a coin mining routine\\n\})()` ==
#> vs
}\if{html}{\out{
}}
The adversarial input now produces an error:
\if{html}{\out{}}\preformatted{with(mtcars, mean(eval(code)))
#> Error in eval(code): object '(function() \{\\n Sys.sleep(Inf) # Could be a coin mining routine\\n\})()' not found
}\if{html}{\out{
}}
Finally, it is recommended to inject the code instead of evaluating it to avoid names collisions:
\if{html}{\out{}}\preformatted{var1 <- "am"
var2 <- "vs"
code <- call("==", sym(var1), sym(var2))
inject(
with(mtcars, mean(!!code))
)
#> [1] 0.59375
}\if{html}{\out{
}}
}
}
\keyword{internal}