A.3 Function short-cut
apply one function to your data to create a new variable: mutate(mod=map(data,function))
instead of using i in 1:length(object)
: for (i in seq_along(object))
apply multiple function: map_dbl
apply multiple function to multiple variables:map2
autoplot(data)
plot times series data
mod_tidy = linear(reg) %>% set_engine('lm') %>% fit(price ~ ., data=data)
fit lm model. It could also fit other models (stan, spark, glmnet, keras)
- Sometimes, data-masking will not be able to recognize whether you’re calling from environment or data variables. To bypass this, we use
.data$variable
or.env$variable
. For exampledata %>% mutate(x=.env$variable/.data$variable
- Problems with data-masking:
- Unexpected masking by data-var: Use
.data
and.env
to disambiguate
- Data-var cant get through:
- Tunnel data-var with {{}} + Subset
.data
with [[]]
- Unexpected masking by data-var: Use
- Passing Data-variables through arguments
library("dplyr")
mean_by <- function(data,by,var){
data %>%
group_by({{{by}}}) %>%
summarise("{{var}}":=mean({{var}})) # new name for each var will be created by tunnel data-var inside strings
}
mean_by <- function(data,by,var){
data %>%
group_by({{{by}}}) %>%
summarise("{var}":=mean({{var}})) # use single {} to glue the string, but hard to reuse code in functions
}
- Trouble with selection:
library("purrr")
name <- c("mass","height")
starwars %>% select(name) # Data-var. Here you are referring to variable named "name"
starwars %>% select(all_of((name))) # use all_of() to disambiguate when
averages <- function(data,vars){ # take character vectors with all_of()
data %>%
select(all_of(vars)) %>%
map_dbl(mean,na.rm=TRUE)
}
x = c("Sepal.Length","Petal.Length")
iris %>% averages(x)
# Another way
averages <- function(data,vars){ # Tunnel selectiosn with {{}}
data %>%
select({{vars}}) %>%
map_dbl(mean,na.rm=TRUE)
}
x = c("Sepal.Length","Petal.Length")
iris %>% averages(x)