11.6 Exercises
The following basic exercises practice how functions can be defined and evaluated and how the flow of information can be controlled.
11.6.1 Exercise 1
Fun with errors
Imagine someone proudly presents the following 3 functions to you.
Each of them takes a vector v
as an input and tries to perform a simple task.
For each function:
- describe the task that the function is designed to perform,
- test whether it successfully performs this task,
- name any problem that you detect with the current function,
- fix the function so that it successfully performs its task.
# (1) ------
first_element <- function(v) {
output <- NA # initialize
output <- v[1] # set to 1st
}
# (2) ------
avg_mn_med <- function(v, na_rm = TRUE) {
mn <- mean(v)
med <- median(v)
avg <- (mn + med)/2
return(avg)
}
# (3) ------
mean_sd <- function(v, na_rm = TRUE) {
mn <- mean(v)
sd <- sd(v)
both <- c(mn, sd)
names(both) <- c("mean", "sd")
return(mn)
}
11.6.2 Exercise 2
Conditional feeding
Let’s write a first function and then add some conditions to it.
- Write a
feed_me()
function that takes a character stringfood
as a required argument, and returns the sentence"I love to eat ___!"
. Test your function by runningfeed_me("apples")
, etc.
Here’s a template with some blanks, to get you started:
Note that the template used
print()
as its last expression. Would replacing this byreturn()
make a difference? (Why or why not?)Modify
feed_me()
so that it returns"Nothing to eat."
whenfood = NA
.Extend your function to a
feed_vegan()
function that uses two additional arguments:type
should be an optional character string, set to a default argument of"food"
. Iftype
is not"food"
, the function should return"___ is not edible."
.vegan
should be an optional Boolean value, which is set toFALSE
by default. Ifvegan
is set toTRUE
, the function should return"I love to eat ___!"
. Otherwise, the function should return"I do not eat ___."
.
Test each of your functions by evaluating appropriate function calls.
11.6.3 Exercise 3
Buggy number recognition
This exercise analyzes and corrects someone else’s function.
- Explain what the following
describe()
function (not to be confused withdescribe()
above) intends to do and why it fails in doing it.
describe <- function(x) {
if (x %% 2 == 0) {print("x is an even number.")}
else if (x %% 2 == 1) {print("x is an odd number.")}
else if (x < 1) {print("x is too small.")}
else if (x > 20) {print("x is too big.")}
else if (x == 13) {print("x is a lucky number.")}
else if (x == pi) {print("Let's make a pie!")}
else {print("x is beyond description.")}
}
- Repair the
describe()
function to yield the following results:
# Desired results:
describe(0)
#> [1] "x is too small."
describe(1)
#> [1] "x is an odd number."
describe(13)
#> [1] "x is a lucky number."
describe(20)
#> [1] "x is an even number."
describe(21)
#> [1] "x is too big."
describe(pi)
#> [1] "Let's make a pie!"
What are the results of
describe(NA)
anddescribe("one")
? Correct the function to yield appropriate results in both cases.For what kind of
x
willdescribe()
print"x is beyond description."
?
11.6.4 Exercise 4
Double standards?
Smart Alex is a student in this course, whereas Smart Alexa has graduated and now works as a software developer for a company.
Both get the assignment to define a fac()
function that computes the factorial of some number n
and submit the following solution:
Surprisingly, the consequences differ: Whereas Smart Alex gets a bad grade, Smart Alexa gets promoted. Explain.
11.6.5 Exercise 5
Randomizers revisited
In Chapter 1, we explored the ds4psy functions coin()
, dice()
, and dice_2()
, and used the base R function sample()
to mimic their behavior (see Section 1.6.4 and Exercise 3 in Section 1.8.3).
Now we can create these functions.
- Study the
dice()
function of ds4psy and write a functionmy_dice()
that accepts one argumentN
and always returnsN
random numbers from 1 to 6 (as a number).
Hint: Drawing random numbers from a uniform distribution could be achieved by stats::runif()
, but beware of distortions when rounding its results. An easier solution uses the base R sample()
function in the definition.
- Use (parts of) your new
my_dice()
function to write amy_coin()
function that mimicks the behavior of the ds4psycoin()
function.
Hint: The sampling part of the function remains the same — only the events
to sample from change.
- Bonus task: We have seen (in Exercise 3, Section 1.8.3) that the
dice_2()
function of ds4psy yields non-random results. Write a similarmy_special_dice()
function that accepts two arguments:
N
is the number of dice to throw. Each dice should always yield a number from 1 to 6 and your function should return theN
numbers of the set of dice.The 2nd argument
special_number
lets you specify a number (from 1 to 6) that will occur twice as often as the other numbers in exactly one of the dice.To make this scam less obvious, your function should return the results of the
N
dice in a random order.
Hint: We can rely on the dice()
function to imitate all dice. For the one “special” dice, we need to change the events to sample from so that the special_number
occurs twice as often as the other numbers. Randomizing the order of outputs can be achieved by the sample()
function (by drawing N
dice from the set of all dice without replacement).
- Bonus task: Checking your
my_special_dice()
function:
- How could you verify that your
my_special_dice()
function works as intended?
Hint: Don’t solve it here — just describe what you would need to check your function.
11.6.6 Exercise 6
Tibble charts
This exercise writes a function to extract rows from tabular inputs based on the top values of some variable.
- Write a
top_3()
function that takes a tibbledata
and a the column numbercol_nr
of a numeric variable as its two inputs and returns the top-3 rows of the tibble after it has been sorted (in descending order) by the specified column number.
Use the data of sw <- dplyr::starwars
to illustrate your function.
Hint: To write this function, first solve its task for a specific case (e.g., for col_nr = 2
).
When using the dplyr commands of the tidyverse, a problem you will encounter is that a tibble’s variables are typically referenced by their unquoted names, rather than by their number (or column index). Here are 2 ways to solve this problem:
To obtain the unquoted name
some_name
of a given character string"some_name"
, you can call!!sym("some_name")
.Rather than aiming for a tidyverse solution, you could solve the problem with base R functions. In this case, look up and use the
order()
function to re-arrange the rows of a tibble or data frame.
- What happens in your
top_3()
function whencol_nr
refers to a character variable (e.g.,dplyr::starwars[ , 1]
)? Adjust the function so that its result varies by the type of the variable designated by thecol_nr
argument:
- if the corresponding variable is a character variable, sort the data in ascending order (alphabetically);
- if the corresponding variable is a numeric variable, sort the data in descending order (from high to low).
- Generalise your
top_3()
function to atop_n()
function that returns the topn
rows when sorted bycol_nr
. What would be a good default value forn
? What should happen whenn = NA
and whenn > nrow(data)
?
Check all your functions with appropriate inputs.
Note: Functions for different tasks and data types
The following three exercises illustrate how functions can use, mix, and merge various data types to solve different tasks. Specifically, they ask you to write functions for
- visualizing data as plots (Exercise 6),
- printing numbers as text (Exercise 7),
- computing with dates (Exercise 8).
11.6.7 Exercise 7
A plotting function
This exercise asks you to write a function that uses some input data for creating a specific type of plot.
- Write a
plot_scatter()
function that takes a table (tibble or data frame) with two numeric variablesx
andy
asmy_data
and plots a scatterplot of the values ofy
by the values ofx
.
Hint: First use base R or ggplot2 commands to create a scatterplot of my_data
. Then wrap a new function plot_scatter()
around this command that takes my_data
as its argument.
Test your function by using the following two tibbles tb_1
and tb_2
as my_data
:
set.seed(101)
n_num <- 100
x_val <- runif(n = n_num, min = 30, max = 90)
y_val <- runif(n = n_num, min = 30, max = 90)
tb_1 <- tibble::tibble(x = x_val, y = y_val)
tb_2 <- tibble::tibble(x = x_val, y = x_val + rnorm(n = n_num, mean = 0, sd = 10))
names(tb_1)
#> [1] "x" "y"
- For any table
my_data
that contains two numeric variablesx
andy
we can fit a linear model as follows:
my_data <- tb_1
my_lm <- lm(y ~ x, data = my_data)
my_lm
#>
#> Call:
#> lm(formula = y ~ x, data = my_data)
#>
#> Coefficients:
#> (Intercept) x
#> 53.2340 0.1318
# Get the model's intercept and slope values:
my_lm$coefficients[1] # intercept
#> (Intercept)
#> 53.23402
my_lm$coefficients[2] # slope
#> x
#> 0.1318431
Incorporate the fit of a linear model into your plot_scatter()
function. Use a linear model to add a line to your plot that shows the prediction of the linear model (in a color that can be set by an optional col
argument).
11.6.8 Exercise 8
Printing numbers (as characters)
A common problem when printing numbers in text is that the number of digits to be printed (i.e., characters or symbols) depends on the number’s value. This means that series of different numbers often have different lengths, which makes it hard to align them (e.g., in tables). A potential solution to this is adding leading or trailing zeros (or empty spaces) to the front and back of a number.
The function num_as_char()
of the ds4psy package provides a (sub-optimal) solution to this problem by containing three main arguments:
x
for the number(s) to be formatted (required);
n_pre_dec
for the number of digits prior to the decimal separator (defaultn_pre_dec = 2
);
n_dec
to specify the number of digits after the decimal separator (defaultn_dec = 2
).
Additional arguments specify the symbol sym
to use for filling up digit positions and the symbol used as decimal separator sep
.
- Experiment with
num_as_char()
to check its functionality and limits.
- Write your own function
num_to_char()
that achieves the same (or a similar) functionality.
Hint: The num_as_char()
function of the ds4psy package also works for vectors, but uses two for
loops to achieve this.
Try writing a simpler solution that works for using individual numbers as x
(i.e., scalars, or vectors of length 1).
If you get stuck, try adapting parts of the solution used by num_as_char
.
11.6.9 Exercise 9
Computing with dates
Use what you have learned in Chapter 10 on Dates and times to write a function that takes a date or time (e.g., the date of someone’s birthday) as its input and returns the corresponding individual’s age (as a number, rounded to completed years) as output.
Check your function with appropriate examples.
Does your solution also work when multiple dates or times are entered (as a vector)?
Bonus task: Write alternative versions of the following lubridate functions:
a
my_leap_year()
function (as an alternative tolubridate::leap_year()
) that detects whether a givenyear
is a leap year (see Wikipedia: leap year for definition).a
my_change_tz()
function (as an alternative tolubridate::with_tz()
) that converts the time display from its current time zone into a different one (tz
), but keeping the point in time (i.e., the represented time) the same.a
my_change_time()
function (as an alternative tolubridate::force_tz()
) that changes a given time into the same nominal time (i.e., showing the same time display, but representing a different time) in a different time zone (tz
).
Hint: See Section 10.3.4 of Chapter 10 for examples and the corresponding lubridate functions. Check your functions for a variety of examples and input types.
11.6.10 Exercise 10
A zodiac function
Use what you have learned in Chapter 10 on Dates and times to write a function that takes a date (e.g., the date of someone’s birthday) as its input and returns the corresponding individual’s zodiac sign (as a character or factor variable) as output.
Check your function with appropriate examples.
Does your function also work for vector inputs (containing more than one date)? What could be done to make it work for them?
Hint: This task may seem simple, but is quite challenging, for several reasons:
When working with dates in no particular year, we can treat them as character or numeric variables.
Basic conditionals in R only work for scalar inputs, not vectors. However, the base R function
cut()
classifies continuous variables into discrete categories (see Section 11.3.7).Different sources provide different names and date ranges for the 12 zodiac signs. See https://en.wikipedia.org/wiki/Zodiac or https://de.wikipedia.org/wiki/Tierkreiszeichen for alternatives.
The ds4psy package also contains a zodiac()
function that provides multiple output options and allows re-defining date ranges (see Table 11.2 for default settings).
nr | name | from | to | symbol |
---|---|---|---|---|
1 | Aries | 2022-03-21 | 2022-04-20 | ♈ |
2 | Taurus | 2022-04-21 | 2022-05-20 | ♉ |
3 | Gemini | 2022-05-21 | 2022-06-20 | ♊ |
4 | Cancer | 2022-06-21 | 2022-07-22 | ♋ |
5 | Leo | 2022-07-23 | 2022-08-22 | ♌ |
6 | Virgo | 2022-08-23 | 2022-09-22 | ♍ |
7 | Libra | 2022-09-23 | 2022-10-22 | ♎ |
8 | Scorpio | 2022-10-23 | 2022-11-22 | ♏ |
9 | Sagittarius | 2022-11-23 | 2022-12-21 | ♐ |
10 | Capricorn | 2022-12-22 | 2023-01-19 | ♑ |
11 | Aquarius | 2023-01-20 | 2023-02-18 | ♒ |
12 | Pisces | 2023-02-19 | 2023-03-20 | ♓ |
This concludes our basic exercises on creating new functions. The following Section 11.7 contains additional exercises that address the more advanced topics of recursion and sorting.