A.11 Solutions (11)

ds4psy: Solutions 11

Here are the solutions of the exercises on creating functions of Chapter 11 (Section 11.4).

A.11.1 Exercise 1

Imagine someone proudly presents the following 3 functions to you. Each of them takes a vector v as an input and tries to perform a simple task. For each function:

  • describe the task that the function is designed to perform,
  • test whether it successfully performs this task,
  • name any problem that you detect with the current function,
  • fix the function so that it successfully performs its task.
# (1)  ------ 
first_element <- function(v) {
  
  output <- NA   # initialize
  output <- v[1] # set to 1st
  
}

# (2) ------ 
avg_mn_med <- function(v, na_rm = TRUE) {
  
  mn  <- mean(v)
  med <- median(v)
  avg <- (mn + med)/2
  
  return(avg)
}

# (3)  ------ 
mean_sd <- function(v, na_rm = TRUE) {
  
  mn  <- mean(v)
  sd  <- sd(v)
  
  both <- c(mn, sd)
  names(both) <- c("mean", "sd")
  
  return(mn)
  
}

Solution

Here are some statement suited to evaluate the functions:

# (1)  first_element aims to return the 1st element of a vector v.

# Check: 
first_element(v = c(10, 11, 12))
first_element(v = c("A", "B", "C"))
first_element(v = c(NA, 11, 12))
first_element(v = NA)

# (2) avg_mn_med aims to compute 3 different metrics of centrality for v. 

# Check:
avg_mn_med(v = c(1, 2, 3))
avg_mn_med(v = c(1, 2, 9))
avg_mn_med(v = c(1, 2, NA))
avg_mn_med(v = NA)
avg_mn_med(v = c("A", "B"))

# (3) mean_sd aims to compute the mean and standard deviation of v.

# Check:
mean_sd(v = c(1, 2, 3))
mean_sd(v = c(1, 2, 9))
mean_sd(v = c(1, 2, NA))
mean_sd(v = NA)
mean_sd(v = c("A", "B"))

Here are the same functions with some quick fixes:

# (1)  ------ 
# Task: Return the 1st element of v.
# Problem:  Nothing is returned.
# Solution: Add return(output).

first_element <- function(v) {
  
  output <- NA   # initialize
  output <- v[1] # set to 1st
  
  return(output)
  
}

# Check: 
first_element(v = c(10, 11, 12))
#> [1] 10
first_element(v = c("A", "B", "C"))
#> [1] "A"
first_element(v = c(NA, 11, 12))
#> [1] NA
first_element(v = NA)
#> [1] NA

# (2) ------ 
# Task: Return the average of mean and median of v.
# Problem  1: na_rm is not used in function body.
# Solution 1: Add na_rm as value to na.rm of base functions.
# Problem  2: Function returns ERROR for non-numeric v.
# Solution 2: Use if statement to check for is.numeric(v).

avg_mn_med <- function(v, na_rm = TRUE) {
  
  if (!is.numeric(v)) {
    message("avg_mn_md: v requires numeric inputs.")
  } else {
    
    mn  <- mean(v, na.rm = na_rm)
    med <- median(v, na.rm = na_rm)
    avg <- (mn + med)/2
    
    return(avg)
    
  } # if (!is.numeric(v)) etc.
}

# Check:
avg_mn_med(v = c(1, 2, 3))
#> [1] 2
avg_mn_med(v = c(1, 2, 9))
#> [1] 3
avg_mn_med(v = c(1, 2, NA))
#> [1] 1.5
avg_mn_med(v = NA)
avg_mn_med(v = c("A", "B"))

# (3)  ------ 
# Task: Return the mean and sd of v.
# Problem  1: Only the mean is returned.
# Solution 1: return(both)
# Problem  2: na_rm is not used in function body.
# Solution 2: Add na_rm as value to na.rm of base functions.
# Problem  3: Function returns ERROR for non-numeric v.
# Solution 3: Use if statement to check for is.numeric(v).

mean_sd <- function(v, na_rm = TRUE) {
  
  if (all(is.na(v))) {return(v)}
  
  if (!is.numeric(v)) {
    message("mean_sd: v requires numeric inputs.")
  } else {
    
    mn  <- mean(v, na.rm = na_rm)
    sd  <- sd(v, na.rm = na_rm)
    
    both <- c(mn, sd)
    names(both) <- c("mean", "sd")
    
    return(both)
    
  } # if (!is.numeric(v)) etc.
  
}

# Check:
mean_sd(v = c(1, 2, 3))
#> mean   sd 
#>    2    1
mean_sd(v = c(1, 2, 9))
#>     mean       sd 
#> 4.000000 4.358899
mean_sd(v = c(1, 2, NA))
#>      mean        sd 
#> 1.5000000 0.7071068
mean_sd(v = NA)
#> [1] NA
mean_sd(v = c("A", "B"))

A.11.2 Exercise 2

Conditional feeding

Let’s write a first function and then add some conditions to it.

  1. Write a function feed_me that takes a character string food as a required argument, and returns the sentence "I love to eat ___!". Test your function by running feed_me("apples"), etc.

Here’s a template with some blanks, to get you started:

feed_me <- function(___) {
  
  output <- paste0("I love to eat ", ___, "!")
  
  print(___)
}

Solution

A possible solution:

feed_me <- function(food) {
  
  output <- paste0("I love to eat ", food, "!")
  
  print(output)
  }

# Check:
feed_me("apples")
#> [1] "I love to eat apples!"
feed_me("books")
#> [1] "I love to eat books!"
feed_me("cake")
#> [1] "I love to eat cake!"
feed_me(NA)
#> [1] "I love to eat NA!"
  1. Modify feed_me so that it returns "Nothing to eat." when food = NA.

Solution

A possible solution:

feed_me <- function(food) {
  
  if (is.na(food)) {
    output <- "Nothing to eat."
  } else {
    output <- paste0("I love to eat ", food, "!")
  }
  
  print(output)
}

# Check:
feed_me(NA)
#> [1] "Nothing to eat."
feed_me("apples")
#> [1] "I love to eat apples!"
feed_me("books")
#> [1] "I love to eat books!"
  1. Extend your function to a feed_vegan function that uses 2 additional arguments:

    • type should be an optional character string, set to a default argument of "food". If type is not "food", the function should return "___ is not edible.".

    • vegan should be an optional Boolean value, which is set to FALSE by default. If vegan is TRUE, the function should return "I love to eat ___!". Otherwise, the function should return "I do not eat ___.".

Test each of your functions by evaluating appropriate function calls.

Solution

A possible solution:

feed_vegan <- function(food, type = "food", vegan = FALSE) {
  
  if (is.na(food)) {
    
    output <- "Nothing to eat."
    
  } else {  # food is not NA: 
    
    if (type != "food") {
      output <- paste0(food, " is not edible.")
    } else {  # type == "food": 
      
      if (vegan) {
        output <- paste0("I love to eat ", food, "!")
      } else {  # vegan 
        output <- paste0("I do not eat ", food, ".")
      }      
    }
  }
  
  print(output)
}

# Check:
feed_vegan(NA)
#> [1] "Nothing to eat."
feed_vegan("veggies", vegan = TRUE)
#> [1] "I love to eat veggies!"
feed_vegan("R4DS", type = "book")
#> [1] "R4DS is not edible."
feed_vegan("meat", vegan = FALSE)
#> [1] "I do not eat meat."
# but:
feed_vegan("spagetti", type = "pasta")  # due to type != "food"
#> [1] "spagetti is not edible."
feed_vegan("veggies")  # due to default: vegan = FALSE
#> [1] "I do not eat veggies."

A.11.3 Exercise 3

Number recognition

This exercise analyzes and corrects someone else’s function.

  1. Explain what the following function describe (not to be confused with describe above) intends to do and why it fails in doing it.
describe <- function(x) {
  
  if (x %% 2 == 0)  {print("x is an even number.")} 
  else if (x %% 2 == 1) {print("x is an odd number.")}
  else if (x < 1)   {print("x is too small.")} 
  else if (x > 20)  {print("x is too big.")} 
  else if (x == 13) {print("x is a lucky number.")} 
  else if (x == pi) {print("Let's make a pie!")}
  else {print("x is beyond description.")}
  
}

Solution

The function describe seems to want to categorize a number x into one of various cases:

  • numbers that are too small numbers vs. too big
  • odd vs. even numbers
  • 13 as a lucky number
  • pi to make a pie
  • numbers that cannot be described

However, it currently fails (and only distinguishes between even and odd numbers), because of the order of its if statements. Currently, the quite general tests at the beginning are TRUE for most cases, so that more specific later cases are never reached.

  1. Repair the describe function to yield the following results:
# Desired results:
describe(0)
#> [1] "x is too small."
describe(1)
#> [1] "x is an odd number."
describe(13)
#> [1] "x is a lucky number."
describe(20)
#> [1] "x is an even number."
describe(21)
#> [1] "x is too big."
describe(pi)
#> [1] "Let's make a pie!"

Solution

To repair the function, we need to re-arrange the if statements (putting some more specific statements before more general ones):

# Correction:
describe <- function(x) {
  
  if (x < 1)            {print("x is too small.")}  
  else if (x > 20)      {print("x is too big.")} 
  else if (x == 13)     {print("x is a lucky number.")} 
  else if (x %% 2 == 0) {print("x is an even number.")} 
  else if (x %% 2 == 1) {print("x is an odd number.")}
  else if (x == pi)     {print("Let's make a pie!")}
  else                  {print("x is beyond description.")}
  
}

# Check:
describe(0)
#> [1] "x is too small."
describe(1)
#> [1] "x is an odd number."
describe(13)
#> [1] "x is a lucky number."
describe(20)
#> [1] "x is an even number."
describe(21)
#> [1] "x is too big."
describe(pi)
#> [1] "Let's make a pie!"
  1. What are the results of describe(NA) and describe("one")? Correct the function to yield appropriate results in both cases.

Solution

# Check:
# describe(NA)   # yields an ERROR
describe("one")  # yields "x is too big."
#> [1] "x is too big."
"one" > 20       # is TRUE!
#> [1] TRUE

# Correction:
describe <- function(x) {
  
  # stopifnot(!is.na(x))  # would yield an error if is.na(x)
  
  if (is.na(x))         {print("x is NA.")}
  else if (is.character(x)) {print("x is a word.")}
  
  else if (x < 1)       {print("x is too small.")}  
  else if (x > 20)      {print("x is too big.")} 
  else if (x == 13)     {print("x is a lucky number.")} 
  else if (x %% 2 == 0) {print("x is an even number.")} 
  else if (x %% 2 == 1) {print("x is an odd number.")}
  else if (x == pi)     {print("Let's make a pie!")}
  else                  {print("x is beyond description.")}
  
}

# Check: 
describe(NA)     # yields "x is NA."
#> [1] "x is NA."
describe("one")  # yields "x is a word."
#> [1] "x is a word."
  1. For what kind of x will describe print "x is beyond description."?

Solution

The function describe prints "x is beyond description." for values of 1 < x < 20 that are not integers and not pi:

# For 1 < x < 20 that are not integers:
describe(3/2)
#> [1] "x is beyond description."
describe(sqrt(2))
#> [1] "x is beyond description."
describe(pi + .001)
#> [1] "x is beyond description."
# but:
describe(pi)
#> [1] "Let's make a pie!"

A.11.4 Exercise 4

Tibble charts

This exercise writes a function to extract rows from tabular inputs based on the top values of some variable.

  1. Write a top_3 function that takes a tibble data and a the column number col_nr of a numeric variable as its 2 inputs and returns the top-3 rows of the tibble after it has been sorted (in descending order) by the specified column number.

Use the data of sw <- dplyr::starwars to illustrate your function.

Hint: To write this function, first solve its task for a specific case (e.g., for col_nr = 2). When using the dplyr commands of the tidyverse, a problem you will encounter is that a tibble’s variables are typically referenced by their unquoted names, rather than by their number (or column index). Here are 2 ways to solve this problem:

  • To obtain the unquoted name some_name of a given character string "some_name", you can call !!sym("some_name").

  • Rather than aiming for a tidyverse solution, you could solve the problem with base R commands. In this case, look up and use the command order to re-arrange the rows of a tibble or data frame.

Solution

A possible tidyverse solution:

# Data:
sw <- dplyr::starwars

# (a) specific solution:
# dplyr-pipe solution uses unquoted variable name: 
sw %>% arrange(desc(height)) %>% slice(1:3)
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Yara…    264    NA none       white      yellow            NA male   Quermia  
#> 2 Tarf…    234   136 brown      brown      blue              NA male   Kashyyyk 
#> 3 Lama…    229    88 none       grey       black             NA male   Kamino   
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>

# (b) Same pipe with a quoted variable name:
sw %>% arrange(desc(!!sym("height"))) %>% slice(1:3)
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Yara…    264    NA none       white      yellow            NA male   Quermia  
#> 2 Tarf…    234   136 brown      brown      blue              NA male   Kashyyyk 
#> 3 Lama…    229    88 none       grey       black             NA male   Kamino   
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>

# (c) Translation into a function:
top_3 <- function(data, col_nr){

  col_name <- names(data)[col_nr]
  
  result <- data %>% 
    arrange(desc(!!sym(col_name))) %>%
    slice(1:3)
  
  return(result)
  
}

# Check:
top_3(sw, 2)  # top_3 height values
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Yara…    264    NA none       white      yellow            NA male   Quermia  
#> 2 Tarf…    234   136 brown      brown      blue              NA male   Kashyyyk 
#> 3 Lama…    229    88 none       grey       black             NA male   Kamino   
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
top_3(sw, 3)  # top_3 mass values
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Jabb…    175  1358 <NA>       green-tan… orange           600 herma… Nal Hutta
#> 2 Grie…    216   159 none       brown, wh… green, y…         NA male   Kalee    
#> 3 IG-88    200   140 none       metal      red               15 none   <NA>     
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
top_3(sw, 7)  # top_3 birth_year values
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Yoda      66    17 white      green      brown            896 male   <NA>     
#> 2 Jabb…    175  1358 <NA>       green-tan… orange           600 herma… Nal Hutta
#> 3 Chew…    228   112 brown      unknown    blue             200 male   Kashyyyk 
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
# But:
top_3(sw, 1)  # character variables are ordered in reverse of alphabetical order
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Zam …    168    55 blonde     fair, gre… yellow            NA female Zolan    
#> 2 Yoda      66    17 white      green      brown            896 male   <NA>     
#> 3 Yara…    264    NA none       white      yellow            NA male   Quermia  
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>

A solution using only base R functions (here: order):

# Data:
sw <- dplyr::starwars

# (a) specific solution for sorting data:  
# using order() function and a variable name:
sw[order(-sw$height), ]
#> # A tibble: 87 x 13
#>    name  height  mass hair_color skin_color eye_color birth_year gender
#>    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
#>  1 Yara…    264    NA none       white      yellow          NA   male  
#>  2 Tarf…    234   136 brown      brown      blue            NA   male  
#>  3 Lama…    229    88 none       grey       black           NA   male  
#>  4 Chew…    228   112 brown      unknown    blue           200   male  
#>  5 Roos…    224    82 none       grey       orange          NA   male  
#>  6 Grie…    216   159 none       brown, wh… green, y…       NA   male  
#>  7 Taun…    213    NA none       grey       black           NA   female
#>  8 Rugo…    206    NA none       green      orange          NA   male  
#>  9 Tion…    206    80 none       grey       black           NA   male  
#> 10 Dart…    202   136 none       white      yellow          41.9 male  
#> # … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>

# Note:
sw$height # is a vector (of height values)
#>  [1] 172 167  96 202 150 178 165  97 183 182 188 180 228 180 173 175 170 180  66
#> [20] 170 183 200 190 177 175 180 150  NA  88 160 193 191 170 196 224 206 183 137
#> [39] 112 183 163 175 180 178  94 122 163 188 198 196 171 184 188 264 188 196 185
#> [58] 157 183 183 170 166 165 193 191 183 168 198 229 213 167  79  96 193 191
#>  [ reached getOption("max.print") -- omitted 12 entries ]
sw[ , 2]  # is the same vector (as column of sw)
#> # A tibble: 87 x 1
#>    height
#>     <int>
#>  1    172
#>  2    167
#>  3     96
#>  4    202
#>  5    150
#>  6    178
#>  7    165
#>  8     97
#>  9    183
#> 10    182
#> # … with 77 more rows

# (b) specific solution for sorting data: 
# using order() function and the variable's column number:
sw[order(-sw[ , 2]), ]
#> # A tibble: 87 x 13
#>    name  height  mass hair_color skin_color eye_color birth_year gender
#>    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
#>  1 Yara…    264    NA none       white      yellow          NA   male  
#>  2 Tarf…    234   136 brown      brown      blue            NA   male  
#>  3 Lama…    229    88 none       grey       black           NA   male  
#>  4 Chew…    228   112 brown      unknown    blue           200   male  
#>  5 Roos…    224    82 none       grey       orange          NA   male  
#>  6 Grie…    216   159 none       brown, wh… green, y…       NA   male  
#>  7 Taun…    213    NA none       grey       black           NA   female
#>  8 Rugo…    206    NA none       green      orange          NA   male  
#>  9 Tion…    206    80 none       grey       black           NA   male  
#> 10 Dart…    202   136 none       white      yellow          41.9 male  
#> # … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>


# (c) Translation into a function:
top_3_base <- function(data, col_nr){

  sorted_data <- data[order(-data[ , col_nr]), ]
  
  result <- sorted_data[1:3, ]  # top 3 rows
  
  return(result)
  
}

# Check:
top_3_base(sw, 2)  # top_3 height values
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Yara…    264    NA none       white      yellow            NA male   Quermia  
#> 2 Tarf…    234   136 brown      brown      blue              NA male   Kashyyyk 
#> 3 Lama…    229    88 none       grey       black             NA male   Kamino   
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
top_3_base(sw, 3)  # top_3 mass values
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Jabb…    175  1358 <NA>       green-tan… orange           600 herma… Nal Hutta
#> 2 Grie…    216   159 none       brown, wh… green, y…         NA male   Kalee    
#> 3 IG-88    200   140 none       metal      red               15 none   <NA>     
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
top_3_base(sw, 7)  # top_3 birth_year values
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Yoda      66    17 white      green      brown            896 male   <NA>     
#> 2 Jabb…    175  1358 <NA>       green-tan… orange           600 herma… Nal Hutta
#> 3 Chew…    228   112 brown      unknown    blue             200 male   Kashyyyk 
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
# But:
# top_3_base(sw, 1)  # would yield an error, as order does not allow "-" for character variables.
  1. What happens in your top_3 function when col_nr refers to a character variable (e.g., dplyr::starwars[ , 1])? Adjust the function so that its result varies by the type of the variable designated by the col_nr argument:

    • if the corresponding variable is a character variable, sort the data in ascending order (alphabetically);
    • if the corresponding variable is a numeric variable, sort the data in descending order (from high to low).

Solution

Adjusted tidyverse solution:

# What happens?
top_3(sw, 1)  # character variables are ordered in reverse of alphabetical order
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Zam …    168    55 blonde     fair, gre… yellow            NA female Zolan    
#> 2 Yoda      66    17 white      green      brown            896 male   <NA>     
#> 3 Yara…    264    NA none       white      yellow            NA male   Quermia  
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>

# Preparation: How can we determine the variable type?
# character variable: 
sw$name   # a vector name in sw
#>  [1] "Luke Skywalker"        "C-3PO"                 "R2-D2"                
#>  [4] "Darth Vader"           "Leia Organa"           "Owen Lars"            
#>  [7] "Beru Whitesun lars"    "R5-D4"                 "Biggs Darklighter"    
#> [10] "Obi-Wan Kenobi"        "Anakin Skywalker"      "Wilhuff Tarkin"       
#> [13] "Chewbacca"             "Han Solo"              "Greedo"               
#> [16] "Jabba Desilijic Tiure" "Wedge Antilles"        "Jek Tono Porkins"     
#> [19] "Yoda"                  "Palpatine"             "Boba Fett"            
#> [22] "IG-88"                 "Bossk"                 "Lando Calrissian"     
#> [25] "Lobot"                 "Ackbar"                "Mon Mothma"           
#> [28] "Arvel Crynyd"          "Wicket Systri Warrick" "Nien Nunb"            
#> [31] "Qui-Gon Jinn"          "Nute Gunray"           "Finis Valorum"        
#> [34] "Jar Jar Binks"         "Roos Tarpals"          "Rugor Nass"           
#> [37] "Ric Olié"              "Watto"                 "Sebulba"              
#> [40] "Quarsh Panaka"         "Shmi Skywalker"        "Darth Maul"           
#> [43] "Bib Fortuna"           "Ayla Secura"           "Dud Bolt"             
#> [46] "Gasgano"               "Ben Quadinaros"        "Mace Windu"           
#> [49] "Ki-Adi-Mundi"          "Kit Fisto"             "Eeth Koth"            
#> [52] "Adi Gallia"            "Saesee Tiin"           "Yarael Poof"          
#> [55] "Plo Koon"              "Mas Amedda"            "Gregar Typho"         
#> [58] "Cordé"                 "Cliegg Lars"           "Poggle the Lesser"    
#> [61] "Luminara Unduli"       "Barriss Offee"         "Dormé"                
#> [64] "Dooku"                 "Bail Prestor Organa"   "Jango Fett"           
#> [67] "Zam Wesell"            "Dexter Jettster"       "Lama Su"              
#> [70] "Taun We"               "Jocasta Nu"            "Ratts Tyerell"        
#> [73] "R4-P17"                "Wat Tambor"            "San Hill"             
#>  [ reached getOption("max.print") -- omitted 12 entries ]
sw[ , 1]  # 1st column of sw
#> # A tibble: 87 x 1
#>    name              
#>    <chr>             
#>  1 Luke Skywalker    
#>  2 C-3PO             
#>  3 R2-D2             
#>  4 Darth Vader       
#>  5 Leia Organa       
#>  6 Owen Lars         
#>  7 Beru Whitesun lars
#>  8 R5-D4             
#>  9 Biggs Darklighter 
#> 10 Obi-Wan Kenobi    
#> # … with 77 more rows

typeof(sw$name)  # character
#> [1] "character"
typeof(sw[ , 1]) # list!
#> [1] "list"
typeof(unlist(sw[ , 1])) # character !!
#> [1] "character"
typeof(sw[[1]])  # character
#> [1] "character"

# numeric variable:
sw$height  # a vector height in sw
#>  [1] 172 167  96 202 150 178 165  97 183 182 188 180 228 180 173 175 170 180  66
#> [20] 170 183 200 190 177 175 180 150  NA  88 160 193 191 170 196 224 206 183 137
#> [39] 112 183 163 175 180 178  94 122 163 188 198 196 171 184 188 264 188 196 185
#> [58] 157 183 183 170 166 165 193 191 183 168 198 229 213 167  79  96 193 191
#>  [ reached getOption("max.print") -- omitted 12 entries ]
sw[ , 2]   # 2nd column of sw 
#> # A tibble: 87 x 1
#>    height
#>     <int>
#>  1    172
#>  2    167
#>  3     96
#>  4    202
#>  5    150
#>  6    178
#>  7    165
#>  8     97
#>  9    183
#> 10    182
#> # … with 77 more rows
sw[[2]]    # as a vector
#>  [1] 172 167  96 202 150 178 165  97 183 182 188 180 228 180 173 175 170 180  66
#> [20] 170 183 200 190 177 175 180 150  NA  88 160 193 191 170 196 224 206 183 137
#> [39] 112 183 163 175 180 178  94 122 163 188 198 196 171 184 188 264 188 196 185
#> [58] 157 183 183 170 166 165 193 191 183 168 198 229 213 167  79  96 193 191
#>  [ reached getOption("max.print") -- omitted 12 entries ]

typeof(sw$height)  # integer
#> [1] "integer"
typeof(sw[ , 2])   # list!
#> [1] "list"
typeof(unlist(sw[ , 2])) # integer
#> [1] "integer"
typeof(sw[[2]])    # integer
#> [1] "integer"

# Adjusting function from above:
top_3 <- function(data, col_nr){
  
  col_name <- names(data)[col_nr]
  col_type <- typeof(unlist(data[ , col_nr]))
  
  if (col_type == "character") {
    
    result <- data %>% 
      arrange(!!sym(col_name)) %>%  # do NOT use desc()
      slice(1:3)
    
  } else {  # col_type != "character"
    
    result <- data %>% 
      arrange(desc(!!sym(col_name))) %>%
      slice(1:3)
    
  }
  
  return(result)
  
}

# Check:
top_3(sw, 2)  # top_3 height values
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Yara…    264    NA none       white      yellow            NA male   Quermia  
#> 2 Tarf…    234   136 brown      brown      blue              NA male   Kashyyyk 
#> 3 Lama…    229    88 none       grey       black             NA male   Kamino   
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
top_3(sw, 3)  # top_3 mass values
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Jabb…    175  1358 <NA>       green-tan… orange           600 herma… Nal Hutta
#> 2 Grie…    216   159 none       brown, wh… green, y…         NA male   Kalee    
#> 3 IG-88    200   140 none       metal      red               15 none   <NA>     
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
top_3(sw, 7)  # top_3 birth_year values
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Yoda      66    17 white      green      brown            896 male   <NA>     
#> 2 Jabb…    175  1358 <NA>       green-tan… orange           600 herma… Nal Hutta
#> 3 Chew…    228   112 brown      unknown    blue             200 male   Kashyyyk 
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
# AND now:
top_3(sw, 1)  # name in alphabetical order
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Ackb…    180    83 none       brown mot… orange          41   male   Mon Cala 
#> 2 Adi …    184    50 none       dark       blue            NA   female Coruscant
#> 3 Anak…    188    84 blond      fair       blue            41.9 male   Tatooine 
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
top_3(sw, 9)  # homeworld in alphabetical order
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Leia…    150    49 brown      light      brown             19 female Alderaan 
#> 2 Bail…    191    NA black      tan        brown             67 male   Alderaan 
#> 3 Raym…    188    79 brown      light      brown             NA male   Alderaan 
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>

Adjusted base R solution:

# What happens?
# top_3_base(sw, 1)  # would yield an error, as order does not allow "-" for character variables.

# (a) specific solution for sorting data:  
# using order() function and a variable name:
sw[order(-sw$height), ]
#> # A tibble: 87 x 13
#>    name  height  mass hair_color skin_color eye_color birth_year gender
#>    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
#>  1 Yara…    264    NA none       white      yellow          NA   male  
#>  2 Tarf…    234   136 brown      brown      blue            NA   male  
#>  3 Lama…    229    88 none       grey       black           NA   male  
#>  4 Chew…    228   112 brown      unknown    blue           200   male  
#>  5 Roos…    224    82 none       grey       orange          NA   male  
#>  6 Grie…    216   159 none       brown, wh… green, y…       NA   male  
#>  7 Taun…    213    NA none       grey       black           NA   female
#>  8 Rugo…    206    NA none       green      orange          NA   male  
#>  9 Tion…    206    80 none       grey       black           NA   male  
#> 10 Dart…    202   136 none       white      yellow          41.9 male  
#> # … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>

# as above: 
typeof(unlist(sw[ , 1]))  # "character"
#> [1] "character"
typeof(unlist(sw[ , 2]))  # "integer"
#> [1] "integer"
typeof(unlist(sw[ , 3]))  # "double"
#> [1] "double"
typeof(sw[[3]])           # "double"
#> [1] "double"

sw[order(sw$name), ] # works
#> # A tibble: 87 x 13
#>    name  height  mass hair_color skin_color eye_color birth_year gender
#>    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
#>  1 Ackb…    180    83 none       brown mot… orange          41   male  
#>  2 Adi …    184    50 none       dark       blue            NA   female
#>  3 Anak…    188    84 blond      fair       blue            41.9 male  
#>  4 Arve…     NA    NA brown      fair       brown           NA   male  
#>  5 Ayla…    178    55 none       blue       hazel           48   female
#>  6 Bail…    191    NA black      tan        brown           67   male  
#>  7 Barr…    166    50 black      yellow     blue            40   female
#>  8 BB8       NA    NA none       none       black           NA   none  
#>  9 Ben …    163    65 none       grey, gre… orange          NA   male  
#> 10 Beru…    165    75 brown      light      blue            47   female
#> # … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>
sw$name # is a vector
#>  [1] "Luke Skywalker"        "C-3PO"                 "R2-D2"                
#>  [4] "Darth Vader"           "Leia Organa"           "Owen Lars"            
#>  [7] "Beru Whitesun lars"    "R5-D4"                 "Biggs Darklighter"    
#> [10] "Obi-Wan Kenobi"        "Anakin Skywalker"      "Wilhuff Tarkin"       
#> [13] "Chewbacca"             "Han Solo"              "Greedo"               
#> [16] "Jabba Desilijic Tiure" "Wedge Antilles"        "Jek Tono Porkins"     
#> [19] "Yoda"                  "Palpatine"             "Boba Fett"            
#> [22] "IG-88"                 "Bossk"                 "Lando Calrissian"     
#> [25] "Lobot"                 "Ackbar"                "Mon Mothma"           
#> [28] "Arvel Crynyd"          "Wicket Systri Warrick" "Nien Nunb"            
#> [31] "Qui-Gon Jinn"          "Nute Gunray"           "Finis Valorum"        
#> [34] "Jar Jar Binks"         "Roos Tarpals"          "Rugor Nass"           
#> [37] "Ric Olié"              "Watto"                 "Sebulba"              
#> [40] "Quarsh Panaka"         "Shmi Skywalker"        "Darth Maul"           
#> [43] "Bib Fortuna"           "Ayla Secura"           "Dud Bolt"             
#> [46] "Gasgano"               "Ben Quadinaros"        "Mace Windu"           
#> [49] "Ki-Adi-Mundi"          "Kit Fisto"             "Eeth Koth"            
#> [52] "Adi Gallia"            "Saesee Tiin"           "Yarael Poof"          
#> [55] "Plo Koon"              "Mas Amedda"            "Gregar Typho"         
#> [58] "Cordé"                 "Cliegg Lars"           "Poggle the Lesser"    
#> [61] "Luminara Unduli"       "Barriss Offee"         "Dormé"                
#> [64] "Dooku"                 "Bail Prestor Organa"   "Jango Fett"           
#> [67] "Zam Wesell"            "Dexter Jettster"       "Lama Su"              
#> [70] "Taun We"               "Jocasta Nu"            "Ratts Tyerell"        
#> [73] "R4-P17"                "Wat Tambor"            "San Hill"             
#>  [ reached getOption("max.print") -- omitted 12 entries ]
# BUT: 
sw[ , 1] # is a tibble
#> # A tibble: 87 x 1
#>    name              
#>    <chr>             
#>  1 Luke Skywalker    
#>  2 C-3PO             
#>  3 R2-D2             
#>  4 Darth Vader       
#>  5 Leia Organa       
#>  6 Owen Lars         
#>  7 Beru Whitesun lars
#>  8 R5-D4             
#>  9 Biggs Darklighter 
#> 10 Obi-Wan Kenobi    
#> # … with 77 more rows
as_vector(sw[ , 1]) # is a vector
#>                   name1                   name2                   name3 
#>        "Luke Skywalker"                 "C-3PO"                 "R2-D2" 
#>                   name4                   name5                   name6 
#>           "Darth Vader"           "Leia Organa"             "Owen Lars" 
#>                   name7                   name8                   name9 
#>    "Beru Whitesun lars"                 "R5-D4"     "Biggs Darklighter" 
#>                  name10                  name11                  name12 
#>        "Obi-Wan Kenobi"      "Anakin Skywalker"        "Wilhuff Tarkin" 
#>                  name13                  name14                  name15 
#>             "Chewbacca"              "Han Solo"                "Greedo" 
#>                  name16                  name17                  name18 
#> "Jabba Desilijic Tiure"        "Wedge Antilles"      "Jek Tono Porkins" 
#>                  name19                  name20                  name21 
#>                  "Yoda"             "Palpatine"             "Boba Fett" 
#>                  name22                  name23                  name24 
#>                 "IG-88"                 "Bossk"      "Lando Calrissian" 
#>                  name25                  name26                  name27 
#>                 "Lobot"                "Ackbar"            "Mon Mothma" 
#>                  name28                  name29                  name30 
#>          "Arvel Crynyd" "Wicket Systri Warrick"             "Nien Nunb" 
#>                  name31                  name32                  name33 
#>          "Qui-Gon Jinn"           "Nute Gunray"         "Finis Valorum" 
#>                  name34                  name35                  name36 
#>         "Jar Jar Binks"          "Roos Tarpals"            "Rugor Nass" 
#>                  name37                  name38                  name39 
#>              "Ric Olié"                 "Watto"               "Sebulba" 
#>                  name40                  name41                  name42 
#>         "Quarsh Panaka"        "Shmi Skywalker"            "Darth Maul" 
#>                  name43                  name44                  name45 
#>           "Bib Fortuna"           "Ayla Secura"              "Dud Bolt" 
#>                  name46                  name47                  name48 
#>               "Gasgano"        "Ben Quadinaros"            "Mace Windu" 
#>                  name49                  name50                  name51 
#>          "Ki-Adi-Mundi"             "Kit Fisto"             "Eeth Koth" 
#>                  name52                  name53                  name54 
#>            "Adi Gallia"           "Saesee Tiin"           "Yarael Poof" 
#>                  name55                  name56                  name57 
#>              "Plo Koon"            "Mas Amedda"          "Gregar Typho" 
#>                  name58                  name59                  name60 
#>                 "Cordé"           "Cliegg Lars"     "Poggle the Lesser" 
#>                  name61                  name62                  name63 
#>       "Luminara Unduli"         "Barriss Offee"                 "Dormé" 
#>                  name64                  name65                  name66 
#>                 "Dooku"   "Bail Prestor Organa"            "Jango Fett" 
#>                  name67                  name68                  name69 
#>            "Zam Wesell"       "Dexter Jettster"               "Lama Su" 
#>                  name70                  name71                  name72 
#>               "Taun We"            "Jocasta Nu"         "Ratts Tyerell" 
#>                  name73                  name74                  name75 
#>                "R4-P17"            "Wat Tambor"              "San Hill" 
#>  [ reached getOption("max.print") -- omitted 12 entries ]
sw[order(as_vector(sw[ , 1])), ] # works
#> # A tibble: 87 x 13
#>    name  height  mass hair_color skin_color eye_color birth_year gender
#>    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
#>  1 Ackb…    180    83 none       brown mot… orange          41   male  
#>  2 Adi …    184    50 none       dark       blue            NA   female
#>  3 Anak…    188    84 blond      fair       blue            41.9 male  
#>  4 Arve…     NA    NA brown      fair       brown           NA   male  
#>  5 Ayla…    178    55 none       blue       hazel           48   female
#>  6 Bail…    191    NA black      tan        brown           67   male  
#>  7 Barr…    166    50 black      yellow     blue            40   female
#>  8 BB8       NA    NA none       none       black           NA   none  
#>  9 Ben …    163    65 none       grey, gre… orange          NA   male  
#> 10 Beru…    165    75 brown      light      blue            47   female
#> # … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>


# Adjusting function from above:
top_3_base <- function(data, col_nr){
  
  col_type <- typeof(unlist(data[ , col_nr]))
  
  if (col_type == "character") {
    
    sorted_data <- data[order(as_vector(data[ , col_nr])), ]
    
  } else {
    
    sorted_data <- data[order(-data[ , col_nr]), ]
    
  }
  
  result <- sorted_data[1:3, ]  # top 3 rows
  
  return(result)
  
}

# Check:
top_3_base(sw, 2)  # top_3 height values
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Yara…    264    NA none       white      yellow            NA male   Quermia  
#> 2 Tarf…    234   136 brown      brown      blue              NA male   Kashyyyk 
#> 3 Lama…    229    88 none       grey       black             NA male   Kamino   
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
top_3_base(sw, 3)  # top_3 mass values
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Jabb…    175  1358 <NA>       green-tan… orange           600 herma… Nal Hutta
#> 2 Grie…    216   159 none       brown, wh… green, y…         NA male   Kalee    
#> 3 IG-88    200   140 none       metal      red               15 none   <NA>     
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
top_3_base(sw, 7)  # top_3 birth_year values
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Yoda      66    17 white      green      brown            896 male   <NA>     
#> 2 Jabb…    175  1358 <NA>       green-tan… orange           600 herma… Nal Hutta
#> 3 Chew…    228   112 brown      unknown    blue             200 male   Kashyyyk 
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
# AND now:
top_3_base(sw, 1)  # name in alphabetical order
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Ackb…    180    83 none       brown mot… orange          41   male   Mon Cala 
#> 2 Adi …    184    50 none       dark       blue            NA   female Coruscant
#> 3 Anak…    188    84 blond      fair       blue            41.9 male   Tatooine 
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
top_3_base(sw, 9)  # homeworld in alphabetical order
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Leia…    150    49 brown      light      brown             19 female Alderaan 
#> 2 Bail…    191    NA black      tan        brown             67 male   Alderaan 
#> 3 Raym…    188    79 brown      light      brown             NA male   Alderaan 
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
  1. Generalise your top_3 function to a top_n function that returns the top n rows when sorted by col_nr. What would be a good default value for n? What should happen when n = NA and when n > nrow(data)?

Check all your functions with appropriate inputs.

Solution

Adjusted tidyverse solution:

nrow(sw)  # 87
#> [1] 87

# Preparation:
# a. nrow(data) is a good default value for n. 
# b. n = NA should return NA. 
# c. n > nrow(data) should return n = nrow(data), plus a message.

# Generalizing function from above:
top_n <- function(data, col_nr, n = nrow(data)){
  
  if (is.na(n) || is.na(data)) {
    return(NA)  # return early
  }
  
  if (n > nrow(data)) {
    message("n exceeds nrow(data). Using n = nrow(data) instead...")
    n <- nrow(data)
  }
  
  col_name <- names(data)[col_nr]
  col_type <- typeof(unlist(data[ , col_nr]))
  
  if (col_type == "character") {
    
    result <- data %>% 
      arrange(!!sym(col_name)) %>%  # do NOT use desc()
      slice(1:n)  # use top n instead of 3
    
  } else {  # col_type != "character"
    
    result <- data %>% 
      arrange(desc(!!sym(col_name))) %>%
      slice(1:n)  # use top n instead of 3
    
  }
  
  return(result)
  
}

# Check:
top_n(NA) # NA (even though col_nr is missing)
#> [1] NA
top_n(NA, 2, n = 1)  # NA
#> [1] NA
top_n(sw, 2, n = NA) # NA
#> [1] NA

top_n(sw, 2, n = 3)  # top n = 3 height values
#> # A tibble: 3 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Yara…    264    NA none       white      yellow            NA male   Quermia  
#> 2 Tarf…    234   136 brown      brown      blue              NA male   Kashyyyk 
#> 3 Lama…    229    88 none       grey       black             NA male   Kamino   
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
top_n(sw, 3, n = 5)  # top n = 5 mass values
#> # A tibble: 5 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Jabb…    175  1358 <NA>       green-tan… orange         600   herma… Nal Hutta
#> 2 Grie…    216   159 none       brown, wh… green, y…       NA   male   Kalee    
#> 3 IG-88    200   140 none       metal      red             15   none   <NA>     
#> 4 Dart…    202   136 none       white      yellow          41.9 male   Tatooine 
#> 5 Tarf…    234   136 brown      brown      blue            NA   male   Kashyyyk 
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
top_n(sw, 7, n = 1)  # top n = 1 birth_year values
#> # A tibble: 1 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Yoda      66    17 white      green      brown            896 male   <NA>     
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
# AND:
top_n(sw, 1, n = 2)   # top n = 2 names in alphabetical order
#> # A tibble: 2 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Ackb…    180    83 none       brown mot… orange            41 male   Mon Cala 
#> 2 Adi …    184    50 none       dark       blue              NA female Coruscant
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>
top_n(sw, 9, n = 999) # message and top n = 87 homeworlds 
#> # A tibble: 87 x 13
#>    name  height  mass hair_color skin_color eye_color birth_year gender
#>    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
#>  1 Leia…    150    49 brown      light      brown             19 female
#>  2 Bail…    191    NA black      tan        brown             67 male  
#>  3 Raym…    188    79 brown      light      brown             NA male  
#>  4 Ratt…     79    15 none       grey, blue unknown           NA male  
#>  5 Lobot    175    79 none       light      blue              37 male  
#>  6 Jek …    180   110 brown      fair       blue              NA male  
#>  7 Nute…    191    90 none       mottled g… red               NA male  
#>  8 Ki-A…    198    82 white      pale       yellow            92 male  
#>  9 Mas …    196    NA none       blue       blue              NA male  
#> 10 Mon …    150    NA auburn     fair       blue              48 female
#> # … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>

Note: Functions for different tasks and data types

The following three exercises illustrate how functions can use, mix, and merge various data types to solve different tasks. Specifically, they ask you to write functions for

  • visualizing data as plots (Exercise 5),
  • printing numbers as text (Exercise 6),
  • computing with dates (Exercise 7).

A.11.5 Exercise 5

A plotting function

This exercise asks you to write a function for creating a specific type of plot.

  1. Write a function plot_scatter that takes a table (tibble or data frame) with 2 numeric variables x and y as my_data and plots a scatterplot of the values of y by the values of x.

Hint: First write a ggplot command that creates a scatterplot of my_data. Then wrap a function plot_scatter around this command that takes my_data as its argument.

Test your function by using the following 2 tibbles tb_1 and tb_2 as my_data:

set.seed(101)
n_num <- 100
x_val <- runif(n = n_num, min = 30, max = 90)
y_val <- runif(n = n_num, min = 30, max = 90)

tb_1 <- tibble::tibble(x = x_val, y = y_val)
tb_2 <- tibble::tibble(x = x_val, y = x_val + rnorm(n = n_num, mean = 0, sd = 10))

names(tb_1)
#> [1] "x" "y"

Solution

library(tidyverse)

plot_scatter <- function(my_data) {
  
  ggplot(data = my_data) +
    geom_point(aes(x = x, y = y), alpha = .5) +
    labs(title = "My scatterplot", 
         caption = paste0("Data from ", deparse(substitute(my_data)))) +
    theme_bw()
  
}

# Check:
plot_scatter(my_data = tb_1)

plot_scatter(my_data = tb_2)

  1. For any table my_data that contains 2 numeric variables x and y we can fit a linear model as follows:
my_data <- tb_1

my_lm <- lm(y ~ x, data = my_data)
my_lm
#> 
#> Call:
#> lm(formula = y ~ x, data = my_data)
#> 
#> Coefficients:
#> (Intercept)            x  
#>     53.2340       0.1318

# Get the model's intercept and slope values:
my_lm$coefficients[1]  # intercept
#> (Intercept) 
#>    53.23402
my_lm$coefficients[2]  # slope
#>         x 
#> 0.1318431

Incorporate the fit of a linear model into your plot_scatter function. Use a linear model to add a line to your plot that shows the prediction of the linear model (in a color that can be set by an optional col argument).

Solution

plot_scatter <- function(my_data, col = "red") {

  # fit linear model:
  lm <- lm(y ~ x, data = my_data)
  intercept <- lm$coefficients[1]
  slope <- lm$coefficients[2]  
    
  # plot:
  ggplot(data = my_data) +
    geom_point(aes(x = x, y = y), alpha = 1/2, size = 2) +
    geom_abline(intercept = intercept, slope = slope, color = col, lty = 2, size = 1) + 
    labs(title = "My scatterplot", 
         caption = paste0("Data from ", deparse(substitute(my_data)))) +
    theme_bw()
  
}

# Check:
plot_scatter(my_data = tb_1, col = Pinky)

plot_scatter(my_data = tb_2, col = Seeblau)

A.11.6 Exercise 6

A common problem when printing numbers in text is that the number of digits to be printed (i.e., characters or symbols) depends on the number’s value. This means that series of different numbers often have different lengths, which makes it hard to align them (e.g., in tables). A potential solution to this is adding leading or trailing zeros (or empty spaces) to the front and back of a number.

The function num_as_char() of the ds4psy package provides a (sub-optimal) solution to this problem by containing 3 main arguments:

  • x for the number to be formatted (required);
  • n_pre_dec for the number of digits prior to the decimal separator (default n_pre_dec = 2);
  • n_dec to specify the number of digits after the decimal separator (default n_dec = 2).

Additional arguments specify the symbol sym to use for filling up digit positions and the symbol used as decimal separator sep.

  1. Experiment with num_as_char() to check its functionality and limits.

Solution

library(ds4psy)

# explore: 
num_as_char(x =  1)
#> [1] "01.00"
num_as_char(x =  2, n_pre_dec = 3, n_dec = 3)
#> [1] "002.000"
num_as_char(x =  3.33, n_pre_dec = 1, n_dec = 1)
#> [1] "3.3"
num_as_char(x =  6.66, n_pre_dec = 1, n_dec = 1)  # rounding up
#> [1] "6.7"
num_as_char(x =  6.66, n_pre_dec = 2, n_dec = 0)  # rounding up
#> [1] "07"
num_as_char(x = 66.66, n_pre_dec = 1, n_dec = 1)  # not removing needed digits
#> [1] "66.7"

# limits:
num_as_char(1:4)   # works for vectors 
#> [1] "01.00" "02.00" "03.00" "04.00"
num_as_char(1, sym = "89")  # works, but warns
#> [1] "891.8989"
# num_as_char(NA)  # yields an error
# num_as_char("X") # yields an error
  1. Write your own function num_to_char() that achieves the same (or a similar) functionality.

Hint: The ds4psy package contains a solution that also works for vectors, but uses 2 for loops to achieve this. Try writing a simpler solution that works for individual numbers x (i.e., scalars, or vectors of length 1). If you are stuck, try adapting parts of the solution used by num_as_char.

Solution

Evaluate ds4psy::num_as_char (without any parentheses) to inspect the (sub-optimal) solution implemented by the ds4psy package. Then copy and adapt it for a simpler solution — by removing the for loops — that works for scalar numbers.

A.11.7 Exercise 7

Computing with dates

  1. Use what you have learned in Chapter 10: Time data to write a function that takes a date or time (e.g., the date of someone’s birthday) as its input and returns the corresponding age (as a number, rounded to completed years) as output.

  2. Check your function with appropriate examples.

  3. Does your solution also work when multiple dates are entered (as a vector)?

Solution

Using base R date and time functions:

# Define a function that computes current age (in full years): 

age_in_years <- function(bdate){
  
  # Initialize:
  age <- NA
  
  # Get some numbers:
  bd_year <- as.numeric(format(bdate, "%Y"))
  bd_month <- as.numeric(format(bdate, "%m"))
  bd_day <- as.numeric(format(bdate, "%d"))
  
  today <- Sys.Date()
  cur_year <- as.numeric(format(today, "%Y"))
  cur_month <- as.numeric(format(today, "%m"))
  cur_day <- as.numeric(format(today, "%d"))
  
  # # Determine whether bday happened this year:
  
  # # Non-vectorized version: 
  # if ((cur_month > bd_month) || ((cur_month == bd_month) & (cur_day >= bd_day))) 
  # { 
  #   bday_this_year <- TRUE
  # } 
  # else {
  #   bday_this_year <- FALSE
  # }

  # Vectorized version: 
  bday_this_year <- ifelse((cur_month > bd_month) | ((cur_month == bd_month) & (cur_day >= bd_day)), TRUE, FALSE)
  
  # Compute age (in full years):
  age <- (cur_year - bd_year) - !bday_this_year
  
  return(age)
  
}

# Check with an example: 
today <- Sys.Date()    # today's date 
bday_1 <- today - 366  # more than 1 year ago
bday_2 <- today - 365  # 1 year ago
bday_3 <- today - 364  # less than 1 year ago

age_in_years(bday_1)  # 1
#> [1] 1
age_in_years(bday_2)  # 1
#> [1] 1
age_in_years(bday_3)  # 0 (qed)
#> [1] 0

# Note: This solution also works for vectors:
bdays <- today - 366:364
age_in_years(bdays)
#> [1] 1 1 0

Using the lubridate package:

library(lubridate)

lubriage <- function(bdate) {
  
  lifetime <- (bdate %--% today()) # time interval from bdate to today() 
  (lifetime %/% years(1))          # integer division (into a period of full years)
  
}

# Check with an example: 
bday_1 <- today() - years(18) - days(1)  # 18 years ago, yesterday
bday_2 <- today() - years(18) + days(0)  # 18 years ago, today
bday_3 <- today() - years(18) + days(1)  # 18 years ago, tomorrow

lubriage(bday_1)  # 18
#> [1] 18
lubriage(bday_2)  # 18
#> [1] 18
lubriage(bday_3)  # 17 (qed)
#> [1] 17

# Note: This solution also works for vectors:
bdays <- today() - years(18) + days(-1:1)  # 18 years ago, -1/0/+1 day
lubriage(bdays)
#> [1] 18 18 17

This concludes our exercises on creating new functions.