A.12 Solutions (12)

ds4psy: Solutions 12

Here are the solutions to the exercises on loops and applying functions to data structures of Chapter 12 (Section 12.4).

A.12.1 Exercise 1

Fibonacci loop and functions

  1. Look up the term Fibonacci numbers and use a for loop to create a numeric vector of the first 25 Fibonacci numbers (for a series of numbers starting with 0, 1).

  2. Incorporate your for loop into a fibonacci function that returns a numeric vector of the first n Fibonacci numbers. Test your function for fibonacci(n = 25).

  3. Generalize your fibonacci function to also accept the first 2 elements (e1 and e2) as inputs to the series and then create the first n Fibonacci numbers given these initial elements. Test your function for fibonacci(e1 = 1, e2 = 3, n = 25).

Solution

  1. According to Wikipedia, a Fibonacci sequence is the integer sequence of 0, 1, 1, 2, 3, 5, 8, ....

Thus, the first 2 elements \(e_{1}\) and \(e_{2}\) of the series need to be provided. For \(i > 2\), each element \(e_{i}\) is the sum of the 2 preceding elements:

\(e_{i} = e_{i-2} + e_{i-1}\).

We turn this into a for loop as follows:

  1. Incorporating the for loop into a function fibonacci(n):

Checking the function:

Realizing that we only need the for loop when n > 2, we could re-write the same function as follows:

Checking the function:

  1. Generalizing fibonacci(n) to a function fibonacci(e1, e2, n) is simple: The following version makes the arguments e1 and e2 optional to return the standard sequence by default.

This generalized fibonacci function still allows all previous calls, like:

but now also allows specifying different initial elements:

A.12.2 Exercise 2

A.12.2.1 Looping for divisors

  1. Write a for loop that prints out all positive divisors of the number 1000.

Hint: Use N %% x == 0 to test whether x is a divisor of N.

Solution

Our first for loop required N = 1000 iterations. However, realizing that the largest divisor cannot exceed N/2 (for even numbers), and N\3 (for odd numbers), we could achieve the same results with far fewer iterations:

  1. Write a divisors function that uses a for loop to return a numeric vector containing all positive divisors of a natural number N.

Hint: Note that we do not know the length of the resulting vector.

Solution

  1. Use your divisors function to answer the question: Does the number 1001 have fewer or more divisors than the number 1000?
  1. Use your divisors function and another for loop to answer the question: Which prime numbers exist between the number 111 and the number 1111?

Hint: A prime number (e.g., 13) has only 2 divisors: 1 and the number itself.

Note some details:

  • The loop above uses the divisors function within a for loop (in the range range_min:range_max). As the divisors function also uses a for loop to find all divisors of N, we are using a loop inside a loop. As such structures can quickly become very inefficient, it is a good idea to try reducing the number of iterations when possible.

  • The condition length(divisors(i)) == 2 would fail to detect a prime number i=1. A more general solution would first define an is_prime function and then use is_prime(i) in the if statement of the for loop:

The is_prime function can be written in many different ways, of course. Check out this stackoverflow thread for solutions.

A.12.3 Exercise 3

Throwing dice

  1. Implement a function dice that uses the base R function sample() to simulate a throw of a dice (i.e., yielding an integer from 1 to 6 with equal probability).

Solution

  1. Add an argument n (for the number of throws) to your function and modify it by using a for loop to throw the dice n times, and returning a vector of length n that shows the results of the n throws.

Solution

Note: As the sample function contains a size argument, a simpler version of the same function could have been:

  1. Use a while loop to throw dice(n = 1) until you throw the number 6 twice in a row and show the sequence of all throws up to this point.

Hint: Given a sequence throws, the n-th element is throws[n]. Hence, the last element of throws is throws[length(throws)].

Solution

  1. Use your solution of 3. to conduct a simulation that addresses the following question:
  • How many times on average do we need to throw dice(1) to obtain the number 6 twice in a row?

Hint: Use a for loop to run your solution to 3. for N = 10000 times and store the length of the individual throws in a numeric vector.

A.12.4 Exercise 4

Mapping functions to data

Write code that uses a function of the base R apply or purrr map family of functions to:

  1. Compute the mean of every column in mtcars.
  2. Determine the type of each column in ggplot2::diamonds.
  3. Compute the number of unique values in each column of iris.
  4. Generate 10 random normal numbers for each of μ = −100, 0, and 100.

Note: This exercise is based on Exercise 1 of Chapter 21.5.3 in r4ds.

Solution

# 1. Compute the mean of every column in `mtcars`:

# (a) Solve for 1st column: 
mean(mtcars$mpg)  
#> [1] 20.09062

# (b) Generalize to all columns:
as_tibble(mtcars) %>% map_dbl(mean)
#>        mpg        cyl       disp         hp       drat         wt       qsec 
#>  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
#>         vs         am       gear       carb 
#>   0.437500   0.406250   3.687500   2.812500
apply(X = mtcars, MARGIN = 2, FUN = mean)
#>        mpg        cyl       disp         hp       drat         wt       qsec 
#>  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
#>         vs         am       gear       carb 
#>   0.437500   0.406250   3.687500   2.812500

# 2. Determine the type of each column in `ggplot2::diamonds`:

# (a) Solve for 1st column: 
typeof(ggplot2::diamonds$carat)  # solution for 1st column
#> [1] "double"

# (b) Generalize to all columns:
ggplot2::diamonds %>% map_chr(typeof)
#>     carat       cut     color   clarity     depth     table     price         x 
#>  "double" "integer" "integer" "integer"  "double"  "double" "integer"  "double" 
#>         y         z 
#>  "double"  "double"
apply(X = ggplot2::diamonds, MARGIN = 2, FUN = typeof) 
#>       carat         cut       color     clarity       depth       table 
#> "character" "character" "character" "character" "character" "character" 
#>       price           x           y           z 
#> "character" "character" "character" "character"
# Note: All variables viewed as characters!

# 3. Compute the number of unique values in each column of `iris`:

# (a) Solve for 1st column: 
n_distinct(iris$Sepal.Length)  # solution for 1st column
#> [1] 35

# (b) Generalize to all columns:
as_tibble(iris) %>% map_int(n_distinct)
#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
#>           35           23           43           22            3
apply(X = iris, MARGIN = 2, FUN = n_distinct)
#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
#>           35           23           43           22            3

# 4. Generate 10 random normal numbers for each of `μ = −100, 0, and 100`:

# (a) Solve for 1st mean: 
mu <- c(-100, 0, 100)
rnorm(n = 10, mean = mu[1])
#>  [1]  -99.50863  -99.71473 -100.44417  -98.88835  -99.96750 -100.61690
#>  [7] -100.08146  -99.68594  -98.64650 -100.27637

# (b) Generalize to all means:
mu %>% map(rnorm, n = 10) %>% str()
#> List of 3
#>  $ : num [1:10] -99.3 -100 -99.6 -100.6 -101.6 ...
#>  $ : num [1:10] -1.503 -0.704 -1.304 -0.779 0.449 ...
#>  $ : num [1:10] 99.7 99.7 101 100.5 100.6 ...
lapply(X = mu, FUN = rnorm, n = 10) %>% str()
#> List of 3
#>  $ : num [1:10] -99 -100.8 -100.9 -99.5 -100 ...
#>  $ : num [1:10] 0.767 -0.449 -0.938 1.927 -0.862 ...
#>  $ : num [1:10] 100.2 99.3 99.9 101.8 98.8 ...

# Note: In 4(b), we add str() to show the structure of the output lists.

A.12.5 Exercise 5

Z-transforming tables

In this exercise, we will standardize an entire table of data (using a for loop, an apply, and a map function). We will first write a utility function that achieves the desired transformation for a vector and then compare and contrast different ways of applying this function to a table of data.

In case you are not familiar with the notion of a z score or standard score, look up these terms (e.g., on Wikipedia).

  1. Write a function called z_trans that takes a vector v as input and returns the z-transformed (or standardized) values as output if v is numeric and returns v unchanged if it is non-numeric.

Hint: Remember that z <- (v - mean(v)) / sd(v)), but beware that v could contain NA values.

Solution

  1. Load the dataset for the false positive psychology (see Section B.2 of Appendix B) into falsePosPsy and remove any non-numeric variables from it.
  • Use an appropriate map function to to create a single vector that — for each column in falsePosPsy — indicates whether or not it is a numeric variable?

Hint: The function is.numeric tests whether a vector is numeric.

Solution

Note that we use cbind rather than c within the for loop to add the results of z_trans to out. This is because z_trans returns a vector for every column of only_numeric. Alternatively, we also could have constructed a very long vector (with a length of nrow(fpp_numeric) x ncol(fpp_numeric) = 78 x 18 = 1404) and turned it into a rectangular table later.

  1. Repeat the task of 2. (i.e., applying z_trans to all numeric columns of falsePosPsy) by using the base R apply function, rather than a for loop. Save and print your resulting data structure as a tibble out_2.

Hint: Remember to set the MARGIN argument to apply z_trans over all columns, rather than rows.

Solution

  1. Repeat the task of 2. and 3. (i.e., applying z_trans to all numeric columns of falsePosPsy) by using an appropriate version of a map function from the purrr package. Save and print your resulting data structure as a tibble out_3.

Hint: Note that the desired output structure is a rectangular data table, which is also a list.

A.12.6 Exercise 6

Cumulative savings revisited

In Exercise 2 of Chapter 1: Basic R concepts and commands, we computed the cumulative sum of an initial investment amount a = 1000, given an annual interest rate of int of .1%, and an annual rate of inflation inf of 2%, after a number of n full years (e.g., n = 10):

Our solution in Chapter 1 consisted in an arithmetic formula which computes a new total based on the current task parameters:

Given our new skills about writing loops and functions (from Chapter 11), we can solve this task in a variety of ways. This exercise illustrates some differences between loops, a function that implements the formula, and a vector-based solution. Although all these approaches solve the same problem, they differ in important ways.

  1. Write a for loop that iteratively computes the current value of your investment after each of 1:n years (with \(n \geq 1\)).

Hint: Express the new value of your investment a as a function of its current value a and its change based on inf and int in each year.

Solution

Note the difference between both for loops in this exercise and the vector-based solution:

  • In 6.1, the current value of a was used to iteratively compute each new value of a.

  • In 6.3, we use the function from 6.2 to directly compute a specific value x for given parameter values (e.g., of a and n). The loop used in 6.1 incrementally computes the new value of a for every increment of i. Thus, the corresponding loop must begin at i = 1 and increment its index in steps of consecutive integer values (2, 3, …). By contrast, the solution of 6.3 is more general and would also work for different loop ranges (e.g., i in c(5, 10, 15)).

  • The solution in 6.4 is similar to the loop used in 6.3, but replaces the increments of n by a vector of values for n.

This concludes our exercises on loops and applying functions to data structures.


ds4psy

[50_solutions.Rmd updated on 2020-02-24 14:01:40 by hn.]