A.1 Solutions (01)

ds4psy: Solutions 1

Here are the solutions to the basic R exercises of Chapter 1 (Section 1.8).

A.1.1 Exercise 1

  1. Check the Environment tab of RStudio to see which objects are currently defined to which values (after working through this chapter). Then evaluate and explain the following expressions (and correct any errors that may occur):
# Note: The following assume the object definitions from above.
a
b
b <- a + a
a + a == b
!!a

sqrt(2)  # see ?sqrt
sqrt(2)^2
sqrt(2)^2 == 2  # Why FALSE? 
# Hint: Compute the difference sqrt(2)^2 - 2
sqrt(2)^2 - 2   # is not 0

o / O / 0   # (using o and O from above)
0 / (o * O)
0 / (o * 0)

a + b + C   # are all objects defined?

sum(a, b) - sum(a + b)

b:a  # divide b by a
length(b:a)

i <- i + 1  # increment i by 1

nchar(d) - length(d)

e
e + e + !!e

e <- stuff
paste(d, e)  # paste "adds" 2 character objects

Solution

Re-do the current object definitions at the end of Chapter 1.8:

a <- 100
b <- 200
d <- "weird"
e <- TRUE
o <- FALSE # OR 
# o <- "ene mene mu"
O <- 5

Evaluate and explain the following expressions (and correct any errors that may occur):

# Note: The following assume the object definitions from above.
a  # 100
b  # 200
b <- a + a  # b is changed to 100 + 100
a + a == b  # TRUE, as both are 200
!!a            # TRUE, as
as.logical(a)  # TRUE
as.logical(0)  # FALSE

sqrt(2)  # see ?sqrt
sqrt(2)^2       # 2 (as it should)
sqrt(2)^2 == 2  # Why FALSE? 

# Hint: Compute the difference sqrt(2)^2 - 2
sqrt(2)^2 - 2   # is not 0, but some very small number.

o / O / 0   # (using o and O from above)
# If o is set to "ene mene mu": Error, as non-numeric.
# If o is set o FALSE: 0/0 => NaN.
# Correction: Set o to some number:
o <- 1
0 / (o * O) # works
0 / (o * 0) # NaN, due to division of 0/0

a + b + C   # are all objects defined?
# C is defined as? some function (see ?C for details)
# Correction: Set C to some number:
C <- 1
a + b + C  # works: 301

sum(a, b) - sum(a + b)
sum(a, b)   # 300
sum(a + b)  # 300

b:a  # divide b by a?
# b:a creates a vector of integers from b = 200 to a = 100: 
length(b:a)  # 101 elements

i <- i + 1  # increment i by 1
# Error: i is not defined.
# Correction: Set i to some number:
i <- 1
i <- i + 1  # works: 
i           # 2

nchar(d) - length(d)  # returns 4
d          # d is set to "weird"
nchar(d)   # 5 characters
length(d)  # 1 element (in character vector, scalar)

e  # TRUE
e + e + !!e  # 3 = 1 + 1 + 1

e <- stuff
# Error: stuff is not defined.
e <- "stuff"  # define as character (text)
paste(d, e)   # works: "weird stuff"
  1. In Section 1.2.2, we explored the plot_fn() function of the ds4psy package to discover the meaning of its arguments. Assume the perspective of an empirical scientist to explore and decipher the arguments of the plot_fun() function in a similar fashion.
library(ds4psy)
plot_fun()

Hint: Solving this task essentially means to answer the question “What does this argument do?” for each argument (i.e., the lowercase letters from a to f, and c1 and c2).

Solution

The documentation of plot_fun() (available via ?plot_fun()) shows the following list of arguments:

  • a A (natural) number. Default: a = NA.
  • b A Boolean value. Default: b = TRUE.
  • c A Boolean value. Default: c = TRUE.
  • d A (decimal) number. Default: d = 1.0.
  • e A Boolean value. Default: e = FALSE.
  • f A Boolean value. Default: f = FALSE.
  • g A Boolean value. Default: g = FALSE.
  • c1 A color palette (e.g., as a vector). Default: c1 = c(rev(pal_seeblau), "white", pal_grau, "black", Bordeaux).
  • c2 A color (e.g., as a character). Default: c2 = "black".

The plot_fun() function of the ds4psy is a simplified and deliberately obscured version of the plot_tiles() function. See the documentation of the latter (via ?plot_tiles()) to obtain the documentation of its arguments.

  1. Use your exploration of plot_fun() to reconstruct the command that creates the following plots:

Hint: Check the documentation of plot_fun() (e.g., for color information).

Solution

plot_fun(a = 5, d = 4, e = TRUE, c2 = "white")

plot_fun(a = 4, c = FALSE, f = TRUE, g = TRUE, c1 = c("steelblue", "white", "firebrick"))

A.1.2 Exercise 2

With only a little knowledge of R you can perform quite fancy financial arithmetic. Assume that you have won an amount a of EUR 1000 and are considering to deposit this amount into a new bank account that offers an annual interest rate int of 0.1%.

  1. How much would your account be worth after waiting for n = 2 full years?

  2. What would be the total value of your money after n = 2 full years if the annual inflation rate inf is 2%?

  3. What would be the results to 1. and 2. if you waited for n = 10 years?

Answer these questions by defining well-named objects and performing simple arithmetic computations on them.

Solution

# Definitions:
a_0 <- 1000     # initial amount of savings (year 0)
int  <- .1/100  # interest rate (annual)
inf <- 2/100    # inflation rate (annual)
n <- 2          # number of years

## 1. Savings with interest: ----- 

# 1a. In 2 steps: 
a_1 <- a_0 + (a_0 * int)  # after 1 year
a_1
#> [1] 1001

a_2 <- a_1 + (a_1 * int)  # after 2 years
a_2
#> [1] 1002.001

# 1b. Both in 1 step: 
a_0 * (1 + int)^n
#> [1] 1002.001


# 2. Also accounting for inflation: ----- 
total <- a_0 * (1 + int - inf)^n
total 
#> [1] 962.361

# 3. Different numbers of years:
n <- 10

# Use formulas from 1b and 2:
a_0 * (1 + int)^n        # interest only
#> [1] 1010.045
a_0 * (1 + int - inf)^n  # interest + inflation
#> [1] 825.4487

Note: Do not worry if you find this task difficult at this point — we will revisit it later. In Exercise 6 of Chapter 12: Iteration, we will use loops and functions to solve it in a more general fashion.

A.1.3 Exercise 3

When introducing arithmetic functions above, we showed that they can be used with numeric scalars (i.e., numeric objects with a length of 1).

  1. Demonstrate that the same arithmetic functions also work with 2 numeric vectors x and y (of the same length).

  2. What happens when x and y have different lengths?

Solution

Arithmetic with vectors, rather than scalars:

## 1. Arithmetic with vectors of the same length:
x <- c(2, 4, 6)
y <- c(1, 2, 3)

+ x     # keeping sign 
#> [1] 2 4 6
- y     # reversing sign
#> [1] -1 -2 -3
x + y   # addition
#> [1] 3 6 9
x - y   # subtraction
#> [1] 1 2 3
x * y   # multiplication
#> [1]  2  8 18
x / y   # division
#> [1] 2 2 2
x ^ y   # exponentiation
#> [1]   2  16 216
x %/% y # integer division
#> [1] 2 2 2
x %% y  # remainder of integer division (x mod y)
#> [1] 0 0 0

When vectors have different lengths, the shorter one is recycled to the length of the longer one (and a Warning is issued). The result of vector arithmetic involving multiple vectors is a vector with as many elements as the longest vector:

## 2. Arithmetic with vectors of different lengths:
x <- c(2, 4, 6)
y <- c(1, 2)

+ x     # keeping sign 
#> [1] 2 4 6
- y     # reversing sign
#> [1] -1 -2
x + y   # addition
#> [1] 3 6 7
x - y   # subtraction
#> [1] 1 2 5
x * y   # multiplication
#> [1] 2 8 6
x / y   # division
#> [1] 2 2 6
x ^ y   # exponentiation
#> [1]  2 16  6
x %/% y # integer division
#> [1] 2 2 6
x %% y  # remainder of integer division (x mod y)
#> [1] 0 0 0

Generalize to more than 2 vectors:

## Generalize to 3 vectors:
x <- c(1)
y <- c(1, 2)
z <- c(1, 2, 3)

x + y + z  # => 3 5 5 + Warning
#> [1] 3 5 5

# Explanation: Due to recycling of x and y to the length of z, 
# R actually computes: 
# c(1, 1, 1) + c(1, 2, 1) + c(1, 2, 3)

A.1.4 Exercise 4

Predict the result of the arithmetic expression x %/% y * y + x %% y. Then test your prediction by assigning some number to x and y and evaluating the expression. Finally, explain why the result occurs.

Solution

## Note: The given expression 
x %/% y * y + x %% y  
#> [1] 1 1
# is identical to: 
((x %/% y) * y) + (x %% y) 
#> [1] 1 1

## Prediction: 
x %/% y * y + x %% y  # will evaluate to x.
#> [1] 1 1

## Testing the prediction:
x <- 4711
y <- 1307

((x %/% y) * y) + (x %% y) == x  # prediction is TRUE
#> [1] TRUE

## Explanation:
x %/% y      # yields the integer part of x/y
#> [1] 3
x %/% y * y  # multiplies this integer part by y 
#> [1] 3921
x %% y       # yields the remainder of integer division
#> [1] 790

# => The sum  
(x %/% y * y) + (x %% y)
#> [1] 4711
# yields x.

A.1.5 Exercise 5

Assume the following definitions for a survey:

  • A person with an age from 1 to 17 years is classified as a minor,

  • a person with an age from 18 to 64 years is classified as an adult,

  • a person with an age from 65 to 99 years is classified as a senior.

Generate a vector with 100 random samples that specifies the age of 100 people (in years), but contains exactly 20 minors, 50 adults, and 30 seniors.

Now use some functions on your age vector to answer the following questions:

  1. What is the average (mean), minimum, and maximum age in this sample?

  2. How many people are younger than 25 years?

  3. What is the average (mean) age of people older than 50 years?

  4. How many people have a round age (i.e., an age that is divisible by 10)? What is their mean age?

Solution

set.seed(42) # for replicable randomness

# Creating 3 groups:
minor_range  <-  1:17
adult_range  <- 18:64
senior_range <- 64:99

# Creating 3 vectors (1 for each sub-group):
minors  <- sample(minor_range, 20, replace = TRUE)
adults  <- sample(adult_range, 50, replace = TRUE)
seniors <- sample(senior_range, 30, replace = TRUE)

# Combining 3 vectors into 1:
age <- c(minors, adults, seniors)
age
#>  [1] 17  5  1 10  4 17 15  7  4  5 14 15  3  9  4  5 13  5  2  8 20 50 59 41 47
#> [26] 60 32 39 25 53 21 39 35 62 45 22 21 51 52 41 40 43 23 23 19 20 38 19 55 27
#> [51] 57 22 50 56 53 62 59 26 46 29 37 26 60 52 46 33 54 45 63 22 91 65 81 87 81
#>  [ reached getOption("max.print") -- omitted 25 entries ]

# Checks:
length(age)
#> [1] 100
min(age)
#> [1] 1
max(age)
#> [1] 99
range(age)
#> [1]  1 99
mean(age)
#> [1] 46.45

# Using indexing: 
# How many people are younger than 25 years? 
length(age[age < 25])
#> [1] 31

# What is the average (mean) age of people older than 50 years?
mean(age[age > 50])
#> [1] 73.26087

# Round age:
round_ages <- age[age %% 10 == 0]
length(round_ages)
#> [1] 10
mean(round_ages)
#> [1] 48

A.1.6 Exercise 6

Examine the participant information in p_info (Woodworth et al., 2018) by describing each of its variables:

  1. How many individuals are contained in the dataset?

  2. What percentage of them is female (i.e., has a sex value of 1)?

  3. How many participants were in one of the 3 treatment groups (i.e., have an intervention value of 1, 2, or 3)?

  4. What is the participants’ mean education level? What percentage has a university degree (i.e., an educ value of at least 4)?

  5. What is the age range (min to max) of participants? What is the average (mean and median) age?

  6. Describe the range of income levels present in this sample of participants. What percentage of participants self-identifies as a below-average income (i.e., an income value of 1)?

Solution

# Load data:
# (a) from ds4psy package:
library(ds4psy)
p_info <- ds4psy::posPsy_p_info  # from ds4psy package

# (b) from file (stored online): 
# p_info <- readr::read_csv(file = "http://rpository.com/ds4psy/data/posPsy_participants.csv")
# p_info

## 1. Number of participants: ----- 
n_total <- nrow(p_info)
n_total
#> [1] 295

## 2. How many female participants? -----
n_female <- length(p_info$sex[p_info$sex == 1])
n_female
#> [1] 251
n_female <- sum(p_info$sex == 1)     # alternative solution

pc_female <- n_female/n_total * 100  # compute percentage
pc_female
#> [1] 85.08475

## 3. How many participants are in an intervention group? ----- 
range(p_info$intervention)
#> [1] 1 4
hist(p_info$intervention, col = Seeblau)  # plots a histogram


n_i1 <- sum(p_info$intervention == 1)
n_i2 <- sum(p_info$intervention == 2)
n_i3 <- sum(p_info$intervention == 3)
n_i4 <- sum(p_info$intervention == 4)

n_treat <- n_i1 + n_i2 + n_i3
n_treat
#> [1] 222

# Check: All participants NOT in control group 4:
n_treat == (n_total - n_i4)
#> [1] TRUE

## 4. Education level: ----- 
hist(p_info$educ, col = Bordeaux)

mean(p_info$educ)
#> [1] 3.979661
n_educ_uni <- sum(p_info$educ >= 4)

pc_educ_uni <- n_educ_uni/n_total * 100
pc_educ_uni
#> [1] 74.91525

## 5. Age: -----
hist(p_info$age, col = Seegruen)

range(p_info$age)
#> [1] 18 83
mean(p_info$age)
#> [1] 43.75932
median(p_info$age)
#> [1] 44

## 6. Income: ----- 
hist(p_info$income, col = Pinky)

n_income_low <- sum(p_info$income < 2)
pc_income_low <- n_income_low/n_total * 100
pc_income_low
#> [1] 24.74576

Answers to Exercise 6:

  1. The p_info data contains 295 individuals.

  2. 85.08% of the participants are female.

  3. 222 of the participants are in 1 of the 3 treatment groups.

  4. 74.92% of the participants have a university degree.

  5. Participant’s age values range from 18 to 83 years. Their mean age is 43.76 years, their median age is 44 years.

  6. 24.75% of the participants state that their income is below average.

This concludes our first set of exercises on base R.

References

Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R., & Schüz, B. (2018). Data from “Web-based positive psychology interventions: A reexamination of effectiveness”. Journal of Open Psychology Data, 6(1). https://doi.org/10.5334/jopd.35