Chapter 7 Base R Practice Problems

These 10 problems are just practice problems to enhance your understanding on Base R. You don’t need to submit anything on these 10 problems. Just understand and run codes by yourself.

  1. Calculate the sum of square of all the observations in the vector a from their mean.
# the original vector 
a <- c(1,2,3)
# deviations from the mean
# a is a vector and mean(a) is a sinlge number
# notice R used recycling rule to match different lengths
a-mean(a)
## [1] -1  0  1
# square of deviations
(a-mean(a))^2
## [1] 1 0 1
# sum of square 
sum((a-mean(a))^2)
## [1] 2
  1. Without using R, calculate the variance of a.
# variance = SS/(n-1), where SS = sum of square
# for a, n = 3
2/2
## [1] 1
# we can directly calculate variance using var()
var(a)
## [1] 1
  1. Without using R, what would be the fourth element of v1+v2?
# 9 since R recycled the shorter vector v2 
# This is what happened: c(4,5,6,7) + c(10,2,10,2)
v1 <- c(4,5,6,7)
v2 <- c(10,2)
v1 + v2
## [1] 14  7 16  9
  1. Without using R, what would be the result?
# sum() requires a numeric vector
# for that, TRUE is converted into 1, and FALSE is converted into 0
# essentially, this expression count "the number of TRUEs'
sum(c(TRUE, TRUE, FALSE, TRUE, FALSE))
## [1] 3
  1. In mtcars data, what is the mean (rounded to 2 decimal) of mpg for cars with 6 cylinders?
# $ is used to access a variable within a data.frame (or a list)
mpg <- mtcars$mpg
mpg
##  [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
## [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
## [31] 15.0 21.4
cyl <- mtcars$cyl
cyl
##  [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
# this comparison operation produces a logical vector
cyl==6
##  [1]  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
# logical indexing: only element corresponding to TRUE will be extracted
mpg[cyl==6]
## [1] 21.0 21.0 21.4 18.1 19.2 17.8 19.7
mean(mpg[cyl==6])
## [1] 19.74286
# if you become familiar with R, then you can do this with oneline
mean(mtcars$mpg[mtcars$cyl==6])
## [1] 19.74286
  1. Given the 1000000 random numbers from a standard normal distribution (a normal distribution with mean = 0 and sd = 1), what would be the proportion of random numbers greater than 0?
# have you seen 1.96 before?
set.seed(777)
a <- rnorm(1000000) # equivalent to rnorm(1000, 0, 1)
length(a[a>0])/length(a)
## [1] 0.500623
# alternative way to do the same thing
sum(a>0)/length(a)
## [1] 0.500623
  1. Given the 1000000 random numbers from a standard normal distribution (a normal distribution with mean = 0 and sd = 1), what would be the proportion of random numbers greater than 1.96?
# have you seen 1.96 before?
set.seed(777)
a <- rnorm(1000000) # equivalent to rnorm(1000, 0, 1)
length(a[a>1.96])/length(a)
## [1] 0.025198
# alternative way to do the same thing
sum(a>1.96)/length(a)
## [1] 0.025198
  1. Without using R, what would be the first element of the following expression?
# %in% operator 
# v1 %in% v2 returns a logical vector indicating 
# whether the elements of v1 are included in v2. 
c(1,2,3) %in% c(2,3,4,5,6)
## [1] FALSE  TRUE  TRUE
  1. Without using R, which element would be displayed first when displaying the level of the following factor?
# "medium" will be displayed first according to alphabetical order
# notice this order does not represent intrinsic order in factor
# this order only applies to display or sort
# we use ordered() to create ordered factor
a <- factor(c("high", "high", "medium", "low"))
a
## [1] high   high   medium low   
## Levels: high low medium
  1. The following code will produce error (TRUE or FALSE)?
# this code will produce error because a factor only allow some pre-specified values (i.e., levels)
a[1] <- "very high"