Solutions for tasks in tutorials

This is where you’ll find solutions for all of the tutorials (mostly after we have discussed them in the seminar).

Solutions for Tutorial 3

Task 3.1

Create a data frame called data. The data frame should contain the following variables (in this order):

  • a vector called food. It should contain 5 elements, namely the names of your five favourite dishes.
  • a vector called description. For every dish mentioned in food, please describe the dish in a single sentence (for instance, if the first food you describe is “pizza”, you could write: “This is an Italian dish, which I prefer with a lot of cheese.”)
  • a vector called rating. Rate every dish mentioned in food with 1-5 (using every number only once), i.e., by rating your absolute favorite dish out of all five with a 1 and your least favorite dish out of all five with a 5.

Solution:

data <- data.frame("food" = c("pizza", "pasta", "ice cream", "crisps", "passion fruit"),
                   "description" = c("Italian dish, I actually prefer mine with little cheese",
                                     "Another Italian dish",
                                     "The perfect snack in summer",
                                     "Potatoes and oil - a luxurious combination",
                                     "A fruit that makes me think about  vacation"),
                   "Rating" = c(3,1,2,4,5))
data
##            food                                             description Rating
## 1         pizza Italian dish, I actually prefer mine with little cheese      3
## 2         pasta                                    Another Italian dish      1
## 3     ice cream                             The perfect snack in summer      2
## 4        crisps              Potatoes and oil - a luxurious combination      4
## 5 passion fruit             A fruit that makes me think about  vacation      5

Task 3.2

Can you sort the data in your data set by rating - with your favorite dish (i.e., the one rated “1”) on top of the list and your least favourite dish (i.e., the one rated “5”) on the bottom?

Important: You do not yet know this command - you’ll have to google for the right solution. Please do and note down the exact search terms you used for googling, so we can discuss them next week.

Solution:

There’s a couple of solutions as we saw in the last class - this is just one!

library("dplyr")
data <- data%>%arrange(Rating)
data
##            food                                             description Rating
## 1         pasta                                    Another Italian dish      1
## 2     ice cream                             The perfect snack in summer      2
## 3         pizza Italian dish, I actually prefer mine with little cheese      3
## 4        crisps              Potatoes and oil - a luxurious combination      4
## 5 passion fruit             A fruit that makes me think about  vacation      5

Solutions for Tutorial 5

Task 5.1

Read the data set into R. Writing the corresponding R code, find out

  • how many observations and how many variables the data set contains.

Solution:

data <- read.csv2("data_tutorial4.txt", sep = ",")
#number of rows / observations
nrow(data)
## [1] 85
#number of columns / variables
ncol(data)
## [1] 13

Task 5.2

Writing the corresponding R code, find out

  • how many candy bars contain chocolate.
  • how many candy bars contain fruit flavor.

Solution:

table(data$chocolate)
## 
##  0  1 
## 48 37
table(data$fruity)
## 
##  0  1 
## 47 38

Task 5.3

Writing the corresponding R code, find out

  • the name(s) of candy bars containing both chocolate and fruit flavor.

Solution:

#Solution 1: base R
data$competitorname[data$chocolate==1 & data$fruity==1]
## [1] "Tootsie Pop"
#Solution 2: dplyr
data %>% filter(chocolate==1 & fruity == 1) %>% select(competitorname)
##   competitorname
## 1    Tootsie Pop

Task 5.4

Create a new data frame called data_new. Writing the corresponding R code,

  • reduce the data set only observations containing chocolate but not caramel. The data set should also only include the variables competitorname and pricepercent.
  • round the variable pricepercent to two decimals.
  • sort the data by pricepercent in descending order, i.e., make sure that candy bars with the highest price are on top of the data frame and those with the lowest price on the bottom.

Solution:

#Solution 1: base R
data_new <- data[data$chocolate == 1 & data$caramel==0,]
data_new$pricepercent <- round(as.numeric(data_new$pricepercent),2)
data_new[order(data_new$pricepercent, decreasing = TRUE),c("competitorname", "pricepercent")]
##                 competitorname pricepercent
## 63             Nestle Smarties         0.98
## 24           Hershey's Krackel         0.92
## 25    Hershey's Milk Chocolate         0.92
## 26      Hershey's Special Dark         0.92
## 41                 Mr Good Bar         0.92
## 40                      Mounds         0.86
## 85                    Whoppers         0.85
## 6                   Almond Joy         0.77
## 43         Nestle Butterfinger         0.77
## 44               Nestle Crunch         0.77
## 33         Peanut butter M&M's         0.65
## 34                       M&M's         0.65
## 48                 Peanut M&Ms         0.65
## 53   Reese's Peanut Butter cup         0.65
## 54              Reese's pieces         0.65
## 55 Reese's stuffed with pieces         0.65
## 2                 3 Musketeers         0.51
## 11             Charleston Chew         0.51
## 28                Junior Mints         0.51
## 29                     Kit Kat         0.51
## 76        Tootsie Roll Juniors         0.51
## 75                 Tootsie Pop         0.32
## 78     Tootsie Roll Snack Bars         0.32
## 52          Reese's Miniatures         0.28
## 23            Hershey's Kisses         0.09
## 60                     Sixlets         0.08
## 77        Tootsie Roll Midgies         0.01
#Solution 1: dplyr
data %>% filter(chocolate == 1 & caramel == 0) %>%
  select(competitorname, pricepercent) %>%
  mutate(pricepercent = as.numeric(pricepercent)) %>%
  mutate(across(2, round, 2)) %>%
  arrange(desc(pricepercent))
##                 competitorname pricepercent
## 1              Nestle Smarties         0.98
## 2            Hershey's Krackel         0.92
## 3     Hershey's Milk Chocolate         0.92
## 4       Hershey's Special Dark         0.92
## 5                  Mr Good Bar         0.92
## 6                       Mounds         0.86
## 7                     Whoppers         0.85
## 8                   Almond Joy         0.77
## 9          Nestle Butterfinger         0.77
## 10               Nestle Crunch         0.77
## 11         Peanut butter M&M's         0.65
## 12                       M&M's         0.65
## 13                 Peanut M&Ms         0.65
## 14   Reese's Peanut Butter cup         0.65
## 15              Reese's pieces         0.65
## 16 Reese's stuffed with pieces         0.65
## 17                3 Musketeers         0.51
## 18             Charleston Chew         0.51
## 19                Junior Mints         0.51
## 20                     Kit Kat         0.51
## 21        Tootsie Roll Juniors         0.51
## 22                 Tootsie Pop         0.32
## 23     Tootsie Roll Snack Bars         0.32
## 24          Reese's Miniatures         0.28
## 25            Hershey's Kisses         0.09
## 26                     Sixlets         0.08
## 27        Tootsie Roll Midgies         0.01