📌 Solutions for example tasks

This is where you’ll find solutions for all of the tutorials (mostly after we have discussed them in the seminar).

Solutions for Tutorial 3

Task 3.1

Create a subfolder called “data” in your current working environment. Download the text file “data_halloween.txt” (via Moodle/Data for R). Save the text file in the subfolder and try to load it into R as an object called data_halloween.

Solution:

data <- read.csv2("data/data_halloween.txt", sep = ",")

Task 3.2

In this subfolder called “data”, try to write out the file you just created called data_halloween as a .csv file. You may have to google for the right command.

Solution:

write.csv2(data, "data/data_export.csv")

Solutions for Tutorial 4

Task 4.1

Create a data frame called data. The data frame should contain the following variables (in this order):

a vector called food. It should contain 5 elements, namely the names of your five favourite dishes.
a vector called description. For every dish mentioned in food, please describe the dish in a single sentence (for instance, if the first food you describe is “pizza”, you could write: “This is an Italian dish, which I prefer with a lot of cheese.”)
a vector called rating. Rate every dish mentioned in food with 1-5 (using every number only once), i.e., by rating your absolute favorite dish out of all five with a 1 and your least favorite dish out of all five with a 5.

Solution:

data <- data.frame("food" = c("pizza", "pasta", "ice cream", "crisps", "passion fruit"),
                   "description" = c("Italian dish, I actually prefer mine with little cheese",
                                     "Another Italian dish",
                                     "The perfect snack in summer",
                                     "Potatoes and oil - a luxurious combination",
                                     "A fruit that makes me think about  vacation"),
                   "Rating" = c(3,1,2,4,5))
data

##            food                                             description Rating
## 1         pizza Italian dish, I actually prefer mine with little cheese      3
## 2         pasta                                    Another Italian dish      1
## 3     ice cream                             The perfect snack in summer      2
## 4        crisps              Potatoes and oil - a luxurious combination      4
## 5 passion fruit             A fruit that makes me think about  vacation      5

Task 4.2

Can you sort the data in your data set by rating - with your favorite dish (i.e., the one rated “1”) on top of the list and your least favourite dish (i.e., the one rated “5”) on the bottom?

Important: You do not yet know this command - you’ll have to google for the right solution. Please do and note down the exact search terms you used for googling, so we can discuss them next week.

Solution:

library("dplyr")
data <- data %>% 
  arrange(Rating)
data

##            food                                             description Rating
## 1         pasta                                    Another Italian dish      1
## 2     ice cream                             The perfect snack in summer      2
## 3         pizza Italian dish, I actually prefer mine with little cheese      3
## 4        crisps              Potatoes and oil - a luxurious combination      4
## 5 passion fruit             A fruit that makes me think about  vacation      5

Solutions for Tutorial 5

Task 5.1

Read the data set into R. Writing the corresponding R code, find out

how many observations and how many variables the data set contains.

Solution:

data <- read.csv2("data_halloween.txt", sep = ",")

#number of rows / observations
nrow(data)

## [1] 85

#number of columns / variables
ncol(data)

## [1] 13

Task 5.2

Writing the corresponding R code, find out

how many candy bars contain chocolate.
how many candy bars contain fruit flavor.

Solution:

table(data$chocolate)

## 
##  0  1 
## 48 37

table(data$fruity)

## 
##  0  1 
## 47 38

Task 5.3

Writing the corresponding R code, find out

the name(s) of candy bars containing both chocolate and fruit flavor.

Solution:

data %>% 
  
  #filter out candy bars containing both flavors
  filter(chocolate == 1 & fruity == 1) %>% 
  
  #choose only the variable including the name of the candy bar
  select(competitorname)

##   competitorname
## 1    Tootsie Pop

Task 5.4

Create a new data frame called data_new. Writing the corresponding R code,

reduce the data set only to observations containing chocolate but not caramel. The data set should also only include the variables competitorname and pricepercent.
round the variable pricepercent to two decimals.
sort the data by pricepercent in descending order, i.e., make sure that candy bars with the highest price are on top of the data frame and those with the lowest price on the bottom.

Solution:

data_new <- data %>% 
  
  #reduce to observations containing chocolate but **not** caramel
  filter(chocolate == 1 & caramel == 0) %>%
  
  #only include variables "competitorname" and "pricepercent"
  select(competitorname, pricepercent) %>%
  
  #round to two decimals
  mutate(pricepercent = as.numeric(pricepercent)) %>%
  mutate(pricepercent = round(pricepercent, 2)) %>%
  
  #sort by price
  arrange(desc(pricepercent))

Solutions for Tutorial 7

Task 7.1

Go to the Washington Post website. Using R code, download the content of the website.

Solution: (uploaded after the session)

session <- bow(url = "https://www.washingtonpost.com/",
               user_agent = "Teaching project, 
               Valerie Hase, 
               Department of Media and Communication,
               LMU Munich")
#Result
session

## <polite session> https://www.washingtonpost.com/
##     User-agent: Teaching project, 
##                Valerie Hase, 
##                Department of Media and Communication,
##                LMU Munich
##     robots.txt: 113 rules are defined for 8 bots
##    Crawl delay: 5 sec
##   The path is scrapable for this user-agent

#scrape
url <- scrape(session)

Task 7.2

Next, identify the headlines of all articles.

Solution: (uploaded after the session)

url %>%
  html_elements(".headline") %>%
  html_text() %>%
  head()

## [1] "In the next presidential election, some votes may matter more than others"     
## [2] "U.N. warns humanitarian efforts in ‘tatters’; Blinken criticizes civilian toll"
## [3] "Where should you live as you age? We asked 11 American seniors."               
## [4] "Will going outside in the cold with wet hair make you sick?"                   
## [5] "8 mindful practices to celebrate during Hanukkah "                             
## [6] "A father fears he’ll pass his body image issues on to his son"

Task 7.3

For the “front page” article (i.e., the first article on the page), can you identify its headline and the link to the article?

Solution: (uploaded after the session)

data <- tibble("headline" = url %>% 
                 html_element(".headline") %>%
                 html_text(),
               
               "link" = url %>% 
                 html_element(".headline") %>%
                 html_element("a") %>%
                 html_attr("href"))