Graded Task in R (I)

Graded Task 1.1

Writing the corresponding R code, create a data frame with the name dataframe. The data frame should include 20 observations and the following three variables:

as its first variable, a vector called numbers consisting of all numbers from 1 to 20.
as its second variable, a vector called letters consisting of all letters from a to t.
as its third variable, a vector called sentence consisting of sentence “This is a task”, repeated 20 times.

The final data frame should look like this:

dataframe

##    numbers letters       sentence
## 1        1       a This is a task
## 2        2       b This is a task
## 3        3       c This is a task
## 4        4       d This is a task
## 5        5       e This is a task
## 6        6       f This is a task
## 7        7       g This is a task
## 8        8       h This is a task
## 9        9       i This is a task
## 10      10       j This is a task
## 11      11       k This is a task
## 12      12       l This is a task
## 13      13       m This is a task
## 14      14       n This is a task
## 15      15       o This is a task
## 16      16       p This is a task
## 17      17       q This is a task
## 18      18       r This is a task
## 19      19       s This is a task
## 20      20       t This is a task

Graded Task 1.2

Writing the corresponding R code, create a new vector called numbers consisting of all numbers from 1 to 100. Replace the first and the last element of the vector with the number 0. Afterwards, divide every single value in the vector by 2.

The final vector should look like this:

numbers

##   [1]  0.0  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0  9.5 10.0 10.5 11.0 11.5
##  [24] 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0
##  [47] 23.5 24.0 24.5 25.0 25.5 26.0 26.5 27.0 27.5 28.0 28.5 29.0 29.5 30.0 30.5 31.0 31.5 32.0 32.5 33.0 33.5 34.0 34.5
##  [70] 35.0 35.5 36.0 36.5 37.0 37.5 38.0 38.5 39.0 39.5 40.0 40.5 41.0 41.5 42.0 42.5 43.0 43.5 44.0 44.5 45.0 45.5 46.0
##  [93] 46.5 47.0 47.5 48.0 48.5 49.0 49.5  0.0

Graded Task 1.3

Writing the corresponding R code, create a vector called numbers2 consisting of all numbers from 1 to 100. Next, replace every third value in numbers2 with a 2.

The final vector should look like this:

numbers2

##   [1]   1   2   2   4   5   2   7   8   2  10  11   2  13  14   2  16  17   2  19  20   2  22  23   2  25  26   2  28
##  [29]  29   2  31  32   2  34  35   2  37  38   2  40  41   2  43  44   2  46  47   2  49  50   2  52  53   2  55  56
##  [57]   2  58  59   2  61  62   2  64  65   2  67  68   2  70  71   2  73  74   2  76  77   2  79  80   2  82  83   2
##  [85]  85  86   2  88  89   2  91  92   2  94  95   2  97  98   2 100

Graded Task 1.4

For the next tasks, we’ll again use the data set “data_tutorial6.txt” (via OLAT/Materials/Data for R). The data set has already been introduced and explained in Tutorial 6: Control structures & functions in R, so you should know it already.

The data set consists of data that is completely made up - a survey with 1000 citizens in Europe.

The data file “data_tutorial6.txt” is structured as follows:

Each row contains the answer for a single citizen.
Each column contains all values given by citizens for a single variable.

The five variables included here are:

country: the country in which each citizen was living at the time of the survey (France/Germany/Italy/Switzerland)
date: the date on which each citizen was surveyed (from 2021-09-20 to 2021-10-03)
gender: each citizen’s gender (female/male/NA)
trust_politics: how much each citizen trusts the political system (from 1 = no trust at all to 4 = a lot of trust)
trust_news_media: how much each citizen trusts the news media (from 1 = no trust at all to 4 = a lot of trust)

Read in the data set:

data <- read.csv2("data_tutorial 6.txt")

This is how the data looks like in R:

head(data)

##       country       date gender trust_politics trust_news_media
## 1     Germany 2021-09-20 female              3                1
## 2 Switzerland 2021-10-02   male              2                1
## 3      France 2021-09-21   <NA>              1                3
## 4       Italy 2021-10-03   male              2                2
## 5     Germany 2021-09-21 female              3                1
## 6 Switzerland 2021-09-20   male              1                2

Writing the corresponding R code, create a new data frame called data_new consisting only of those observations where respondents were interviewed in October (not September) and were respondents indicated that their gender was either male or female (i.e., did not reply with NA).

Moreover, the data frame data_new should only include the variables “date”, “gender”, “trust_politics”, and “trust_news_media”.

After doing so, the data frame data_new should look like this:

head(data_new)

##         date gender trust_politics trust_news_media
## 1 2021-10-02   male              2                1
## 2 2021-10-03   male              2                2
## 3 2021-10-02   male              4                4
## 4 2021-10-02 female              1                4
## 5 2021-10-03   male              4                1
## 6 2021-10-02 female              4                3

str(data_new)

## 'data.frame':    253 obs. of  4 variables:
##  $ date            : chr  "2021-10-02" "2021-10-03" "2021-10-02" "2021-10-02" ...
##  $ gender          : chr  "male" "male" "male" "female" ...
##  $ trust_politics  : int  2 2 4 1 4 4 3 2 3 4 ...
##  $ trust_news_media: int  1 2 4 4 1 3 3 4 1 4 ...

Graded Task 1.5

Writing the corresponding R code, generate the following descriptive statistics on the full data set data (i.e., including all N = 1,000 observations) for the variable gender:

the absolute number of female respondents, the absolute number of male respondents, and the absolute number of respondents with a missing value for gender.
the relative percentage of female respondents and the relative percentage of male respondents (when excluding NAs). Thus, the relative percentage of female and male respondents should sum up to 100%.

Graded Task 1.6

Writing the corresponding R code, write your own custom function with the name stats_helper.

The function should only need one argument called x for which it should be able to execute the following task:

Given a vector x with numeric data, the function should paste the following sentence: “This variable has a mean of M = XY with a standard deviation of SD = XY. In total, XY out of N = XY observations are missing.”

Important: The values XY should be replaced with whatever mean, standard deviation, missing values, and total observations of x has (rounded to two decimals). That is, the function should calculate these values on its own for any x it is given.

Below, you see what the function should do when tested for two variables from the data frame data:

#When testing the function
stats_helper(x = data$trust_politics)

## [1] "This variable has a mean of M = 2.54 with a standard deviation of SD = 1.1. In total, 0 out of N = 1000 observations are missing."

stats_helper(x = data$trust_news_media)

## [1] "This variable has a mean of M = 2.46 with a standard deviation of SD = 1.12. In total, 0 out of N = 1000 observations are missing."