Part B

In this part you should select five countries of your liking from your data set, including Sweden, but not including China, Germany, Italy or the United States of America. You can use the example code below, but replace the placeholder code with your chosen countries.


\newpage

# --- PART B ---

```{r}

# Select Countries
Country1<-"Sweden"
Country2<-"Austria"
Country3<-"Estonia"
Country4<-"Ireland"
Country5<-"Spain"

cat(Country1)
cat(Country2)
cat(Country3)
cat(Country4)
cat(Country5)

```

Comment briefly on why you chose these 5 countries.

To find more background to the different statistics and graphics below, you can refer to the appendix.

B1 Descriptive Statistics

Calculate the number of (non-missing) dates, minimum, maximum, median, quartiles, mean and standard deviation for daily new COVID-19 cases for the whole time period, for each of your five countries. Report one table with a column for each country. You can use the code below, but change the code so that the correct variable, New Cases, is used.


## Descriptive Statistics

```{r}

# Pick the column of the variable you want to examine
column<-8

# Filter data
data1<-data[data$location==Country1, column]
data2<-data[data$location==Country2, column]
data3<-data[data$location==Country3, column]
data4<-data[data$location==Country4, column]
data5<-data[data$location==Country5, column]

# Create functions
functions<-function(x){
    list("N"            = sum (!is.na(x)),
         "Min"          = min (x, na.rm = TRUE),
         "Max"          = max (x, na.rm = TRUE),
         "1st Quartile" = format(round(as.vector(quantile(x, probs = 0.25, na.rm =      TRUE)),2),nsmall=2),
         "Median"       = format(round(median (x, na.rm = TRUE),2),nsmall=2),         
         "3rd Quartile" = format(round(as.vector(quantile(x, probs = 0.75, na.rm = TRUE)),2),nsmall=2),
         "Mean"         = format(round(mean(x, na.rm = TRUE),2),nsmall=2),
         "Stdev"        = format(round(sd  (x, na.rm = TRUE),2),nsmall=2))}

# Apply functions to data
column1<-sapply(data1, functions)
column2<-sapply(data2, functions)
column3<-sapply(data3, functions)
column4<-sapply(data4, functions)
column5<-sapply(data5, functions)

# Create table
table<-cbind(column1,column2,column3,column4,column5)

# Format table
colnames(table)<-c(Country1,Country2,Country3,Country4,Country5)

# Print table
kable(table,
      format="latex",
      caption="Daily New COVID-19 Cases",
      align=rep('r',5),
      booktabs=TRUE)%>%
kable_styling(latex_options = 
                c("striped", "hold_position"))

```

Compare briefly the spread of COVID-19 in the countries.

B2 Histograms

Draw histograms for the number of daily new COVID-19 cases for the whole time period, one plot for each of your five countries. You can use the code below.


\newpage

## Histograms

```{r, out.width=c('50%', '50%'), fig.show='hold'}

# Filter data
data1<-data[data$location==Country1,]
data2<-data[data$location==Country2,]
data3<-data[data$location==Country3,]
data4<-data[data$location==Country4,]
data5<-data[data$location==Country5,]

# Specify plot margins
margins<-par(mar=c(10,5,2,2))

barplot(height=data1$new_cases, 
        main=Country1,
        names= data1$date,
        las=2)

barplot(height=data2$new_cases, 
        main=Country2,
        names= data2$date,
        las=2)

barplot(height=data3$new_cases, 
        main=Country3,
        names= data3$date,
        las=2)

barplot(height=data4$new_cases, 
        main=Country4,
        names= data4$date,
        las=2)

barplot(height=data5$new_cases, 
        main=Country5,
        names= data5$date,
        las=2)


# Remove margins from workspace
rm(margins)
```

Compare briefly.

B3 Box Plots

Draw box plots for the number of daily new COVID-19 cases for the whole time period, one plot for each of your five countries, but in the same diagram (only one scale/axis for all plots). One plot should be with a regular scale and the other with logarthmic scale. Use the code below and just adjust to logarithmic scale by removing the appropriate # in the code.


\newpage

## Box Plots

```{r, out.width=c('50%', '50%'), fig.show='hold'}

# Filter data
data<-data[ data$location==Country1|
            data$location==Country2|
            data$location==Country3|
            data$location==Country4|
            data$location==Country5,]

# Plot margins
m<-par(mar=c(5,10,2,2))

# Make a boxplot of daily cases

boxplot(data$new_cases~data$location,
        horizontal=TRUE,
        las=1,
        main="Daily COVID-19 Cases",
        xlab="",
        ylab="")


# Make a boxplot of daily cases with log scale

p <- data[,6]
q <- data.frame("new_cases2"= rep(1/10,nrow(p)))
t <- data.frame("A" = p, "B" = q)
names(t)[1] <- "A"
names(t)[2] <- "B"
t$new_cases <- apply(t, 1, max)
data[,6] <- t[,3]

boxplot(data$new_cases~data$location,
        horizontal=TRUE,
        #log = "x",
        las=1,
        main="Daily COVID-19 Cases log scale",
        xlab="",
        ylab="")

# Apply margins
rm(m)
```

Compare briefly.

B4 Time Series Plots

Draw four time series plots over the whole time period, each plot with five countries. You should plot the following four variables over time: (1) Daily new COVID-19 cases, (2) total cases per million capita, (3) total deaths per million capita, and (4) total vaccinations (per 100). Use the code below and adjust the relevant y variables to get the correct graphs.


\newpage

## Time Series Plots

```{r, out.width=c('50%', '50%'), fig.show='hold'}

# We add a variable to our data
data["people_fully_vaccinated_per_100"] <- round(100*data[,18]/data[,13],2)

# Specify plot margins
m<-par(mar=c(10,5,2,2))

#change the "y=______" to get the relevant graphs

#Plot new cases over time 
data %>%
  ggplot(aes(x=date, y=new_cases, group=location, color=location)) +
    geom_line(na.rm=TRUE) +
      ggtitle("New Cases") + 
        xlab("Date") + ylab("Cases") +
          labs(colour = "Country")
    
#Plot total cases per million over time
data %>%
  ggplot(aes(x=date, y=new_cases, group=location, color=location)) +
    geom_line(na.rm=TRUE) + 
      ggtitle("Total Cases per million Capita") + 
        xlab("Date") + ylab("Total Cases / mCap") +
          labs(colour = "Country")

#Plot total deaths per million over time 
data %>%
  ggplot(aes(x=date, y=new_cases, group=location, color=location)) +
    geom_line(na.rm=TRUE) +
      ggtitle("Total Deaths per million Capita") + 
        xlab("Date") + ylab("Total Deaths / mCap") +
          labs(colour = "Country")

#Plot total vaccinations per 100 over time 
data %>%
  ggplot( aes(x=date, y=new_cases, group=location, color=location)) +
    geom_line(na.rm=TRUE) +
      ggtitle("Fully Vaccinated per hundred Capita in 2021") + 
        xlab("Date") + ylab("Fully Vaccinated / 100 people") +
          labs(colour = "Country") +
            scale_x_date(limit=c(as.Date("2021-01-01"),as.Date("2022-08-30")))

# Remove margins from workspace
rm(m)
```

Make a brief comment about the plots. What can you say about the spread of COVID-19?

B5 Pie Charts

Draw a pie chart for the total number of COVID-19 cases for your five countries for 2022-08-15. Repeat for total deaths and for total vaccinations. Remember to insert the correct date into the code.


\newpage

## Pie Charts

```{r, out.width=c('50%', '50%'), fig.show='hold'}

#Pick a date
date1<-"2020-02-01"

datax<-data[ data$date==date1,]
lbls <- datax$location
cases <- datax$total_cases
deaths <- datax$total_deaths
vaccs <- datax$total_vaccinations
pctc <- round(cases/sum(cases)*100)
pctd <- round(deaths/sum(deaths)*100)
pctv <- round(vaccs/sum(vaccs)*100)
lblsc <- paste(lbls, pctc)
lblsd <- paste(lbls, pctd)
lblsv <- paste(lbls, pctv)
lblsc <- paste(lblsc,"%",sep="")
lblsd <- paste(lblsd,"%",sep="")
lblsv <- paste(lblsv,"%",sep="")
pie(cases, labels = lblsc, main="Total Cases")

```

Compare briefly.