Section 6 March 23rd, 2023
6.1 Welcome!
As a reminder:
CPR = Copy, paste, and run
The group typically independently comes together for dedicated time to use R.
Everybody can work at their own pace or in groups, etc.
Senior members help support junior members
When it doubt: Google
- Being able to identify a solution online is a unique skill.
Have fun
This week we will use data about horror movies from TidyTuesday..
You can CPR the the following to get your data, which will be called horror_movies
:
<- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-01/horror_movies.csv') horror_movies
The following packages will likely be helpful for you today:
6.2 Some basic stuff
Get the descriptive statistics for movie revenue and runtime. Get means, mode, median, sd, min, and max.
What movie has the longest runtime?
What movie has the highest average rating for movies with at least 1,000 votes?
Plot a histogram of movie ratings. Exclude movies with no votes.
What movie with at least 1,000 votes got the lowest?
Who starred in that movie?
Plot a scatterplot of budget and average vote average for movies with at least 5’000 vote and were released originally in english.
Think there’s a correlation between the above data? Run it (only on the stipulations in 8).
Consider only nightmare on elm street movies (collection == 8581
). Plot the release dates and revenue of those movies.
Do the same thing for the saw movies. But, plot them on the same graph.
Run a t-test on the collections to determine if they a statistically significant different amount of venue. Use Welch’s t-test.