YaRrr! The Pirate’s Guide to R

18.6 Chapter 13: Hypothesis tests

Do male pirates have significantly longer beards than female pirates? Test this by conducting the appropriate test on the relevant data in the pirates dataset.

beard.sex.htest <- t.test(formula = beard.length ~ sex,
                           subset = sex %in% c("male", "female"),
                           data = pirates)

beard.sex.htest
## 
##  Welch Two Sample t-test
## 
## data:  beard.length by sex
## t = -70, df = 500, p-value <2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -20 -18
## sample estimates:
## mean in group female   mean in group male 
##                  0.4                 19.4

apa(beard.sex.htest)
## [1] "mean difference = 19.02, t(499.82) = -70.89, p < 0.01 (2-tailed)"

Answer: Yes, men have significantly longer beards than women, mean difference = 19.02, t(499.82) = -70.89, p < 0.01 (2-tailed)

Are pirates whose favorite pixar movie is Up more or less likely to wear an eye patch than those whose favorite pixar movie is Inside Out? Test this by conducting the appropriate test on the relevant data in the pirates dataset.

df <- subset(pirates, fav.pixar %in% c("Up", "Inside Out"))
pixar.ep.table <- table(df$fav.pixar, df$eyepatch)

pixar.ep.htest <- chisq.test(pixar.ep.table)
pixar.ep.htest
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  pixar.ep.table
## X-squared = 90, df = 1, p-value <2e-16

apa(pixar.ep.htest)
## [1] "X(1, N = 422) = 88.96, p < 0.01 (2-tailed)"

Answer: Yes, pirates whose favorite movie is Inside Out are much more likely to wear an eye patch than those whose favorite Pixar movie is Up, X(1, N = 422) = 88.96, p < 0.01 (2-tailed)

Do longer movies have significantly higher budgets than shorter movies? Answer this question by conducting the appropriate test in the movies dataset.

budget.time.htest <- cor.test(formula = ~ budget + time,
                              data = movies)

budget.time.htest
## 
##  Pearson's product-moment correlation
## 
## data:  budget and time
## t = 10, df = 2000, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.24 0.32
## sample estimates:
##  cor 
## 0.28

apa(budget.time.htest)
## [1] "r = 0.28, t(2313) = 14.09, p < 0.01 (2-tailed)"

Answer: Yes, longer movies tend to have higher budgets than shorter movies, r = 0.28, t(2313) = 14.09, p < 0.01 (2-tailed)

Do R rated movies earn significantly more money than PG-13 movies? Test this by conducting a the appropriate test on the relevant data in the movies dataset.

revenue.rating.htest <- t.test(formula = revenue.all ~ rating,
                               subset = rating %in% c("R", "PG-13"),
                               data = movies)

revenue.rating.htest
## 
##  Welch Two Sample t-test
## 
## data:  revenue.all by rating
## t = 10, df = 2000, p-value <2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  56 82
## sample estimates:
## mean in group PG-13     mean in group R 
##                 148                  80

apa(revenue.rating.htest)
## [1] "mean difference = -68.86, t(1779.2) = 10.67, p < 0.01 (2-tailed)"

Answer: No, R Rated movies do not earn significantly more than PG-13 movies. In fact, PG-13 movies earn significantly more than R rated movies.

Are certain movie genres significantly more common than others in the movies dataset?

genre.table <- table(movies$genre)
genre.htest <- chisq.test(genre.table)

genre.htest
## 
##  Chi-squared test for given probabilities
## 
## data:  genre.table
## X-squared = 6000, df = 10, p-value <2e-16

apa(genre.htest)
## [1] "X(13, N = 4682) = 6408.91, p < 0.01 (2-tailed)"

Answer: Yes, some movie genres are more common than others, X(13, N = 4682) = 6408.91, p < 0.01 (2-tailed)

Do sequels and non-sequels differ in their ratings?

genre.sequel.table <- table(movies$genre, movies$sequel)

genre.sequel.htest <- chisq.test(genre.sequel.table)
## Warning in chisq.test(genre.sequel.table): Chi-squared approximation may be
## incorrect

apa(genre.sequel.htest)
## [1] "X(13, N = 4669) = 387.17, p < 0.01 (2-tailed)"

Answer: Yes, sequels are more likely in some genres than others.

Note: The error “Warning in chisq.test” we get in this code is due to the fact that some cells have no entries. This can make the test statistic unreliable. You can correct it by adding a value of 20 to every element in the table as follows:

genre.sequel.table <- table(movies$genre, movies$sequel)

# Add 20 to each cell to correct for empty cells
genre.sequel.table <- genre.sequel.table + 20

# Here is the result
genre.sequel.table
##                      
##                          0    1
##   Action               550  178
##   Adventure            384  141
##   Black Comedy          54   20
##   Comedy              1078  172
##   Concert/Performance   34   20
##   Documentary           83   20
##   Drama               1077   46
##   Horror               235  105
##   Multiple Genres       21   20
##   Musical               92   25
##   Reality               22   20
##   Romantic Comedy      265   23
##   Thriller/Suspense    425   41
##   Western               57   21

# Run a chi-square test on the table
genre.sequel.htest <- chisq.test(genre.sequel.table)

# Print the result
genre.sequel.htest
## 
##  Pearson's Chi-squared test
## 
## data:  genre.sequel.table
## X-squared = 400, df = 10, p-value <2e-16