Chapter 4 Simulation-Based Hypothesis Tests

In a simulation-based hypothesis test, we test the null hypothesis of no relationship between one or more explanatory variables and a response variable.

4.1 Performing the Simulation-Based Test for Regression Coefficient

To test for a relationship between a response variable and a single quantitative explanatory variable, or a categorical variable with only two categories, we perform the following steps.

  1. Fit the model to the actual data and record the value of the regression coefficient \(b_1\), which describes the slope of the regression line (for a quantitative variable), or the difference between groups (for a categorical variable with 2 categories).
  2. Repeat the following steps many (say 10,000) times, using a "for" loop:

    • Randomly shuffle the values (or categories) of the explanatory variable to create a scenario where there is no systematic relationship between the explanatory and response variable.

    • Fit the model to the shuffled data, and record the value of \(b_1\).

  3. Observe how many of our simulations resulted in values of \(b_1\) as extreme as the one from the actual data. If this proportion is small, we have evidence that our result did not occur just by chance. If this value is large, it is plausible that the result we saw occurred by chance alone, and thus there is not enough evidence to say there is a relationship between the explanatory and response variables.

These steps are performed in the code below, using the example of price and acceleration time of 2015 cars.

4.1.1 Simulation-Based Hypothesis Test Example

Cars_M_A060  <- lm(data=Cars2015, LowPrice~Acc060)
b1 <- Cars_M_A060$coef[2] # record value of b1 from actual data
# perform simulation
b1Sim <- rep(NA, 10000)          # vector to hold results
ShuffledCars <- Cars2015    # create copy of dataset
for (i in 1:10000){
  #randomly shuffle acceleration times
ShuffledCars$Acc060 <- ShuffledCars$Acc060[sample(1:nrow(ShuffledCars))] 
ShuffledCars_M<- lm(data=ShuffledCars, LowPrice ~ Acc060)   #fit model to shuffled data
b1Sim[i] <- ShuffledCars_M$coef[2]  # record b1 from shuffled model
}
Cars_A060SimulationResults <- data.frame(b1Sim)  #save results in dataframe

Now that we've performed the simulation, we'll display a histogram of the sampling distribution for \(b_1\) when the null hypothesis of no relationship between price and acceleration time is true.

b1 <- Cars_M_A060$coef[2] # record value of b1 from actual data
Cars_A060SimulationResultsPlot <- ggplot(data=Cars_A060SimulationResults, aes(x=b1Sim)) + 
  geom_histogram(fill="lightblue", color="white") + 
  geom_vline(xintercept=c(b1, -1*b1), color="red") + 
  xlab("Simulated Value of b1") + ylab("Frequency") + 
  ggtitle("Distribution of b1 under assumption of no relationship")
Cars_A060SimulationResultsPlot

We can calculate the p-value by finding the proportion of simulations with \(b_1\) values more extreme than the observed value of \(b_1\), in absolute value.

mean(abs(b1Sim) > abs(b1))
## [1] 0

4.2 Simulation-Based F-Test

When testing for a relationship between the response variable and a categorical variable with more than 2 categories, we first use the F-statistic to capture the maginitude of differences between groups. The process is similar to the one above, with the exception that we recored the value of F, rather than \(b_1\).

4.2.1 Simulation-Based F-Test Example

Cars_M_Size <- lm(data=Cars2015, LowPrice~Size)
Fstat <- summary(Cars_M_Size)$fstatistic[1] # record value of F-statistic from actual data
# perform simulation
FSim <- rep(NA, 10000)          # vector to hold results
ShuffledCars <- Cars2015    # create copy of dataset
for (i in 1:10000){
  #randomly shuffle acceleration times
ShuffledCars$Size <- ShuffledCars$Size[sample(1:nrow(ShuffledCars))] 
ShuffledCars_M<- lm(data=ShuffledCars, LowPrice ~ Size)   #fit model to shuffled data
FSim[i] <- summary(ShuffledCars_M)$fstatistic[1]  # record F from shuffled model
}
CarSize_SimulationResults <- data.frame(FSim)  #save results in dataframe

Create the histogram of the sampling distribution for \(F\).

CarSize_SimulationResults_Plot <- ggplot(data=CarSize_SimulationResults, aes(x=FSim)) + 
  geom_histogram(fill="lightblue", color="white") +  
  geom_vline(xintercept=c(Fstat), color="red") + 
  xlab("Simulated Value of F") + ylab("Frequency") +  
  ggtitle("Distribution of F under assumption of no relationship")
CarSize_SimulationResults_Plot

Calculate the p-value:

mean(FSim > Fstat)
## [1] 1e-04

4.3 Testing for Differences Between Groups

If we find differences in the F-statistic, we should test for differences between individual groups, or categories.

In this example, those are given by \(b_1\), \(b_2\), and \(b_1-b_2\). When considering p-values, keep in mind that we are perfoming multiple tests simultaneously, so we should use a multiple-testing procedure, such as the Bonferroni correction.

4.3.1 Differences Between Groups Example

b1 <- Cars_M_Size$coefficients[2]  #record b1 from actual data
b2 <- Cars_M_Size$coefficients[3]  #record b2 from actual data
# perform simulation
b1Sim <- rep(NA, 10000)          # vector to hold results
b2Sim <- rep(NA, 10000)          # vector to hold results
ShuffledCars <- Cars2015    # create copy of dataset
for (i in 1:10000){
  #randomly shuffle acceleration times
ShuffledCars$Size <- ShuffledCars$Size[sample(1:nrow(ShuffledCars))] 
ShuffledCars_M<- lm(data=ShuffledCars, LowPrice ~ Size)   #fit model to shuffled data
b1Sim[i] <- ShuffledCars_M$coefficients[2]   # record b1 from shuffled model
b2Sim[i] <- ShuffledCars_M$coefficients[3]   # record b2 from shuffled model
}
Cars_Size2_SimulationResults <- data.frame(b1Sim, b2Sim)  #save results in dataframe

Sampling Distribution for \(b_1\)

Cars_Size2_SimulationResultsPlot_b1 <- ggplot(data=Cars_Size2_SimulationResults, aes(x=b1Sim)) + 
  geom_histogram(fill="lightblue", color="white") + 
  geom_vline(xintercept=c(b1, -1*b1), color="red") + 
  xlab("Simulated Value of b1") + ylab("Frequency") + 
  ggtitle("Large vs Midsize Cars: Distribution of b1 under assumption of no relationship")
Cars_Size2_SimulationResultsPlot_b1

p-value:

mean(abs(b1Sim)>abs(b1))
## [1] 0.0226

Sampling Distribution for \(b_2\)

Cars_Size2_SimulationResultsPlot_b2 <- ggplot(data=Cars_Size2_SimulationResults, aes(x=b2Sim)) + 
  geom_histogram(fill="lightblue", color="white") + 
  geom_vline(xintercept=c(b2, -1*b2), color="red") + 
  xlab("Simulated Value of b2") + ylab("Frequency") + 
  ggtitle("Large vs Small Cars: Distribution of b2 under assumption of no relationship")
Cars_Size2_SimulationResultsPlot_b2

p-value:

mean(abs(b2Sim)>abs(b2))
## [1] 0

Sampling Distribution for \(b_1 - b_2\)

Cars_Size2_SimulationResultsPlot_b1_b2 <- ggplot(data=Cars_Size2_SimulationResults, 
                                                 aes(x=b1Sim-b2Sim)) + 
  geom_histogram(fill="lightblue", color="white") + 
  geom_vline(xintercept=c(b1-b2, -1*(b1-b2)), color="red") + 
  xlab("Simulated Value of b1-b2") + ylab("Frequency") + 
  ggtitle("Small vs Midsize Cars: Distribution of b1-b2 under assumption of no relationship")
Cars_Size2_SimulationResultsPlot_b1_b2

p-value:

mean(abs(b1Sim-b2Sim)>abs(b1-b2))
## [1] 0.0629

4.4 "Theory-Based" Tests and Intervals in R

The quantity Pr(>|t|) in the coefficients table contains p-values pertaining to the test of the null hypothesis that that parameter is 0.

The confint() command returns confidence intervals for the regression model parameters.

The aov() command displays the F-statistic and p-value, as well as the sum of squared residuals and sum of squares explained by the model.

4.4.1 Example "Theory-Based" Test and Interval

summary(Cars_M_A060)
## 
## Call:
## lm(formula = LowPrice ~ Acc060, data = Cars2015)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.512  -6.544  -1.265   4.759  27.195 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  89.9036     5.0523   17.79   <2e-16 ***
## Acc060       -7.1933     0.6234  -11.54   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.71 on 108 degrees of freedom
## Multiple R-squared:  0.5521, Adjusted R-squared:  0.548 
## F-statistic: 133.1 on 1 and 108 DF,  p-value: < 2.2e-16
confint(Cars_M_A060, level=0.95)
##                 2.5 %    97.5 %
## (Intercept) 79.888995 99.918163
## Acc060      -8.429027 -5.957651

4.4.2 Example Theory-Based F-Test

summary(Cars_M_A060)
## 
## Call:
## lm(formula = LowPrice ~ Acc060, data = Cars2015)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.512  -6.544  -1.265   4.759  27.195 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  89.9036     5.0523   17.79   <2e-16 ***
## Acc060       -7.1933     0.6234  -11.54   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.71 on 108 degrees of freedom
## Multiple R-squared:  0.5521, Adjusted R-squared:  0.548 
## F-statistic: 133.1 on 1 and 108 DF,  p-value: < 2.2e-16
Cars_A_Size <- aov(data=Cars2015, LowPrice~Size)
summary(Cars_A_Size)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Size          2   4405  2202.7   10.14 9.27e-05 ***
## Residuals   107  23242   217.2                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(Cars_A_Size)
##                  2.5 %    97.5 %
## (Intercept)   36.88540 47.736327
## SizeMidsized -16.48366 -1.713069
## SizeSmall    -22.55789 -8.759618