Chapter 7 Simulationbased Inference
The type of inference you will perform depends on the type of data you are analyzing.
response variable (def) is the variable we think is being impacted or changed by the explanatory variable.
explanatory variable (def) is the variable we think is “explaining” the change in the response variable.
Here are the 7 general steps to performing an inference procedure using simulation techniques:
Step 








7.1 OneProportion Inference
Step  Example 


Less than 50% of New York City Voters will support some policy 

1000 voters in NYC 

Let pi be the proportion of NYC voters who support the policy. H0: pi = 0.50; Ha: pi < 0.50 

phat = 460/1000, or 46% 

Simulate a oneproportion inference n = 1000, observed = 460 

Onesided pvalue = 0.002 

Since the pvalue is so small, we reject the null hypothesis and we have evidence to believe that less than 50% of NYC voters support the policy 
7.1.1 When do we use oneproportion inference?
When we want to test the claim that a population proportion is not equal to some fixed percentage. For example, the proportion of women within some church congregation is greater than 55%.
\(H_0: \pi = 0.55\)
\(H_a: \pi > 0.55\)
Test statistic: we use the observed proportion from the sample. In this case, we identify a sample of church members and count how many women there are. Let’s say there are 62 out of 100. We say \(\hat{p} = 0.62\)
Let’s develop an algorithm using simulation to replicate the OneProportion applet found at link.
We will test the above claim that the proportion of women in the church congregation is greater than 55% by simulating a large number of trials of flipping n coins:
< 0.55 # probability of success for each toss
pi < 100 # Number of times we toss the penny (sample size)
n < 1000 # Number of trials (number of samples)
trials
< 62 # Observed number of heads
observed
= observed / n # phat  the observed proportion of heads
phat
< do(trials) * rflip(n, prob = pi )
data.sim
head(data.sim)
## n heads tails prop
## 1 100 62 38 0.62
## 2 100 53 47 0.53
## 3 100 50 50 0.50
## 4 100 44 56 0.44
## 5 100 54 46 0.54
## 6 100 50 50 0.50
If we consider the above distribution to be our null distribution, we can now consider the likelihood that we would observe a value “as extreme as” our test statistic  62 heads in this case.
histogram(~prop, data = data.sim,
v = phat,
width = 0.025,
xlab = "Proportion of women",
main = "Null hypothesis distribution",
groups = prop >= phat)
To compute a pvalue, we simply need to count the number of times, in our null distribution, we observed a value as extreme (or more so) as the test statistic, phat.
if(observed > pi * n) {
< sum(data.sim$prop >= phat) / trials
pvalue else {
} < sum(data.sim$prop <= phat) / trials
pvalue
}
paste("Onesided pvalue is", pvalue)
## [1] "Onesided pvalue is 0.095"
paste("Twosided pvalue is", 2 * pvalue)
## [1] "Twosided pvalue is 0.19"
Conclusion: if the pvalue is less than the level of significance, we REJECT H0 and conclude that there is evidence to support the alternative. In this case, we have a rather large pvalue (0.095), so we FAIL TO REJECT H0. There is not evidence to support the above claim.
7.1.2 OneProportion Inference Code
library(mosaic)
< 0.55 # probability of success for each toss
pi < 100 # Number of times we toss the penny (sample size)
n < 1000 # Number of trials (number of samples)
trials
< 62 # Observed number of heads
observed
= observed / n # phat  the observed proportion of heads
phat
< do(trials) * rflip(n, prob = pi )
data.sim
histogram(~prop, data = data.sim,
v = phat,
width = 0.025,
xlab = "Proportion",
main = "Null hypothesis distribution",
groups = prop >= phat)
if(observed > pi * n) {
< sum(data.sim$prop >= phat) / trials
pvalue else {
} < sum(data.sim$prop <= phat) / trials
pvalue
}
paste("Onesided pvalue is", pvalue)
paste("Twosided pvalue is", 2 * pvalue)
7.2 OneMean Inference
Here is the procedure for using a Confidence Interval to analyze a single mean.
Step  Example 


According to a 2000 report, the average number of flipflops owned by a typical college student in the US is 3 

A bunch of students in a data science class 

Let mu be the average number of flipflops owned by a college student. H0: mu = 3; Ha: mu > 3 

xbar (sample mean) = 5.26 

Simulate a onemean inference, 95% confidence interval is (4.3, 6.32) 

Since our sample mean falls within the interval, it is a plausible value for mu 

We fail to reject the null hypothesis and do not have evidence to believe that the number of flipflops owned by a typical college student is greater than 3. 
Step  Example 


According to a 2000 report, the average number of flipflops owned by a typical college student in the US is 3 

A bunch of students in a data science class 

Let mu be the average number of flipflops owned by a college student. H0: mu = 3; Ha: mu > 3 

xbar (sample mean) = 5.26 

Simulate a onemean inference 

pvalue = 0.0165 

We reject the null hypothesis and have evidence to believe that the number of flipflops owned by a typical college student is greater than 3. 
7.2.1 When do we use onemean inference?
When we want to test the claim that a population mean is not equal to some fixed value. For example, the average milespergallon of a car is equal to 22.
\(H_0: \mu = 22\) \(H_a: \mu \ne 22\)
Test statistic: we use the observed mean from the sample. In this case, we will compute the mean of the mtcars
data \(\bar{x} = 20.09\)
= 22
mu
favstats(~mpg, data=mtcars)
## min Q1 median Q3 max mean sd n missing
## 10.4 15.425 19.2 22.8 33.9 20.09062 6.026948 32 0
< mean(~mpg, data=mtcars, na.rm=T)
observed paste("Observed value for sample mean: ", observed)
## [1] "Observed value for sample mean: 20.090625"
Let’s develop an algorithm using simulation to compute a confidence interval for a single mean, using simulation. we will use the resample()
function to create a null distribution from the sample.
< 1000
trials < do(trials) * mosaic::mean(~mpg, data=resample(mtcars))
samples
# Let's compute a 95% Confidence Interval
< quantile(samples$mean,c(0.025,0.975))) (ci
## 2.5% 97.5%
## 18.07781 22.13758
Interpretation: We are 95% confident that the population mean is within the interval (17.45, 22.99). Since the test statistic of 22 is within this interval, we FAIL TO REJECT the claim that the average miles per gallon of all cars is not equal to 22 mpg.
And we can look at the distribution of sample means for our resampled data.
histogram(~mean, data=samples,
main = "Distribution of sample means",
v = c(ci[1], ci[2])
)
Notice that the value of 22 falls just within the confidence interval.
Let’s compute the pvalue.
if(observed < mu) {
< sum(samples$mean >= mu) / trials
pvalue else {
} < sum(samples$mean <= mu) / trials
pvalue
}
paste("Onesided pvalue is", pvalue)
## [1] "Onesided pvalue is 0.035"
paste("Twosided pvalue is", 2 * pvalue)
## [1] "Twosided pvalue is 0.07"
Since this is a twosided test of inference (mpg not equal to 22), we fail to reject the null hypothesis. We do not have evidence to believe that the mpg of all cars is different from 22 milepergallon.
7.2.2 Onemean Inference Code
library(mosaic)
# Read your data here
= 22
mu
favstats(~mpg, data=mtcars)
< mean(~mpg, data=mtcars, na.rm=T)
observed paste("Observed value for sample mean: ", observed)
< 1000
trials < do(trials) * mosaic::mean(~mpg, data=resample(mtcars))
samples
# Let's compute a 95% Confidence Interval
< quantile(samples$mean,c(0.025,0.975)))
(ci
histogram(~mean, data=samples,
main = "Distribution of sample means",
v = c(ci[1], ci[2])
)
if(observed < mu) {
< sum(samples$mean >= mu) / trials
pvalue else {
} < sum(samples$mean <= mu) / trials
pvalue
}
paste("Onesided pvalue is", pvalue)
paste("Twosided pvalue is", 2 * pvalue)
7.3 Twoproportion inference
Step  Example 


The death rate on Nurse Gilbert’s shifts is higher than the base death rate 

all shifts in the hospital 

Let pi1 be the proportion of patients who die on Nurse Gilbert shifts; pi2 will be the proportion of patients who die on shifts when Nurse Gilbert does not work. H0: pi1 = pi2; Ha: pi1 > pi2 

phat1 = 0.156; phat2 = 0.025 

Simulate a twoproportion inference n = 1000 

Onesided pvalue = 0 

Since the pvalue is so small, we reject the null hypothesis and we have evidence to believe that the death rate on Nurse Gilbert shifts is higher than the death rate on shifts when Nurse Gilbert does not work. 
7.3.1 When do we use twoproportion inference?
When we want to test the claim that two population proportions are significantly different. For example, the proportion of women who are in favor of some policy vs. the proportion of men who are in favor of the policy.
Let \(\pi_1\) be the proportion of women who favor the policy
Let \(\pi_2\) be the proportion of men who favor the policy
\(H_0: \pi_1 = \pi_2\) \(H_a: \pi_1 \ne \pi_2\)
Test statistic: we use the observed proportions from the sample. In this case, we identify a sample of church members, determine their gender, and ask them if they support the policy. Let’s say that out of 100 women, 62 support the policy, giving us \(\hat{p}_1 = 0.62\), and out of 100 men, 51 support the policy, giving us \(\hat{p}_2 = 0.51\).
Let’s develop an algorithm using simulation to replicate the TwoProportion applet found at link.
First, we need to create a dataframe representing the individual subjects in the study. We know the summary information that 51% of the men and 62% of the women support the policy. We will need to create, in effect 200 rows representing each of the subjects, as follows:
# Step 1: Create a dataframe representing the two variables
# NOTE: Values should be in alphabetical order
# df < data.table(Group = c(rep("men", 100),
# rep("women", 100)),
# Support = c(rep("no",38),
# rep("yes", 62),
# rep("no",49),
# rep("yes",51)))
< rbind(
df do(38) * data.frame(Group = "Men", Support = "no"),
do(62) * data.frame(Group = "Men", Support = "yes"),
do(49) * data.frame(Group = "Women", Support ="no"),
do(51) * data.frame(Group = "Women", Support = "yes")
)
# Compute my summary AND print the results
< tally(Support ~ Group, data=df)) (df.summary
## Group
## Support Men Women
## no 38 49
## yes 62 51
tally(Support~Group, data=df, format="proportion")
## Group
## Support Men Women
## no 0.38 0.49
## yes 0.62 0.51
62% of the men support the policy and 51% of the women support the policy.
Let’s compute the difference in the two sample proportions, and view a barplot of the data:
# Step 2: Compute the observed difference in proportions
< diffprop(Support ~ Group, data = df)
observed paste("Observed difference in proportions: " , round(observed,3))
## [1] "Observed difference in proportions: 0.11"
barplot(prop.table(df.summary, margin = 2),
las = 1,
main = "Proportion that Support",
col = c("green", "purple"),
legend.text = c("no", "yes"))
Let’s shuffle the treatment labels and look at the null distribution.
# STep 3: Shuffle the treatment labels
< 1000
trials
< do(trials) * diffprop(Support ~ shuffle(Group), data=df)
null.dist
histogram( ~ diffprop, data= null.dist,
xlab = "Differences in proportions",
main = "Null distribution for differences in proportions",
v= observed)
Lastly, we’ll compute the pvalue  that is, the probability of observing a test statistic as extreme (or more so) as 0.11, given that the null hypothesis is true.
if (observed > 0 ) {
< prop(~diffprop >= observed, data= null.dist)
p.value else {
} < prop(~diffprop <= observed, data= null.dist)
p.value
}
paste(" Onesided pvalue: ", round(p.value,3))
## [1] " Onesided pvalue: 0.094"
paste(" Twosided pvalue: ", 2 * round(p.value,3))
## [1] " Twosided pvalue: 0.188"
7.3.2 TwoProportion Inference Code  Method I (rbind):
library(mosaic)
library(data.table)
# Step 1: Create a dataframe representing the two variables
# NOTE: Values should be in alphabetical order
< rbind(
df do(38) * data.frame(Group = "Men", Support = "no"),
do(62) * data.frame(Group = "Men", Support = "yes"),
do(49) * data.frame(Group = "Women", Support ="no"),
do(51) * data.frame(Group = "Women", Support = "yes")
)
# Compute my summary AND print the results
< tally(Support ~ Group, data=df))
(df.summary
tally(Support~Group, data=df, format="proportion")
# Step 2: Compute the observed difference in proportions
< diffprop(Support ~ Group, data = df)
observed paste("Observed difference in proportions: " , round(observed,3))
barplot(prop.table(df.summary, margin = 2),
las = 1,
main = "Proportion that Support",
col = c("green", "purple"),
legend.text = c("no", "yes"))
# STep 3: Shuffle the treatment labels
< 1000
trials
< do(trials) * diffprop(Support ~ shuffle(Group), data=df)
null.dist
histogram( ~ diffprop, data= null.dist,
xlab = "Differences in proportions",
main = "Null distribution for differences in proportions",
groups= diffprop >= observed,
v= observed)
if (observed > 0 ) {
< prop(~diffprop >= observed, data= null.dist)
p.value else {
} < prop(~diffprop <= observed, data= null.dist)
p.value
}
paste("Onesided pvalue is", pvalue)
paste("Twosided pvalue is", 2 * pvalue)
7.3.3 TwoProportion Inference Code  Method II (matrix):
library(mosaic)
library(data.table)
rm(list=ls())
# Step 1: Create a dataframe representing the two variables
< matrix(c(15,3,4,8), nrow = 2)
df rownames(df) < c("Positive", "Negative")
colnames(df) < c("Good", "Bad")
# Step 2: Compute our differences
< sum(df[,1])
total_col1 < df[1,1]/total_col1
prop_col1 < sum(df[,2])
total_col2 < df[1,2]/total_col2
prop_col2
< prop_col1  prop_col2
observed paste("Proportion  column 1: " , round(prop_col1,3))
paste("Proportion  column 2: " , round(prop_col2,3))
paste("Observed difference in proportions: " , round(observed,3))
paste("Relative Risk: " , round(prop_col1/prop_col2,3))
barplot(prop.table(df,margin = 2),
las = 1,
main = "xxxxxxx",
ylab = "Observed proportion",
col = c("green", "purple"),
legend.text = rownames(df))
# STep 3: Shuffle the treatment labels and compute a pvalue
< 1000
trials < do(trials) * (mean(sample(0:1, total_col1, replace = T)) 
null.dist mean(sample(0:1, total_col2, replace=T)))
histogram( ~ result, data= null.dist,
xlab = "Differences in proportions",
main = "Null distribution for differences in proportions",
v= observed)
if (observed > 0) {
< prop(~result >= observed, data= null.dist)
p.value else {
} < prop(~result <= observed, data= null.dist)
p.value
}
paste(" Onesided pvalue: ", round(p.value,3))
paste(" Twosided pvalue: ", 2 * round(p.value,3))
7.4 Twomean inference
Step  Example 


Intereruption times by Old Faithful changed between 1978 and 2003 

the 203 observations made in which we measured the intereruption times 

Let mu1 be the intereruption time in 1978; mu2 is the intereruption time in 2003. H0: mu1 = mu2; Ha: mu1 not = mu2 

xbar1 = 71.07 minutes; xbar2 = 91.19 minutes 

Simulate a twomean inference assuming the times are equal 

Onesided pvalue = 0 

Since the pvalue is so small, we reject the null hypothesis and we have evidence to believe that the intereruption times are different in 2003 from 1978. 
When do we use twomean inference?
When we want to test the claim that two population means are significantly different. For example, the average amount of mercury found in two types of tuna fish  albacore and yellowfin
Let \(\mu_1\) be the average amount of mercury in albacore Let \(\mu_2\) be the average amount of mercury in yellowfin
\(H_0: \mu_1 = \mu_2\) \(H_a: \mu_1 \ne \mu_2\)
Test statistic: we use the observed sample means. We will have to compute these from the data.
Let’s read in the data and compute the sample means
read.delim("http://citadel.sjfc.edu/faculty/ageraci/data/tuna.txt") > df
str(df)
## 'data.frame': 274 obs. of 2 variables:
## $ Tuna : chr "albacore " "albacore " "albacore " "albacore " ...
## $ Mercury: num 0 0.41 0.82 0.32 0.036 0.28 0.29 0.34 0.36 0.42 ...
options(digits = 5)
favstats(~Mercury  Tuna, data=df)
## Tuna min Q1 median Q3 max mean sd n missing
## 1 albacore 0 0.300 0.3600 0.3965 0.820 0.35763 0.13799 43 0
## 2 yellowfin 0 0.184 0.3109 0.4625 1.478 0.35444 0.23100 231 0
Now we will compute the observed difference in the means:
# Compute the observed difference in the means between the two groups
< diffmean(~Mercury  Tuna, data=df, na.rm = T)
observed paste("Observed difference in the means: ", round(observed, 3 ))
## [1] "Observed difference in the means: 0.003"
Next we will ‘shuffle()’ the Tuna variable and compare the means:
< 1000
trials < "less"
moreorless
# Compare the difference in means when we shuffle the Tuna varaible  do this lots of times
< do(trials) * diffmean(Mercury ~ shuffle(Tuna), data=df, na.rm = T)
null.dist
# Graph the histogram with the observed difference of proportions
histogram( ~ diffmean, data=null.dist,
main = "Sampling distribution for null hypothesis",
xlab = "Difference in means",
v = observed, na.rm = T)
… and compute the pvalue.
# Let's compute the pvalue
if (moreorless == "more") {
< prop(~ diffmean >= observed, data=null.dist)
pvalue else {
} < prop(~ diffmean <= observed, data=null.dist)
pvalue
}paste("The onesided pvalue is ", round(pvalue,3))
## [1] "The onesided pvalue is 0.432"
paste("The twosided pvalue is ", 2 * round(pvalue,3))
## [1] "The twosided pvalue is 0.864"
7.4.1 TwoMean Inference Code:
library(mosaic)
library(data.table)
< read.delim("http://citadel.sjfc.edu/faculty/ageraci/data/tuna.txt")
df
favstats(~Mercury  Tuna, data=df)
# Compute the observed difference in the means between the two groups
< diffmean(~Mercury  Tuna, data=df, na.rm = T)
observed paste("Observed difference in the means: ", round(observed, 3 ))
< 1000
trials
# Compare the difference in means when we shuffle the Tuna varaible  do this lots of times
< do(trials) * diffmean(Mercury ~ shuffle(Tuna), data=df, na.rm = T)
null.dist
# Graph the histogram with the observed difference of proportions
histogram( ~ diffmean, data=null.dist,
main = "Sampling distribution for null hypothesis",
xlab = "Difference in means",
groups = diffmean >= observed,
v = observed, na.rm = T)
# Let's compute the pvalue
if (observed > 0) {
< prop(~ diffmean >= observed, data=null.dist)
pvalue else {
} < prop(~ diffmean <= observed, data=null.dist)
pvalue
}paste("The onesided pvalue is ", round(pvalue,3))
paste("The twosided pvalue is ", 2 * round(pvalue,3))