Chapter 32 Reflect Review and Relax
Motivating scenarios: We are taking stock of where we are in the term. Thinking about stats and science, and making sure we understand the material to date.
Required reading / Viewing:
The Science of Doubt. link. By Michael Whitlock.32.1 Review / Setup
So much of stats aims to learn the TRUTH.
We focus so much on our data and how to measure uncertainty around estimates and (in)compatibility of data with a null model. We will review and solidify this, but
Recognize that so much beyond sampling error can mislead us.
32.2 How science goes wrong
Watch the video below. Whe you do, consider these types of errors that accompanny science. You should be able to think about htese and as good questions about them.
- Fraud.
- Wrong models.
- Experimental design error.
- Communication error.
- Statistician error.
- Harking.
- Coding error.
- Technical Error.
- Publication bias.
You should have something to say about
- The “replication crisis,” and
- If/why preregistration of studies is a good idea.
A brief word on publication bias Scientists are overworked and have too much to do. They get more rewards for publishing statistically significant results, so those are usually higher on the to do list. This results in the file drawer effect in which non-significant results are less likely to be submitted for publication.
I simulate this below, and the have a web app (basically this code dressed up in sliders) for you to use to explore this.
# Set it up
<- c(2,4,6,8,12,16,24,32,48,64,96,128,192,250, 384,500,768, 1000)
sample_sizes <- 10000
replicates <- length(sample_sizes) * replicates
total_experiments <- factor(1:total_experiments)
exp_id <- .2
mu <- 1
sigma
# Simulate
<- tibble(exp_id = factor(1:total_experiments) ,
sim_dat sample_size = rep(sample_sizes, each = replicates)) %>%
uncount(weights = sample_size, .remove = FALSE) %>%
mutate(sim_val = rnorm(n = n(), mean = mu, sd = sigma))
# Summarize and hypothesis test
<- sim_dat %>%
sim_summary group_by(exp_id) %>%
summarise(n = n(),
mean_val = mean(sim_val),
se = sd(sim_val) / sqrt(n),
t = mean_val / se,
p_val = 2 * pt(q = abs(t), df = n-1, lower.tail = FALSE),
reject = p_val < 0.05) %>%
group_by(n) %>%
mutate(power = mean(reject))
# plot
<- ggplot(sim_summary, aes(x = power, y = mean_val,label = n))+
sim_plotstat_summary(aes(color = reject),
geom = "text", size =3,
position = position_nudge(y = .02, x = -.015),
show.legend = FALSE) +
stat_summary(aes(color = reject),geom = "point",
show.legend = FALSE) +
stat_summary(aes(color = reject), geom = "line")+
stat_summary(geom = "line", color = "black") +
annotate(x = .5, y = mu+.02, geom = "text",label = "mean of all sims" , size = 2) +
theme_light()+
labs(title = "Significant results are biased. Numbers show n")
ggplotly(sim_plot)
Interact with the app below (basically this code with widgets allowing you to) to see how this biases our estimates.
I find this stuff fascinating If you want more, here are some good resources.
Videos from calling bullshit – largely redundant with video above): 7.2 Science is amazing, but…, 7.3 Reproducibility, 7.4 A Replication Crisis, 7.5 Publication Bias, and 7.6 Science is not Bullshit.
The replication crisis
- Estimating the reproducibility of psychological science (Collaboration 2015) link,
- A glass half full interpretation of the replicability of psychological (Leek, Patil, and Peng 2015) link,
- The Persistence of Underpowered Studies in Psychological Research: Causes, Consequences, and Remedies (Maxwell 2004) link.
P-hacking The Extent and Consequences of P-Hacking in Science (Head 2015) link.
*The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time∗ link.
32.3 Review
You should be pretty comfortable with the ideas of
- Parameters vs Estimates
- Sampling and what can go wrong
- Null hypothesis significance testing
- Common test statistics
- F
- t
- F
- Calculating Sums of Squares
- Interpreting stats output like that below
<- mutate(ToothGrowth, dose = factor(dose))
ToothGrowth <- lm(len ~ supp * dose, data = ToothGrowth)
tooth_lm
summary(tooth_lm)
##
## Call:
## lm(formula = len ~ supp * dose, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.20 -2.72 -0.27 2.65 8.27
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.230 1.148 11.521 3.60e-16 ***
## suppVC -5.250 1.624 -3.233 0.00209 **
## dose1 9.470 1.624 5.831 3.18e-07 ***
## dose2 12.830 1.624 7.900 1.43e-10 ***
## suppVC:dose1 -0.680 2.297 -0.296 0.76831
## suppVC:dose2 5.330 2.297 2.321 0.02411 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.631 on 54 degrees of freedom
## Multiple R-squared: 0.7937, Adjusted R-squared: 0.7746
## F-statistic: 41.56 on 5 and 54 DF, p-value: < 2.2e-16
anova(tooth_lm)
## Analysis of Variance Table
##
## Response: len
## Df Sum Sq Mean Sq F value Pr(>F)
## supp 1 205.35 205.35 15.572 0.0002312 ***
## dose 2 2426.43 1213.22 92.000 < 2.2e-16 ***
## supp:dose 2 108.32 54.16 4.107 0.0218603 *
## Residuals 54 712.11 13.19
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(tooth_lm, type = "II")
## Anova Table (Type II tests)
##
## Response: len
## Sum Sq Df F value Pr(>F)
## supp 205.35 1 15.572 0.0002312 ***
## dose 2426.43 2 92.000 < 2.2e-16 ***
## supp:dose 108.32 2 4.107 0.0218603 *
## Residuals 712.11 54
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
32.4 Quiz
Reflection questions on [canvas]