2. Statistical Power and Sample Size (Week 3) | Causal Inference in Education

Effect Size for T-Test

T-Test Results

mm4_t <- t.test(nysp_vouchers$post_ach ~ nysp_vouchers$voucher, var.equal = T)

T-Test Results

no voucher	voucher	statistic	p.value	conf.low	conf.high	std. err
21.13	26.029	-2.911	0.004	-8.205	-1.593	1.682719

Cohen’s d effect size for t-test

Using the effectsize package:

library(effectsize)

# use cohens_d() from effectsize
cohens_d(nysp_vouchers$post_ach, as.factor(nysp_vouchers$voucher)) %>%
  # format into an APA table
  mutate(across(is.numeric, round, 3)) %>%  #round
  mutate("95% CI" = paste0("[", CI_low, ", ", CI_high, "]")) %>%   # combine
  select(Cohens_d, `95% CI`) %>%  # drop columns
  rename("Cohen's d"= Cohens_d) %>%   # rename
  apa("Effect size for t-test")

Effect size for t-test

Cohen's d	95% CI
-0.257	[-0.43, -0.083]

Interpretation

Students with an opportunity to recieve a voicher scored .25 standard deviations higher than students who did not recieve an opportunity to get a voucher.

Calculating Power

Power calculations are based on the pwr package and come from https://www.statmethods.net/stats/power.html.

library(pwr)

New York Scholarship Program (NYSP) Power Analysis

T-Tests

The following calculates power from the NYSP t-test example (Strategy 1, Table 4.1, pg. 49)

nysp_power <- pwr.t2n.test(n1 = 230, n2= 291, d = .257, sig.level =0.05, power = )

Power of the NYSP Voucher T-Test

n1	n2	d	sig	power
230	291	0.257	0.05	0.8283176

Interpretation

This is a post-hoc power analysis. The study above had a power of .82. That is, it had an 82% chance to detect and effect if there was once and there was a 18% chance of making a type II error (rejecting a null hypothesis when there is an effect).

Simple Linear Regression

Recall Strategy 2, Table 4.1, pg. 49):

#model
mm4_model1 <- lm(post_ach ~ voucher, data=nysp_vouchers)

Simple Linear Regression

Predictor	b	b_95%_CI	beta	beta_95%_CI	sr2	sr2_95%_CI	r	Fit
(Intercept)	21.13**	[18.66, 23.60]
voucher	4.90**	[1.59, 8.20]	0.13	[0.04, 0.21]	.02	[.00, .04]	.13**
								R2 = .016**
								95% CI[.00,.04]

ANOVA Table for Simple Linear Regresion

Predictor	SS	df	MS	F	p	partial_eta2	CI_90_partial_eta2
(Intercept)	102693.91	1	102693.91	282.32	.000
voucher	3082.89	1	3082.89	8.48	.004	.02	[.00, .04]
Error	188787.59	519	363.75

Power for NYSP Simple Linear Regression

Use pwr.f2.test(u =, v = , f2 = , sig.level = , power = ) where,

u = numerator or df of predictors (e.g. number of predictors including each dummy variable)
v = denominator or df for the residual
f2 = Cohen’s $f^2$ , which is equal to $\frac{R^2}{1-R^2}$

Based on the regression results, the NYSP simple linear regression model had the following power:

nysp_regpower <- pwr.f2.test(u = 1, v = 519, f2 = .016/(1-.016), sig.level = .05, power = )

Power of the NYSP Voucher Simple Linear Regression Test

Predictors	Residual df	r2	sig	power
1	519	0.01626016	0.05	0.8277319

Interpretation

Because no covariates were used, the results here are the same as the t-test above.

Power for NYSP Multiple Regression

(Strategy 3, Table 4.1, pg 49)

mm4_model2 <- lm(post_ach ~ voucher + pre_ach, data = nysp_vouchers)
apa.reg.table(mm4_model2)[[3]] %>% apa()



Predictor	b	b_95%_CI	beta	beta_95%_CI	sr2	sr2_95%_CI	r	Fit
(Intercept)	7.72**	[5.43, 10.00]
voucher	4.10**	[1.61, 6.59]	0.11	[0.04, 0.17]	.01	[-.00, .02]	.13**
pre_ach	0.69**	[0.62, 0.76]	0.65	[0.59, 0.72]	.43	[.36, .49]	.66**
								R2 = .442**
								95% CI[.38,.49]

apa.aov.table(mm4_model2)[[3]] %>% apa()



Predictor	SS	df	MS	F	p	partial_eta2	CI_90_partial_eta2
(Intercept)	9100.16	1	9100.16	44.05	.000
voucher	2154.80	1	2154.80	10.43	.001	.02	[.00, .04]
pre_ach	81780.28	1	81780.28	395.88	.000	.43	[.38, .48]
Error	107007.31	518	206.58

Based on the regression results, the NYSP multiple regression model had the following power:

nysp_regpower_2 <- pwr.f2.test(u = 2, v = 518, f2 = .442/(1-.442), sig.level = .05, power = )

Power of the NYSP Voucher Multiple Regression Test

Predictors	Residual df	r2	sig	power
2	518	0.7921147	0.05	1

Interpretation

The post-hoc test of power indicated that the large sample size and large $R^2$ had a power of 1, or approximately a 100% chance to detect an effect if there was one.

Effect Size Calculator

Here is a quick interactive calculator I made. It’s very basic.

knitr::include_url('https://acircleda.shinyapps.io/PowerCalculator/', height = '900px')

Accuracy in Parameter Estimation (AIPE)

AIPE is another method which can be used to calculate estimated sample size. It is based on specifying a confidence interval in which you would find an effect size of interest. Here is an example based on the NYSP multiple regression using the MBESS package:

library(MBESS)

ss.aipe.R2(Population.R2 = .442, conf.level = .95, width=.10, p=2, Random.Predictors = FALSE)

## [1] "The approximate sample size is given below; you should consider using the additional"
## [1] "argument 'verify.ss=TRUE' to ensure the exact sample size value is obtained."

## $Required.Sample.Size
## [1] 661

To find an $R^2$ of .442 ${_{CI}}_{[.3-.5]}$ , you would need the sample size indicated above (661). The actual sample size that found the $R^2$ of .442 was 520. The estimate was not exact, but was very close.

References

Kabacoff, R. I. (2017). Power analysis. Quick-R. https://www.statmethods.net/stats/power.html

Murnane, R. J., & Willett, J. B. (2010). Methods matter: Improving causal inference in educational and social science research. Oxford University Press.