0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
303 | 42 | 172 | 270 | 369 | 1281 | 853 | 1344 | 1295 | 353 | 356 |
Ein Einsteiger-Seminar für Studierende der Politikwissenschaft
Updated: Jul 17, 2024
When, where, requirements, contact, grading: See syllabus
Document for posting questions can be found here.
Content of slides is a fusion of…
Sociology of research methodology (Where did you study?)
Measure:‘Would you say that most people can be trusted or that you can’t be too careful in dealing with people, if 0 means “Can’t be too careful” and 10 means “Most people can be trusted”?’
RQ: What is the average level of trust (Y)? How are individuals distributed? (univariate)
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
303 | 42 | 172 | 270 | 369 | 1281 | 853 | 1344 | 1295 | 353 | 356 |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
---|---|---|---|---|---|---|---|---|---|---|---|
no victim | 259 | 36 | 135 | 214 | 320 | 1142 | 782 | 1228 | 1193 | 326 | 331 |
victim | 44 | 6 | 37 | 56 | 48 | 139 | 70 | 114 | 101 | 27 | 25 |
Is there a causal effect of victimization on trust in a sample of Swiss citizens?
What is the impact of negative experiences on trust?
Is there a causal effect of victimization on generalized trust in a sample of Swiss aged from 18 to 98 in 2010?
What is the impact of victimization on generalized trust?
Hypotheses = expectations we have for the answers to our research question (descriptive or causal)
RQ: Does smoking increase the probability of/cause cancer?
Sampling: Select subset of units from population to estimate characteristics of that population
Steps: Researcher (we)…
Q: Are the above steps necessary when we work with secondary data (e.g., ALLBUS)?
Sampling techniques (Cochran 2007)
Simple random sampling: (1) Units in the population are numbered from 1 to N; (2) Series of random numbers between 1 and N is drawn; (3) Units which bear these numbers constitute the sample (ibid, 11-12) → Each unit has same probability of being chosen
Stratified random sampling: (1) Population divided into non-overlapping, exhaustive subpopulations (strata); (2) Simple random sample is taken in each stratum (ibid, 65f)
Quota sampling: Decide about N units that are wanted from each stratum (e.g., age, gender, state) and continue sampling until the neccessary “quota” has been obtained in each stratum (ibid, 105)
Snowball sampling: (1) Locate members of special population (e.g., drug addicts); (2) Ask them to name other members of population and repeat this step (Sudman and Kalton 1986, 413) → use snowballing to create sampling frame, then sample
“the most important thing in statistics that’s not in the textbooks” (Gelman, April, 2015)
Theories (and the hypotheses they imply) (Moore and Siegel 2013, 3–4)
A [theoretical] variable has different theoretical “levels” or “values” (Jaccard and Jacoby 2019, 13)
Empirical value of a variable for a given unit u (ui): the number assigned by some measurement process to u (Holland 1986, 954), e.g., male (0) or female (1)
Random variables: “If we have beliefs (i.e., probabilities) attached to the possible values that a variable may attain, we will call that variable a random variable.” (Pearl 2009, 8)
id | gender | age | degree | subject |
---|---|---|---|---|
John | M | 25 | bachelor | sociology |
Petra | F | 30 | master | physics |
Hans | M | 29 | master | biology |
names | id | gender | income | education | happiness | age |
---|---|---|---|---|---|---|
Hans | 1 | male | 1000 | 0 | 5 | 30 |
Peter | 2 | male | 5000 | 3 | 10 | 30 |
Julia | 3 | female | 500 | 1 | 3 | 30 |
Andrea | 4 | female | 1600 | 3 | 7 | 30 |
Feli | 5 | female | 1600 | 3 | 7 | 30 |
Columns = variables
Rows = observations (often observations = units but not always)
Q: Which type of dataset has more observations (rows) than units? (Tip: Pa…)
Q: What are the theoretical and observed (empirical) values of happiness and age?
Q: Which are constants and which are variables in the above data frame? What is the difference?
Systematic error:
Random error: In repeated measures, a scale randomly deviates from your true weight
Partly conceptual confusion around terms such as reliability, repeatability (Bartlett and Frost 2008) (Q: Validity? Reliability?)
After decisions about research question, population, sample, concepts, and measures we finally choose a research method and have collected data
Lecture focuses on causal inference using experimental/observational data so let’s quickly reiterate what data is!
Data: Units’ observed values on different variables (observations)
Variables: Dimensions of the data space
Empirical observations are distributed across those dimensions, i.e., across (theoretical) values of those variables
Name | trust2006 | threat2006 | education2006 |
---|---|---|---|
Aseela | 4 | 0 | 8 |
Dominic | 5 | 1 | 1 |
Elshaday | 0 | 0 | 0 |
Daniel | 5 | 0 | 9 |
Sulaimaan | 7 | 0 | 4 |
Peyton | 5 | 0 | 1 |
Mudrik | 2 | 0 | 4 |
Alexander | 7 | 0 | 5 |
.. | .. | .. | .. |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
303 | 42 | 172 | 270 | 368 | 1281 | 852 | 1342 | 1294 | 353 | 356 |
0 | 1 |
---|---|
5966 | 667 |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
380 | 806 | 194 | 89 | 2182 | 324 | 687 | 474 | 195 | 425 | 877 |
Measures on several variables → multivariate joint distribution
3 variables: Victimization/Threat (0,1); Education (0-10) → Trust (0-10)
Q: How many dimensions? How many theoretical value combinations?
Joint distribution is basis for any quantitative analysis (Holland 1986, 948) of variables used in the design/analysis
Associational inference (descriptive questions, what?):
Causal inference (causal questions, why?):
Given an unit (individual/you!) and a set of actions (take aspirin or not) we associate each action-individual pair with a potential outcome
Example (Peter has a headache): Aspirin (0 = no/1 = yes) → headache/Kopfschmerzen (0 = no/1 = yes)
Definition individual-level causal effect
Difference in potential outcomes, same individual, same moment in time post-treatment (Imbens and Rubin 2015)
Causal effect = treatment effect
\(ITE = Y_{i}(\color{red}{1}) - Y_{i}(\color{blue}{0})\) ( \(t\) often omitted)
\(ITE_{Peter} = \text{Headache}_{Peter}(\color{red}{\text{Aspirin}}) - \text{Headache}_{Peter}(\color{blue}{\text{No aspirin}})\)
\(\quad i\quad\) | \(\quad D_i\quad\) | \(\quad Y_{i}\quad\) | \(\quad Y_{i}(\color{red}{1})\quad\) | \(Y_{i}(\color{blue}{0})\) |
---|---|---|---|---|
Peter | 1 | 0 | 0 | ? |
Missing data problem: We don’t observe the missing potential outcome
\(\quad i\quad\) | \(\quad D_i\quad\) | \(\quad Y_{i}\quad\) | \(\quad Y_{i}(\color{red}{1})\quad\) | \(Y_{i}(\color{blue}{0})\) |
---|---|---|---|---|
Peter | 1 | 0 | 0 | ? |
Definition of causal effect does not require more than one individual (Imbens and Rubin 2015, 8)
BUT only one potential outcome realized (and observable)
Estimation: Pursue between-individual comparisons or within-individual comparisons
Individuals = units = can be anything (school classes, firms, governments etc.)
Q: How many potential outcomes do we have for a treatment variable Aspirin (no/yes), Education (primary school/high school/university) and Motivation (lowest/low/high/highest)?
\(Unit\) | \(D_{i} (Aspirin:Yes/No)\quad\) | \(Y_{i}(1) (Headache \mid Aspirin)\quad\) | \(Y_{i}(0) (Headache \mid NoAspirin)\) |
---|---|---|---|
Simon | 1 | 0 | ? |
Julia | 1 | 1 | ? |
Paul | 0 | ? | 1 |
Trump | 0 | ? | 0 |
Fabrizio | 0 | ? | 0 |
Diego | 0 | ? | 0 |
\(Unit\) | \(D_{i} (Aspirin: Yes/No)\quad\) | \(Y_{i} (Headache: Yes/No)\quad\) |
---|---|---|
Simon | 1 | 0 |
Julia | 1 | 1 |
Paul | 0 | 1 |
Trump | 0 | 0 |
Fabrizio | 0 | 0 |
Diego | 0 | 0 |
\(Unit\) | \(D_{i} (Aspirin: Yes/No)\quad\) | \(Y_{i} (Head.: Yes/No)\quad\) | \(Y_{i}(1) (Head. \mid YesAspirin)\quad\) | \(Y_{i}(0) (Head. \mid NoAspirin)\) |
---|---|---|---|---|
Simon | 1 | 0 | 0 | ? |
Julia | 1 | 1 | 1 | ? |
Paul | 0 | 1 | ? | 1 |
Trump | 0 | 0 | ? | 0 |
Fabrizio | 0 | 0 | ? | 0 |
Diego | 0 | 0 | ? | 0 |
\(Y = 1\) | \(Y = 0\) | |
---|---|---|
\(D = 1\) | 0.5 | ? |
\(D = 0\) | ? | 0.25 |
Potential outcomes: Sometimes \(Y_{i}(1)\) written as \(Y_{1i}\), \(Y_{i}^{t}\), \(y_{i}^{1}\) (Morgan and Winship 2007, 43)
ATE notation (finite vs. superpopulation, Imbens and Rubin (2015, 18))
Often the focus is on subpopulations (Imbens and Rubin 2015, 18)
Females (covariate value): \(\tau_{fs} (f) = \frac{1}{N(f)} \sum\limits_{i: X_{i} = f}^{N} (Y_{i}(1) - Y_{i}(0))\) (Conditional ATE)
ATT: \(\tau_{fs,t} = \frac{1}{N_{t}} \sum\limits_{i: D_{i} = 1}^{N} (Y_{i}(1) - Y_{i}(0))\)
Other subpopulations: Complier Average Treatment Effect; Intent-to-Treat Effect (see overview at Egap)
Q: For which units in the Table below would you need to fill in the missing potential outcomes if you were interested in…
\(Unit\) | \(D_{i}\) | \(Y_{i}(1)\) | \(Y_{i}(0)\) |
---|---|---|---|
Simon | 1 | 0 | ? |
Julia | 1 | 1 | ? |
Paul | 0 | ? | 1 |
Trump | 0 | ? | 0 |
Fabrizio | 0 | ? | 0 |
Diego | 0 | ? | 0 |
Causal ordering assumption: Written as \(D_{i} \longrightarrow Y_{i}\) (Imai’s notation)
Independence assumption (IA): also called unconfounded assignment (For other names see Imbens and Rubin 2015, 43)
Stable Unit Treatment Values Assumption (SUTVA): (1) No interference assumption & (2) Consistency assumption
\(Unit\) | \(D_{i}\quad\) | \(Y_{i}\quad\) | \(Y_{i}(1)\quad\) | \(Y_{i}(0)\) |
---|---|---|---|---|
Simon | 1 | 0 | 0 | ? |
Julia | 1 | 1 | 1 | ? |
Paul | 0 | 1 | ? | 1 |
Trump | 0 | 0 | ? | 0 |
Fabrizio | 0 | 0 | ? | 0 |
Diego | 0 | 0 | ? | 0 |
Step \(\stackrel{1}{=}\): In causal inference, the observed outcome \(Y_{i}\) can be expressed as a combination of potential outcomes \(Y_{i}(0)\) and \(Y_{i}(1)\) based on the treatment indicator \(D_{i}\) (Observed Outcome Definition). Specifically, \(Y_{i} = Y_{i}(0) + D_{i} (Y_{i}(1) - Y_{i}(0))\).
Step \(\stackrel{2}{=}\): If \(D_{i} = 1\), \((Y_{i}(0) + 1\times Y_{i}(1) - 1\times Y_{i}(0))\) so \(Y_{i}(0)\) cancels out and we end up with \(E[Y_{i}(1)\mid D_{i} = 1]\)
Step \(\stackrel{3}{=}\): Because \(Y_{i}(1)\) is independent of \(D_{i}\) (independence assumption) we can replace \(E[Y_{i}(1)\mid D_{i} = 1]\) with \(E[Y_{i}(1)]\)
Longer explanation
Insight: If we assume SUTVA! (Imbens and Rubin 2015, 10)
Q: Imagine you all have a headache and you sit in the same room. Then we try to randomly assign aspirin to one half of you to test its effect: What would the two assumptions (No interference, No hidden variations) mean and how could they be violated in that situation? Discuss in groups!
Q: In groups, think of one more empirical examples where those assumption could be violated.
Questions?
Q: What is the fundamental problem of causal inference? (missing data!)
To ensure these assumptions hold in a real study:
Distinction going back to Cochran (1965) and others
Assignment mechanism: “process that determines which units receive which treatments, hence which potential outcomes are realized and thus can be observed [and which are missing]” (Imbens and Rubin 2015, 31)
Experiments (experimental studies)
Observational studies
Provided perfect randomization we can estimate causal effect by comparing outcome averages between treatment and control group (e.g., using t-tests)
Various checks are recommended!
Examples from…
We are leaving the realm of experimental studies and enter the realm of observational studies!
Good because.. many questions cannot be answered using experiments! (ethical & resource constraints)
Lab_2_Experimental_data.html
Lab_2_Experimental_data.qmd
Lab2_data
Questions?
Today’s objective
Terminology: Conditioning on vs. controlling for
Assignment mechanism: “process that determines which units receive which treatments, hence which potential outcomes are realized and thus can be observed [and which are missing]” (see Imbens and Rubin 2015, 31)
Q: What is the difference between experimental (experiments) and observational studies?
Many questions cannot be answered using experiments (Q: Any examples?)
Imbens and Rubin (2015) show that under certain assumptions assignment mechanism within subpopulations of units with the same value for the covariates (Q?) can be interpreted as if it was completely randomized experiment (see Imbens and Rubin 2015, 257)
I & R call assignment mechanisms that fulfill those assumptions regular assignment mechanism
Given assumptions 1-3 the probability of receiving the treatment is equal to \(e(x)=N_{t}(x)/(N_{c}(x)+N_{t}(x))\) for all units with \(X_{i}=x\) conditional on the number of treated and control units composing such a subpopulation
We don’t know a priori assignment probabilities for units, but know that units with the same pre-treatment covariate values have same \(e(x)\), i.e., the same prob. of getting the treatment
This insight still suggests feasible strategies (e.g., focus on subsample)!
Q: What might be the problem if we have many distinct values of covariates?
\(Unit\) | \(D_{i}\) | \(X_{i}\) | \(Y_{i}\) | \(Y_{i}(1)\) | \(Y_{i}(0)\) |
---|---|---|---|---|---|
Simon | 1 | 0 | 0 | 0 | ? |
Julia | 1 | 1 | 1 | 1 | ? |
Paul | 0 | 0 | 1 | ? | 1 |
Sarah | 0 | 1 | 0 | ? | 0 |
Fabrizio | 0 | 0 | 0 | ? | 0 |
Diego | 0 | 0 | 0 | ? | 0 |
$Unit$ | $D_{i}$ | $X_{i}$ | $Y_{i}$ | $Y_{i}(1)$ | $Y_{i}(0)$ |
---|---|---|---|---|---|
Simon | 1 | 0 | 0 | 0 | ? |
Julia | 1 | 1 | 1 | 1 | ? |
Paul | 0 | 0 | 1 | ? | 1 |
Sarah | 0 | 1 | 0 | ? | 0 |
Fabrizio | 0 | 0 | 0 | ? | 0 |
Diego | 0 | 0 | 0 | ? | 0 |
Remember.. fundamental objective: Estimate true causal effect of D on Y without bias (unbiased)
Common-cause confounding bias (Elwert and Winship 2014b, 37)
Overcontrol/post-treatment bias
Endogenous selection bias
Lab_3_SSO_Observational_data.html
Lab_3_SSO_Observational_data.qmd
Lab3_data.csv
Remember: In randomized experiments we can simply compare means in treatment and control
Observational studies (and data) require more refined estimation strategies
4 (+1) broad classes of strategies for estimation (Imbens and Rubin 2015, 268f)
Strategies (1) and (2-4) differ in that (2-4) can be implemented before seing any outcome data
We focus on model-based imputation (1) (= regression) and matching methods (4)
Theory: Impute missing potential outcomes by building a model for the missing outcomes
In practice: “off-the-shelf” methods
But model-based imputation problematic when covariate distributions are far apart!
Better: Prior to using regression methods ensure balance between covariate distributions for treatment and control
“broadly […] any method that aims to equate (or ”balance”) the distribution of covariates in the treated and control groups” (Stuart 2010, 2)
Goal
Approach
Q: What tradeoff is there when it comes to pruning units (think of representativeness)?
Pure regression approach is increasingly questioned (e.g., Aronow and Samii 2015)
Matching methods (Stuart 2010, 2)
Q: If we use matching, do we still need the conditional unconfoundedness/independence assumption?
Standardized mean differences (SMD): Difference in means of each covariate between treatment/control standardized by a stand. factor (same scale across all COVARs)
Variance Ratios: Ratio of variance of covariate in one group to that in the other
Empirical CDF Statistics: Evaluate difference in empirical cumulative density functions (eCDFs) of each covariate between treatment/control (allow assessment of imbalance across the entire covariate distribution)
Visual Diagnostics: Can help tailoring matching method to target imbalance
Idea: find lower-dimensional functions of covariates that suffice for removing bias associated with differences in the pre-treatment variables (Imbens and Rubin 2015, 266f)
Formally: balancing score is a function of the covariates such that the probability of receiving the active treatment given the covariates is free of dependence on the covariates given the balancing score: \(D_{i} \perp \!\!\! \perp X_{i} | b(X_{i})\)
Important property: if assignment to treatment is unconfounded given the full set of covariates, then assignment is also unconfounded conditioning only on a balancing score
x_age
and x_education
(2 dimensions/variables) to pr_score
(1 dimension/variable)Name | x_education | x_age | d_victim | y_trust | pr_score |
---|---|---|---|---|---|
Alissa | 4 | 75 | 0 | 8 | 0.14 |
Damaris | 0 | 17 | 1 | 7 | 0.50 |
Juan | 0 | 18 | 0 | 9 | 0.50 |
Rosa | 6 | 62 | 0 | 5 | 0.11 |
Janeth | 4 | 62 | 0 | 6 | 0.17 |
Yeimi | 2 | 51 | 1 | 9 | 0.28 |
Jacob | 4 | 31 | 0 | 5 | 0.24 |
Monica | 4 | 38 | 1 | 8 | 0.22 |
Cesar | 6 | 44 | 1 | 3 | 0.14 |
Marcos | 0 | 18 | 1 | 5 | 0.50 |
Lab_4_SSO_Matching.html
Lab_4_SSO_Matching.qmd
Lab4_data.csv
Panel data: Observing units (individuals, countries etc.) several times (Q: Balanced panel?)
Term time-series data used for countries/country-level measures across time (“aggregated units”)
Identifcation strategies (e.g., Keele 2015, 10)
Classical FE/FD approaches recently reassessed from causal inference perspective (e.g., Imai & Kim 2019)
Data perspective: Time just another dimension in the joint distribution (see next slides)
unit | trust.2006 | trust.2007 | Victimization.2006 | Victimization.2007 |
---|---|---|---|---|
Peter | 1 | 1 | 0 | 1 |
Julia | 5 | 5 | 0 | 1 |
Pedro | 6 | 6 | 1 | 0 |
unit | time | trust | Victimization |
---|---|---|---|
Julia | 2006 | 5 | 0 |
Julia | 2007 | 5 | 1 |
Pedro | 2006 | 6 | 1 |
Pedro | 2007 | 6 | 0 |
Peter | 2006 | 1 | 0 |
Peter | 2007 | 1 | 1 |
Two groups of units (e.g., restaurants, individuals)
Outcome Y: Observed twice, before and after treatment (2 timepoints/periods \(t_{0}\) and \(t_{1}\))
Treatment D: happens between \(t_{0}\) and \(t_{1}\)
Covariates X: Observed at \(t_{0}\) or before
Graph (left) shows average outcome (across time and groups)
Naive strategies:
Problem: Bias due to pre-existing trends/pre-existing differences (Q: ?)
Solution/Idea: Take trend of control group as counterfactual for unobserved trend in treatment group
\(Unit\) | \(D_{i}\) | \(Y_{i, t = 0}\) | \(Y_{i, t = 1}\) | \(\Delta Y\) |
---|---|---|---|---|
Restaurant 1 | 1 | 40 | 20 | -20 |
Restaurant 2 | 1 | 60 | 30 | -30 |
Restaurant 3 | 0 | 50 | 40 | -10 |
Restaurant 4 | 0 | 30 | 30 | 0 |
We can easily estimate our DiD effect via regression.. but how?
We want to estimate:
So what do we need?
name | time T | treated D | outcome Y |
---|---|---|---|
Restaurant_1 | 0 | 1 | 15.00 |
Restaurant_3 | 0 | 1 | 24.00 |
Restaurant_5 | 0 | 1 | 15.00 |
Restaurant_2 | 0 | 0 | 40.50 |
Restaurant_4 | 0 | 0 | 13.75 |
Restaurant_6 | 0 | 0 | 8.50 |
Restaurant_1 | 1 | 1 | 27.00 |
Restaurant_3 | 1 | 1 | 23.00 |
Restaurant_5 | 1 | 1 | 21.50 |
Restaurant_2 | 1 | 0 | 24.00 |
Restaurant_4 | 1 | 0 | 11.50 |
Restaurant_6 | 1 | 0 | 10.50 |
time T | treated D | mean_Y |
---|---|---|
0 | 0 | 20.92 |
0 | 1 | 18.00 |
1 | 0 | 15.33 |
1 | 1 | 23.83 |
Dependent variable: | |
outcome Y
|
|
treated D
|
-2.92 |
(8.02) | |
time T
|
-5.58 |
(8.02) | |
I(treated D * time T )
|
11.42 |
(11.35) | |
Constant | 20.92*** |
(5.67) | |
Observations | 12 |
R2 | 0.14 |
Adjusted R2 | -0.19 |
Residual Std. Error | 9.83 (df = 8) |
F Statistic | 0.42 (df = 3; 8) |
Note: | p<0.1; p<0.05; p<0.01 |
Equation: \(\text{outcome Y}_{i} =\) \(\beta_{1} + \beta_{2}\text{time T}_{i} + \beta_{3}\text{treated D}_{i} +\) \(\beta_{4}\text{time T}_{i}\times \text{treated D}_{i}+ \epsilon_{i}\)
We can estimate the model for the data above as follows (see left):
name | D | Y (T = 0) | Y (T = 1) | Y_diff |
---|---|---|---|---|
Restaurant_1 | 1 | 15.00 | 27.0 | 12.00 |
Restaurant_2 | 0 | 40.50 | 24.0 | -16.50 |
Restaurant_3 | 1 | 24.00 | 23.0 | -1.00 |
Restaurant_4 | 0 | 13.75 | 11.5 | -2.25 |
Restaurant_5 | 1 | 15.00 | 21.5 | 6.50 |
Restaurant_6 | 0 | 8.50 | 10.5 | 2.00 |
lm(Y ~ D + T + D*T + X, data= data_long)
lm(Y_diff ~ D + X, data= data_long)
X
are covariates that we can add to make the parallel trends assumption more realisticClassic and first use: Evaluation of scholarship programs (Thistlethwaite 1960)
All units receive a score (e.g., a grade), and a treatment (e.g., a scholarship) is assigned to those units whose score is above a known cutoff and withheld from those units whose score is below the cutoff
Q: How have you been selected for your BA program?
Features of (all) RDDs
\(X_{i}\): Score with \(c\) as a known cutoff
\(Z_{i}\) = Assignment variable; \(1 \text{ if } X_{i} \geq c, 0\text{ otherwise}\)
\(D_{i}\): Treatment actually received
Q: Can you give an example of \(X_{i}\), \(Z_{i}\) and \(D_{i}\) for a concrete person, e.g., thinking of access to a Bachelor program?
\[\mathbb{E}[Y_{i}|X_{i}] = \begin{cases} \mathbb{E}[Y_{i}(0)|X_{i}] & \quad \text{if } X_{i} < c\\ \mathbb{E}[Y_{i}(1)|X_{i}] & \quad \text{if } X_{i} \geq c \end{cases}\]
Fund. problem of causal inference: Only observe the outcome under control, \(Y_{i}(0)\), for those units whose score is below the cutoff \(c\), and the outcome under treatment, \(Y_{i}(1)\), for those units whose score is above the cutoff \(c\)
Fig. 3 plots the average potential outcomes given the score, \(E[Y_{i}(1)|X_{i} = x]\) and \(E[Y_{i}(0)|X_{i} = x]\), against the score
We can estimate the regression function \(E[Y_{i}(1)|X_{i}]\) for values of the score to the right of the cutoff because we observe \(Y_{i}(1)\), for every \(i\) when \(X \geq c\) (solid red line)
Sharp RD treatment effect: \(\tau_{SRD} = \mathbb{E}[Y_{i}(1) - Y_{i}(0)|X_{i} = c]\)
Classic and first use: Evaluation of scholarship programs (Thistlethwaite 1960)
All units receive a score (e.g., a grade), and a treatment (e.g., a scholarship) is assigned to those units whose score is above a known cutoff and withheld from those units whose score is below the cutoff
Q: How have you been selected for your BA program?
Features of (all) RDDs
\(X_{i}\): Score with \(c\) as a known cutoff
\(Z_{i}\) = Assignment variable; \(1 \text{ if } X_{i} \geq c, 0\text{ otherwise}\)
\(D_{i}\): Treatment actually received
Q: Can you give an example of \(X_{i}\), \(Z_{i}\) and \(D_{i}\) for a concrete person, e.g., thinking of access to a Bachelor program?
\[\mathbb{E}[Y_{i}|X_{i}] = \begin{cases} \mathbb{E}[Y_{i}(0)|X_{i}] & \quad \text{if } X_{i} < c\\ \mathbb{E}[Y_{i}(1)|X_{i}] & \quad \text{if } X_{i} \geq c \end{cases}\]
Fund. problem of causal inference: Only observe the outcome under control, \(Y_{i}(0)\), for those units whose score is below the cutoff \(c\), and the outcome under treatment, \(Y_{i}(1)\), for those units whose score is above the cutoff \(c\)
Fig. 3 plots the average potential outcomes given the score, \(E[Y_{i}(1)|X_{i} = x]\) and \(E[Y_{i}(0)|X_{i} = x]\), against the score
We can estimate the regression function \(E[Y_{i}(1)|X_{i}]\) for values of the score to the right of the cutoff because we observe \(Y_{i}(1)\), for every \(i\) when \(X \geq c\) (solid red line)
Sharp RD treatment effect: \(\tau_{SRD} = \mathbb{E}[Y_{i}(1) - Y_{i}(0)|X_{i} = c]\)
Two approaches to estimation: randomization-based vs. continuity-based approach (we focus on the latter!)
Assumption of comparability formalized by Hahn et al. (2001) using continuity assumptions
Continuity: As score \(x\) gets closer to cutoff \(c\), the average potential outcome function \(\mathbb{E}[Y_{i}(0)|X_{i} = x]\) gets closer to its value at the cutoff \(\mathbb{E}[Y_{i}(0)|X_{i} = c]\) (same for \(\mathbb{E}[Y_{i}(1)|X_{i} = x]\))
In contrast, randomization-based approach explicitly assumes that RDD induces randomized experiment in window near the cutoff (local randomization assumption)
Choose a polynomial order \(p\) and a kernel function \(K(\cdot)\).
Choose a bandwidth \(h\) around \(c\) (see Fig 14, Cattaneo et al. 2019, 46).
For observations above the cutoff, fit a weighted least squares regression of the outcome \(Y_{i}\) on a constant and \((X_{i}−c),(X_{i}−c)^{2},...,(X_{i}−c)^{p}\), where \(p\) is the chosen polynomial order, with weight \(K(\frac{X_{i}−c}{h})\) for each observation. The estimated intercept from this local weighted regression, \(\hat{\mu}_{+}\), is an estimate of the point \(\mu_{+}=\mathbb{E}[Y_{i}(1)|X_{i}=c]\).
Calculate the Sharp RD point estimate: \(\hat{\tau}_{SRD} = \hat{\mu}_{+} − \hat{\mu}_{-}\)
Continuity (and local randomization) assumptions inherently untestable but “empirical implications” testable
General problem: Researcher has no control over assignment
Qualitative tests of assumptions: Explore how manipulable score/assignment are, e.g., institutional appeal possibility (get scholarship despite low score) or administrative process of score assignment
Quantitative tests of assumptions
“If we want to estimate the ATE of social class (D) on educational attainment (Y) and assume D –> educational aspiration (X) –> Y, controlling for aspirations induces ########## bias”
“Education (D) –> SES (Y). We can have a ########## bias if we forget to include parents’ SES (X) because it affects both the education (D) and the SES (Y) of their children.”
“An example of ########## bias would be to control the moisture content of a plant (X) after watering (D), if we are interested in the causal effect of watering plants on growth (Y).”
If interest is causal mediators (i.e., which post-treatment variables matter) → causal mediation analysis
Q: Difference-in-differences design/data: How often do we observe (or measure) outcome and treatment?
Panel data
Allows us to focus on changes between time points
Commonly used estimators
Name | victim.2006 | victim.2007 | victim.2008 | trust.2006 | trust.2007 | trust.2008 |
---|---|---|---|---|---|---|
Brittany | 0 | 0 | 1 | 4 | 4 | 2 |
Ethan | 1 | 1 | 0 | 5 | 6 | 4 |
Kyle | 0 | 0 | 0 | 0 | 7 | 5 |
Jacob | 0 | 1 | 1 | 5 | 3 | 6 |
Jessica | 0 | 0 | 0 | 7 | 9 | 4 |
trust.06.07 = trust.2007 - trust.2006
Name | trust.06.07 | trust.07.08 | victim.06.07 | victim.07.08 |
---|---|---|---|---|
Brittany | 0 | -2 | 0 | 1 |
Ethan | 1 | -2 | 0 | -1 |
Kyle | 7 | -2 | 0 | 0 |
Jacob | -2 | 3 | 1 | 0 |
Jessica | 2 | -5 | 0 | 0 |
Name | trust.2006.dem | trust.2007.dem | trust.2008.dem | victim.2006.dem | victim.2007.dem | victim.2008.dem |
---|---|---|---|---|---|---|
Brittany | 0.67 | 0.67 | -1.33 | -0.33 | -0.33 | 0.67 |
Ethan | 0.00 | 1.00 | -1.00 | 0.33 | 0.33 | -0.67 |
Kyle | -4.00 | 3.00 | 1.00 | 0.00 | 0.00 | 0.00 |
Jacob | 0.33 | -1.67 | 1.33 | -0.67 | 0.33 | 0.33 |
Jessica | 0.33 | 2.33 | -2.67 | 0.00 | 0.00 | 0.00 |
Name | victim.2006 | victim.2007 | victim.2008 | trust.2006 | trust.2007 | trust.2008 |
---|---|---|---|---|---|---|
Brittany | 0 | 0 | 1 | 4 | 4 | 2 |
Ethan | 1 | 1 | 0 | 5 | 6 | 4 |
Kyle | 0 | 0 | 0 | 0 | 7 | 5 |
Jacob | 0 | 1 | 1 | 5 | 3 | 6 |
Jessica | 0 | 0 | 0 | 7 | 9 | 4 |
Name | trust.06.07 | trust.07.08 | victim.06.07 | victim.07.08 |
---|---|---|---|---|
Brittany | 0 | -2 | 0 | 1 |
Ethan | 1 | -2 | 0 | -1 |
Kyle | 7 | -2 | 0 | 0 |
Jacob | -2 | 3 | 1 | 0 |
Jessica | 2 | -5 | 0 | 0 |
Name | trust.2006.dem | trust.2007.dem | trust.2008.dem | victim.2006.dem | victim.2007.dem | victim.2008.dem |
---|---|---|---|---|---|---|
Brittany | 0.67 | 0.67 | -1.33 | -0.33 | -0.33 | 0.67 |
Ethan | 0.00 | 1.00 | -1.00 | 0.33 | 0.33 | -0.67 |
Kyle | -4.00 | 3.00 | 1.00 | 0.00 | 0.00 | 0.00 |
Jacob | 0.33 | -1.67 | 1.33 | -0.67 | 0.33 | 0.33 |
Jessica | 0.33 | 2.33 | -2.67 | 0.00 | 0.00 | 0.00 |
Advantage: Stable, i.e., time invariant (unobserved & observed) confounders drop out
Assumptions (standard): Selection only on observable time-variant covariates/confounders (Q: ?)
Focus on within-unit variation may dramatically change outcome variable and variation we are looking at: explore variation in Y after transformation! (Mummolo and Peterson 2018)
FE and FD should be equivalent for \(T = 2\) (e.g., see here)
BUT causal inference with panel data + within-comparisons is an ongoing research field (Imai & Kim 2019)
FD and FE regression models… (Imai et al. 2020, 1, Imai and Kim, 2019, 2020)
Reflects my personal experience (Bauer 2015, 2018) → check out new methods!
Imai and Kim (2019) develop methods for within estimation.. unit fixed effects (wfe: Weighted Linear Fixed Effects Estimators for Causal Inference) [partly solves problems]
Imai et al. (2020) also develop matching methods for TSCS data and between-unit comparisons → PanelMatch
Name | victim.2006 | victim.2007 | victim.2008 | trust.2006 | trust.2007 | trust.2008 |
---|---|---|---|---|---|---|
Brittany | 0 | 0 | 1 | 4 | 4 | 2 |
Ethan | 1 | 1 | 0 | 5 | 6 | 4 |
Kyle | 0 | 0 | 0 | 0 | 7 | 5 |
Jacob | 0 | 1 | 1 | 5 | 3 | 6 |
Jessica | 0 | 0 | 0 | 7 | 9 | 4 |
Specifying causal quantity of interest requires specifying leads and lags
Leads: Choose the number of leads \(F\)
Lags: Specify how many previous time periods \(L\) one wants to adjust (match) for
After selecting \(F\) and \(L\) we can specify quantity of interest
Average treatment effect of policy (= treatment) change among the treated (ATT) (Imai et al. 2020, 11)
\(\delta(F,L)=\) \(E\{Y_{i,t+F}(X_{it}=1,X_{i,t−1} = 0,\{X_{i,t−\ell}\}^{L}_{\ell=2})\) \(− Y_{i,t+F}(X_{it} = 0, X_{i,t−1}= 0,\{X_{i,t−\ell}\}^{L}_{\ell=2})|X_{it}= 1,X_{i,t−1}= 0\}\)
The above causal quantity allows for a future treatment reversal, i.e., treatment status could go back to the control condition before the outcome is measured, i.e., \(X_{i,t+\ell}= 0\) for some with \(1 \leq \ell \leq F\) (other definition possible)
Identification assumptions
How should researchers choose the values of L and F?
Large value of \(L\) improves the credibility of the limited carryover effect assumption
\(F\) should be motivated by interest in short-term or long-term causal effects:
In Step 1 we adjust for the treatment history.
To make the parallel trends assumption credible we should also adjust for other confounders such as past outcomes and (possibly time-varying) covariates
We (can) apply various matching methods (see Imai et al. 2020, 14; cf. Lecture 5 & 6)
print()
, plot()
, and summary()
methods for matched.set objects and get_covariate_balance()
Assume estimation of ATT of stable policy treatment: Which units would be suitable control for observations for observation \((i,t) = (3,4)\) (treated at \(t = 4\)) with \(\delta(F = 1,L = 2)\)?
Unit 1 was not in control at \(t = 4\) ❌
Unit 2, 4 and 5 were all in control at \(t = 4\) ✅
Unit 2, 4 and 5 all share the same treatment history for \(L = 2\) ✅
Unit 2 changes treatment status from \(t = 4\) to \(t = 5\) ❌
Unit 4,5 remain in control until \(t = 5\) ✅
Q: Which units would be suitable control for the unit \(i = 3\) treated at \(t = 4\) with \(\delta(F = 1,L = 3)\)?
DisplayTreatment()
: Visualize treatment distribution across units, across time
?DisplayTreatment
for argumentsdense.plot = TRUE/FALSE
: Allows you to remove lines between tiles in case the number of units and/or time periods is very high (make it more redable)PanelMatch()
: Create refined/weighted sets of treated and control units using different matching/weighting strategies
?PanelMatch
for argumentslead =
: Specify the lead window, i.e., for how long “after” treatment you would like to estimate effects; 0 (default) corresponds to contemporaneous treatment effectlag =
: Choose how many treatment history periods you want to match onrefinement.method =
: Specifying the matching or weighting method to be used for refining the matched sets, i.e., in addition to matching on the treatment history you may want to match on the history of other variables (covariates and outcome)exact.match.variables =
: Specify variables for exact matchingcovs.formula =
: Provide formula object indicating which variables should be used for matching and refinementforbid.treatment.reversal
: Whether or not it is permissible for treatment to reverse in the specified lead windowget_covariate_balance()
: Calculate covariate balance for user specified covariates across matched sets (see also balance_scatter()
)PanelEstimate()
: Estimate causal quantity of interest based on the matched sets (summarize results with summary()
and plot()
)First-difference/fixed-effects (within-comparisons): parametric assumptions, few diagnostic tools, difficult to understand how counterfactual outcomes are estimated (see newer methods!)
PanelMatch (between-comparisons): Combination of matching and difference-in-differences estimator (design-based approach)
PanelMatch proceeds in three steps:
PanelMatch compares between units (not within!)
PanelMatch is implemented in R package PanelMatch
Visualizing treatment variation across time is always helpful!
ivreg()
in RRequires strong assumptions (can be partially validated with data)
As-if random (independence) IV assumption
Additional issues with use of multiple regression models
IVs help confront the problem of confounding
IV logic is helpful even if you don’t find a good instrument
Experimental studies: Assignment can be seen as an IV of actually taking treatment/control
Observational studies: Often very hard to find good, credible instrumental variables
15 Minutes for the evaluation!
Questions and answer document: See this file!
Retake: IV
Regression discontinuity design (based on Cattaneo et al. 2019)
1960s: Health Insurance Plan (HIP) clinical trial studied effects of screening \(D\) for breast cancer \(Y\) (Dunning 2009)
Instrument: Invitation for screening \(Z\) issued at random
Table 1 (next slide) shows death rates from breast cancer 5 years after trial
Homework: Read through Dunning’s (2009) example and explain the different comparisons we can make and the logic of IV analysis using Table 1 below.
(23/20200)*1000 = 1.14
; (47/20200)*1000 = 2.33
1960s: Health Insurance Plan (HIP) clinical trial studied effects of screening \(D\) for breast cancer \(Y\) (Dunning 2009)
Instrument: Invitation for screening \(Z\) issued at random
Table 1 (next slide) shows death rates from breast cancer 5 years after trial
Homework: Read through Dunning’s (2009) example and explain the different comparisons we can make and the logic of IV analysis using Table 1 below.
(23/20200)*1000 = 1.14
; (47/20200)*1000 = 2.33
Studies in sociology and political science often make a statement about statistical significance but rarely about actual effect size (but it’s getting better!).
Q: What is the difference between statistical significance and effect size? What is the source of uncertainty in our effect estimates?
‘Sing me a song with social significance’: The (mis) use of statistical significance testing in European sociological research (2017)
–
Yes, income!
Examples (in my own research):
“Vague” scales
data$trust.2006 <- (data$trust.2006 - mean(data$trust.2006))/sd(data$trust.2006)
Frequentist paradigm: In theory, we draw repeated samples \((S_{1}, S_{2}, S_{3}, ...)\) from population \(P\)
In each sample we calculate the quantity of interest resulting in a vector of estimates: \((\hat{\theta}_{S1}, \hat{\theta}_{S2}, \hat{\theta}_{S3}, ...)\)
Central limit theorem (CLT): Roughly states that sampling distribution (given certain conditions) approaches a normal distribution
In reality, we normally only observe one sample but we can still assume (CLT!) that the sampling distribution would look like a normal distribution (or a t-distribution)
Classic standard errors (SEs) [and also confidence intervals]: are based on this frequentist logic and are designed to capture sampling variation
Standard error (SE): the standard deviation (SD) of the sampling distribution of a statistic, e.g., of our causal effect of interest
Sample mean as ‘simple’ example: Mean \(\bar{x}\) is quantity of interest and the samples are uncorrelated
If sampling distribution is normally distributed, sample mean, standard error, and quantiles of the normal distribution can be used to calculate confidence intervals for the true population mean
Same logic applies to other quantities of interest, e.g., difference in means (causal effect etc.), estimates from linear regression model etc.
Sampling-based uncertainty: stems from the fact that we only observe a subset of the population (Abadie et al. 2020)
Table I: finite population consisting of \(n\) units with each unit characterized by a pair of variables \(Y_{i}\) and \(Z_{i}\), with inclusion of unit \(i\) in a sample encoded by the binary variable \(R_{i} \in \{0,1\}\)
Design-based uncertainty: “arises when the parameter of interest is defined in terms of the unobserved outcomes that some units would attain under a certain intervention” (Abadie et al. 2020, 266)
Table II: scenario in which we observe, for each unit in the population, the value of one of two potential outcome variables, either \(Y^{∗}_{i}(1)\) or \(Y^{∗}_{i}(0)\); \(X_{i}\in\{0,1\}\) indicates which potential outcome we observe
We face a missing data process that may combine features of these two examples
Articulating both exact nature of the estimand of interest and the source of uncertainty that makes an estimator stochastic is a crucial first step to valid inference (Abadie et al. 2020, 266)
Useful to distinguish…
When interest is descriptive (e.g., mean age in population, income difference between men and women) we are only concerned with sampling-based uncertainty
We can now also return to the concepts of internal vs. external validity
Abadie et al. (2020, 271)’s distinction between sampling-based and design-based uncertainty suggests a definition of these concepts
“Internal validity bears on the question of whether \(E[\theta|\mathbf{R},N_{1},N_{0}]\) is equal to \(\theta^{causal,sample}\). This relies on random assignment of the treatment. Whether or not the sampling is random is irrelevant for this question because \(\theta^{causal,sample}\) conditions on which units were sampled.” (Abadie et al. 2020, 271)
“External validity bears on the question of whether \(E[\theta^{causal,sample}|N_{1},N_{0}]\) equal to \(\theta^{causal}\) [population!]. This relies on the random sampling assumption and does not require that the assignment is random” (Abadie et al. 2020, 271)
“However, for \(\hat{\theta}\) to be a good estimator of \(\theta^{causal}\), which is often the most interesting estimand, we need both internal and external validity, and thus both random assignment and random sampling.” (Abadie et al. 2020, 271)
Whether variable \(A\) has a causal effect on variable \(B\) vs. explain how causal relationship between \(A\) and \(B\) arises
Causal mechanism (CM): “a process in which a causal variable of interest, i.e., a treatment variable, influences an outcome” (Imai et al. 2011, 765)
Q: Can you think of any examples of mediators \(M\) that mediate a causal relationship between \(D\) and \(Y\)?
Contributions (Imai et al. 2011, 766): “commonly used statistical methods [cf. Baron & Kenny 1986] rely upon untestable assumptions and are often inappropriate even under those assumptions” (Imai et al. 2011, 764)
Example of Brader et al. (2008)
\(M_{i}(t)\): potential value of mediator for unit \(i\) under the treatment status \(T_{i}=t\) (Q: \(M_{i}(1)=?\) )
\(Y_{i}(t,m)\): potential outcome that would result if the treatment and mediating variables equal \(t\) and \(m\)
Difference to usual situation
Example for sequential ignorability (1) and (2)
If sequential ignorability holds we can identify ACME and ADE [+ Assumption 2: consistency assumption, see discussion in Imai et al. (2011, 782)]
Imai et al. (2011) also provide R code to conduct causal mediation analysis
Appendix: Only relevant when discussed during the lecture
Slides: https://paulcbauer.github.io/research_design_2022/lecture_14.html
Causal mediation analysis (rest)
Review of material
Exam
Questions!
Causal Mediation Analysis
Example for sequential ignorability (1) and (2)
If sequential ignorability holds we can identify ACME and ADE [+ Assumption 2: consistency assumption, see discussion in Imai et al. (2011, 782)]
Imai et al. (2011) also provide R code to conduct causal mediation analysis
Example for sequential ignorability (1) and (2)
If sequential ignorability holds we can identify ACME and ADE [+ Assumption 2: consistency assumption, see discussion in Imai et al. (2011, 782)]
Imai et al. (2011) also provide R code to conduct causal mediation analysis
Key component of causal analysis (Imbens and Rubin 2015)
“process that determines which units receive which treatments, hence which potential outcomes are realized and thus can be observed [and which are missing]” (Imbens and Rubin 2015, 31)
“describes, as a function of all covariates and of all potential outcomes, the probability of any vector of assignments” (Imbens and Rubin 2015, 31)
\(Pr(\mathbf{D}|\mathbf{X},\mathbf{Y}(0),\mathbf{Y}(1))\): Function that assigns probabilities to all possible values of vector of assignments \(\mathbf{D}\) ( \(\mathbf{D}\) = \(\mathbf{W}\) in Imbens and Rubin (2015))
\(\neq\) unit-level assignment probability \(p_{i}(\mathbf{X},\mathbf{Y}(0),\mathbf{Y}(1))\))
Imbens and Rubin (2015, 31f) define assignment mechanism and provide a systematic outline of the underlying causal assumptions
In part, they introduce new terms (called restrictions: Individualist assignment, probabilistic assignment, unconfounded assignment)
Since, these terms reflect the assumptions we discussed so far (independence, SUTVA etc.) we will stick to the latter terms
In experiments we randomize \(\rightarrow\) some ways are better than others
Imbens and Rubin (2015, 47f) provide a very insightful taxonomy of classical randomized experiments
Bernoulli experiment tosses a fair coin for each unit
Often used in online survey experiments (tossing “digital coin” as people enter survey)
Disadvantage
Q: Below you have all 16 possible assignment vectors for a Bernoulli experiment with 4 persons (N = 4). Please compare the different vectors. Which one(s) would you prefer? What is the probability of one of them occuring?
unit | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9 | D10 | D11 | D12 | D13 | D14 | D15 | D16 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Simon | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
Julia | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
Claudia | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
Diego | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
\[Pr(\mathbf{D}|\mathbf{X},\mathbf{Y}(0),\mathbf{Y}(1)) = \begin{cases} \left( \begin{array}{c} N \\ N_{t} \end{array} \right)^{-1} & \quad \sum_{i=1}^{N} D_{i}=N_{t},\\ 0 & \quad otherwise \end{cases}\]
\[Pr(\mathbf{D}|\mathbf{X},\mathbf{Y}(0),\mathbf{Y}(1))\] =
\[\begin{cases}\text{Assignment vectors with} \sum_{i=1}^{N} D_{i}=N_{t} \text{occur with probability of} \left(\begin{array}{c}N \\N_{t}\end{array}\right),\\ \text{All other assignment vectors occur with probability} \end{cases}\]
unit | D1 | D2 | D3 | D4 | D5 | D6 |
---|---|---|---|---|---|---|
Simon | 1 | 1 | 0 | 1 | 0 | 0 |
Julia | 1 | 0 | 1 | 0 | 1 | 0 |
Claudia | 0 | 1 | 1 | 0 | 0 | 1 |
Diego | 0 | 0 | 0 | 1 | 1 | 1 |
\(\left(\begin{array}{c}N \\ N_{t}\end{array}\right)^{-1}\) = \(\left(\frac{N!}{N_{t}!\,(N-N_{t})!}\right)^{-1}\) = \(\left(\begin{array}{c}4\\ 2\end{array}\right)^{-1}\) = \(\left(\frac{4!}{2!\,(4-2)!}\right)^{-1}\) = \(\left(\frac{4\cdot3\cdot2\cdot1}{2\cdot1 \times2\cdot1!}\right)^{-1}\) = \(\left(\frac{24}{4}\right)^{-1}\) = \(\left(6\right)^{-1}\) = \(\frac{1}{6}\)
Q: How many assignment vectors are there if \(N = 4\) and \(N_{t} = 3\)? What is their probability of occuring?
Population of units in the study is first partitioned into blocks or strata
Within each block, we conduct a completely randomized experiment, with assignments independent across blocks
Example: 2 blocks, e.g., males and females, where independent completely randomized (block) experiments are conducted for each group/block
Assignment mechanism (see Imbens and Rubin 2015, 52): Same formula as completely randomized experiments but replacing \(N\) and \(N_{t}\) with \(N(m)\) and \(N_{t}(m)\) for males and \(N(f)\) and \(N_{t}(f)\) for females
Q: What are the main differences between the four types of randomization which we discussed - bernoulli trials, completely randomized experiments; stratified randomized experiments and paired randomized experiments?
Q: Imagine that you conducted an experiment, in which you successfully, randomly assigned participants to treatment and control. What things can still go wrong in a classical randomized experiment?
Branch of statistics which assumes that sample data comes from a population that follows a probability distribution based on a fixed set of parameters (Wikipedia)
Since parametric model relies on fixed parameter set, it assumes more about a given population than non-parametric methods
Non-parametric model: Parameter set is not fixed, can increase/decrease if new relevant information is collected
Various helpful ressources
“Probability” has been defined in different ways
Frequentist view (classical approach)
Bayesian view
Probability distributions (Everitt 2010, 338) (Examples)
Also called theoretical distributions (as opposed to empirical distributions of data) and often invented by mathematicians (Q: Most famous one?)
Bernoulli distribution (simplest discrete distribution): e.g., flipping a fair coin
as.numeric(simDAG::rbernoulli(5, 0.5))
= 1, 0, 0, 0, 1Seminar: Kausalanalyse - Eine Einführung
Generalizability of a study’s empirical findings to new environments, settings or populations (Pearl and Bareinboim 2014)
Outcome measurement might become a treatment in itself. Make sure that everyone is equally exposed, i.e., that exposure to outcome measurement is constant across units.
Individualistic assignment and SUTVA are both critical assumptions in causal inference but serve different purposes. Individualistic assignment ensures that the treatment assignment for each unit is based solely on its own characteristics, while SUTVA ensures that the potential outcomes for each unit are not influenced by the treatment assignments of other units and that observed outcomes match potential outcomes for the received treatment. Both assumptions help in simplifying the analysis and interpretation of causal effects, but they operate in distinct areas of the causal inference framework. Dependence on Pre-Treatment Variables: Individualistic Assignment focuses on the independence of the treatment assignment probability from the pre-treatment variables of other units. It ensures that the assignment mechanism for each unit is based solely on its own pre-treatment characteristics. SUTVA does not explicitly address the assignment mechanism. Instead, it focuses on the independence of the potential outcomes from the treatment assignments of other units (no interference) and the consistency of observed and potential outcomes. Scope of assumptions: Individualistic Assignment pertains specifically to the mechanism by which treatments are assigned to units, ensuring that this mechanism is individualistic. SUTVA pertains to the nature of potential outcomes and their relationship with treatment assignments, ensuring no interference and consistency. Implications for Analysis: Individualistic Assignment ensures that when modeling the probability of treatment assignment, one needs to consider only the pre-treatment variables of the specific unit. SUTVA ensures that one can interpret potential outcomes meaningfully without worrying about spillover effects from other units’ treatments, simplifying causal inference by focusing on each unit independently.
e.g., \(e(x)=40/(60+40)=0.4\)
Small summary: Same propensity score ≠ Same covariate values; Prop. score matching mimics completeley randomised experiment vs. other matching methods that mimic blocked/stratified randomised experiments