4.22 Assumptions: Independence Assumption (IA)

E[Y_i|D_i = 1] \(\stackrel{1}{=}\) E[Y_i0 + D_i(Y_i1 - Y_i0)| D_i = 1] \(\stackrel{2}{=}\) E[Y_i1|D_i = 1] \(\stackrel{3}{=}\) E[Y_i1]²³
Step \(\stackrel{1}{=}\), \(\stackrel{2}{=}\) and \(\stackrel{3}{=}\): Conditional expectations of the observed outcomes conditional on treatment assignment = expectation of the unobserved potential outcomes²⁴
Same logic for E[Y_i|D_i = 0] = E[Y_i0]
It follows: ATE = E[Y_i1 - Y_i0] = E[Y_i1] - E[Y_i0] = E[Y_i|D_i = 1] - E[Y_i|D_i = 0]

Unit	D_i	Y_i	Y_i1	Y_i0
Simone	1	0	0	?
Julia	1	1	1	?
Paul	0	1	?	1
Trump	0	0	?	0
Fabrizio	0	0	?	0
Diego	0	0	?	0

The IA allows us to equate the expected value of the whole column E[Yi0] (blue and orange values) with the orange values, i.e. E[Yi|Di = 0] (same for column E[Yi1]).

When is the independence assumption justified? (it depends… next slide)

Step \(\stackrel{1}{=}\): Replace column 2 with difference of column 3 and 4; Step \(\stackrel{2}{=}\): Y_i0 cancels out and we end up with E[Y_i1|D_i = 1]; Step \(\stackrel{3}{=}\): Because Y_i1 is independent of D_i (independence assumption) we can replace E[Y_i1|D_i = 1] with E[Y_i1].↩
Normally, to estimate the ATE we calculate the expected value of the differences between column Y_i1 and column Y_i0. In other words, we would have to observe both treatment and control units in their counterfactual states (e.g. observe what the value of control units would be if they had been treated). However, for the units that were assigned to control (D_i = 0) we do not observe Y_i1 and the other way round. Starting with the column Y_i1, the independence assumption simply means that the expected value of the whole column Y_i1 (red and green values) can be equated with the expected of the first two rows of the column, namely Y_i1|D_i = 1 (the red values). And that is what we actually observe. Hence, through this assumption there is not need to observe the missing green values any more. The same logic applies to column Y_i0. The IA allows as to equate the expected value of the whole column E[Y_i0] (blue and yellow values) with the yellow values, i.e. E[Y_i|D_i = 0].↩