7 Policy Learning I - Binary Treatment
Before you jump into this chapter, we recommend that you’ve read:
- Introduction to Machine Learning
- Introduction to Causal Inference
- ATE I - Binary treatment
- HTE I - Binary treatment
- Policy evaluation I - Binary treatment
library(grf)
library(policytree)
7.1 Learning a policy
In the previous section you saw how to evaluate a given policy. Now, let’s see how to learn a policy from data.
set.seed(1)
<- 1000
n <- 4
p <- matrix(runif(n*p), n, p)
X <- 1/(1+exp(-3*(X[,1]-.5)-3*(X[,2]-.5)))
e <- rbinom(n, prob = e, size = 1)
W <- .5*X[,2] + (X[,1] - .5)*W + .01 * rnorm(n)
Y
<- (Y - min(Y))/(max(Y)-min(Y)) # just for plotting y.norm
7.2 Learning policies
7.2.1 Using policytree
The policies are easier to describe to a stakeholder.
Use policytree
. See this short policytree tutorial for other similar examples.
<- 1:(n/2)
train <- multi_causal_forest(X = X[train,], Y = Y[train], W = W[train])
multi.forest <- double_robust_scores(multi.forest)
gamma.hat.train <- policy_tree(X[train,], gamma.hat.train) policy
plot(policy, leaf.labels = c("control", "treatment"))
To evaluate the policy we simply use the code we learned in the previous section, but remembering that we can only use the test set for evaluation. For an experimental setting:
# Only valid for experimental setting!
# predicting pi(X[i]) for each value in the test set
# policytree labels each treatment as 1,2,3... Here, we subtract one so that zero represents control, as we're used to in the case of a binary treatment.
<- predict(policy, X[-train,]) - 1
w.opt <- w.opt == 1
A
# Copied and pasted from Policy Evaluation section
<- mean(Y[A & (W==1)]) * mean(A) + mean(Y[!A & (W==0)]) * mean(!A)
value.estimate <- sqrt(var(Y[A & (W==1)]) / sum(A & (W==1)) * mean(A)^2 + var(Y[!A & (W==0)]) / sum(!A & W==0) * mean(!A)^2)
value.stderr print(paste("Value estimate:", value.estimate, "Std. Error:", value.stderr))
## [1] "Value estimate: 0.351117998129077 Std. Error: 0.00841548479712636"
Note Very small policy tree leaves make it hard to reliably evaluate policy values, in particular when the treatment is categorical with many levels. You can avoid small tree leaves increasing the min.node.size
argument in policy_tree
.
7.2.2 Using grf
Given any estimate of the treatment effect \(\widehat{\tau}(.)\), we can define the policy that treats whenever the treatment effect is positive, or above some threshold \(C\) indicating the cost of treatment, \[\begin{align} \tag{7.1} \pi(x) = \mathbb{I}\{ \widehat{\tau}(x) \geq C \}. \end{align}\]
In the example below, we estimate the treatment effect via grf
, but any other non-parametric method (including the glmnet+splines examples we saw in previous sections) would be valid.
<- 1:(n/2)
train <- causal_forest(X[train,], Y[train], W[train], num.trees=200) # increase num.trees in real application
cf <- predict(cf, X[-train,])$predictions
tau.hat.test <- 1*(tau.hat.test > 0) w.opt
Again, to evaluate the policy value in an experimental setting:
# Only valid in experimental setting!
<- w.opt
A
# Copied and pasted from Policy Evaluation section
<- mean(Y[A & (W==1)]) * mean(A) + mean(Y[!A & (W==0)]) * mean(!A)
value.estimate <- sqrt(var(Y[A & (W==1)]) / sum(A & (W==1)) * mean(A)^2 + var(Y[!A & (W==0)]) / sum(!A & W==0) * mean(!A)^2)
value.stderr print(paste("Value estimate:", value.estimate, "Std. Error:", value.stderr))
## [1] "Value estimate: 0.347418681830061 Std. Error: 0.00833062316678724"