7 Policy Learning I - Binary Treatment

Before you jump into this chapter, we recommend that you’ve read:

library(grf)
library(policytree)

7.1 Learning a policy

In the previous section you saw how to evaluate a given policy. Now, let’s see how to learn a policy from data.

set.seed(1)
n <- 1000
p <- 4
X <- matrix(runif(n*p), n, p)
e <- 1/(1+exp(-3*(X[,1]-.5)-3*(X[,2]-.5)))
W <- rbinom(n, prob = e, size = 1)
Y <- .5*X[,2] + (X[,1] - .5)*W + .01 * rnorm(n)

y.norm <- (Y - min(Y))/(max(Y)-min(Y)) # just for plotting

7.2 Learning policies

7.2.1 Using policytree

The policies are easier to describe to a stakeholder.

Use policytree. See this short policytree tutorial for other similar examples.

train <- 1:(n/2)
multi.forest <- multi_causal_forest(X = X[train,], Y = Y[train], W = W[train])
gamma.hat.train <- double_robust_scores(multi.forest)
policy <- policy_tree(X[train,], gamma.hat.train)
plot(policy, leaf.labels = c("control", "treatment"))

To evaluate the policy we simply use the code we learned in the previous section, but remembering that we can only use the test set for evaluation. For an experimental setting:

# Only valid for experimental setting!

# predicting pi(X[i]) for each value in the test set
# policytree labels each treatment as 1,2,3... Here, we subtract one so that zero represents control, as we're used to in the case of a binary treatment.
w.opt <- predict(policy, X[-train,]) - 1  
A <- w.opt == 1

# Copied and pasted from Policy Evaluation section
value.estimate <- mean(Y[A & (W==1)]) * mean(A) + mean(Y[!A & (W==0)]) * mean(!A)
value.stderr <- sqrt(var(Y[A & (W==1)]) / sum(A & (W==1)) * mean(A)^2 + var(Y[!A & (W==0)]) / sum(!A & W==0) * mean(!A)^2)
print(paste("Value estimate:", value.estimate, "Std. Error:", value.stderr))
## [1] "Value estimate: 0.351117998129077 Std. Error: 0.00841548479712636"

Note Very small policy tree leaves make it hard to reliably evaluate policy values, in particular when the treatment is categorical with many levels. You can avoid small tree leaves increasing the min.node.size argument in policy_tree.

7.2.2 Using grf

Given any estimate of the treatment effect \(\widehat{\tau}(.)\), we can define the policy that treats whenever the treatment effect is positive, or above some threshold \(C\) indicating the cost of treatment, \[\begin{align} \tag{7.1} \pi(x) = \mathbb{I}\{ \widehat{\tau}(x) \geq C \}. \end{align}\]

In the example below, we estimate the treatment effect via grf, but any other non-parametric method (including the glmnet+splines examples we saw in previous sections) would be valid.

train <- 1:(n/2)
cf <- causal_forest(X[train,], Y[train], W[train], num.trees=200)  # increase num.trees in real application
tau.hat.test <- predict(cf, X[-train,])$predictions
w.opt <- 1*(tau.hat.test > 0)

Again, to evaluate the policy value in an experimental setting:

# Only valid in experimental setting!
A <- w.opt

# Copied and pasted from Policy Evaluation section
value.estimate <- mean(Y[A & (W==1)]) * mean(A) + mean(Y[!A & (W==0)]) * mean(!A)
value.stderr <- sqrt(var(Y[A & (W==1)]) / sum(A & (W==1)) * mean(A)^2 + var(Y[!A & (W==0)]) / sum(!A & W==0) * mean(!A)^2)
print(paste("Value estimate:", value.estimate, "Std. Error:", value.stderr))
## [1] "Value estimate: 0.347418681830061 Std. Error: 0.00833062316678724"

7.2.2.1 Aside: using predictions vs using doubly-robust scores

7.3 Heterogeneity via learned policies