Skip to content

Fixes #1501 #1502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions r-package/grf/vignettes/diagnostics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,41 @@ ate.high[["estimate"]] - ate.low[["estimate"]] +
c(-1, 1) * qnorm(0.975) * sqrt(ate.high[["std.err"]]^2 + ate.low[["std.err"]]^2)
```

While this approach may give some qualitative insight into heterogeneity, the grouping is naive, because the doubly robust scores used to determine subgroups are not independent of the scores used to estimate those group ATEs (see Athey and Wager, 2019).

To avoid this, we can use a cross-fitting approach, where the data is split into two folds, and the "high"/"low" groups are determined by the models fit on the other fold, while the ATEs are estimated using the default out of bag predictions using the [average_treatment_effect](https://grf-labs.github.io/grf/reference/average_treatment_effect.html) function.

```{r}
folds <- sample(rep(1:2, length.out = nrow(X)))
idxA <- which(folds == 1)
idxB <- which(folds == 2)

cfA <- causal_forest(X[idxA,], Y[idxA], W[idxA])
cfB <- causal_forest(X[idxB,], Y[idxB], W[idxB])

tau.hatB <- predict(cfA, newdata = X[idxB,])$predictions
high.effectB <- tau.hatB > median(tau.hatB)
tau.hatA <- predict(cfB, newdata = X[idxA,])$predictions
high.effectA <- tau.hatA > median(tau.hatA)

ate.highA <- average_treatment_effect(cfA, subset = high.effectA)
ate.lowA <- average_treatment_effect(cfA, subset = !high.effectA)
ate.highB <- average_treatment_effect(cfB, subset = high.effectB)
ate.lowB <- average_treatment_effect(cfB, subset = !high.effectB)

```

Which gives us 95% confidence intervals for the difference in ATE for each fold using the same approach as above.

```{r}
ate.highA[["estimate"]] - ate.lowA[["estimate"]] +
c(-1, 1) * qnorm(0.975) * sqrt(ate.highA[["std.err"]]^2 + ate.lowA[["std.err"]]^2)

ate.highB[["estimate"]] - ate.lowB[["estimate"]] +
c(-1, 1) * qnorm(0.975) * sqrt(ate.highB[["std.err"]]^2 + ate.lowB[["std.err"]]^2)

```

For another way to assess heterogeneity, see the function [rank_average_treatment_effect](https://grf-labs.github.io/grf/reference/rank_average_treatment_effect.html) and the accompanying [vignette](https://grf-labs.github.io/grf/articles/rate.html).

Athey et al. (2017) suggests a bias measure to gauge how much work the propensity and outcome models have to do to get an unbiased estimate, relative to looking at a simple difference-in-means: $bias(x) = (e(x) - p) \times (p(\mu(0, x) - \mu_0) + (1 - p) (\mu(1, x) - \mu_1)$.
Expand Down