Description
This issue is about a method proposed in the tutorial, Evaluating a causal forest fit.
One heuristic method to detect heterogeneity described in the tutorial over-rejects under the null, I believe due to the winner's curse. (The same model is used to determine subgroups and implement estimation). Cross-fitting resolves this problem, and I have proposed a small modification to the tutorial that provides this suggestion in a pull-request (#1502 ).
Description of the bug
tau.hat <- predict(cf)$predictions
high.effect <- tau.hat > median(tau.hat)
ate.high <- average_treatment_effect(cf, subset = high.effect)
ate.low <- average_treatment_effect(cf, subset = !high.effect)
ate.high[["estimate"]] - ate.low[["estimate"]] +
c(-1, 1) * qnorm(0.975) * sqrt(ate.high[["std.err"]]^2 + ate.low[["std.err"]]^2)
#> [1] 0.6591796 1.0443646
(this is just the method as referenced in the tutorial, the bug is the average rejection rate)
Steps to reproduce
Even when the sharp null is true (i.e., treatment is not associated with outcomes), this method rejects at higher than nominal rates.
I used the code from the tutorial, and created a gist showing over-rejection here:
https://gist.github.com/mollyow/c1690ac8fd4a8d333d61cdefeeef82a9
A longer write-up is available here: https://alexandercoppock.com/testing_with_grf.pdf
GRF version
2.4.0 (but it's just about the tutorial, not the underlying code).