Skip to content

Incorrect Code in Chapter 20 (and theoretical nitpicking) #402

Open
@aliquod

Description

@aliquod

First of all, thank you for making this very accessible book!

In the section about continuous treatment in chapter 20, you defined

$$Y^*_i := (Y_i- \bar{Y})\dfrac{(T_i - M(T_i))}{(T_i - M(T_i))^2}$$

to be the pseudo-outcome1 and then you threw away the denominator since you are interested in comparing treatment effects, not their absolute values. But doing so does not preserve order2. Instead why don't we just simplify it to be

$$Y^*_i = \dfrac{Y_i- \bar{Y}}{T_i - M(T_i)}?$$

Now onto the actual issue: the code block that came after

$$Y^*_i = (Y_i- \bar{Y})(T_i - M(T_i))$$

is

y_star_cont = (train["price"] - train["price"].mean()
               *train["sales"] - train["sales"].mean())

but this is missing some parentheses, so it actually computes

$$Y^*_i \overset{???}{=} Y_i- (\bar{Y} \times T_i) - M(T_i).$$

Footnotes

  1. The denominator I assume is an estimate of the conditional variance Var(T|X), but for most regression methods this residual is an underestimate.

  2. In the end we will average those values up to estimate the CATE. But unlike the randomized treatment case where every term is scaled by σ² and can be un-scaled without changing order, here each term has a different factor.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions