When randomization is clean, every estimator should land on the truth — and here it does
Nagoya University (GSID)
June 11, 2026
Act I
The hard part isn’t sending the money — it is proving it worked.
So we simulate a known truth: the program raises consumption by 12% (\(0.12\) log points). Can the estimators recover it?
| Estimator family | Estimand | Estimate |
|---|---|---|
| Cross-sectional (RA / IPW / DR) | ATE (offer) | 0.113 |
| Difference-in-differences | ATT (offer) | 0.135–0.137 |
| Endogenous treatment (IV) | ATE (receipt) | 0.147 |
| True effect | — | 0.12 |
Twelve specifications, three estimands, one truth at \(0.12\) — and every 95% CI covers it.
Act II
treat randomized within poverty strata (intent-to-treat)D, endogenous: only 85% of offered households took upState the wedge early: random treat is exogenous; actual D is a choice. Most of the deck estimates the offer effect; Act III’s coda returns to receipt.
Standardized mean differences for all baseline covariates. Dashed lines mark the \(\pm 10\%\) rule of thumb; female is the only borderline case at \(\approx 9.3\%\).
| Covariate | Control | Treatment | SMD |
|---|---|---|---|
| Consumption \(y\) | 10.025 | 10.006 | <0.05 |
| Age | 35.34 | 34.93 | <0.05 |
| Education | 11.97 | 12.08 | 0.052 |
| Female-headed | 0.484 | 0.531 | 0.093 |
| Poverty | 0.307 | 0.318 | <0.05 |
\(p = 0.038\) flags female, but the SMD of \(9.3\%\) is the right lens — large \(n\) makes tiny gaps “significant.”
\[\hat\tau_{\text{baseline}} = -0.024 \quad (p = 0.196)\]
Run the doubly-robust estimator on baseline only, before the program exists. A null “effect” is exactly what a clean randomization predicts.
Overidentification test: \(\chi^2(5) = 3.22\), \(p = 0.667\) — no residual imbalance after weighting.
Log-consumption densities for treatment vs. control, before and after AIPW weighting. The weighted curves overlap almost perfectly.
Estimated propensity-score densities for both groups span roughly 0.43–0.55 with heavy overlap and no mass near 0 or 1.
\[E[Y(1) - Y(0)]\]
\[E[Y(1) - Y(0) \mid T = 1]\]
Under randomization with homogeneous effects, ATE \(=\) ATT. DiD will only ever give us the ATT.
Randomized treatment is independent of potential outcomes, so the raw difference in means is already unbiased.
No confounding here, so covariates don’t fix bias — they soak up residual variation and tighten the estimate.
Doubly robust (AIPW / IPWRA) fits both — and is consistent if either one is correct.
\[\hat\tau_{DR}^{ATE} = \frac1N \sum_{i=1}^{N}\Big[\hat\mu_1(X_i) - \hat\mu_0(X_i) + \frac{T_i\,(Y_i - \hat\mu_1)}{\hat p(X_i)} - \frac{(1-T_i)(Y_i - \hat\mu_0)}{1 - \hat p(X_i)}\Big]\]
First two terms: the RA prediction. Last two: IPW residuals that cancel RA’s bias.
Belt and suspenders: if the belt fails, the suspenders hold. Only both wrong is fatal.
keep if post==1
* RA — models the outcome only
teffects ra (y c.age c.edu i.female i.poverty) (treat), ate
* IPW — models the treatment only
teffects ipw (y) (treat c.age c.edu i.female i.poverty), ate
* Doubly robust — models both
teffects ipwra (y c.age c.edu i.female i.poverty) (treat c.age c.edu i.female i.poverty), vce(robust)0.113
ATE of the offer · RA, IPW, and doubly-robust IPWRA agree to three decimals (SE 0.019)
| Method | Models | Estimand | Estimate | 95% CI |
|---|---|---|---|---|
| Simple diff-in-means | none | ATE | 0.116 | [0.078, 0.154] |
| Regression adjustment | outcome | ATE | 0.113 | [0.075, 0.150] |
| Inverse prob. weighting | treatment | ATE | 0.113 | [0.075, 0.150] |
| IPWRA (doubly robust) | both | ATE | 0.113 | [0.075, 0.150] |
| True effect | — | — | 0.12 | — |
Adjusted estimates (\(0.113\)) sit just below the raw \(0.116\) — that gap is the precision gain from controlling for the gender imbalance.
Cross-sectional adjustment can only control for what you observe.
Difference-in-differences compares each household to itself over time, cancelling every time-invariant unobservable — motivation, geography, family culture.
\[\hat\tau_{DiD} = \underbrace{(\bar Y_{T,post} - \bar Y_{T,pre})}_{\text{effect}\,+\,\text{trend}} - \underbrace{(\bar Y_{C,post} - \bar Y_{C,pre})}_{\text{trend only}}\]
Treated change = effect + trend. Control change = trend alone. Subtract — the trend cancels.
Identification rides on parallel trends — plausible here because randomization equalized the groups at baseline.
0.135 sits just above the cross-sectional \(0.113\) — the wider SE (\(0.027\) vs \(0.019\)) is the price of differencing.
\[\hat\tau_{DR}^{DiD} = \frac{1}{N_1}\sum_{i=1}^{N}\big[w_1(D_i) - w_0(D_i, X_i)\big]\big[\Delta Y_i - \hat\mu_{0,\Delta}(X_i)\big]\]
Subtract the control-group’s predicted change from each household’s actual change \(\Delta Y_i\), then IPW-reweight so controls resemble the treated. Consistent if either model is right.
Needs only conditional parallel trends — covariate-specific time trends are allowed.
Act III
0.137
ATT of the offer · drdid and xthdidregress aipw match to four decimals (SE 0.027)
| Method | Estimand | Data | Estimate | 95% CI |
|---|---|---|---|---|
| RA / IPW / DR | ATE | endline | 0.113 | [0.075, 0.150] |
| Basic DiD | ATT | both waves | 0.135 | [0.081, 0.188] |
DR-DiD (drdid) |
ATT | both waves | 0.137 | [0.084, 0.191] |
DR-DiD (xthdidregress) |
ATT | both waves | 0.137 | [0.084, 0.191] |
| True effect | — | — | 0.12 | — |
DiD’s value is not a tighter SE — it is robustness to time-invariant unobservables.
Most of the deck estimated the offer (treat, intent-to-treat). The effect of receipt (D) needs the random offer as an instrument, because take-up is a choice.
| Specification | Estimand | Estimate | 95% CI |
|---|---|---|---|
etregress (IV) |
ATE (receipt) | 0.147 | [0.099, 0.195] |
teffects ipwra + \(y_0\) |
ATE (receipt) | 0.117 | [0.054, 0.180] |
The doubly-robust receipt estimate (\(0.117\)) lands closest to the truth; both cover \(0.12\).
Propensity-score densities for receivers vs. non-receivers of the transfer show ample common support after IPWRA weighting.
Objection. Twelve estimators all near 0.12 — surely that proves the cash transfer caused the gain?
Response. The randomization identifies the effect; the estimators merely recover it efficiently and robustly. On real observational data the same methods rest on conditional independence and parallel trends — assumptions a clean RCT hands you for free but the world rarely does.