From ATE to IATE to a better training-assignment rule
Nagoya University (GSID)
June 11, 2026
Act I
A government runs a job-training programme. The average effect is positive. So we train everyone — right?
Caseworkers steer the neediest into training. A simple comparison then confuses the programme with who was selected. Average is not enough.
Forest plot · Naive, DoubleML, CausalForestDML, and the truth (orange star, 5.628). The naive interval sits entirely below the truth.
Act II
\[\text{ATE} = E[Y(1) - Y(0)]\] \[\text{GATE}(z) = E[Y(1) - Y(0) \mid Z = z]\] \[\text{IATE}(x) = E[Y(1) - Y(0) \mid X = x]\]
Population average → subgroup average → one number per person. Earn the coarse estimand before the fine one — the order is the discipline.
The data are observational. We assume selection-on-observables: conditional on \(X\), treatment is as good as random.
\[D \perp \{Y(1), Y(0)\} \mid X\]
This is the strong assumption that licenses everything downstream. The naive difference-in-means is therefore genuinely biased here — not merely imprecise.
Synthetic Flemish-ALMP cohort modelled on Cockx, Lechner & Bollens (2023). Every estimator is benchmarked against the truth.
Propensity-score histograms by treatment status; the two distributions overlap heavily across [0.2, 0.8].
−0.52
Naive bias in months · estimate 5.111 [4.93, 5.30], its 95% CI fails to cover the true 5.628
\[\psi_i = g_1(X_i) - g_0(X_i) + \frac{D_i\,(Y_i - g_1(X_i))}{m(X_i)} - \frac{(1-D_i)\,(Y_i - g_0(X_i))}{1 - m(X_i)}\]
Outcome regression \(g_d(X) = E[Y \mid D=d, X]\) plus an inverse-propensity residual correction. \(E[\psi_i] = \text{ATE}\) if either \(g\) or \(m\) is right — the “double” in doubly robust.
dml_data = DoubleMLData(df, y_col="Y", d_cols="D", x_cols=X_COLS)
ml_g = RandomForestRegressor(n_estimators=200, min_samples_leaf=5)
ml_m = RandomForestClassifier(n_estimators=200, min_samples_leaf=5)
dml_irm = DoubleMLIRM(
dml_data, ml_g=ml_g, ml_m=ml_m,
n_folds=5, score="ATE", trimming_threshold=0.01,
)
dml_irm.fit(store_predictions=True) # coef[0] is the ATE5.520
DoubleML ATE [5.36, 5.68] · covers the true 5.628 · bias cut from −0.52 to −0.11; SE 0.094 → 0.081
Estimated GATE (steel) vs true GATE (orange) by Dutch proficiency; both decline monotonically and nearly coincide.
Estimated IATE vs true individual effect \(\tau\) for 5,000 jobseekers, with a 45° line; points cluster tightly on the diagonal.
Estimated IATEs coloured by Dutch proficiency; a dashed line marks the true ATE of 5.63. Distributions shift monotonically left as proficiency rises.
Act III
\[W(\text{rule}) = E\big[\,\text{rule}(X)\cdot(\tau(X) - c)\,\big]\]
For everyone the rule treats, add their true effect minus the cost \(c = 4\) months. Treat-all leaves welfare on the table; targeting the responders does better.
1.749
Welfare/person under the IATE rule vs oracle 1.758 (99.5%) and treat-all 1.628 (+7.4%); treats 83.9% vs the oracle’s 83.8%
Average net welfare per person under four rules; the IATE rule (1.749) nearly matches the oracle (1.758) and beats treat-all (1.628).
Objection. A flexible forest that picks its own controls must be “more causal” than a naive comparison.
Response. Flexibility helps estimation, not identification. \(\tau\) is identified only under unconfoundedness and overlap; the forest just estimates it well given those. On a real cohort with thin overlap, the favourable performance here would not transfer for free.