Why does DiD give two different answers?
Did the Affordable Care Act's Medicaid expansion reduce adult mortality? The simplest 2×2 difference-in-differences (DiD) on 2,604 U.S. counties gives two opposite answers: an unweighted ATT of +0.12 deaths per 100,000 (no effect, or a tiny increase) and a population-weighted ATT of −2.56 (a meaningful reduction).
Same data. Same arithmetic. Same identifying assumption. The only difference is how you average across counties of very different size. Weighting silently changes the target parameter — not the precision, the parameter itself. The unweighted answer is the effect on the typical treated county; the weighted answer is the effect on the typical treated adult. They answer different questions.
The parallel-trends assumption, animated
DiD assumes that absent treatment, the treated group would have moved in parallel with the control group. The dashed orange line below shows that counterfactual trend; the solid orange line shows the treated group's actual trajectory after the policy kicks in. The vertical teal bar at the right edge is the ATT — exactly the gap parallel trends creates room for.
Weighting Simulator
Drag the heterogeneity slider and watch the unweighted and population-weighted means diverge — even though every county satisfies parallel trends.
Forest Plot
The post's headline numbers — seven estimators (2×2 means, TWFE, OR, IPW, DRDID, 2×T dynamic, G×T dynamic), unweighted vs population-weighted.
Event Study
The full G×T dynamic ATT(e) trajectory from the post, e = −10 to +5, with shaded 95% confidence bands.
The three key takeaways
- The 2×2 sign reversal is real and structural. Unweighted ATT(2014) = +0.12; weighted ATT(2014) = −2.56. The pre-period gap is identical in both regimes (−54.77 vs −53.68) — the reversal is driven entirely by which counties dominate the averages.
- Weighting choice dominates methodology. Within either weighting, the four 2×2 estimators (cell means, TWFE, OR, IPW, DRDID) agree to within 1.7 deaths per 100,000. Across weightings, the gap is 2.5–10 deaths per 100,000. The estimator menu matters less than the weighting button.
- Power is the binding constraint. None of the six 2×2 covariate-adjusted 95% CIs excludes zero. Only the unweighted G×T event study at e = 5 escapes — at +16.96 deaths per 100,000, in the opposite-of-expected direction.
Glossary (open a card if a term is unfamiliar)
DiD (Difference-in-Differences)
ATT (Average Treatment effect on the Treated)
Parallel-trends assumption
Unweighted vs population-weighted
TWFE (Two-way fixed effects)
DRDID (doubly-robust DiD)
Callaway-Sant'Anna ATT(g,t)
HonestDiD M̄
Weighting Simulator — why the sign can flip
Two cohorts: a small group of large urban counties and a large group of small rural counties. Both share parallel trends. The treatment effect is different in each cohort. Drag the effect heterogeneity slider and watch the unweighted mean (blue) and population-weighted mean (orange) diverge — exactly the mechanism that flips the sign on the real Medicaid data.
What to look for
- Set both effects to the same value. The unweighted and weighted ATTs collapse to the same number. Heterogeneity is required for the reversal.
- Increase the population ratio. A bigger urban / rural population gap amplifies the divergence — the population-weighted ATT migrates toward the urban effect, the unweighted ATT toward the rural one.
- Reproduce the post's headline. Urban effect = −6, rural = +2, urban share = 0.3, ratio = 15 gives weighted ≈ −3, unweighted ≈ +0.4 — the same sign reversal the manuscript reports on the actual Medicaid data.
Where does the post's asymmetry come from?
The real Medicaid panel has the same structure: the never-expansion cohort is 46.9% of counties but only 38.2% of adults; the 2014-expansion cohort is 37.6% of counties but 49.5% of adults. Switching to population weights shifts 11 percentage points of mass between the two largest cohorts — exactly the mechanism this simulator illustrates.
The post's estimator forest plot
Every number on this plot comes from the post's own results CSVs. Each method-by-weighting cell is one estimate with a 95% confidence interval. The dominant fact is colour: every blue dot (unweighted) sits to the right of its orange counterpart (population-weighted), and that 2.5–10 death gap is wider than the spread within a weighting across estimators.
What to look for
- Toggle off the dynamic methods. The 2×2-only view (cell-means, TWFE, OR, IPW, DRDID) shows the within-weighting estimator agreement: at most 1.7 deaths per 100,000 between the orange dots, 1.7 between the blue ones.
- Toggle on the dynamic methods. The 2×T row is the post's most dramatic divergence: +9.43 unweighted vs −0.68 weighted. Pooling across the 2014, 2015, 2016, and 2019 cohorts (the G×T row) shrinks the gap to 7.7 but does not close it.
- Hover any dot. See the exact estimate, SE, and CI. Notice that no weighted CI excludes zero by a comfortable margin and no unweighted CI excludes zero at all — except the G×T dynamic.
Methods
Weightings
The unweighted-vs-weighted gap, by stage
The post's Section 11 headline table shows how the gap grows when staggered cohort heterogeneity enters the design:
- 2×2 cell-means: gap = 2.69 (only one cohort)
- 2×2 DRDID: gap = 2.53 (covariate-adjusted)
- 2×T dynamic: gap = 10.11 (one cohort, eleven periods — the widest gap)
- G×T dynamic: gap = 7.65 (all four cohorts pooled)
The gap is largest when staggered cohort heterogeneity is in play (2×T and G×T) and smallest when the four-cell 2×2 design forces a single ATT(2014). That's the manuscript's core lesson made visible at a glance: methodology and target parameter are orthogonal axes of choice, and the second dominates the first.
G×T event study — ATT(e) across cohorts
The Callaway-Sant'Anna group-time framework produces an ATT for every cohort × calendar-year cell. Aggregated across cohorts at fixed event time e, you get a single ATT(e) per relative year. Below: e = −10 to +5, both weightings, shaded 95% confidence bands. The orange dashed line at e = −0.5 separates leads (placebo test for parallel trends) from lags (causal effect).
Show
What to look for
- Pre-period leads at e = −10, −9, −8 are sharply negative under both weightings (around −15 to −26 deaths per 100,000, CIs excluding zero). These are driven entirely by the small 2019 cohort — the only cohort with a pre-history that long.
- From e = −7 onward, leads settle near zero. Approximate parallel trends holds across the bulk of the comparison window, even though the assumption is technically violated at long pre-horizons.
- Post-treatment divergence. Unweighted ATT(e) climbs from −0.45 at e = 0 to +16.96 at e = 5 (CI [+6.83, +27.09], excludes zero). Weighted ATT(e) oscillates within [−3.74, +4.49] — every weighted CI overlaps zero.
What does this mean for the policy question?
The dynamic-aggregated ATT averaged over e ≥ 0 is +7.92 unweighted versus +0.27 population-weighted. For the typical treated adult (the policy-relevant target parameter), there is no statistically credible mortality effect in either direction. For the typical treated county-as-a-unit, the unweighted G×T design gives a positive sign — opposite of what one might expect for an insurance-expansion policy.
The manuscript flags this case as pedagogical rather than as the best-possible estimate of Medicaid's mortality effect. HonestDiD's sensitivity analysis underscores why: at M̄ = 0 the unweighted bound is entirely positive [+2.01, +14.09], but by M̄ = 0.25 it crosses zero — the slightest parallel-trends violation overturns the sign conclusion. The weighted bound already straddles zero at M̄ = 0.