Why modern Difference-in-Differences?
For decades, the workhorse for evaluating policy interventions has been the two-way fixed-effects (TWFE) regression. With staggered treatment adoption and heterogeneous treatment effects, however, TWFE silently mixes valid and invalid comparisons — already-treated units sneak in as the "control" for later-treated units — and the resulting coefficient can be biased toward zero, or even take the wrong sign.
This app lets you reproduce the post's headline finding interactively. In four tabs you will (1) slide the parallel-trends violation knob and watch the DiD effect appear; (2) compare TWFE to the Callaway-Sant'Anna group-time ATT across thousands of synthetic panels; and (3) explore the actual minimum-wage results: TWFE = −0.038, Doubly robust = −0.065, breakdown M̄ ≈ 0.67.
Parallel trends — the assumption that does all the work
DiD is identified by one assumption: in the absence of treatment, the treated group's outcome would have moved in parallel with the control's. The orange line is what we actually observe; the dashed grey line is the unobserved counterfactual. Their gap at t+3 is the DiD effect.
Adjust the divergence slider in Tab 2 to see how the estimated DiD effect moves with the assumed deviation from parallel trends.
Parallel Trends
Slide the violation knob. Toggle pre-trends. See how the dashed counterfactual moves and what the DiD effect estimates as a result.
TWFE vs CS Showdown
Simulate staggered panels with cohort-specific dynamics. Run 100 sims and watch the TWFE distribution drift away from the truth while CS stays centred.
Forest Plot + Event Study
The post's full menu: TWFE, CS overall, doubly robust, IPW, not-yet-treated, anticipation, lagged outcomes. Hover for SEs and CIs.
Glossary (open a card if a term is unfamiliar)
Parallel trends
Staggered adoption
TWFE regression
Group-time ATT — ATT(g, t)
Forbidden comparisons
Doubly robust DiD
Event study ATT(e)
HonestDiD breakdown M̄
The three takeaways the app re-runs in code
- TWFE understates the effect by ~33%. The minimum-wage TWFE estimate is −0.038 vs. the proper overall ATT of −0.057; 64% of the bias is pre-treatment contamination, 36% is improper post-treatment weighting.
- The doubly robust ATT (−0.065) is stable. Across estimation methods (regression, IPW, DR), comparison groups (never-treated, not-yet-treated), and base periods (universal, varying), the estimate moves only by 0.001.
- HonestDiD breakdown M̄ ≈ 0.67. The on-impact effect survives parallel-trends violations up to ~67% of the largest observed pre-trend deviation before the CI first crosses zero.
Parallel-trends lab
The single identifying assumption: without treatment, the treated group's outcome would move in parallel with the control's. Slide the divergence δ to add a per-period violation on top of that, and watch what the standard 2×2 DiD estimator returns.
When δ < 0, the treated path bends downward each period after treatment. Reflects a true causal effect.
Tilts the counterfactual relative to the control. A pre-trend > 0 is a parallel-trends violation.
What the lines show
Blue = control mean. Dashed grey = counterfactual treated (what we would see if parallel trends held). Orange = observed treated. The DiD effect at t+3 is annotated top-right.
Simple 2×2 DiD estimator
What to look for
- With pre-trend = 0, DiD is unbiased for the true effect.
- With a small positive pre-trend (say 0.01), the DiD estimator under-states a negative effect.
- This is the failure mode HonestDiD formalises — the post's M̄ ≈ 0.67 says: a pre-trend up to 67% of the observed worst case is still tolerable for significance.
TWFE vs Callaway-Sant'Anna — a head-to-head
With one treatment date, TWFE = 2×2 DiD = CS. With staggered treatment and heterogeneous dynamics, TWFE silently uses already-treated units as controls. The damage shows up as bias toward zero. Crank the dials and confirm.
Strongly time-varying effects worsen TWFE's negative-weight problem.
Callaway-Sant'Anna
TWFE
Distribution across 100 simulated panels
Each draw re-randomises noise (same DGP, same seed family). With staggered timing and heterogeneous dynamics, TWFE's histogram drifts to the right of the true value while CS-style remains centred.
The post's headline results, interactively
Every estimator the post computes, plotted on one axis. The post's overall ATT is −0.057 (CS unconditional) or −0.065 (doubly robust with covariates). Toggle methods on/off; hover for SEs and confidence intervals.
Methods to display
Event study — ATT(e) by event time
Three estimators stacked on the same axis. TWFE (Sun-Abraham) and Callaway-Sant'Anna use no covariates; Doubly robust conditions on log population and log average pay.
HonestDiD sensitivity — when does the result break?
The dashed orange line marks the breakdown M̄ ≈ 0.67: post-treatment parallel-trends violations up to ~67% of the largest pre-treatment deviation still leave the on-impact effect statistically below zero. Beyond that, the CI crosses zero.