Difference-in-Differences Interactive Lab

Why modern Difference-in-Differences?

For decades, the workhorse for evaluating policy interventions has been the two-way fixed-effects (TWFE) regression. With staggered treatment adoption and heterogeneous treatment effects, however, TWFE silently mixes valid and invalid comparisons — already-treated units sneak in as the "control" for later-treated units — and the resulting coefficient can be biased toward zero, or even take the wrong sign.

This app lets you reproduce the post's headline finding interactively. In four tabs you will (1) slide the parallel-trends violation knob and watch the DiD effect appear; (2) compare TWFE to the Callaway-Sant'Anna group-time ATT across thousands of synthetic panels; and (3) explore the actual minimum-wage results: TWFE = −0.038, Doubly robust = −0.065, breakdown M̄ ≈ 0.67.

Parallel trends — the assumption that does all the work

DiD is identified by one assumption: in the absence of treatment, the treated group's outcome would have moved in parallel with the control's. The orange line is what we actually observe; the dashed grey line is the unobserved counterfactual. Their gap at t+3 is the DiD effect.

Adjust the divergence slider in Tab 2 to see how the estimated DiD effect moves with the assumed deviation from parallel trends.

Tab 2

Parallel Trends

Slide the violation knob. Toggle pre-trends. See how the dashed counterfactual moves and what the DiD effect estimates as a result.

Tab 3

TWFE vs CS Showdown

Simulate staggered panels with cohort-specific dynamics. Run 100 sims and watch the TWFE distribution drift away from the truth while CS stays centred.

Tab 4

Forest Plot + Event Study

The post's full menu: TWFE, CS overall, doubly robust, IPW, not-yet-treated, anticipation, lagged outcomes. Hover for SEs and CIs.

Glossary (open a card if a term is unfamiliar)

Parallel trends

Absent treatment, treated and control would move in lockstep. The single identifying assumption of DiD.

Staggered adoption

Different units enter treatment at different dates. The post has cohorts G ∈ {0, 2004, 2006}.

TWFE regression

Two-way fixed effects: Y_it = θ_t + η_i + α D_it + v_it. Equivalent to 2×2 DiD when there's only one treatment date — but biased with staggered timing under heterogeneity.

Group-time ATT — ATT(g, t)

The treatment effect for cohort g at calendar time t. The Callaway-Sant'Anna building block; aggregated into overall ATT or event study.

Forbidden comparisons

When TWFE uses already-treated units as the "control" for later-treated units. The source of the negative-weight problem.

Doubly robust DiD

Combines an outcome regression and a propensity-score model. Consistent if either model is correct — belt and suspenders.

Event study ATT(e)

ATT aggregated by event time e = t − g, not calendar time. Lines up each cohort at its own "year zero."

HonestDiD breakdown M̄

The threshold of allowed parallel-trends violation at which the CI first crosses zero. Bigger M̄ = more robust. The post reports M̄ ≈ 0.67.

The three takeaways the app re-runs in code

TWFE understates the effect by ~33%. The minimum-wage TWFE estimate is −0.038 vs. the proper overall ATT of −0.057; 64% of the bias is pre-treatment contamination, 36% is improper post-treatment weighting.
The doubly robust ATT (−0.065) is stable. Across estimation methods (regression, IPW, DR), comparison groups (never-treated, not-yet-treated), and base periods (universal, varying), the estimate moves only by 0.001.
HonestDiD breakdown M̄ ≈ 0.67. The on-impact effect survives parallel-trends violations up to ~67% of the largest observed pre-trend deviation before the CI first crosses zero.

Parallel-trends lab

The single identifying assumption: without treatment, the treated group's outcome would move in parallel with the control's. Slide the divergence δ to add a per-period violation on top of that, and watch what the standard 2×2 DiD estimator returns.

Per-period treatment effect δ -0.030

When δ < 0, the treated path bends downward each period after treatment. Reflects a true causal effect.

Pre-trend slope (violation) 0.000

Tilts the counterfactual relative to the control. A pre-trend > 0 is a parallel-trends violation.

What the lines show

Blue = control mean. Dashed grey = counterfactual treated (what we would see if parallel trends held). Orange = observed treated. The DiD effect at t+3 is annotated top-right.

Simple 2×2 DiD estimator

α̂_DiD (t+3 vs. t-2)—

True effect at t+3—

Bias (α̂ − truth)—

What to look for

With pre-trend = 0, DiD is unbiased for the true effect.
With a small positive pre-trend (say 0.01), the DiD estimator under-states a negative effect.
This is the failure mode HonestDiD formalises — the post's M̄ ≈ 0.67 says: a pre-trend up to 67% of the observed worst case is still tolerable for significance.

TWFE vs Callaway-Sant'Anna — a head-to-head

With one treatment date, TWFE = 2×2 DiD = CS. With staggered treatment and heterogeneous dynamics, TWFE silently uses already-treated units as controls. The damage shows up as bias toward zero. Crank the dials and confirm.

Number of units per cohort 200

Treatment dynamics δ_growth (per period) -0.020

Strongly time-varying effects worsen TWFE's negative-weight problem.

Cohort gap (years between G=A and G=B) 2

Noise σ 0.10

Callaway-Sant'Anna

α̂ (overall ATT)—

true ATT—

bias—

TWFE

α̂ (single coef)—

true ATT—

bias—

Distribution across 100 simulated panels

Each draw re-randomises noise (same DGP, same seed family). With staggered timing and heterogeneous dynamics, TWFE's histogram drifts to the right of the true value while CS-style remains centred.

Difference-in-Differences — Interactive Lab