HonestDiD — Interactive Lab

A pedagogical companion to Sensitivity Analysis for Parallel Trends in Difference-in-Differences Using honestdid in Stata ↗ Back to the post

How robust is your DiD result to violations of parallel trends?

Every difference-in-differences (DiD) estimate rests on a single fundamentally untestable assumption — parallel trends. The honestdid package, due to Rambachan and Roth (2023), reframes the question. Instead of asking "Do parallel trends hold?" it asks "How large would a violation need to be before our conclusion changes?" The answer is a single number — the breakdown value — that quantifies the robustness of any DiD result.

This app lets you turn the dials yourself. You will sweep the sensitivity parameter M and watch the robust CI expand in real time; simulate an event study where you control the size of the parallel-trends violation; and compare the breakdown values from every analysis in the post — 2x2, full panel, average effect, staggered DiD, and smoothness restrictions.

The two restrictions visualised — ΔRM vs ΔSD

Relative magnitudes (ΔRM): the post-treatment violation can be at most M̄ times the largest pre-treatment violation. Smoothness (ΔSD): the trend's "acceleration" between consecutive periods is bounded by M. The animation below shows the sensitivity CI sweeping from the original (narrow) bound to the breakdown point where it first crosses zero.

Tab 2

M-Slider

Drag M̄ from 0 to 2 and watch the robust CI widen on real Medicaid expansion estimates. Find the breakdown value yourself.

Tab 3

DGP Simulator

Generate an event study where you set the true parallel-trends violation. Sweep the smoothness parameter M and watch the CI react.

Tab 4

Breakdown Forest

Side-by-side robust CIs for every analysis in the post. Compare 2x2 DiD vs full panel vs staggered DiD vs smoothness.

Glossary (open a card if a term is unfamiliar)

Parallel trends (PTA)
The identifying assumption for DiD. Treated and control would have moved together absent treatment. Fundamentally untestable: we never see the treated counterfactual.
DiD ATT
Average Treatment effect on the Treated. The 2x2 estimate is 6.18 pp; the 2014 event-study coefficient is 4.23 pp.
Pre-trends test
Joint F-test of pre-period leads = 0. The Medicaid F = 0.86, p = 0.518 — passes. But low power means passing is not the same as parallel trends holding.
Sensitivity analysis
Replaces the binary "does PTA hold?" question with a continuous "how much violation can we tolerate?" Rambachan-Roth (2023) formalised this for DiD.
ΔRM — Relative Magnitudes
Post-violation ≤ M̄ × max pre-violation. M̄ is dimensionless. Use when you have few pre-periods or pre-trends look like random noise.
ΔSD — Smoothness
Bounds the trend's second difference: |δt+1 − 2δt + δt−1| ≤ M. Use when there's a visible pre-trend and you want to allow gradual but not abrupt deviations.
Breakdown value
The M (or M̄) at which the robust CI first includes zero. The headline robustness statistic. Medicaid: ~1.5–2 under ΔRM; ~0.015–0.02 under ΔSD.
C-LF / FLCI methods
C-LF = Conditional Least-Favourable (used for ΔRM); FLCI = Fixed-Length CI (used for ΔSD). Both produce honest coverage under the chosen restriction.

M-Slider — find the breakdown value yourself

These are the actual Medicaid expansion estimates from the post. Drag the M̄ slider (relative magnitudes) or M slider (smoothness) and watch the robust confidence interval expand. The breakdown value is the smallest M at which the lower CI bound first crosses zero. The numbers in the cards below come directly from honestdid, pre(1/5) post(7/8) output.

Each analysis uses ΔRM — the relative magnitudes restriction.
Slide right to allow larger post-treatment PTA violations relative to pre-treatment.
Original point estimate
2014 event-study coefficient
Robust CI at current M̄
[lower, upper]
Breakdown value
M̄ where lower CI crosses 0
Current verdict
at this M̄

What to look for

  • Start at M̄ = 0. The robust CI matches the original confidence interval — exact PTA. As you slide right, the lower bound shrinks toward zero.
  • Switch to the 2x2 DiD. With only 1 pre-period, the breakdown value is > 2 — the 2x2 is the most robust analysis (it has less information to calibrate the worst-case violation).
  • Switch to "Average 2014–2015". The breakdown drops to ~1.4. Averaging over a longer horizon accumulates potential violations, making the average effect less robust than the first-period effect alone.
  • Compare TWFE (full panel) vs csdid (staggered). Their breakdown values are nearly identical (~1.5–2) — reassuring when you only have one treatment cohort.

DGP Simulator — set the violation, watch the CI react

Simulate an event study where you control the true treatment effect, the pre-treatment noise, and the size of the parallel-trends violation. Then sweep M and watch the robust CI react. This is where the "speed limit" / "acceleration limit" intuition becomes concrete.

Number of units. Larger n ⇒ tighter original CI.
The "real" post-treatment effect.
Size of the hidden post-period bias. Higher = more bias hiding in the estimate.
Standard deviation of period-by-period noise.

ΔRM — Relative Magnitudes

Observed pre-max
Robust CI at M̄ = 1
Robust CI at M̄ = 2
Breakdown M̄

ΔSD — Smoothness

Observed acceleration
Robust CI at M = 0.01
Robust CI at M = 0.02
Breakdown M

How does the breakdown value change with the violation?

Run 100 simulations with fresh random draws (same parameters, different ε) to see how the breakdown value distributes around the truth.

Why this experiment matters

  • The breakdown value depends on the observed pre-trends. Higher pre-period noise gives a larger "max pre-violation" — which makes M̄ × pre-max bigger, which lets the CI tolerate more.
  • The true violation is hidden. You set it, but the analyst doesn't know it. The breakdown value tells them how large a hidden violation could be before the result reverses.
  • RM and SD measure different things. RM scales with magnitudes (dimensionless); SD scales with acceleration (in outcome units). Both can be informative — report both when feasible.

Breakdown values across every analysis in the post

The numbers below come straight from results.json — exactly the sensitivity tables in §6, §9, §10, and §11 of the post. Each forest bar shows the robust CI for the chosen M̄ value. Toggle methods to compare ΔRM at different M̄ values across analyses.

What to look for

  • The 2x2 DiD is the most robust. Its breakdown value is > 2 because with only 1 pre-period, the "max pre-violation" is small — the CI tolerates more relative scaling.
  • The average 2014–2015 effect is the least robust under ΔRM. Breakdown ≈ 1.4 — averaging amplifies cumulative bias.
  • TWFE (full panel) and csdid (staggered) agree. Both produce breakdown ~1.5–2 — reassuring with a single treatment cohort.
  • Smoothness gives a tighter bound. Breakdown ≈ 0.018 in outcome units, while ΔRM breakdown is 1.55. Different metrics, same robustness story.

Sensitivity parameter M (or M̄)

Discrete grid: 0, 0.5, 1, 1.5, 2 (the same as Stata's mvec(0(0.5)2)).

Analyses to show

Breakdown values summary table

The headline number for each analysis. The breakdown value is the M (or M̄) at which the lower bound of the robust CI first touches zero.

Connecting back to Tab 2

The forest bars above are static snapshots at a fixed M̄. Tab 2 lets you sweep M̄ continuously and watch a single analysis's CI react. Tab 3 lets you generate fresh data and compute breakdown values yourself. Together, they give you three views of the same Rambachan-Roth idea: replace "Does PTA hold?" with "How much violation does this result tolerate?"