How robust is your DiD result to violations of parallel trends?
Every difference-in-differences (DiD) estimate rests on a single
fundamentally untestable assumption — parallel trends. The
honestdid package, due to Rambachan and Roth (2023), reframes
the question. Instead of asking "Do parallel trends hold?" it asks "How
large would a violation need to be before our conclusion changes?" The
answer is a single number — the breakdown value — that
quantifies the robustness of any DiD result.
This app lets you turn the dials yourself. You will sweep the sensitivity parameter M and watch the robust CI expand in real time; simulate an event study where you control the size of the parallel-trends violation; and compare the breakdown values from every analysis in the post — 2x2, full panel, average effect, staggered DiD, and smoothness restrictions.
The two restrictions visualised — ΔRM vs ΔSD
Relative magnitudes (ΔRM): the post-treatment violation can be at most M̄ times the largest pre-treatment violation. Smoothness (ΔSD): the trend's "acceleration" between consecutive periods is bounded by M. The animation below shows the sensitivity CI sweeping from the original (narrow) bound to the breakdown point where it first crosses zero.
M-Slider
Drag M̄ from 0 to 2 and watch the robust CI widen on real Medicaid expansion estimates. Find the breakdown value yourself.
DGP Simulator
Generate an event study where you set the true parallel-trends violation. Sweep the smoothness parameter M and watch the CI react.
Breakdown Forest
Side-by-side robust CIs for every analysis in the post. Compare 2x2 DiD vs full panel vs staggered DiD vs smoothness.
Glossary (open a card if a term is unfamiliar)
Parallel trends (PTA)
DiD ATT
Pre-trends test
Sensitivity analysis
ΔRM — Relative Magnitudes
ΔSD — Smoothness
Breakdown value
C-LF / FLCI methods
M-Slider — find the breakdown value yourself
These are the actual Medicaid expansion estimates from
the post. Drag the M̄ slider (relative magnitudes) or M slider
(smoothness) and watch the robust confidence interval expand. The
breakdown value is the smallest M at which the lower CI
bound first crosses zero. The numbers in the cards below come directly
from honestdid, pre(1/5) post(7/8) output.
What to look for
- Start at M̄ = 0. The robust CI matches the original confidence interval — exact PTA. As you slide right, the lower bound shrinks toward zero.
- Switch to the 2x2 DiD. With only 1 pre-period, the breakdown value is > 2 — the 2x2 is the most robust analysis (it has less information to calibrate the worst-case violation).
- Switch to "Average 2014–2015". The breakdown drops to ~1.4. Averaging over a longer horizon accumulates potential violations, making the average effect less robust than the first-period effect alone.
- Compare TWFE (full panel) vs csdid (staggered). Their breakdown values are nearly identical (~1.5–2) — reassuring when you only have one treatment cohort.
DGP Simulator — set the violation, watch the CI react
Simulate an event study where you control the true treatment effect, the pre-treatment noise, and the size of the parallel-trends violation. Then sweep M and watch the robust CI react. This is where the "speed limit" / "acceleration limit" intuition becomes concrete.
ΔRM — Relative Magnitudes
ΔSD — Smoothness
How does the breakdown value change with the violation?
Run 100 simulations with fresh random draws (same parameters, different ε) to see how the breakdown value distributes around the truth.
Why this experiment matters
- The breakdown value depends on the observed pre-trends. Higher pre-period noise gives a larger "max pre-violation" — which makes M̄ × pre-max bigger, which lets the CI tolerate more.
- The true violation is hidden. You set it, but the analyst doesn't know it. The breakdown value tells them how large a hidden violation could be before the result reverses.
- RM and SD measure different things. RM scales with magnitudes (dimensionless); SD scales with acceleration (in outcome units). Both can be informative — report both when feasible.
Breakdown values across every analysis in the post
The numbers below come straight from results.json — exactly
the sensitivity tables in §6, §9, §10, and §11 of the post. Each forest
bar shows the robust CI for the chosen M̄ value. Toggle methods to compare
ΔRM at different M̄ values across analyses.
What to look for
- The 2x2 DiD is the most robust. Its breakdown value is > 2 because with only 1 pre-period, the "max pre-violation" is small — the CI tolerates more relative scaling.
- The average 2014–2015 effect is the least robust under ΔRM. Breakdown ≈ 1.4 — averaging amplifies cumulative bias.
- TWFE (full panel) and csdid (staggered) agree. Both produce breakdown ~1.5–2 — reassuring with a single treatment cohort.
- Smoothness gives a tighter bound. Breakdown ≈ 0.018 in outcome units, while ΔRM breakdown is 1.55. Different metrics, same robustness story.
Sensitivity parameter M (or M̄)
mvec(0(0.5)2)).Analyses to show
Breakdown values summary table
The headline number for each analysis. The breakdown value is the M (or M̄) at which the lower bound of the robust CI first touches zero.
Connecting back to Tab 2
The forest bars above are static snapshots at a fixed M̄. Tab 2 lets you sweep M̄ continuously and watch a single analysis's CI react. Tab 3 lets you generate fresh data and compute breakdown values yourself. Together, they give you three views of the same Rambachan-Roth idea: replace "Does PTA hold?" with "How much violation does this result tolerate?"