HonestDiD — Interactive Lab

How robust is your DiD result to violations of parallel trends?

Every difference-in-differences (DiD) estimate rests on a single fundamentally untestable assumption — parallel trends. The honestdid package, due to Rambachan and Roth (2023), reframes the question. Instead of asking "Do parallel trends hold?" it asks "How large would a violation need to be before our conclusion changes?" The answer is a single number — the breakdown value — that quantifies the robustness of any DiD result.

This app lets you turn the dials yourself. You will sweep the sensitivity parameter M and watch the robust CI expand in real time; simulate an event study where you control the size of the parallel-trends violation; and compare the breakdown values from every analysis in the post — 2x2, full panel, average effect, staggered DiD, and smoothness restrictions.

The two restrictions visualised — Δ^RM vs Δ^SD

Relative magnitudes (Δ^RM): the post-treatment violation can be at most M̄ times the largest pre-treatment violation. Smoothness (Δ^SD): the trend's "acceleration" between consecutive periods is bounded by M. The animation below shows the sensitivity CI sweeping from the original (narrow) bound to the breakdown point where it first crosses zero.

Tab 2

M-Slider

Drag M̄ from 0 to 2 and watch the robust CI widen on real Medicaid expansion estimates. Find the breakdown value yourself.

Tab 3

DGP Simulator

Generate an event study where you set the true parallel-trends violation. Sweep the smoothness parameter M and watch the CI react.

Tab 4

Breakdown Forest

Side-by-side robust CIs for every analysis in the post. Compare 2x2 DiD vs full panel vs staggered DiD vs smoothness.

Glossary (open a card if a term is unfamiliar)

Parallel trends (PTA)

The identifying assumption for DiD. Treated and control would have moved together absent treatment. Fundamentally untestable: we never see the treated counterfactual.

DiD ATT

Average Treatment effect on the Treated. The 2x2 estimate is 6.18 pp; the 2014 event-study coefficient is 4.23 pp.

Pre-trends test

Joint F-test of pre-period leads = 0. The Medicaid F = 0.86, p = 0.518 — passes. But low power means passing is not the same as parallel trends holding.

Sensitivity analysis

Replaces the binary "does PTA hold?" question with a continuous "how much violation can we tolerate?" Rambachan-Roth (2023) formalised this for DiD.

Δ^RM — Relative Magnitudes

Post-violation ≤ M̄ × max pre-violation. M̄ is dimensionless. Use when you have few pre-periods or pre-trends look like random noise.

Δ^SD — Smoothness

Bounds the trend's second difference: |δ_t+1 − 2δ_t + δ_t−1| ≤ M. Use when there's a visible pre-trend and you want to allow gradual but not abrupt deviations.

Breakdown value

The M (or M̄) at which the robust CI first includes zero. The headline robustness statistic. Medicaid: ~1.5–2 under Δ^RM; ~0.015–0.02 under Δ^SD.

C-LF / FLCI methods

C-LF = Conditional Least-Favourable (used for Δ^RM); FLCI = Fixed-Length CI (used for Δ^SD). Both produce honest coverage under the chosen restriction.

M-Slider — find the breakdown value yourself

These are the actual Medicaid expansion estimates from the post. Drag the M̄ slider (relative magnitudes) or M slider (smoothness) and watch the robust confidence interval expand. The breakdown value is the smallest M at which the lower CI bound first crosses zero. The numbers in the cards below come directly from honestdid, pre(1/5) post(7/8) output.

Analysis

Each analysis uses Δ^RM — the relative magnitudes restriction.

M̄ (relative magnitudes) 1.00

Slide right to allow larger post-treatment PTA violations relative to pre-treatment.

Original point estimate

—

2014 event-study coefficient

Robust CI at current M̄

—

[lower, upper]

Breakdown value

—

M̄ where lower CI crosses 0

Current verdict

—

at this M̄

What to look for

Start at M̄ = 0. The robust CI matches the original confidence interval — exact PTA. As you slide right, the lower bound shrinks toward zero.
Switch to the 2x2 DiD. With only 1 pre-period, the breakdown value is > 2 — the 2x2 is the most robust analysis (it has less information to calibrate the worst-case violation).
Switch to "Average 2014–2015". The breakdown drops to ~1.4. Averaging over a longer horizon accumulates potential violations, making the average effect less robust than the first-period effect alone.
Compare TWFE (full panel) vs csdid (staggered). Their breakdown values are nearly identical (~1.5–2) — reassuring when you only have one treatment cohort.

DGP Simulator — set the violation, watch the CI react

Simulate an event study where you control the true treatment effect, the pre-treatment noise, and the size of the parallel-trends violation. Then sweep M and watch the robust CI react. This is where the "speed limit" / "acceleration limit" intuition becomes concrete.

Sample size n 200

Number of units. Larger n ⇒ tighter original CI.

True ATT 0.05

The "real" post-treatment effect.

PTA violation magnitude 0.020

Size of the hidden post-period bias. Higher = more bias hiding in the estimate.

Noise σ 0.020

Standard deviation of period-by-period noise.

Δ^RM — Relative Magnitudes

Observed pre-max—

Robust CI at M̄ = 1—

Robust CI at M̄ = 2—

Breakdown M̄—

Δ^SD — Smoothness

Observed acceleration—

Robust CI at M = 0.01—

Robust CI at M = 0.02—

Breakdown M—

How does the breakdown value change with the violation?

Run 100 simulations with fresh random draws (same parameters, different ε) to see how the breakdown value distributes around the truth.

Why this experiment matters

The breakdown value depends on the observed pre-trends. Higher pre-period noise gives a larger "max pre-violation" — which makes M̄ × pre-max bigger, which lets the CI tolerate more.
The true violation is hidden. You set it, but the analyst doesn't know it. The breakdown value tells them how large a hidden violation could be before the result reverses.
RM and SD measure different things. RM scales with magnitudes (dimensionless); SD scales with acceleration (in outcome units). Both can be informative — report both when feasible.

Breakdown values across every analysis in the post

The numbers below come straight from results.json — exactly the sensitivity tables in §6, §9, §10, and §11 of the post. Each forest bar shows the robust CI for the chosen M̄ value. Toggle methods to compare Δ^RM at different M̄ values across analyses.

What to look for

The 2x2 DiD is the most robust. Its breakdown value is > 2 because with only 1 pre-period, the "max pre-violation" is small — the CI tolerates more relative scaling.
The average 2014–2015 effect is the least robust under Δ^RM. Breakdown ≈ 1.4 — averaging amplifies cumulative bias.
TWFE (full panel) and csdid (staggered) agree. Both produce breakdown ~1.5–2 — reassuring with a single treatment cohort.
Smoothness gives a tighter bound. Breakdown ≈ 0.018 in outcome units, while Δ^RM breakdown is 1.55. Different metrics, same robustness story.

Sensitivity parameter M (or M̄)

Choose M̄ value (relative magnitudes) 1.00

Discrete grid: 0, 0.5, 1, 1.5, 2 (the same as Stata's mvec(0(0.5)2)).

Analyses to show

2x2 DiD (1 pre) Full panel (5 pre) Average 2014–2015 Staggered (csdid)

Breakdown values summary table

The headline number for each analysis. The breakdown value is the M (or M̄) at which the lower bound of the robust CI first touches zero.

Connecting back to Tab 2

The forest bars above are static snapshots at a fixed M̄. Tab 2 lets you sweep M̄ continuously and watch a single analysis's CI react. Tab 3 lets you generate fresh data and compute breakdown values yourself. Together, they give you three views of the same Rambachan-Roth idea: replace "Does PTA hold?" with "How much violation does this result tolerate?"

How robust is your DiD result to violations of parallel trends?

The two restrictions visualised — ΔRM vs ΔSD

M-Slider

DGP Simulator

Breakdown Forest

Glossary (open a card if a term is unfamiliar)

M-Slider — find the breakdown value yourself

What to look for

DGP Simulator — set the violation, watch the CI react

ΔRM — Relative Magnitudes

ΔSD — Smoothness

How does the breakdown value change with the violation?

Why this experiment matters

Breakdown values across every analysis in the post

What to look for

Sensitivity parameter M (or M̄)

Analyses to show

Breakdown values summary table

Connecting back to Tab 2

The two restrictions visualised — Δ^RM vs Δ^SD

Δ^RM — Relative Magnitudes

Δ^SD — Smoothness