How Far Can Parallel Trends Bend Before DiD Breaks?

Sensitivity analysis for difference-in-differences with honestdid in Stata

6.18 ppMedicaid DiD ATT
M-bar 1.5-2breakdown · relative magnitudes
M 0.015-0.02breakdown · smoothness

Carlos Mendez

Nagoya University (GSID)

June 11, 2026

The Tension

Act I

Every difference-in-differences estimate rests on an assumption you cannot test

Medicaid expanded in 2014. Treated states’ insurance coverage jumped — but only if they would have tracked non-expanders absent the policy.

That counterfactual is never observed. With two periods, parallel trends is fundamentally untestable. So how much should we trust the estimate?

Where we’re going

  • The 2x2 DiD: an estimate with no way to test parallel trends
  • The event study: more pre-periods, a pre-trends test, and why it still misleads
  • Relative magnitudes (\(\bar M\)) and smoothness (\(M\)): bending the assumption on purpose
  • The breakdown value: how robust the Medicaid result really is

The Investigation

Act II

The lab: 38 states over 2008-2015, ACA Medicaid expansion

  • Outcomedins, insurance coverage among low-income childless adults
  • Treatment — 22 states that expanded Medicaid in 2014
  • Control — 16 states that never expanded
  • Estimand — the average treatment effect on the treated (ATT), under parallel trends

Observational, not randomized: states chose to expand, so parallel trends is a genuine concern — exactly the worry honestdid quantifies.

The 2x2 DiD is the difference of two changes

\[Y_{it} = \alpha + \beta\,\text{Treat}_i + \gamma\,\text{Post}_t + \delta\,(\text{Treat}_i \times \text{Post}_t) + \varepsilon_{it}\]

The interaction \(\delta\) is the DiD estimate: how much the treated group’s change exceeds the control group’s change.

Control rose 6.46 pp; treated rose 12.64 pp; the difference is the policy effect.

Treated states gained 6.18 pp more than controls

Quantity Value
Treated change (65.45% to 78.09%) +12.64 pp
Control change (61.90% to 68.36%) +6.46 pp
DiD ATT \(\hat\delta\) +6.18 pp

\(t = 7.24\), \(p < 0.001\), 95% CI \([4.45,\ 7.91]\) pp · clustered on 38 states.

With one photograph of two runners, you cannot see who was accelerating

Group means before and after expansion. The dashed counterfactual is where treated states sit under parallel trends; the gap to the solid treated line is the 6.18 pp DiD estimate.

The relative-magnitudes restriction bounds the post-violation by the worst pre-violation

\[\Delta^{RM}(\bar M): \quad |\delta_t^{\text{post}}| \;\le\; \bar M \cdot \max_{s \in \text{pre}} |\delta_s|\]

Set \(\bar M = 1\) and the post-treatment violation may be as large as the worst pre-treatment deviation; \(\bar M = 2\) allows twice that.

We never observe the true \(\delta_s\) — the package uses the estimated pre-period coefficients and their uncertainty to build valid CIs.

Three lines turn an event study into a breakdown value

* one pre-coefficient (2012), one post-coefficient (2014), 2013 omitted
reghdfe dins b2013.Dyear, absorb(stfips year) cluster(stfips) noconstant
honestdid, pre(1/1) post(3/3) mvec(0(0.5)2)            // relative magnitudes
honestdid, pre(1/1) post(3/3) mvec(0(0.5)2) coefplot   // and the picture

Even at twice the worst pre-trend, the 2x2 result stays above zero

Robust CI under relative magnitudes vs \(\bar M\). The interval widens as we relax parallel trends but never crosses zero through \(\bar M = 2\).

\(\bar M\) lower bound upper bound
0.0 0.026 0.059
1.0 0.017 0.064
2.0 0.003 0.076

Smoothness limits how fast the trend can change direction

\[\Delta^{SD}(M): \quad \big|(\delta_{t+1}-\delta_t)-(\delta_t-\delta_{t-1})\big| \;\le\; M \quad \text{for all } t\]

Relative magnitudes is a speed limit on the violation; smoothness is an acceleration limit — the trend may drift, but not lurch.

Needs \(\ge 2\) pre-periods (three points define one acceleration) — unavailable in the 2x2, unlocked by the full panel.

The Resolution

Act III

The Medicaid effect survives violations up to 1.5-2x the worst pre-trend

M-bar 1.5-2

breakdown value under \(\Delta^{RM}\) · the post-violation must be 1.5–2x the worst pre-deviation to overturn the result

The robust CI widens with \(\bar M\) and crosses zero between 1.5 and 2

Relative magnitudes with five pre-periods. The CI steadily widens; the lower bound passes through zero between \(\bar M = 1.5\) and \(2\).

\(\bar M\) lower bound upper bound
1.0 0.013 0.071
1.5 0.003 0.081
2.0 −0.007 0.091

Smoothness gives a complementary, tighter view of the same result

Smoothness restriction: robust CI vs \(M\). The lower bound crosses zero near \(M = 0.02\).

Restriction Parameter Breakdown Meaning
Relative magnitudes \(\bar M\) 1.5-2 post \(\le\) 1.5-2x worst pre-violation
Smoothness \(M\) 0.015-0.02 curvature shifts \(\le\) 1.5-2 pp per period

The staggered-robust estimator reaches the same verdict

Relative magnitudes on Callaway-Sant’Anna (csdid) event-study estimates. The breakdown again lands between \(\bar M = 1.5\) and \(2\).

Callaway-Sant’Anna ATT agrees with TWFE here because we held a single 2014 cohort — a clean robustness check.

Does honestdid make the claim causal? No — it disciplines doubt, not identification

Objection. A breakdown value is just a sensitivity number — it cannot prove the treated and control states really were on parallel trends.

Response. Correct, and that is the point. The breakdown value says how large a violation would overturn the result; subject-matter knowledge says whether a violation that large is plausible. Sensitivity is not identification — it quantifies how much identification we need.

Report the breakdown value next to every DiD estimate — it is the honest measure of doubt.