Why DiD? Why not just compare before to after?
A government implements an after-school tutoring program in 10 of 35 high schools. Did the program improve student GPA? A naive before-after look at the treated schools shows a jump of 36.20 GPA points — but that estimate absorbs whatever time-trend was already lifting every school in the district. The 25 untreated comparison schools rose 10.88 points on their own. The Difference-in-Differences estimator subtracts that drift first, recovering an ATT of 25.32 points — about 30% smaller than the naive number.
This app lets you turn the dials yourself. In four tabs you will (1) slide a parallel-trends violation knob and watch the DiD estimate move; (2) compare DiD against the naive ITS across 100 simulated panels; and (3) explore the post's five-estimator agreement on the real Corral-Yang (2024) case study — all converging on the same 25.32-point ATT.
Parallel trends — the assumption that does all the work
DiD is identified by one assumption: in the absence of treatment, the treated group's GPA would have moved in parallel with the comparison group's. The orange line is what we actually observe; the dashed grey line is the unobserved counterfactual. The vertical gap at t+1 is the DiD effect.
Adjust the violation slider in Tab 2 to see how the estimated DiD effect moves with the assumed deviation from parallel trends.
Parallel Trends Lab
Slide the violation knob and the true treatment effect. Watch the counterfactual move and the 2×2 DiD bias appear in real time.
DiD vs ITS Simulator
Two estimators on the same synthetic panel: DiD vs the naive ITS that ignores the comparison group. Run 100 sims and watch ITS systematically inflate the effect.
Five-Estimator Forest Plot
The post's headline finding: diff, reg, didregress, xtreg, and reghdfe all return the same 25.32 ATT — only the SEs differ.
Glossary (open a card if a term is unfamiliar)
2×2 DiD
Parallel trends assumption
ATT
E[Y(1) − Y(0) | D=1]. The causal effect for the 10 schools that received the program. Not the ATE — does not extrapolate to non-treated schools.Counterfactual
TWFE (two-way fixed effects)
Y_it = γ_i + θ_t + β · txp_it + ε_it. The coefficient on the treatment-period interaction txp is the DiD ATT.Event study
eventdd.Interrupted Time Series (ITS)
Clustered standard errors
didregress auto-clusters at the school level, lifting the SE from 0.61 to 0.83. Different SE, same point estimate.The three takeaways this app reruns in code
- The naive ITS overstates the effect by 43%. ITS reports 36.20 GPA points; DiD reports 25.32. The 10.88-point gap is district-wide drift that ITS cannot subtract — the price of skipping a comparison group.
-
Five Stata estimators converge on the same ATT.
diff,reg,didregress,xtreg, andreghdfeall return 25.31–25.33. They differ only in how standard errors are computed (clustering and robust corrections). - The event study supports parallel trends. Pre-treatment leads are 0.34, −0.32, 0.59 — all statistically insignificant. Post-treatment lags are 24.71–25.70 with no fade-out, suggesting a sustained effect.
Parallel-trends lab
The single identifying assumption: without treatment, treated schools' GPA would move in parallel with comparison schools'. Slide the divergence δ to add a per-period violation on top of that, and watch what the standard 2×2 DiD estimator returns.
Magnitude of the true ATT. Set to 25 to match the post; slide to see what would happen for different program sizes.
Tilts the counterfactual relative to the control. A pre-trend > 0 means treated schools were drifting upward faster than comparison schools — a parallel-trends violation.
What the lines show
Blue = comparison group mean. Dashed grey = counterfactual treated (what we would see if parallel trends held). Orange = observed treated. The DiD effect at t+1 is annotated top-right.
Simple 2×2 DiD estimator
What to look for
- With pre-trend = 0, DiD is unbiased for the true ATT — bias is essentially zero.
- With a positive pre-trend (treated drifting up faster), DiD over-states a positive program effect.
- With a negative pre-trend, DiD under-states it.
- In the post, the event-study leads are 0.34, −0.32, 0.59 (all p > 0.10) — close enough to zero that the parallel-trends assumption is plausible.
DiD vs ITS — head-to-head on simulated panels
Two estimators on the same data. DiD uses the comparison group to subtract the secular trend; ITS (Interrupted Time Series) looks only at the treated group's before-after change. When the secular trend is nonzero, ITS systematically over-estimates the program effect. The post's data shows this exactly: ITS = 36.20, DiD = 25.32, gap = 10.88 (the district-wide drift).
10 treated + n−10 comparison schools per simulated panel.
The true program effect, in GPA points. Matched to the post's 25.32.
District-wide drift that affects both groups equally. ITS absorbs this; DiD removes it.
School-level noise in GPA points.
DiD (2×2)
ITS (naive)
Distribution across 100 simulated panels
Each draw re-randomises school-level noise (same DGP, same seed family). With a nonzero secular trend, ITS's histogram sits to the right of the truth (bias ≈ the secular trend) while DiD's histogram stays centred. This is the §3 message of the post: ITS confuses drift with effect; DiD separates them.
The post's headline results, interactively
Every estimator the post computes, plotted on one axis. The
diff, reg, didregress,
xtreg, and reghdfe commands all converge on
25.31–25.33 GPA points. The naive ITS estimator
(36.20) is included so you can see the 10.88-point inflation
against the others. Toggle methods on/off; hover for SEs and CIs.
Methods to display
Event study — dynamic treatment effects from eventdd
The dynamic specification replaces the single txp
interaction with one coefficient per period relative to treatment onset.
The pre-treatment leads (event time −4 to −2) should be near zero if
parallel trends hold; the post-treatment lags (event time 0 to 3) trace
the dynamic effect path.
Reading the plot: pre-treatment coefficients (0.34, −0.32, 0.59) hover around zero with insignificant p-values — parallel pre-trends look fine. Post-treatment coefficients (25.03, 24.71, 24.77, 25.70) snap up immediately and stay flat — no fade-out, no ramp-up. The program delivered its full benefit from period 0 and sustained it.
The 2×2 means table (Table 1 of the post)
| Pre | Post | Change | |
|---|---|---|---|
| Comparison (25 schools) | 71.22 | 82.10 | +10.88 |
| Treated (10 schools) | 60.17 | 96.37 | +36.20 |
| DiD ATT | +25.32 |
The DiD estimator is the difference of the two row-wise changes:
36.20 − 10.88 = 25.32. Equivalently, it is the
difference of the two column-wise gaps:
(96.37 − 82.10) − (60.17 − 71.22) = 14.27 − (−11.05) = 25.32.
Same number, two routes — that is the "double difference."
Why so many ways to compute the same number?
The point-estimate column above is identical (to the third decimal) across five Stata commands. So why include all of them? The differences live in the standard error column:
regwithrobustSE: 0.615 — heteroskedasticity-robust, but no clustering.didregress(Stata 17+): 0.834 — auto-clusters at school level. Wider CI but more honest under within-school error correlation.xtreg, fewith cluster: 0.585 — the smallest SE because within-school variation is the only variation used.reghdfe: 0.585 — identical toxtregbut scales to many fixed effects.reghdfe + female_share: 0.605 — adding an unrelated covariate doesn't change the estimate but mildly inflates the SE.
The lesson from the post: research design (DiD + parallel trends) drives the answer; SE choice fine-tunes inference but does not move the point estimate.