DiD in Stata — Interactive Lab

A pedagogical companion to Introduction to Difference-in-Differences (DiD) in Stata ↗ Back to the post

Why DiD? Why not just compare before to after?

A government implements an after-school tutoring program in 10 of 35 high schools. Did the program improve student GPA? A naive before-after look at the treated schools shows a jump of 36.20 GPA points — but that estimate absorbs whatever time-trend was already lifting every school in the district. The 25 untreated comparison schools rose 10.88 points on their own. The Difference-in-Differences estimator subtracts that drift first, recovering an ATT of 25.32 points — about 30% smaller than the naive number.

This app lets you turn the dials yourself. In four tabs you will (1) slide a parallel-trends violation knob and watch the DiD estimate move; (2) compare DiD against the naive ITS across 100 simulated panels; and (3) explore the post's five-estimator agreement on the real Corral-Yang (2024) case study — all converging on the same 25.32-point ATT.

Parallel trends — the assumption that does all the work

DiD is identified by one assumption: in the absence of treatment, the treated group's GPA would have moved in parallel with the comparison group's. The orange line is what we actually observe; the dashed grey line is the unobserved counterfactual. The vertical gap at t+1 is the DiD effect.

Adjust the violation slider in Tab 2 to see how the estimated DiD effect moves with the assumed deviation from parallel trends.

Tab 2

Parallel Trends Lab

Slide the violation knob and the true treatment effect. Watch the counterfactual move and the 2×2 DiD bias appear in real time.

Tab 3

DiD vs ITS Simulator

Two estimators on the same synthetic panel: DiD vs the naive ITS that ignores the comparison group. Run 100 sims and watch ITS systematically inflate the effect.

Tab 4

Five-Estimator Forest Plot

The post's headline finding: diff, reg, didregress, xtreg, and reghdfe all return the same 25.32 ATT — only the SEs differ.

Glossary (open a card if a term is unfamiliar)

2×2 DiD
Take the post-period treated-minus-control gap, subtract the pre-period treated-minus-control gap. The double difference is the ATT under parallel trends.
Parallel trends assumption
Absent treatment, treated and control would have moved in lockstep. The single identifying assumption of DiD. Starting-level differences are fine; differences in slopes break the design.
ATT
Average Treatment effect on the Treated: E[Y(1) − Y(0) | D=1]. The causal effect for the 10 schools that received the program. Not the ATE — does not extrapolate to non-treated schools.
Counterfactual
The post-period outcome treated schools would have had without treatment. Never observed. DiD constructs it as treated pre-level + control's pre-to-post change: 60.17 + 10.88 = 71.05.
TWFE (two-way fixed effects)
Regression DiD: Y_it = γ_i + θ_t + β · txp_it + ε_it. The coefficient on the treatment-period interaction txp is the DiD ATT.
Event study
A dynamic specification with one coefficient per period relative to treatment. Leads test parallel pre-trends; lags trace dynamic effects. The post uses Stata's eventdd.
Interrupted Time Series (ITS)
Single-group before-after. No control. Equates secular drift with treatment effect. In the post, ITS = 36.20 (vs DiD = 25.32) because it absorbs the 10.88-point district-wide trend.
Clustered standard errors
SEs that allow within-school correlation. The post's didregress auto-clusters at the school level, lifting the SE from 0.61 to 0.83. Different SE, same point estimate.

The three takeaways this app reruns in code

  1. The naive ITS overstates the effect by 43%. ITS reports 36.20 GPA points; DiD reports 25.32. The 10.88-point gap is district-wide drift that ITS cannot subtract — the price of skipping a comparison group.
  2. Five Stata estimators converge on the same ATT. diff, reg, didregress, xtreg, and reghdfe all return 25.31–25.33. They differ only in how standard errors are computed (clustering and robust corrections).
  3. The event study supports parallel trends. Pre-treatment leads are 0.34, −0.32, 0.59 — all statistically insignificant. Post-treatment lags are 24.71–25.70 with no fade-out, suggesting a sustained effect.

DiD vs ITS — head-to-head on simulated panels

Two estimators on the same data. DiD uses the comparison group to subtract the secular trend; ITS (Interrupted Time Series) looks only at the treated group's before-after change. When the secular trend is nonzero, ITS systematically over-estimates the program effect. The post's data shows this exactly: ITS = 36.20, DiD = 25.32, gap = 10.88 (the district-wide drift).

10 treated + n−10 comparison schools per simulated panel.

The true program effect, in GPA points. Matched to the post's 25.32.

District-wide drift that affects both groups equally. ITS absorbs this; DiD removes it.

School-level noise in GPA points.

DiD (2×2)

α̂ (DiD)
true ATT
bias

ITS (naive)

α̂ (ITS)
true ATT
bias

Distribution across 100 simulated panels

Each draw re-randomises school-level noise (same DGP, same seed family). With a nonzero secular trend, ITS's histogram sits to the right of the truth (bias ≈ the secular trend) while DiD's histogram stays centred. This is the §3 message of the post: ITS confuses drift with effect; DiD separates them.

The post's headline results, interactively

Every estimator the post computes, plotted on one axis. The diff, reg, didregress, xtreg, and reghdfe commands all converge on 25.31–25.33 GPA points. The naive ITS estimator (36.20) is included so you can see the 10.88-point inflation against the others. Toggle methods on/off; hover for SEs and CIs.

Methods to display

Event study — dynamic treatment effects from eventdd

The dynamic specification replaces the single txp interaction with one coefficient per period relative to treatment onset. The pre-treatment leads (event time −4 to −2) should be near zero if parallel trends hold; the post-treatment lags (event time 0 to 3) trace the dynamic effect path.

Reading the plot: pre-treatment coefficients (0.34, −0.32, 0.59) hover around zero with insignificant p-values — parallel pre-trends look fine. Post-treatment coefficients (25.03, 24.71, 24.77, 25.70) snap up immediately and stay flat — no fade-out, no ramp-up. The program delivered its full benefit from period 0 and sustained it.

The 2×2 means table (Table 1 of the post)

PrePostChange
Comparison (25 schools)71.2282.10+10.88
Treated (10 schools)60.1796.37+36.20
DiD ATT+25.32

The DiD estimator is the difference of the two row-wise changes: 36.20 − 10.88 = 25.32. Equivalently, it is the difference of the two column-wise gaps: (96.37 − 82.10) − (60.17 − 71.22) = 14.27 − (−11.05) = 25.32. Same number, two routes — that is the "double difference."

Why so many ways to compute the same number?

The point-estimate column above is identical (to the third decimal) across five Stata commands. So why include all of them? The differences live in the standard error column:

  • reg with robust SE: 0.615 — heteroskedasticity-robust, but no clustering.
  • didregress (Stata 17+): 0.834 — auto-clusters at school level. Wider CI but more honest under within-school error correlation.
  • xtreg, fe with cluster: 0.585 — the smallest SE because within-school variation is the only variation used.
  • reghdfe: 0.585 — identical to xtreg but scales to many fixed effects.
  • reghdfe + female_share: 0.605 — adding an unrelated covariate doesn't change the estimate but mildly inflates the SE.

The lesson from the post: research design (DiD + parallel trends) drives the answer; SE choice fine-tunes inference but does not move the point estimate.