Staggered Synthetic Difference-in-Differences

Do gender quotas raise women in parliament? Nine countries, seven clocks

Between 2000 and 2013, nine countries adopted parliamentary gender quotas — each in a different year. This is a staggered adoption design, where the textbook two-way fixed-effects regression quietly breaks. Staggered SDID fixes it by building a separate synthetic control for each adoption cohort and aggregating them. Use the tabs to take the analysis apart.

+8.0

Overall ATT
(percentage points)

9 / 7

adopting countries /
adoption cohorts

−3.5 … +21.8

range of cohort effects
(heterogeneity)

Average treatment effect on the treated (ATT), in percentage points of women in the national parliament, 119 countries, 1990–2015.

Block design, staggered design, and how to aggregate

The single-adoption "block" design (one unit treated in one year) is the textbook SDID setting. Real policies usually arrive on different clocks. Staggered SDID runs the block analysis once per cohort — always against the never-treated controls — then averages the cohort effects with transparent, non-negative weights.

Block design

Predecessor: California, Prop 99

One treated unit, one adoption year (1989). A single synthetic control, a single ATT, placebo inference.

Staggered design

This post: 9 countries, 7 cohorts

Each cohort gets its own synthetic control built from the 110 never-treated countries — never reusing an already-treated unit as a control.

Aggregation

Weights: treated unit-years

The overall ATT is the cohort effects weighted by each cohort's share of treated country-years. Earlier, longer-exposed cohorts count more.

Tab 2

Cohort effects

See all seven cohort effects with their confidence intervals against the aggregate ATT — and the staggered adoption timeline that produces them.

Tab 3

Weights & counterfactual

Pick a cohort and watch its synthetic control redraw — the treated path against its anchored counterfactual, plus the donor countries that build it.

Tab 4

Event study

Read the dynamic effect by event time: flat placebo coefficients before adoption, a sustained rise after — the shape behind the single number.

Why an average can mislead

The headline ATT is +8.0 points, but the cohort effects run from −3.5 (the 2005 cohort) to +21.8 (the 2012 cohort). A single number is an honest summary only if you remember the spread underneath it. And the naive two-way fixed-effects regression most analysts reach for first does worse than summarize — under staggered timing it can use already-treated countries as controls and return a contaminated, even sign-flipped, estimate.

Glossary — open a card if a term is unfamiliar

Staggered adoption

Units adopt treatment in different years (here, quota cohorts in 2000, 2002, 2003, 2005, 2010, 2012, 2013) rather than all at once.

ATT

Average treatment effect on the treated — the effect of quotas in the countries that adopted them, averaged over their post-adoption years.

Adoption cohort

The countries that first adopt in the same year. Staggered SDID estimates one effect per cohort, then aggregates. The 2002 and 2003 cohorts have two countries each; the rest one.

Unit weights (ω)

How much each never-treated donor country counts toward a cohort's synthetic control. Each cohort gets its own ω.

Time weights (λ)

How much each pre-adoption year counts toward the baseline — and the baseline the event study measures against.

Event time

Years relative to a cohort's own adoption (… −2, −1, 0, +1 …), so cohorts that adopted in different calendar years can be compared.

Pre-trend placebo

Event-study coefficients for pre-adoption periods. Near zero ⇒ treated and synthetic moved in parallel before treatment — the identifying assumption made visible.

Forbidden comparison

When a regression uses an already-treated unit as a control for a later adopter. Naive TWFE does this under staggered timing; staggered SDID never does.

Cohort effects — one SDID per adoption year

Each adoption cohort gets its own clean SDID estimate. The diamonds are the cohort effects with 95% confidence intervals; the teal line is the treated-period-weighted aggregate ATT. Hover a cohort to read its effect, precision, and aggregation weight.

Cohort

—

hover a diamond

Cohort effect (pp)

—

Post-adoption years

—

exposure window

Aggregation weight

—

share of treated country-years

The staggered adoption timeline

Steel = pre-adoption (control) years; orange = treated years through 2015. The staircase is the staggered design.

What to look for

The aggregate is not the simple average. The teal line (8.0) is weighted by treated country-years, so the earlier, longer-exposed 2000/2002/2003 cohorts pull it up; the plain mean of the seven effects is about 7.0.
Precision varies wildly. Most cohorts are tightly estimated, but the 2003 cohort's interval runs from −4 to +32 (SE 9.1) — a fragile synthetic control built from few controls.
Two cohorts are negative. The 2005 and 2013 cohorts sit below zero. Heterogeneity, not a universal effect, is the real finding.

Weights & counterfactual — pick a cohort, build its synthetic twin

For each cohort, SDID blends never-treated donor countries into a synthetic control that tracks the cohort's pre-adoption trend (the level gap is absorbed by the unit fixed effect, so we anchor the synthetic to the treated path for display). Choose a cohort and watch its counterfactual and donor mix redraw.

Cohort

2000

—

Cohort effect (pp)

—

post-adoption divergence

Donor countries with weight

—

out of 110 never-treated

Top donor countries (ω)

The never-treated countries whose weighted blend forms this cohort's synthetic control. Top 12 shown.

How to read the counterfactual

Before adoption, the treated cohort (orange) and its anchored synthetic control (steel) move together — SDID built the synthetic to match the pre-trend. After the vertical adoption line they separate: the gap is the cohort's effect. A wide, diffuse donor mix (many small ω) is more robust than leaning on one or two countries.

SDID matches the pre-period trend, not the level — the unit fixed effect absorbs any constant gap, which is why we anchor the synthetic to the treated path here.

What to look for

Pre-adoption overlap is the credibility test. If the two lines hug each other before the adoption year, the synthetic control is doing its job.
The post-adoption gap is the effect. Switch cohorts and watch the divergence flip sign for 2005 and 2013 — the negative cohorts.
Diffuse weights beat sparse ones. The more donor countries share the weight, the less any single idiosyncratic country can distort the counterfactual.

Event study — when does the effect appear?

The sdid_event command traces the effect by event time — years relative to adoption — for the 2002 cohort. Points left of zero are placebo tests of the parallel-trends assumption; points right of zero are the dynamic ATT. Hover any point for its estimate and confidence interval.

Largest pre-period |coef|

—

placebo — should be ≈ 0

Effect at adoption (t = 0)

—

immediate jump

Average post-period effect

—

dynamic ATT, 2002 cohort

Three things to read in this plot

The baseline is λ-weighted. SDID measures everything against the optimally weighted pre-period average, not the single year before adoption — so the zero line is a weighted baseline. Pre-period points are placebos. Here every pre-adoption coefficient sits within a whisker of zero, so we cannot reject parallel synthetic trends. Post-period points are the dynamic ATT. The effect appears immediately at adoption, roughly doubles within a year, and persists above zero for over a decade.

What to look for

Flat before, climbing after. The visual signature of a credible event study: placebos near zero, then a sustained post-adoption rise.
The confidence band widens with horizon. Late event-times rest on fewer comparisons, so the ribbon fans out — read the far-right points with caution.
The post points average to the cohort ATT. Aggregated by the same treated-period logic, they reproduce the 2002 cohort's ≈ +7 effect — the number the plot unpacks.