Do gender quotas raise women in parliament? Nine countries, seven clocks
Between 2000 and 2013, nine countries adopted parliamentary gender quotas — each in a different year. This is a staggered adoption design, where the textbook two-way fixed-effects regression quietly breaks. Staggered SDID fixes it by building a separate synthetic control for each adoption cohort and aggregating them. Use the tabs to take the analysis apart.
(percentage points)
adoption cohorts
(heterogeneity)
Average treatment effect on the treated (ATT), in percentage points of women in the national parliament, 119 countries, 1990–2015.
Block design, staggered design, and how to aggregate
The single-adoption "block" design (one unit treated in one year) is the textbook SDID setting. Real policies usually arrive on different clocks. Staggered SDID runs the block analysis once per cohort — always against the never-treated controls — then averages the cohort effects with transparent, non-negative weights.
Block design
Predecessor: California, Prop 99
One treated unit, one adoption year (1989). A single synthetic control, a single ATT, placebo inference.
Staggered design
This post: 9 countries, 7 cohorts
Each cohort gets its own synthetic control built from the 110 never-treated countries — never reusing an already-treated unit as a control.
Aggregation
Weights: treated unit-years
The overall ATT is the cohort effects weighted by each cohort's share of treated country-years. Earlier, longer-exposed cohorts count more.
Cohort effects
See all seven cohort effects with their confidence intervals against the aggregate ATT — and the staggered adoption timeline that produces them.
Weights & counterfactual
Pick a cohort and watch its synthetic control redraw — the treated path against its anchored counterfactual, plus the donor countries that build it.
Event study
Read the dynamic effect by event time: flat placebo coefficients before adoption, a sustained rise after — the shape behind the single number.
Why an average can mislead
The headline ATT is +8.0 points, but the cohort effects run from −3.5 (the 2005 cohort) to +21.8 (the 2012 cohort). A single number is an honest summary only if you remember the spread underneath it. And the naive two-way fixed-effects regression most analysts reach for first does worse than summarize — under staggered timing it can use already-treated countries as controls and return a contaminated, even sign-flipped, estimate.
Glossary — open a card if a term is unfamiliar
Staggered adoption
ATT
Adoption cohort
Unit weights (ω)
Time weights (λ)
Event time
Pre-trend placebo
Forbidden comparison
Cohort effects — one SDID per adoption year
Each adoption cohort gets its own clean SDID estimate. The diamonds are the cohort effects with 95% confidence intervals; the teal line is the treated-period-weighted aggregate ATT. Hover a cohort to read its effect, precision, and aggregation weight.
The staggered adoption timeline
Steel = pre-adoption (control) years; orange = treated years through 2015. The staircase is the staggered design.
What to look for
- The aggregate is not the simple average. The teal line (8.0) is weighted by treated country-years, so the earlier, longer-exposed 2000/2002/2003 cohorts pull it up; the plain mean of the seven effects is about 7.0.
- Precision varies wildly. Most cohorts are tightly estimated, but the 2003 cohort's interval runs from −4 to +32 (SE 9.1) — a fragile synthetic control built from few controls.
- Two cohorts are negative. The 2005 and 2013 cohorts sit below zero. Heterogeneity, not a universal effect, is the real finding.
Weights & counterfactual — pick a cohort, build its synthetic twin
For each cohort, SDID blends never-treated donor countries into a synthetic control that tracks the cohort's pre-adoption trend (the level gap is absorbed by the unit fixed effect, so we anchor the synthetic to the treated path for display). Choose a cohort and watch its counterfactual and donor mix redraw.
Top donor countries (ω)
The never-treated countries whose weighted blend forms this cohort's synthetic control. Top 12 shown.
How to read the counterfactual
Before adoption, the treated cohort (orange) and its anchored synthetic control (steel) move together — SDID built the synthetic to match the pre-trend. After the vertical adoption line they separate: the gap is the cohort's effect. A wide, diffuse donor mix (many small ω) is more robust than leaning on one or two countries.
SDID matches the pre-period trend, not the level — the unit fixed effect absorbs any constant gap, which is why we anchor the synthetic to the treated path here.
What to look for
- Pre-adoption overlap is the credibility test. If the two lines hug each other before the adoption year, the synthetic control is doing its job.
- The post-adoption gap is the effect. Switch cohorts and watch the divergence flip sign for 2005 and 2013 — the negative cohorts.
- Diffuse weights beat sparse ones. The more donor countries share the weight, the less any single idiosyncratic country can distort the counterfactual.
Event study — when does the effect appear?
The sdid_event command traces the effect by event
time — years relative to adoption — for the 2002 cohort. Points left
of zero are placebo tests of the parallel-trends assumption;
points right of zero are the dynamic ATT. Hover any point for
its estimate and confidence interval.
Three things to read in this plot
The baseline is λ-weighted. SDID measures everything against the optimally weighted pre-period average, not the single year before adoption — so the zero line is a weighted baseline. Pre-period points are placebos. Here every pre-adoption coefficient sits within a whisker of zero, so we cannot reject parallel synthetic trends. Post-period points are the dynamic ATT. The effect appears immediately at adoption, roughly doubles within a year, and persists above zero for over a decade.
What to look for
- Flat before, climbing after. The visual signature of a credible event study: placebos near zero, then a sustained post-adoption rise.
- The confidence band widens with horizon. Late event-times rest on fewer comparisons, so the ribbon fans out — read the far-right points with caution.
- The post points average to the cohort ATT. Aggregated by the same treated-period logic, they reproduce the 2002 cohort's ≈ +7 effect — the number the plot unpacks.