Data dictionary · Staggered Synthetic Difference-in-Differences: Gender Quotas and Women in Parliament

Downloads

Each dataset is available as a labeled Stata .dta and its source file.

⇩ Download all data (ZIP)stata_codebook.do

Dataset	Grain	Rows	Stata	Source
`quota_example`	country-year	3,094 × 7	quota_example.dta	quota_example.dta

Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.

Load directly in code

Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.

Stata

* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_sdid_staggered/data/"
use "${BASE}quota_example.dta", clear
describe
notes

Python

!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_sdid_staggered/data/"
df = pd.read_stata(BASE + "quota_example.dta")

# load every dataset at once
files = ["quota_example"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}

# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "quota_example.dta", "quota_example.dta")
df, meta = pyreadstat.read_dta("quota_example.dta")

Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb

R

# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_sdid_staggered/data/"
df <- read_dta(paste0(BASE, "quota_example.dta"))

Overview & sources

Companion data for a Stata tutorial that extends synthetic difference-in-differences (SDID) to staggered adoption, where units adopt treatment at different times. The single file is quota_example.dta, the balanced panel distributed with the sdid package (Bhalotra, Clarke, Gomes & Venkataramani, 2023): 119 countries observed annually from 1990 to 2015 (3,094 observations). The outcome is the share of seats held by women in the national parliament; the treatment is the adoption of a reserved-seat gender quota (absorbing — once adopted it stays on); the covariate is log GDP per capita. Treatment is staggered: 9 countries adopt a quota across 7 cohorts (2000, 2002, 2003, 2005, 2010, 2012, 2013) and 110 countries remain never-treated, forming the donor pool. The post estimates a separate, clean SDID per cohort against the never-treated controls, aggregates the cohort effects into an overall ATT of +8.0 percentage points, and complements it with the sdid_event event study and bootstrap, jackknife, and placebo inference.

One file, balanced panel. quota_example is an annual country panel (one row per country × year), 119 countries × 26 years = 3,094 rows with no gaps in the outcome or treatment. Set with xtset country year. The treatment quota is absorbing and switches on for only ~3% of country-years; quotaYear records each adopting country's cohort (missing for the 110 never-treated countries); lngdp has 104 missing values that matter only when used as a covariate.

Data sources

Source	Provides	Reference / URL
quota_example (sdid package)	The analysis panel — women-in-parliament outcome, gender-quota treatment, log GDP, quota-adoption year	Bhalotra, S., Clarke, D., Gomes, J. F., & Venkataramani, A. (2023). Maternal Mortality and Women's Political Power. Journal of the European Economic Association. https://doi.org/10.1093/jeea/jvad043
sdid (Stata package)	The estimator and the distributed example dataset (webuse quota_example)	Clarke, D., Pailañir, D., Athey, S., & Imbens, G. (2024). On Synthetic Difference-in-Differences and Related Estimation Methods in Stata. The Stata Journal, 24(4). ssc install sdid.
Method references	Estimators and concepts	Arkhangelsky, Athey, Hirshberg, Imbens & Wager (2021) — SDID; Goodman-Bacon (2021); de Chaisemartin & D'Haultfœuille (2020); Ciccia, Clarke & Pailañir (2024) — sdid_event.

Cite this data

Please cite this dataset as follows.

APA

Mendez, C. (2026). Staggered Synthetic Difference-in-Differences (SDID) in Stata: Gender Quotas and Women in Parliament [Data set]. https://carlos-mendez.org/post/stata_sdid_staggered/

Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic Difference-in-Differences. American Economic Review, 111(12), 4088–4118. https://doi.org/10.1257/aer.20190159  ·  Clarke, D., Pailañir, D., Athey, S., & Imbens, G. (2024). On Synthetic Difference-in-Differences and Related Estimation Methods in Stata. The Stata Journal, 24(4). https://doi.org/10.1177/1536867X241297184  ·  Bhalotra, S., Clarke, D., Gomes, J. F., & Venkataramani, A. (2023). Maternal Mortality and Women's Political Power. Journal of the European Economic Association. https://doi.org/10.1093/jeea/jvad043 (source of the quota_example data).

BibTeX

@misc{mendez2026statasdidstaggered,
  author       = {Mendez, Carlos},
  title        = {Staggered Synthetic Difference-in-Differences (SDID) in Stata: Gender Quotas and Women in Parliament},
  year         = {2026},
  howpublished = {\url{https://carlos-mendez.org/post/stata_sdid_staggered/}},
  note         = {Data set}
}

@article{arkhangelsky2021sdid,
  author  = {Arkhangelsky, Dmitry and Athey, Susan and Hirshberg, David A. and Imbens, Guido W. and Wager, Stefan},
  title   = {Synthetic Difference-in-Differences},
  journal = {American Economic Review},
  volume  = {111}, number = {12}, pages = {4088--4118}, year = {2021},
  doi     = {10.1257/aer.20190159}
}
@article{clarke2024sdid,
  author  = {Clarke, Damian and Paila{\~n}ir, Daniel and Athey, Susan and Imbens, Guido},
  title   = {On Synthetic Difference-in-Differences and Related Estimation Methods in Stata},
  journal = {The Stata Journal},
  volume  = {24}, number = {4}, year = {2024},
  doi     = {10.1177/1536867X241297184}
}
@article{bhalotra2023maternal,
  author  = {Bhalotra, Sonia and Clarke, Damian and Gomes, Joseph F. and Venkataramani, Atheendar},
  title   = {Maternal Mortality and Women's Political Power},
  journal = {Journal of the European Economic Association},
  year    = {2023},
  doi     = {10.1093/jeea/jvad043}
}

Variable explorer search & filter all 7 variables

Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.

Variable	Type	Distribution	Label	Definition	Units	In files	Source
`country`#	identifier	–	Country	Country name — the panel unit (i).	string	quota_example	quota_example (Bhalotra et al. 2023)
`lngdp`#	continuous		Log GDP per capita	Natural log of GDP per capita — the covariate (X).	log GDP	quota_example	quota_example (Bhalotra et al. 2023)
`lnmmrt`#	continuous		Maternal mortality	Natural log of the maternal mortality ratio (ships with the dataset; not used in the post's quota analysis).	log ratio	quota_example	quota_example (Bhalotra et al. 2023)
`quota`#	dummy		Parliamentary gender quota (=1)	Treatment indicator: 1 once a country has a reserved-seat gender quota, 0 before / never.	0/1	quota_example	quota_example (Bhalotra et al. 2023)
`quotaYear`#	year	–	Year quota adopted (cohort)	First year a country is treated — its adoption cohort; missing for the 110 never-treated countries.	year	quota_example	quota_example (Bhalotra et al. 2023)
`womparl`#	continuous		Women in parliament	Percentage of seats held by women in the national (lower) parliament — the outcome.	% of seats	quota_example	quota_example (Bhalotra et al. 2023)
`year`#	year	–	Year	Calendar year — the panel time index (t).	year	quota_example	quota_example (Bhalotra et al. 2023)

Cross-file variable index

Which file each variable appears in (● = present).

Variable	quota_example
`country`	●
`lngdp`	●
`lnmmrt`	●
`quota`	●
`quotaYear`	●
`womparl`	●
`year`	●

Construction & formulas

The estimand is the average treatment effect on the treated (ATT) — the effect of adopting a quota on the women-in-parliament share, in the countries that adopted one, averaged over their post-adoption years:

τ = (1 / N_tr · T_post) · Σ_(i: W_i=1) Σ_(t>T_pre) [ Y_it(1) − Y_it(0) ]

SDID (Arkhangelsky et al., 2021) is a weighted two-way fixed-effects regression that chooses the ATT plus a constant, unit fixed effects, and time fixed effects to minimize a weighted sum of squared residuals, weighting each observation by a unit weight ω_i times a time weight λ_t:

Objective: min Σ_i Σ_t (Y_it − μ − α_i − β_t − W_it·τ)² · ω_i · λ_t.
Unit weights ω: chosen (with an intercept and a ridge penalty) so a non-negative blend of control countries tracks the treated cohort's pre-period trend; the level gap is absorbed by the unit fixed effect.
Time weights λ: chosen so the weighted pre-period years best predict each control's post-period average — recent, similar years count more.

Staggered extension. Run single-cohort SDID once per adoption cohort a (cohort's treated units + never-treated controls only), obtaining τ_a, then aggregate with non-negative treated-period-share weights: ATT = Σ_a [ N_tr^a · T_post^a / Σ_b N_tr^b · T_post^b ] · τ_a. Because each cohort is compared only to never-treated controls, an already-treated unit is never used as a control for a later adopter — the contamination that breaks naive TWFE under staggered timing.

The datasets

Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.

expand to search (Ctrl/⌘+F) or print across all datasets

country-year 3,094 × 7 · 1990-2015 · 119 countries (9 ever-treated, 110 never-treated)

Panel key: country x year · Estimate the effect of gender quotas on women in parliament via staggered SDID.

Variable dictionary

Variable	Label	Definition	Construction	Units	Source	Coverage
`womparl` continuous	Women in parliament	Percentage of seats held by women in the national (lower) parliament — the outcome.	Distributed with the quota_example dataset; observed annually per country.	% of seats	quota_example (Bhalotra et al. 2023)	all 3,094 country-years
`lnmmrt` continuous	Maternal mortality	Natural log of the maternal mortality ratio (ships with the dataset; not used in the post's quota analysis).	Distributed with the quota_example dataset.	log ratio	quota_example (Bhalotra et al. 2023)	3,068 country-years (26 missing)
`country` identifier	Country	Country name — the panel unit (i).	119 countries; 9 ever adopt a quota, 110 never treated (the donor pool).	string	quota_example (Bhalotra et al. 2023)	119 countries
`year` year	Year	Calendar year — the panel time index (t).	Annual, 1990-2015 (26 years), balanced across all countries.	year	quota_example (Bhalotra et al. 2023)	1990-2015
`quota` dummy	Parliamentary gender quota (=1)	Treatment indicator: 1 once a country has a reserved-seat gender quota, 0 before / never.	Absorbing — switches to 1 in the adoption year and stays on; 1 for ~3% of country-years.	0/1	quota_example (Bhalotra et al. 2023)	all 3,094 country-years
`lngdp` continuous	Log GDP per capita	Natural log of GDP per capita — the covariate (X).	Distributed with the quota_example dataset; used in the optimized/projected covariate specifications.	log GDP	quota_example (Bhalotra et al. 2023)	2,990 country-years (104 missing)
`quotaYear` year	Year quota adopted (cohort)	First year a country is treated — its adoption cohort; missing for the 110 never-treated countries.	Cohorts: 2000, 2002, 2003, 2005, 2010, 2012, 2013 (two countries each in 2002 and 2003, one in the rest).	year	quota_example (Bhalotra et al. 2023)	234 treated country-years (9 countries); missing for 110 never-treated

Distribution & statistics (click a header to sort)

Variable	Distribution	Coverage	N	Distinct	Min	Mean	Median	Max	SD
`womparl`		100%	3,094	449	0	14.97	12.00	63.80	10.97
`lnmmrt`		99%	3,068	680	1.10	4.19	4.25	7.24	1.59
`country`	–	100%	3,094	119	—	—	—	—	—
`year`	–	100%	3,094	26	1990	2002.5	2002	2015	7.50
`quota`		100%	3,094	2	0	0.030	0	1.00	0.172
`lngdp`		97%	2,990	2,956	5.87	9.15	9.21	11.62	1.14
`quotaYear`	–	8%	234	7	2000	2005.6	2003	2013	4.56

Known limitations & caveats

Teaching subset. quota_example is the example dataset distributed with the sdid package, drawn from Bhalotra et al. (2023); the numbers illustrate the method, not a final verdict on quota policy.
Effect concentration. The +8 aggregate ATT leans heavily on a few cohorts — the 2012 cohort alone contributes +21.8 points and the early 2000/2002/2003 cohorts carry most of the aggregation weight; dropping 2012 lowers the average noticeably.
Fragile counterfactuals. With 110 controls and as few as one treated country per cohort, some synthetic controls are imprecise (the 2003 cohort's standard error of 9.13 is the tell).
Identifying assumptions. SDID requires no anticipation, an absorbing treatment, no cross-country spillovers, and that quota timing is not itself a response to the outcome's trajectory; the flat event-study placebos support, but cannot prove, the parallel-(synthetic-)trends assumption.
Missing covariate. lngdp has 104 missing country-years; SDID needs a balanced panel, so those rows are dropped before the covariate specifications and event study.