Downloads
Each dataset is available as a labeled Stata .dta and its source file.
⇩ Download all data (ZIP)stata_codebook.do
| Dataset | Grain | Rows | Stata | Source |
|---|---|---|---|---|
quota_example | country-year | 3,094 × 7 | quota_example.dta | quota_example.dta |
Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.
Load directly in code
Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.
Stata
* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_sdid_staggered/data/"
use "${BASE}quota_example.dta", clear
describe
notesPython
!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_sdid_staggered/data/"
df = pd.read_stata(BASE + "quota_example.dta")
# load every dataset at once
files = ["quota_example"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}
# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "quota_example.dta", "quota_example.dta")
df, meta = pyreadstat.read_dta("quota_example.dta")Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb
R
# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_sdid_staggered/data/"
df <- read_dta(paste0(BASE, "quota_example.dta"))Overview & sources
Companion data for a Stata tutorial that extends synthetic difference-in-differences (SDID) to staggered adoption, where units adopt treatment at different times. The single file is quota_example.dta, the balanced panel distributed with the sdid package (Bhalotra, Clarke, Gomes & Venkataramani, 2023): 119 countries observed annually from 1990 to 2015 (3,094 observations). The outcome is the share of seats held by women in the national parliament; the treatment is the adoption of a reserved-seat gender quota (absorbing — once adopted it stays on); the covariate is log GDP per capita. Treatment is staggered: 9 countries adopt a quota across 7 cohorts (2000, 2002, 2003, 2005, 2010, 2012, 2013) and 110 countries remain never-treated, forming the donor pool. The post estimates a separate, clean SDID per cohort against the never-treated controls, aggregates the cohort effects into an overall ATT of +8.0 percentage points, and complements it with the sdid_event event study and bootstrap, jackknife, and placebo inference.
quota_example is an annual country panel (one row per country × year), 119 countries × 26 years = 3,094 rows with no gaps in the outcome or treatment. Set with xtset country year. The treatment quota is absorbing and switches on for only ~3% of country-years; quotaYear records each adopting country's cohort (missing for the 110 never-treated countries); lngdp has 104 missing values that matter only when used as a covariate.
Data sources
| Source | Provides | Reference / URL |
|---|---|---|
| quota_example (sdid package) | The analysis panel — women-in-parliament outcome, gender-quota treatment, log GDP, quota-adoption year | Bhalotra, S., Clarke, D., Gomes, J. F., & Venkataramani, A. (2023). Maternal Mortality and Women's Political Power. Journal of the European Economic Association. https://doi.org/10.1093/jeea/jvad043 |
| sdid (Stata package) | The estimator and the distributed example dataset (webuse quota_example) | Clarke, D., Pailañir, D., Athey, S., & Imbens, G. (2024). On Synthetic Difference-in-Differences and Related Estimation Methods in Stata. The Stata Journal, 24(4). ssc install sdid. |
| Method references | Estimators and concepts | Arkhangelsky, Athey, Hirshberg, Imbens & Wager (2021) — SDID; Goodman-Bacon (2021); de Chaisemartin & D'Haultfœuille (2020); Ciccia, Clarke & Pailañir (2024) — sdid_event. |
Cite this data
Please cite this dataset as follows.
APA
Mendez, C. (2026). Staggered Synthetic Difference-in-Differences (SDID) in Stata: Gender Quotas and Women in Parliament [Data set]. https://carlos-mendez.org/post/stata_sdid_staggered/
Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic Difference-in-Differences. American Economic Review, 111(12), 4088–4118. https://doi.org/10.1257/aer.20190159 · Clarke, D., Pailañir, D., Athey, S., & Imbens, G. (2024). On Synthetic Difference-in-Differences and Related Estimation Methods in Stata. The Stata Journal, 24(4). https://doi.org/10.1177/1536867X241297184 · Bhalotra, S., Clarke, D., Gomes, J. F., & Venkataramani, A. (2023). Maternal Mortality and Women's Political Power. Journal of the European Economic Association. https://doi.org/10.1093/jeea/jvad043 (source of the quota_example data).BibTeX
@misc{mendez2026statasdidstaggered,
author = {Mendez, Carlos},
title = {Staggered Synthetic Difference-in-Differences (SDID) in Stata: Gender Quotas and Women in Parliament},
year = {2026},
howpublished = {\url{https://carlos-mendez.org/post/stata_sdid_staggered/}},
note = {Data set}
}
@article{arkhangelsky2021sdid,
author = {Arkhangelsky, Dmitry and Athey, Susan and Hirshberg, David A. and Imbens, Guido W. and Wager, Stefan},
title = {Synthetic Difference-in-Differences},
journal = {American Economic Review},
volume = {111}, number = {12}, pages = {4088--4118}, year = {2021},
doi = {10.1257/aer.20190159}
}
@article{clarke2024sdid,
author = {Clarke, Damian and Paila{\~n}ir, Daniel and Athey, Susan and Imbens, Guido},
title = {On Synthetic Difference-in-Differences and Related Estimation Methods in Stata},
journal = {The Stata Journal},
volume = {24}, number = {4}, year = {2024},
doi = {10.1177/1536867X241297184}
}
@article{bhalotra2023maternal,
author = {Bhalotra, Sonia and Clarke, Damian and Gomes, Joseph F. and Venkataramani, Atheendar},
title = {Maternal Mortality and Women's Political Power},
journal = {Journal of the European Economic Association},
year = {2023},
doi = {10.1093/jeea/jvad043}
}Variable explorer search & filter all 7 variables
Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.
| Variable | Type | Distribution | Label | Definition | Units | In files | Source |
|---|---|---|---|---|---|---|---|
country# | identifier | – | Country | Country name — the panel unit (i). | string | quota_example | quota_example (Bhalotra et al. 2023) |
lngdp# | continuous | Log GDP per capita | Natural log of GDP per capita — the covariate (X). | log GDP | quota_example | quota_example (Bhalotra et al. 2023) | |
lnmmrt# | continuous | Maternal mortality | Natural log of the maternal mortality ratio (ships with the dataset; not used in the post's quota analysis). | log ratio | quota_example | quota_example (Bhalotra et al. 2023) | |
quota# | dummy | Parliamentary gender quota (=1) | Treatment indicator: 1 once a country has a reserved-seat gender quota, 0 before / never. | 0/1 | quota_example | quota_example (Bhalotra et al. 2023) | |
quotaYear# | year | – | Year quota adopted (cohort) | First year a country is treated — its adoption cohort; missing for the 110 never-treated countries. | year | quota_example | quota_example (Bhalotra et al. 2023) |
womparl# | continuous | Women in parliament | Percentage of seats held by women in the national (lower) parliament — the outcome. | % of seats | quota_example | quota_example (Bhalotra et al. 2023) | |
year# | year | – | Year | Calendar year — the panel time index (t). | year | quota_example | quota_example (Bhalotra et al. 2023) |
Cross-file variable index
Which file each variable appears in (● = present).
Construction & formulas
The estimand is the average treatment effect on the treated (ATT) — the effect of adopting a quota on the women-in-parliament share, in the countries that adopted one, averaged over their post-adoption years:
τ = (1 / N_tr · T_post) · Σ_(i: W_i=1) Σ_(t>T_pre) [ Y_it(1) − Y_it(0) ]
SDID (Arkhangelsky et al., 2021) is a weighted two-way fixed-effects regression
that chooses the ATT plus a constant, unit fixed effects, and time fixed effects to minimize a
weighted sum of squared residuals, weighting each observation by a unit weight
ω_i times a time weight λ_t:
- Objective:
min Σ_i Σ_t (Y_it − μ − α_i − β_t − W_it·τ)² · ω_i · λ_t. - Unit weights
ω: chosen (with an intercept and a ridge penalty) so a non-negative blend of control countries tracks the treated cohort's pre-period trend; the level gap is absorbed by the unit fixed effect. - Time weights
λ: chosen so the weighted pre-period years best predict each control's post-period average — recent, similar years count more.
Staggered extension. Run single-cohort SDID once per adoption cohort
a (cohort's treated units + never-treated controls only), obtaining
τ_a, then aggregate with non-negative treated-period-share weights:
ATT = Σ_a [ N_tr^a · T_post^a / Σ_b N_tr^b · T_post^b ] · τ_a.
Because each cohort is compared only to never-treated controls, an already-treated unit is never
used as a control for a later adopter — the contamination that breaks naive TWFE under staggered
timing.
The datasets
Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.
expand to search (Ctrl/⌘+F) or print across all datasets
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
womparl continuous | Women in parliament | Percentage of seats held by women in the national (lower) parliament — the outcome. | Distributed with the quota_example dataset; observed annually per country. | % of seats | quota_example (Bhalotra et al. 2023) | all 3,094 country-years |
lnmmrt continuous | Maternal mortality | Natural log of the maternal mortality ratio (ships with the dataset; not used in the post's quota analysis). | Distributed with the quota_example dataset. | log ratio | quota_example (Bhalotra et al. 2023) | 3,068 country-years (26 missing) |
country identifier | Country | Country name — the panel unit (i). | 119 countries; 9 ever adopt a quota, 110 never treated (the donor pool). | string | quota_example (Bhalotra et al. 2023) | 119 countries |
year year | Year | Calendar year — the panel time index (t). | Annual, 1990-2015 (26 years), balanced across all countries. | year | quota_example (Bhalotra et al. 2023) | 1990-2015 |
quota dummy | Parliamentary gender quota (=1) | Treatment indicator: 1 once a country has a reserved-seat gender quota, 0 before / never. | Absorbing — switches to 1 in the adoption year and stays on; 1 for ~3% of country-years. | 0/1 | quota_example (Bhalotra et al. 2023) | all 3,094 country-years |
lngdp continuous | Log GDP per capita | Natural log of GDP per capita — the covariate (X). | Distributed with the quota_example dataset; used in the optimized/projected covariate specifications. | log GDP | quota_example (Bhalotra et al. 2023) | 2,990 country-years (104 missing) |
quotaYear year | Year quota adopted (cohort) | First year a country is treated — its adoption cohort; missing for the 110 never-treated countries. | Cohorts: 2000, 2002, 2003, 2005, 2010, 2012, 2013 (two countries each in 2002 and 2003, one in the rest). | year | quota_example (Bhalotra et al. 2023) | 234 treated country-years (9 countries); missing for 110 never-treated |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
womparl | 100% | 3,094 | 449 | 0 | 14.97 | 12.00 | 63.80 | 10.97 | |
lnmmrt | 99% | 3,068 | 680 | 1.10 | 4.19 | 4.25 | 7.24 | 1.59 | |
country | – | 100% | 3,094 | 119 | — | — | — | — | — |
year | – | 100% | 3,094 | 26 | 1990 | 2002.5 | 2002 | 2015 | 7.50 |
quota | 100% | 3,094 | 2 | 0 | 0.030 | 0 | 1.00 | 0.172 | |
lngdp | 97% | 2,990 | 2,956 | 5.87 | 9.15 | 9.21 | 11.62 | 1.14 | |
quotaYear | – | 8% | 234 | 7 | 2000 | 2005.6 | 2003 | 2013 | 4.56 |
Known limitations & caveats
- Teaching subset.
quota_exampleis the example dataset distributed with thesdidpackage, drawn from Bhalotra et al. (2023); the numbers illustrate the method, not a final verdict on quota policy. - Effect concentration. The +8 aggregate ATT leans heavily on a few cohorts — the 2012 cohort alone contributes +21.8 points and the early 2000/2002/2003 cohorts carry most of the aggregation weight; dropping 2012 lowers the average noticeably.
- Fragile counterfactuals. With 110 controls and as few as one treated country per cohort, some synthetic controls are imprecise (the 2003 cohort's standard error of 9.13 is the tell).
- Identifying assumptions. SDID requires no anticipation, an absorbing treatment, no cross-country spillovers, and that quota timing is not itself a response to the outcome's trajectory; the flat event-study placebos support, but cannot prove, the parallel-(synthetic-)trends assumption.
- Missing covariate.
lngdphas 104 missing country-years; SDID needs a balanced panel, so those rows are dropped before the covariate specifications and event study.