← Back to the post
Interactive data dictionary

California's Proposition 99: The Synthetic DiD Panel

The canonical state-cigarette panel behind synthetic difference-in-differences, in Stata.

1
dataset
4
variables
39
states
1970–2000
years

Downloads

Each dataset is available as a labeled Stata .dta and its source file.

⇩ Download all data (ZIP)stata_codebook.do

DatasetGrainRowsStataSource
prop99_examplestate-year1,209 × 4prop99_example.dtaprop99_example.dta

Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.

Load directly in code

Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.

Stata

* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_sdid/data/"
use "${BASE}prop99_example.dta", clear
describe
notes

Python

!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_sdid/data/"
df = pd.read_stata(BASE + "prop99_example.dta")

# load every dataset at once
files = ["prop99_example"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}

# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "prop99_example.dta", "prop99_example.dta")
df, meta = pyreadstat.read_dta("prop99_example.dta")

Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb

R

# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_sdid/data/"
df <- read_dta(paste0(BASE, "prop99_example.dta"))

Overview & sources

Companion data for a hands-on Stata tutorial on synthetic difference-in-differences (SDID), applied to re-evaluate California's Proposition 99 — the 1988 ballot measure that raised the cigarette excise tax by 25 cents a pack and funded an anti-smoking campaign. The file is the canonical strongly balanced panel distributed with the sdid package (originally from Abadie, Diamond & Hainmueller 2010, and used by Arkhangelsky et al. 2021): 39 US states observed annually from 1970–2000 — 1,209 observations — with annual cigarette sales in packs per capita as the sole outcome. California is the single treated unit; the policy bites from 1989 onward. The post writes DiD, synthetic control, and SDID as one weighted two-way fixed-effects regression and estimates the ATT of Proposition 99 with the sdid command, cross-checking synthetic control against synth2.

One file, one outcome. prop99_example is a strongly balanced state-year panel (one row per state × year, no gaps) carrying a single outcome — cigarette packs per capita — and a 0/1 treatment indicator. Of the 1,209 observations only 12 are treated (California, 1989–2000). The panel deliberately contains no covariates, so synthetic control and SDID see exactly the same information set (the pre-period smoking paths) — an apples-to-apples comparison.

Data sources

SourceProvidesReference / URL
Abadie, Diamond &amp; Hainmueller (2010)Original Proposition 99 panel (39 states, 1970–2000, packs per capita)Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program. Journal of the American Statistical Association, 105(490), 493–505. https://doi.org/10.1198/jasa.2009.ap08746
sdid package (Clarke et al. 2024)Distribution of prop99_example.dta; the sdid estimation commandClarke, D., Pailañir, D., Athey, S., & Imbens, G. (2024). On Synthetic Difference-in-Differences and Related Estimation Methods in Stata. The Stata Journal (st0757). https://doi.org/10.1177/1536867X241297184
Method referencesEstimators and conceptsArkhangelsky, Athey, Hsiao, Imbens & Wager (2021) — synthetic DiD; Abadie & Gardeazabal (2003) — synthetic control; Yan & Chen (2023) — synth2.

Cite this data

Please cite this dataset as follows.

APA

Mendez, C. (2026). Synthetic Difference-in-Differences (SDID) in Stata: Re-evaluating California's Proposition 99 [Data set]. https://carlos-mendez.org/post/stata_sdid/

Arkhangelsky, D., Athey, S., Hsiao, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic Difference-in-Differences. American Economic Review, 111(12), 4088–4118. https://doi.org/10.1257/aer.20190159 Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program. Journal of the American Statistical Association, 105(490), 493–505. https://doi.org/10.1198/jasa.2009.ap08746

BibTeX

@misc{mendez2026statasdid,
  author       = {Mendez, Carlos},
  title        = {Synthetic Difference-in-Differences (SDID) in Stata: Re-evaluating California's Proposition 99},
  year         = {2026},
  howpublished = {\url{https://carlos-mendez.org/post/stata_sdid/}},
  note         = {Data set}
}

@article{arkhangelsky2021synthetic,
  author  = {Arkhangelsky, Dmitry and Athey, Susan and Hsiao, David A. and Imbens, Guido W. and Wager, Stefan},
  title   = {Synthetic Difference-in-Differences},
  journal = {American Economic Review},
  volume  = {111}, number = {12}, pages = {4088--4118}, year = {2021},
  doi     = {10.1257/aer.20190159}
}
@article{abadie2010synthetic,
  author  = {Abadie, Alberto and Diamond, Alexis and Hainmueller, Jens},
  title   = {Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program},
  journal = {Journal of the American Statistical Association},
  volume  = {105}, number = {490}, pages = {493--505}, year = {2010},
  doi     = {10.1198/jasa.2009.ap08746}
}

Variable explorer search & filter all 4 variables

Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.

VariableTypeDistributionLabelDefinitionUnitsIn filesSource
packspercapita#continuousmin 40.7 | median 116 | max 296Cigarette sales (packs per capita)Annual per-capita cigarette pack sales — the sole outcome Y_it. Mean about 119 packs; range roughly 41-296.packs per capita per yearprop99_exampleAbadie et al. (2010) / sdid package
state#identifierStateUS state name — the panel unit. 39 states: California (treated) plus 38 control states forming the donor pool.stringprop99_exampleAbadie et al. (2010) / sdid package
treated#dummyshare coded 1 = 0.010Treated indicator (Prop 99)Treatment status W_it: 1 for California in 1989-2000 (the 12 post-Proposition-99 years), 0 otherwise. Only 12 of 1,209 observations are treated.0/1prop99_exampleAbadie et al. (2010) / sdid package
year#yearYearCalendar year — the panel time index (19 pre-treatment years 1970-1988, 12 post-treatment years 1989-2000).yearprop99_exampleAbadie et al. (2010) / sdid package

Cross-file variable index

Which file each variable appears in (● = present).

Variableprop99_example
packspercapita
state
treated
year

Construction & formulas

Every estimator in the post is the same weighted two-way fixed-effects (TWFE) regression over packspercapita (Y) and the treatment indicator treated (W), changing only the weights — the unifying view of Arkhangelsky et al. (2021).

Estimand (ATT). τ = (1 / (N_tr·T_post)) · Σ_{i: W_i=1} Σ_{t > T_pre} [ Y_it(1) − Y_it(0) ] — the effect of Proposition 99 on California over the post-1988 period, where the counterfactual Y_it(0) is never observed and each method imputes it differently. Here N_tr = 1 (California). Because California was not randomly assigned, this is an observational design.

The datasets

Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.

expand to search (Ctrl/⌘+F) or print across all datasets

state-year  1,209 × 4 · 1970-2000 · 39 US states (strongly balanced)

Panel key: state x year · Estimate the ATT of California's Proposition 99 (DiD / synthetic control / SDID).

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
state identifierStateUS state name — the panel unit. 39 states: California (treated) plus 38 control states forming the donor pool.From the distributed dataset; encoded to a numeric id in the post (encode state, gen(id)) for xtset/synth2.stringAbadie et al. (2010) / sdid package39 states
year yearYearCalendar year — the panel time index (19 pre-treatment years 1970-1988, 12 post-treatment years 1989-2000).Annual observations; strongly balanced (every state observed every year, no gaps).yearAbadie et al. (2010) / sdid package1970-2000
packspercapita continuousCigarette sales (packs per capita)Annual per-capita cigarette pack sales — the sole outcome Y_it. Mean about 119 packs; range roughly 41-296.Distributed outcome series; the only outcome in the panel (no income, price, or demographic covariates).packs per capita per yearAbadie et al. (2010) / sdid packageall state-years
treated dummyTreated indicator (Prop 99)Treatment status W_it: 1 for California in 1989-2000 (the 12 post-Proposition-99 years), 0 otherwise. Only 12 of 1,209 observations are treated.1 where state == California and year >= 1989; the single-treated-unit block design.0/1Abadie et al. (2010) / sdid packageall state-years

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
state100%1,20939
year100%1,2093119701985.0198520008.95
packspercapitamin 40.7 | median 116 | max 296100%1,20970340.70118.9116.3296.232.77
treatedshare coded 1 = 0.010100%1,209200.01001.000.099

Known limitations & caveats