← Back to the post
Interactive data dictionary

Bayesian Spatial Synthetic Control: California's Proposition 99

The Abadie et al. (2010) tobacco panel — 39 US states, 1970–2000 — used to fit classical, Bayesian-horseshoe, and Bayesian-spatial synthetic control in R.

1
dataset
6
variables
39
states
1970–2000
years

Downloads

Each dataset is available as a labeled Stata .dta and its source file.

⇩ Download all data (ZIP)stata_codebook.do

DatasetGrainRowsStataSource
r_sc_bayes_spatial_source_datastate-year1,209 × 6r_sc_bayes_spatial_source_data.dtar_sc_bayes_spatial_source_data.csv

Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.

Load directly in code

Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.

Stata

* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_sc_bayes_spatial/data/"
use "${BASE}r_sc_bayes_spatial_source_data.dta", clear
describe
notes

Python

!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_sc_bayes_spatial/data/"
df = pd.read_stata(BASE + "r_sc_bayes_spatial_source_data.dta")

# load every dataset at once
files = ["r_sc_bayes_spatial_source_data"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}

# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "r_sc_bayes_spatial_source_data.dta", "r_sc_bayes_spatial_source_data.dta")
df, meta = pyreadstat.read_dta("r_sc_bayes_spatial_source_data.dta")

Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb

R

# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_sc_bayes_spatial/data/"
df <- read_dta(paste0(BASE, "r_sc_bayes_spatial_source_data.dta"))

Overview & sources

Companion data for an R tutorial that replicates the California case study of Sakaguchi & Tagawa (2026) on cigarette consumption and Proposition 99. The single file is the balanced state-year panel bundled in the scspill replication package — the same real tobacco panel introduced by Abadie, Diamond & Hainmueller (2010): per-capita cigarette sales (cigsale) and real retail price (retprice) for 39 US states over 1970–2000. California is the one treated unit (Prop 99 switches on in 1988); the other 38 states form the donor pool. The post fits three nested synthetic-control estimators on this panel — classical SCM (tidysynth), a Bayesian horseshoe-prior SCM, and a Bayesian spatial SCM with a spatial autoregressive (SAR) layer — and reads the ATT, the donor weights, and the per-state spillovers off each.

One file. r_sc_bayes_spatial_source_data.csv is a balanced annual state panel: one row per state × year, 39 states × 31 years = 1,209 rows. The treatment dummy is 1 only for California in 1988–2000 (13 rows); 18 pre-treatment years (1970–1987) and 13 post-treatment years (1988–2000). The contiguity weights used by the SAR layer (California's w row and the 38×38 donor W matrix) ship separately inside the scspill package and are not part of this CSV.

Data sources

SourceProvidesReference / URL
Sakaguchi &amp; Tagawa (2026)Inspiring study + the scspill replication package that bundles this exact panel and the spatial weightsSakaguchi, S. & Tagawa, H. (2026). Identification and Bayesian Inference for Synthetic Control Methods with Spillover Effects. The Econometrics Journal. https://doi.org/10.1093/ectj/utag006 (replication package: Zenodo record 19066186).
Abadie, Diamond &amp; Hainmueller (2010)Original California tobacco panel (cigsale, retprice) and the synthetic control methodAbadie, A., Diamond, A. & Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of California's tobacco control program. Journal of the American Statistical Association, 105(490), 493–505. https://doi.org/10.1198/jasa.2009.ap08746
Method referencesEstimators and conceptsCarvalho, Polson & Scott (2010, horseshoe prior); LeSage & Pace (2009, spatial econometrics).

Cite this data

Please cite this dataset as follows.

APA

Mendez, C. (2026). Bayesian Spatial Synthetic Control: California's Proposition 99 in R [Data set]. https://carlos-mendez.org/post/r_sc_bayes_spatial/

Sakaguchi, S., & Tagawa, H. (2026). Identification and Bayesian Inference for Synthetic Control Methods with Spillover Effects. The Econometrics Journal. https://doi.org/10.1093/ectj/utag006
Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of California's tobacco control program. Journal of the American Statistical Association, 105(490), 493–505. https://doi.org/10.1198/jasa.2009.ap08746

BibTeX

@misc{mendez2026rscbayesspatial,
  author       = {Mendez, Carlos},
  title        = {Bayesian Spatial Synthetic Control: California's Proposition 99 in R},
  year         = {2026},
  howpublished = {\url{https://carlos-mendez.org/post/r_sc_bayes_spatial/}},
  note         = {Data set}
}

@article{sakaguchi2026spillover,
  author  = {Sakaguchi, Shosei and Tagawa, Hisahiro},
  title   = {Identification and {Bayesian} Inference for Synthetic Control Methods with Spillover Effects},
  journal = {The Econometrics Journal},
  year    = {2026},
  doi     = {10.1093/ectj/utag006}
}
@article{abadie2010synthetic,
  author  = {Abadie, Alberto and Diamond, Alexis and Hainmueller, Jens},
  title   = {Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program},
  journal = {Journal of the American Statistical Association},
  volume  = {105}, number = {490}, pages = {493--505}, year = {2010},
  doi     = {10.1198/jasa.2009.ap08746}
}

Variable explorer search & filter all 6 variables

Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.

VariableTypeDistributionLabelDefinitionUnitsIn filesSource
cigsale#continuousmin 40.7 | median 116 | max 296Per-capita cigarette sales (packs)Annual per-capita cigarette pack sales — the synthetic-control outcome.packs per capita per yearr_sc_bayes_spatial_source_datascspill package (Abadie et al. 2010)
retprice#continuousmin 27.3 | median 95.5 | max 351Real retail price (cents/pack)Average real retail price of a cigarette pack — the lone SAR covariate X.cents per pack (real)r_sc_bayes_spatial_source_datascspill package (Abadie et al. 2010)
state#identifierState nameUS state identifier (the treated unit is California; the other 38 are donors).stringr_sc_bayes_spatial_source_datascspill package (Abadie et al. 2010)
state_id#identifierState numeric IDInteger index for the state (1-39); California is 3.integer (1-39)r_sc_bayes_spatial_source_datascspill package
treatment#dummyshare coded 1 = 0.011Treatment dummy (1=CA post-Prop 99)1 for California in the post-treatment period, else 0 — flags the treated unit-years.0/1r_sc_bayes_spatial_source_dataConstructed in analysis.R
year#yearCalendar yearAnnual time index of the panel.yearr_sc_bayes_spatial_source_datascspill package (Abadie et al. 2010)

Cross-file variable index

Which file each variable appears in (● = present).

Variabler_sc_bayes_spatial_source_data
cigsale
retprice
state
state_id
treatment
year

Construction & formulas

The data are an observed state-year panel; the post derives no new columns — the three estimators below all operate on the same cigsale outcome. California is the treated unit; the 38 other states are donors.

The treatment dummy is constructed as 1 if state == "California" & year ≥ 1988 else 0 (the package's Prop 99 convention). Everything else in the file is observed data read directly from the replication package.

The datasets

Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.

expand to search (Ctrl/⌘+F) or print across all datasets

state-year  1,209 × 6 · 1970-2000 · 39 US states (balanced)

Panel key: state (state_id) x year · Fit classical, Bayesian-horseshoe, and Bayesian-spatial synthetic control for California's Prop 99 ATT.

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
state identifierState nameUS state identifier (the treated unit is California; the other 38 are donors).From the scspill replication package panel (Abadie et al. 2010).stringscspill package (Abadie et al. 2010)39 states
state_id identifierState numeric IDInteger index for the state (1-39); California is 3.Sequential package index aligned to the alphabetical state list.integer (1-39)scspill package39 states
year yearCalendar yearAnnual time index of the panel.Observed year, 1970-2000 (balanced; 31 years per state).yearscspill package (Abadie et al. 2010)1970-2000
cigsale continuousPer-capita cigarette sales (packs)Annual per-capita cigarette pack sales — the synthetic-control outcome.Observed tax-paid cigarette sales per capita, from the Abadie et al. (2010) tobacco data.packs per capita per yearscspill package (Abadie et al. 2010)all rows
retprice continuousReal retail price (cents/pack)Average real retail price of a cigarette pack — the lone SAR covariate X.Observed retail price per pack from the Abadie et al. (2010) tobacco data.cents per pack (real)scspill package (Abadie et al. 2010)all rows
treatment dummyTreatment dummy (1=CA post-Prop 99)1 for California in the post-treatment period, else 0 — flags the treated unit-years.1 if state == 'California' & year >= 1988, else 0 (package Prop 99 convention).0/1Constructed in analysis.R13 ones (California 1988-2000); 1,196 zeros

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
state100%1,20939
state_id100%1,20939
year100%1,2093119701985.0198520008.95
cigsalemin 40.7 | median 116 | max 296100%1,20970340.70118.9116.3296.232.77
retpricemin 27.3 | median 95.5 | max 351100%1,20984927.30108.395.50351.264.38
treatmentshare coded 1 = 0.011100%1,209200.01101.000.103

Known limitations & caveats