Downloads
Each dataset is available as a labeled Stata .dta and its source file.
⇩ Download all data (ZIP)stata_codebook.do
| Dataset | Grain | Rows | Stata | Source |
|---|---|---|---|---|
kansas | state-quarter | 5,250 × 32 | kansas.dta | kansas.csv |
Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.
Load directly in code
Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.
Stata
* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_augsynth/data/"
use "${BASE}kansas.dta", clear
describe
notesPython
!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_augsynth/data/"
df = pd.read_stata(BASE + "kansas.dta")
# load every dataset at once
files = ["kansas"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}
# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "kansas.dta", "kansas.dta")
df, meta = pyreadstat.read_dta("kansas.dta")Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb
R
# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_augsynth/data/"
df <- read_dta(paste0(BASE, "kansas.dta"))Overview & sources
Companion data for a beginner-friendly R tutorial on the Augmented Synthetic Control Method (ASCM) of Ben-Michael, Feller & Rothstein (2021), estimating the effect of the May 2012 Kansas personal-income-tax cut on log gross state product (GSP) per capita. The dataset is the kansas object shipped with the augsynth R package: a balanced panel of 50 U.S. states observed every quarter from 1990 Q1 to 2016 Q1 (105 quarters per state, 5,250 rows). Kansas (FIPS 20) is the single treated unit, switched on from 2012 Q2 (year_qtr = 2012.25) onward; the other 49 states form the donor pool. The outcome is lngdpcapita; per-capita revenue, wage, establishment and employment series serve as auxiliary covariates. The tutorial moves from classic SCM through ridge augmentation and covariate balancing, then compares four ways to do inference.
kansas is a balanced state-by-quarter panel (one row per state × quarter) over 1990 Q1–2016 Q1. The key is fips × year_qtr. The treatment indicator treated is 1 only for Kansas (FIPS 20) from 2012 Q2 onward (89 pre-treatment + 16 post-treatment quarters), 0 everywhere else. Three state/local revenue series are observed only annually and are NA in most quarters.
Data sources
| Source | Provides | Reference / URL |
|---|---|---|
| augsynth R package (kansas) | The full panel — the package's bundled kansas object, copied to CSV for reproducibility | Ben-Michael, E. augsynth: Augmented Synthetic Control Method. https://github.com/ebenmichael/augsynth |
| BEA / BLS QCEW (underlying) | Gross state product, per-capita revenue, wages, establishments and employment series | U.S. Bureau of Economic Analysis (GSP) and Bureau of Labor Statistics QCEW, as assembled in the augsynth kansas object. |
| Ben-Michael, Feller & Rothstein (2021) | Replicated method (ASCM) and the canonical Kansas application | Ben-Michael, E., Feller, A., & Rothstein, J. (2021). The Augmented Synthetic Control Method. Journal of the American Statistical Association, 116(536), 1789–1803. |
| Method references | Estimators and inference concepts | Abadie, Diamond & Hainmueller (2010); Abadie & Gardeazabal (2003); Chernozhukov, Wüthrich & Zhu (2021). |
Cite this data
Please cite this dataset as follows.
APA
Mendez, C. (2026). The Augmented Synthetic Control Method: A Beginner's Tutorial with the Kansas Tax Cuts [Data set]. https://carlos-mendez.org/post/r_augsynth/
Ben-Michael, E., Feller, A., & Rothstein, J. (2021). The Augmented Synthetic Control Method. Journal of the American Statistical Association, 116(536), 1789–1803. https://doi.org/10.1080/01621459.2021.1929245BibTeX
@misc{mendez2026raugsynth,
author = {Mendez, Carlos},
title = {The Augmented Synthetic Control Method: A Beginner's Tutorial with the Kansas Tax Cuts},
year = {2026},
howpublished = {\url{https://carlos-mendez.org/post/r_augsynth/}},
note = {Data set}
}
@article{benmichael2021augmented,
author = {Ben-Michael, Eli and Feller, Avi and Rothstein, Jesse},
title = {The Augmented Synthetic Control Method},
journal = {Journal of the American Statistical Association},
volume = {116}, number = {536}, pages = {1789--1803}, year = {2021},
doi = {10.1080/01621459.2021.1929245}
}Variable explorer search & filter all 32 variables
Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.
| Variable | Type | Distribution | Label | Definition | Units | In files | Source |
|---|---|---|---|---|---|---|---|
abb# | identifier | – | State abbreviation | Two-letter U.S. state postal abbreviation (e.g. KS for Kansas). | string | kansas | augsynth kansas |
avg_wkly_wage# | continuous | Average weekly wage | Average weekly wage in the state-quarter (QCEW). | US$ per week | kansas | BLS QCEW (via augsynth) | |
avgwklywagecapita# | continuous | Average weekly wage (per-capita covariate) | Average weekly wage used as an ASCM covariate (log-transformed in the model). | US$ per week | kansas | Derived | |
emplvl1capita# | continuous | Month-1 employment per capita | First-month employment level as a share of population. | ratio | kansas | Derived | |
emplvl2capita# | continuous | Month-2 employment per capita | Second-month employment level as a share of population. | ratio | kansas | Derived | |
emplvl3capita# | continuous | Month-3 employment per capita | Third-month employment level as a share of population. | ratio | kansas | Derived | |
emplvlcapita# | continuous | Employment per capita (covariate) | Quarterly employment level as a share of population; an ASCM covariate. | ratio | kansas | Derived | |
estabscapita# | continuous | Establishments per capita (covariate) | Business establishments as a share of population; an ASCM covariate. | ratio | kansas | Derived | |
fips# | identifier | – | State FIPS code | Federal Information Processing Standard numeric state identifier (Kansas = 20). | code | kansas | augsynth kansas |
gdp# | continuous | Gross state product (level) | Quarterly gross state product (gross domestic product by state). | US$ (millions) | kansas | BEA (via augsynth) | |
gdpcapita# | continuous | GSP per capita (level) | Gross state product per resident. | US$ per capita | kansas | Derived | |
lngdp# | continuous | Log gross state product | Natural log of quarterly gross state product. | log US$ | kansas | Derived | |
lngdpcapita# | continuous | Log GSP per capita (OUTCOME) | Natural log of gross state product per capita — the study outcome. | log US$ per capita | kansas | Derived | |
month1_emplvl# | continuous | Employment level, month 1 of quarter | Employment count in the first month of the quarter (QCEW). | persons | kansas | BLS QCEW (via augsynth) | |
month2_emplvl# | continuous | Employment level, month 2 of quarter | Employment count in the second month of the quarter (QCEW). | persons | kansas | BLS QCEW (via augsynth) | |
month3_emplvl# | continuous | Employment level, month 3 of quarter | Employment count in the third month of the quarter (QCEW). | persons | kansas | BLS QCEW (via augsynth) | |
popestimate# | continuous | Population estimate | Estimated state resident population for the quarter. | persons | kansas | augsynth kansas | |
qtr# | identifier | – | Calendar quarter (1-4) | Quarter of the year (1 = Q1 ... 4 = Q4). | 1-4 | kansas | augsynth kansas |
qtrly_estabs_count# | continuous | Quarterly establishment count | Number of business establishments in the state-quarter (QCEW). | establishments | kansas | BLS QCEW (via augsynth) | |
rev_local_total# | continuous | Total local government revenue (annual) | Total local-government revenue for the state-year; observed annually only. | US$ | kansas | augsynth kansas | |
rev_state_total# | continuous | Total state government revenue (annual) | Total state-government revenue for the state-year; observed annually only. | US$ | kansas | augsynth kansas | |
revenuepop# | continuous | State revenue per capita (annual) | Total state revenue per resident; observed annually only. | US$ per capita | kansas | augsynth kansas | |
revlocalcapita# | continuous | Local revenue per capita (annual) | Total local revenue per resident; annual covariate (log-transformed in the model). | US$ per capita | kansas | Derived | |
revstatecapita# | continuous | State revenue per capita (annual) | Total state revenue per resident; annual covariate (log-transformed in the model). | US$ per capita | kansas | Derived | |
state# | identifier | – | State name | Full U.S. state name. | string | kansas | augsynth kansas |
taxable_qtrly_wages# | continuous | Taxable quarterly wages | Portion of quarterly wages subject to unemployment-insurance tax (QCEW). | US$ | kansas | BLS QCEW (via augsynth) | |
taxwagescapita# | continuous | Taxable wages per capita | Taxable quarterly wages per resident. | US$ per capita | kansas | Derived | |
total_qtrly_wages# | continuous | Total quarterly wages | Total wages paid in the state-quarter (QCEW). | US$ | kansas | BLS QCEW (via augsynth) | |
totalwagescapita# | continuous | Total wages per capita | Total quarterly wages per resident. | US$ per capita | kansas | Derived | |
treated# | dummy | Treated indicator (1 = Kansas post-2012 Q2) | 1 for Kansas (FIPS 20) from 2012 Q2 onward, 0 for all other state-quarters. | 0/1 | kansas | augsynth kansas | |
year# | year | – | Calendar year | Year of the observation (1990-2016). | year | kansas | augsynth kansas |
year_qtr# | continuous | Decimal year-quarter | Time index as year plus quarter fraction (e.g. 2012.25 = 2012 Q2). | decimal year | kansas | Derived |
Cross-file variable index
Which file each variable appears in (● = present).
Construction & formulas
The estimand is the ATT — actual minus synthetic, for Kansas, after 2012 Q2 —
on the outcome lngdpcapita. Write X1 for Kansas's pre-treatment outcome
vector and X0 for the donors' matching matrix.
- Classic SCM weights (
progfunc = "None"):γ̂_scm = argmin_γ ‖X1 − X0'γ‖₂²subject toΣ γ_i = 1, γ_i ≥ 0— a non-negative, sum-to-one (convex) blend of donors. - Per-quarter effect:
τ̂_t = Y_1t − Σ_i γ̂_i Y_itfort > T₀(actual minus synthetic). - Ridge-augmented counterfactual (
progfunc = "Ridge"):Ŷ_aug(0) = Σ_i γ̂_i Y_iT + (m̂_1T − Σ_i γ̂_i m̂_iT)— SCM plus a bias correction from a ridge outcome modelm̂of post-period outcomes on lagged outcomes, penaltyλ‖η‖₂²(λ chosen by leave-one-pre-period-out CV, one-SE rule). - Covariate ASCM: list covariates after a
|in the formula (lngdpcapita ~ treated | log(revstatecapita) + … + emplvlcapita); both lagged outcomes and covariates enter the SCM balancing problem and the ridge model.
Per-capita series are constructed by dividing a state-quarter total by population
(popestimate); lngdp/lngdpcapita are natural logs of
gdp/gdpcapita. year_qtr encodes the quarter as
year + (qtr−1)/4 (e.g. 2012 Q2 = 2012.25).
The datasets
Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.
expand to search (Ctrl/⌘+F) or print across all datasets
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
fips identifier | State FIPS code | Federal Information Processing Standard numeric state identifier (Kansas = 20). | From the augsynth kansas object; the panel unit id. | code | augsynth kansas | 50 states |
year year | Calendar year | Year of the observation (1990-2016). | From the kansas object. | year | augsynth kansas | |
qtr identifier | Calendar quarter (1-4) | Quarter of the year (1 = Q1 ... 4 = Q4). | From the kansas object. | 1-4 | augsynth kansas | |
state identifier | State name | Full U.S. state name. | From the kansas object. | string | augsynth kansas | |
gdp continuous | Gross state product (level) | Quarterly gross state product (gross domestic product by state). | From the kansas object (BEA GSP). | US$ (millions) | BEA (via augsynth) | |
revenuepop continuous | State revenue per capita (annual) | Total state revenue per resident; observed annually only. | From the kansas object; NA in non-reporting quarters. | US$ per capita | augsynth kansas | annual (NA in 3 of 4 quarters) |
rev_state_total continuous | Total state government revenue (annual) | Total state-government revenue for the state-year; observed annually only. | From the kansas object; NA in non-reporting quarters. | US$ | augsynth kansas | annual (NA in 3 of 4 quarters) |
rev_local_total continuous | Total local government revenue (annual) | Total local-government revenue for the state-year; observed annually only. | From the kansas object; NA in non-reporting quarters. | US$ | augsynth kansas | annual (NA in 3 of 4 quarters) |
popestimate continuous | Population estimate | Estimated state resident population for the quarter. | From the kansas object; used as the per-capita denominator. | persons | augsynth kansas | |
qtrly_estabs_count continuous | Quarterly establishment count | Number of business establishments in the state-quarter (QCEW). | From the kansas object. | establishments | BLS QCEW (via augsynth) | |
month1_emplvl continuous | Employment level, month 1 of quarter | Employment count in the first month of the quarter (QCEW). | From the kansas object. | persons | BLS QCEW (via augsynth) | |
month2_emplvl continuous | Employment level, month 2 of quarter | Employment count in the second month of the quarter (QCEW). | From the kansas object. | persons | BLS QCEW (via augsynth) | |
month3_emplvl continuous | Employment level, month 3 of quarter | Employment count in the third month of the quarter (QCEW). | From the kansas object. | persons | BLS QCEW (via augsynth) | |
total_qtrly_wages continuous | Total quarterly wages | Total wages paid in the state-quarter (QCEW). | From the kansas object. | US$ | BLS QCEW (via augsynth) | |
taxable_qtrly_wages continuous | Taxable quarterly wages | Portion of quarterly wages subject to unemployment-insurance tax (QCEW). | From the kansas object. | US$ | BLS QCEW (via augsynth) | |
avg_wkly_wage continuous | Average weekly wage | Average weekly wage in the state-quarter (QCEW). | From the kansas object. | US$ per week | BLS QCEW (via augsynth) | |
year_qtr continuous | Decimal year-quarter | Time index as year plus quarter fraction (e.g. 2012.25 = 2012 Q2). | year + (qtr-1)/4; the panel's time variable in augsynth(). | decimal year | Derived | |
treated dummy | Treated indicator (1 = Kansas post-2012 Q2) | 1 for Kansas (FIPS 20) from 2012 Q2 onward, 0 for all other state-quarters. | Indicator switched on at year_qtr >= 2012.25 for Kansas only. | 0/1 | augsynth kansas | |
gdpcapita continuous | GSP per capita (level) | Gross state product per resident. | gdp / popestimate (scaled). | US$ per capita | Derived | |
lngdp continuous | Log gross state product | Natural log of quarterly gross state product. | log(gdp). | log US$ | Derived | |
lngdpcapita continuous | Log GSP per capita (OUTCOME) | Natural log of gross state product per capita — the study outcome. | log(gdpcapita). | log US$ per capita | Derived | |
revstatecapita continuous | State revenue per capita (annual) | Total state revenue per resident; annual covariate (log-transformed in the model). | rev_state_total / popestimate; NA in non-reporting quarters. | US$ per capita | Derived | annual (NA in 3 of 4 quarters) |
revlocalcapita continuous | Local revenue per capita (annual) | Total local revenue per resident; annual covariate (log-transformed in the model). | rev_local_total / popestimate; NA in non-reporting quarters. | US$ per capita | Derived | annual (NA in 3 of 4 quarters) |
emplvl1capita continuous | Month-1 employment per capita | First-month employment level as a share of population. | month1_emplvl / popestimate. | ratio | Derived | |
emplvl2capita continuous | Month-2 employment per capita | Second-month employment level as a share of population. | month2_emplvl / popestimate. | ratio | Derived | |
emplvl3capita continuous | Month-3 employment per capita | Third-month employment level as a share of population. | month3_emplvl / popestimate. | ratio | Derived | |
emplvlcapita continuous | Employment per capita (covariate) | Quarterly employment level as a share of population; an ASCM covariate. | Quarterly employment / popestimate. | ratio | Derived | |
totalwagescapita continuous | Total wages per capita | Total quarterly wages per resident. | total_qtrly_wages / popestimate. | US$ per capita | Derived | |
taxwagescapita continuous | Taxable wages per capita | Taxable quarterly wages per resident. | taxable_qtrly_wages / popestimate. | US$ per capita | Derived | |
avgwklywagecapita continuous | Average weekly wage (per-capita covariate) | Average weekly wage used as an ASCM covariate (log-transformed in the model). | From avg_wkly_wage; an auxiliary covariate. | US$ per week | Derived | |
estabscapita continuous | Establishments per capita (covariate) | Business establishments as a share of population; an ASCM covariate. | qtrly_estabs_count / popestimate. | ratio | Derived | |
abb identifier | State abbreviation | Two-letter U.S. state postal abbreviation (e.g. KS for Kansas). | From the kansas object. | string | augsynth kansas |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
fips | – | 100% | 5,250 | 50 | — | — | — | — | — |
year | – | 100% | 5,250 | 27 | 1990 | 2002.6 | 2003 | 2016 | 7.58 |
qtr | – | 100% | 5,250 | 4 | — | — | — | — | — |
state | – | 100% | 5,250 | 50 | — | — | — | — | — |
gdp | 100% | 5,250 | 5,227 | 11,509 | 228,237 | 130,650 | 2,568,986 | 298,949 | |
revenuepop | 57% | 3,000 | 716 | 1,334.6 | 3,851.1 | 3,628.5 | 14,609 | 1,352.8 | |
rev_state_total | 46% | 2,400 | 600 | 1,667.6 | 20,813 | 13,868 | 182,530 | 24,051 | |
rev_local_total | 46% | 2,400 | 600 | 550.0 | 17,197 | 10,041 | 143,137 | 23,640 | |
popestimate | 100% | 5,250 | 5,249 | 453,690 | 5,767,107 | 3,997,978 | 39,250,017 | 6,352,795 | |
qtrly_estabs_count | 100% | 5,250 | 5,177 | 15,133 | 161,021 | 108,822 | 1,448,488 | 189,559 | |
month1_emplvl | 100% | 5,250 | 5,240 | 178,737 | 2,482,331 | 1,675,988 | 16,600,851 | 2,656,261 | |
month2_emplvl | 100% | 5,250 | 5,244 | 178,587 | 2,494,933 | 1,684,341 | 16,633,834 | 2,669,643 | |
month3_emplvl | 100% | 5,250 | 5,240 | 181,521 | 2,510,204 | 1,699,044 | 16,606,038 | 2,684,404 | |
total_qtrly_wages | 100% | 5,250 | 5,246 | 881,111,969 | 24,019,317,100 | 13,622,726,754 | 275,326,853,350 | 31,004,422,823 | |
taxable_qtrly_wages | 100% | 5,250 | 3,051 | 0 | 3,775,737,420 | 1,095,917,898 | 76,893,472,173 | 7,327,919,613 | |
avg_wkly_wage | 100% | 5,250 | 854 | 301.0 | 674.8 | 658.0 | 1,792.0 | 196.2 | |
year_qtr | 100% | 5,250 | 105 | 1,990.0 | 2,003.0 | 2,003.0 | 2,016.0 | 7.58 | |
treated | 100% | 5,250 | 2 | 0 | 0.003 | 0 | 1.00 | 0.055 | |
gdpcapita | 100% | 5,250 | 5,250 | 15,029 | 37,808 | 36,449 | 84,382 | 12,552 | |
lngdp | 100% | 5,250 | 5,227 | 9.35 | 11.75 | 11.78 | 14.76 | 1.09 | |
lngdpcapita | 100% | 5,250 | 5,250 | 9.62 | 10.49 | 10.50 | 11.34 | 0.331 | |
revstatecapita | 46% | 2,400 | 2,400 | 2,021.3 | 3,741.9 | 3,380.1 | 20,353 | 1,608.9 | |
revlocalcapita | 46% | 2,400 | 2,400 | 883.6 | 2,480.2 | 2,428.3 | 7,160.9 | 784.0 | |
emplvl1capita | 100% | 5,250 | 5,250 | 0.325 | 0.437 | 0.436 | 1.05 | 0.041 | |
emplvl2capita | 100% | 5,250 | 5,250 | 0.325 | 0.439 | 0.438 | 1.05 | 0.040 | |
emplvl3capita | 100% | 5,250 | 5,250 | 0.329 | 0.442 | 0.441 | 1.05 | 0.041 | |
emplvlcapita | 100% | 5,250 | 5,250 | 0.327 | 0.439 | 0.438 | 1.05 | 0.041 | |
totalwagescapita | 100% | 5,250 | 5,250 | 1,492.9 | 3,869.4 | 3,787.3 | 10,275 | 1,216.3 | |
taxwagescapita | 100% | 5,250 | 3,051 | 0 | 728.8 | 355.7 | 5,254.4 | 924.7 | |
avgwklywagecapita | 100% | 5,250 | 854 | 301.0 | 674.8 | 658.0 | 1,792.0 | 196.2 | |
estabscapita | 100% | 5,250 | 5,250 | 0.020 | 0.029 | 0.028 | 0.071 | 0.005 | |
abb | – | 100% | 5,250 | 50 | — | — | — | — | — |
Known limitations & caveats
- Net effect, not tax-only. The ATT is the gap between actual Kansas and a synthetic blend; over 2012–2016 Kansas also faced a severe drought and aerospace-sector shocks that SCM cannot strip out separately.
- Annual revenue series.
rev_state_total,rev_local_total,revenuepop, and the per-capitarevstatecapita/revlocalcapitaare recorded only once a year and areNAin the other three quarters (2,250–2,850 missing of 5,250). - Single treated unit, modest donor pool. One Kansas and 49 donors limit statistical power; the four inference methods disagree at the margin (jackknife+ excludes zero; conformal p = 0.066; permutation p = 0.10).
- SCM identifying assumption. Results rest on a convex (or lightly extrapolated) blend of donors standing in for Kansas's untreated path, and on no donor state being affected by Kansas's policy.