Data dictionary · Difference-in-Differences for Regional Data

Downloads

Each dataset is available as a labeled Stata .dta and its source file.

⇩ Download all data (ZIP)stata_codebook.do

Dataset	Grain	Rows	Stata	Source
`raw_data`	county-year	31,843 × 22	raw_data.dta	raw_data.csv
`data_prepared`	county-year	28,644 × 17	data_prepared.dta	data_prepared.csv

Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.

Load directly in code

Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.

Stata

* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_did2/data/"
use "${BASE}raw_data.dta", clear
describe
notes

Python

!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_did2/data/"
df = pd.read_stata(BASE + "raw_data.dta")

# load every dataset at once
files = ["raw_data", "data_prepared"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}

# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "raw_data.dta", "raw_data.dta")
df, meta = pyreadstat.read_dta("raw_data.dta")

Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb

R

# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_did2/data/"
df <- read_dta(paste0(BASE, "raw_data.dta"))

Overview & sources

Companion data for a hands-on R tutorial that asks whether the Affordable Care Act's staggered Medicaid expansion reduced adult mortality, and uses the question to show how population weighting changes the target parameter when the units (U.S. counties) differ in size by orders of magnitude. Following Baker, Callaway, Cunningham, Goodman-Bacon and Sant'Anna's (2025) Practitioner's Guide, the post runs an eight-stage DiD pipeline — 2×2 cell means, three equivalent TWFE specifications, covariate-adjusted OR/IPW/DRDID, a 2×T event study, the Callaway–Sant'Anna staggered ATT(g,t) design, and a Rambachan–Roth HonestDiD sensitivity analysis — computing every estimate both unweighted and weighted by 2013 adult population. The headline 2×2 ATT(2014) flips sign with weighting, from +0.122 deaths per 100,000 unweighted to −2.563 weighted, while no 95% confidence interval at any stage comfortably excludes zero. The two estimands answer different questions — the typical treated county versus the typical treated adult.

Two files. raw_data is the merged CDC county mortality file (deaths per 100,000 adults aged 20–64) joined to state-level ACA Medicaid-expansion timing, as downloaded — one row per county × year, 2009–2019, before any cleaning. data_prepared is the balanced analysis sample built from it: drop the five pre-2014 expanders (DC, DE, MA, NY, VT), require full mortality coverage 2009–2019 and full covariate coverage in 2013–2014, and add the modeling columns (covariate shares, fixed 2013 population weight, treatment-year and post indicators).

Data sources

Source	Provides	Reference / URL
CDC WONDER (mortality)	County-level death counts and crude mortality rates for adults aged 20-64, plus population denominators	U.S. Centers for Disease Control and Prevention, CDC WONDER. https://wonder.cdc.gov/
ACA Medicaid expansion timing	State Medicaid-expansion adoption status, year, and month (the staggered treatment timing)	KFF, Status of State Medicaid Expansion Decisions. https://www.kff.org/medicaid/issue-brief/status-of-state-medicaid-expansion-decisions-interactive-map/
County socioeconomic covariates	Unemployment, poverty, and median household income by county-year (baseline covariates)	U.S. Census Bureau / Bureau of Labor Statistics county-level series (as merged in the source file).
Replicated study & method references	Empirical example, estimators, and concepts	Baker et al. (2025, arXiv:2503.13323); Callaway & Sant'Anna (2021); Sant'Anna & Zhao (2020); Rambachan & Roth (2023); Imbens & Rubin (2015).

Cite this data

Please cite this dataset as follows.

APA

Mendez, C. (2026). Difference-in-Differences for Regional Data: Did Medicaid Expansion Reduce Mortality? [Data set]. https://carlos-mendez.org/post/r_did2/

Baker, A., Callaway, B., Cunningham, S., Goodman-Bacon, A., & Sant'Anna, P. H. C. (2025). Difference-in-Differences Designs: A Practitioner's Guide. arXiv:2503.13323. Callaway, B., & Sant'Anna, P. H. C. (2021). Difference-in-Differences with multiple time periods. Journal of Econometrics, 225(2), 200-230. Sant'Anna, P. H. C., & Zhao, J. (2020). Doubly robust difference-in-differences estimators. Journal of Econometrics, 219(1), 101-122. Rambachan, A., & Roth, J. (2023). A More Credible Approach to Parallel Trends. Review of Economic Studies, 90(5), 2555-2591.

BibTeX

@misc{mendez2026rdid2,
  author       = {Mendez, Carlos},
  title        = {Difference-in-Differences for Regional Data: Did Medicaid Expansion Reduce Mortality?},
  year         = {2026},
  howpublished = {\url{https://carlos-mendez.org/post/r_did2/}},
  note         = {Data set}
}

@article{baker2025did,
  author  = {Baker, Andrew and Callaway, Brantly and Cunningham, Scott and Goodman-Bacon, Andrew and Sant'Anna, Pedro H. C.},
  title   = {Difference-in-Differences Designs: A Practitioner's Guide},
  journal = {arXiv preprint arXiv:2503.13323}, year = {2025}
}
@article{callaway2021did,
  author  = {Callaway, Brantly and Sant'Anna, Pedro H. C.},
  title   = {Difference-in-Differences with multiple time periods},
  journal = {Journal of Econometrics}, volume = {225}, number = {2},
  pages   = {200--230}, year = {2021}
}
@article{santanna2020drdid,
  author  = {Sant'Anna, Pedro H. C. and Zhao, Jun},
  title   = {Doubly robust difference-in-differences estimators},
  journal = {Journal of Econometrics}, volume = {219}, number = {1},
  pages   = {101--122}, year = {2020}
}
@article{rambachan2023honest,
  author  = {Rambachan, Ashesh and Roth, Jonathan},
  title   = {A More Credible Approach to Parallel Trends},
  journal = {Review of Economic Studies}, volume = {90}, number = {5},
  pages   = {2555--2591}, year = {2023}
}

Variable explorer search & filter all 30 variables

Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.

Variable	Type	Distribution	Label	Definition	Units	In files	Source
`Description`#	identifier	–	Expansion description (free text)	Free-text note on the state's expansion implementation (date, retroactivity, etc.).	string	raw_data	KFF / ACA timing
`Post`#	dummy		Post-2014 period dummy	1 for years 2014 and later, else 0 (the post period in the 2x2 design).	0/1	data_prepared	Derived
`Treat_2014`#	dummy		2014-cohort treatment dummy	1 if the county's state expanded Medicaid in 2014, else 0.	0/1	data_prepared	Derived
`county`#	identifier	–	County name (with state abbrev.)	County name followed by its two-letter state abbreviation, e.g. "Autauga County, AL".	string	raw_data, data_prepared	CDC WONDER
`county_code`#	identifier	–	County FIPS code	Five-digit federal (FIPS) county identifier; the unit id for the panel.	FIPS	raw_data, data_prepared	CDC WONDER
`crude_rate_20_64`#	continuous		Crude mortality rate, adults 20-64	Deaths per 100,000 adults aged 20-64 — the DiD outcome variable.	per 100,000	raw_data, data_prepared	CDC WONDER
`deaths`#	continuous		Deaths, adults 20-64	Count of deaths among adults aged 20-64 in the county-year.	count	raw_data	CDC WONDER
`expansion_status`#	identifier	–	ACA Medicaid expansion status (text)	Whether the state had adopted and implemented Medicaid expansion.	category	raw_data	KFF / ACA timing
`labor_force`#	continuous		Civilian labor force	County civilian labor force count (denominator of the raw unemployment rate).	persons	raw_data	BLS (merged)
`maca`#	identifier	–	Month of ACA Medicaid expansion	Calendar month (1-12) in which the state implemented expansion; missing for never-expanders.	month (1-12)	raw_data	KFF / ACA timing
`median_income`#	continuous		Median household income	County median household income; in the raw file expressed in US$, rescaled to thousands of US$ in the prepared data.	US$ (raw) / US$ 000s (prepared)	raw_data, data_prepared	Census (merged)
`perc_female`#	continuous		Female share, adults 20-64 (%)	Percent of the county's 20-64 population that is female (baseline covariate).	%	data_prepared	Derived (CDC)
`perc_hispanic`#	continuous		Hispanic share, adults 20-64 (%)	Percent of the county's 20-64 population that is Hispanic (baseline covariate).	%	data_prepared	Derived (CDC)
`perc_white`#	continuous		White share, adults 20-64 (%)	Percent of the county's 20-64 population that is white (baseline covariate).	%	data_prepared	Derived (CDC)
`population_20_64`#	continuous		Population aged 20-64	County adult population aged 20-64 (denominator of the crude mortality rate).	persons	raw_data, data_prepared	CDC WONDER
`population_20_64_female`#	continuous		Population 20-64, female	Female adults aged 20-64 (numerator for perc_female).	persons	raw_data	CDC WONDER
`population_20_64_hispanic`#	continuous		Population 20-64, Hispanic	Adults aged 20-64 identifying as Hispanic (numerator for perc_hispanic).	persons	raw_data	CDC WONDER
`population_20_64_white`#	continuous		Population 20-64, white	White adults aged 20-64 (numerator for perc_white).	persons	raw_data	CDC WONDER
`population_total`#	continuous		Total county population	Total resident population of the county-year (all ages).	persons	raw_data	CDC WONDER
`poverty_rate`#	continuous		Poverty rate (%)	Share of the county population below the federal poverty line.	%	raw_data, data_prepared	Census (merged)
`set_wt`#	continuous		Fixed 2013 adult population weight	Each county's 2013 population aged 20-64, held constant across all 11 years (the population weight).	persons	data_prepared	Derived (CDC)
`state`#	identifier	–	State name	Full name of the U.S. state the county belongs to.	string	raw_data	CDC WONDER
`state_abb`#	identifier	–	State abbreviation	Two-letter U.S. state postal abbreviation.	code	data_prepared	Derived
`stfips`#	identifier	–	State FIPS code	Numeric federal (FIPS) identifier for the state (1-56).	FIPS	raw_data	CDC WONDER
`treat_year`#	identifier	–	Treatment year (did convention)	Year the county's state expanded Medicaid (2014/2015/2016/2019), or 0 for never-treated counties.	year / 0	data_prepared	Derived
`unemp_rate`#	continuous		Unemployment rate (%)	County unemployment rate (baseline covariate).	%	raw_data, data_prepared	Derived (BLS)
`unemployed`#	continuous		Number unemployed	County count of unemployed persons in the labor force.	persons	raw_data	BLS (merged)
`yaca`#	year	–	Year of ACA Medicaid expansion	Calendar year the state implemented Medicaid expansion; missing (NA) for never-expanders.	year	raw_data, data_prepared	KFF / ACA timing
`year`#	year	–	Calendar year	Annual time index of the observation.	year	raw_data, data_prepared	CDC WONDER
`year_code`#	year	–	Year code (CDC)	CDC WONDER's year code; numerically equal to the calendar year here.	year	raw_data	CDC WONDER

Cross-file variable index

Which file each variable appears in (● = present).

Variable	raw_data	data_prepared
`Description`	●
`Post`		●
`Treat_2014`		●
`county`	●	●
`county_code`	●	●
`crude_rate_20_64`	●	●
`deaths`	●
`expansion_status`	●
`labor_force`	●
`maca`	●
`median_income`	●	●
`perc_female`		●
`perc_hispanic`		●
`perc_white`		●
`population_20_64`	●	●
`population_20_64_female`	●
`population_20_64_hispanic`	●
`population_20_64_white`	●
`population_total`	●
`poverty_rate`	●	●
`set_wt`		●
`state`	●
`state_abb`		●
`stfips`	●
`treat_year`		●
`unemp_rate`	●	●
`unemployed`	●
`yaca`	●	●
`year`	●	●
`year_code`	●

Construction & formulas

The outcome is the CDC crude mortality rate crude_rate_20_64 (deaths per 100,000 adults aged 20–64). Every estimate is computed twice: unweighted (each county counts equally — the ATT for the typical treated county) and population-weighted by the fixed 2013 adult population set_wt (each adult counts equally — the ATT for the typical treated adult).

2×2 cell-means DiD (ATT(2014)): (Ȳ_T,post − Ȳ_T,pre) − (Ȳ_C,post − Ȳ_C,pre) — the treated group's 2013→2014 change minus the control group's change.
TWFE 2×2: Y_it = β₀ + β₁·1{D=1} + β₂·1{t=2014} + β^(2×2)·(D×Post) + ε_it; on a balanced 2×2 panel the Levels, two-way-FE, and long-difference forms recover the same β^(2×2).
Normalized difference (covariate balance): (X̄_T − X̄_C) / √[(S²_T + S²_C)/2]; |value| > 0.25 flags imbalance (Imbens & Rubin 2015).
Doubly-robust DiD (DRDID): (1/n) Σ (ŵ_{D=1} − ŵ_{D=0})(ΔY_i − μ̂_{Δ,D=0}(X_i)) — consistent if either the outcome model or the propensity model is correct (Sant'Anna & Zhao 2020).
Group-time ATT (Callaway–Sant'Anna): ATT(g,t) = E[Y_it(g) − Y_it(∞) | G_i = g], aggregated by cohort and by event time.
HonestDiD relative magnitudes: bound the post-period parallel-trends violation at a multiple M̄ of the worst observed pre-period violation; the breakdown value is the smallest M̄ that overturns the sign.

Constructed columns in data_prepared (built by the post's R script from raw_data): perc_white = population_20_64_white / population_20_64 · 100 (and likewise perc_hispanic, perc_female); unemp_rate rescaled to percent (×100); median_income rescaled to thousands of US$ (÷1000); set_wt = each county's 2013 population_20_64, held constant across years; treat_year = yaca if it falls in 2014–2019, else 0 (the did never-treated convention); Treat_2014 = 1 if yaca == 2014; Post = 1 if year ≥ 2014.

The datasets

Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.

expand to search (Ctrl/⌘+F) or print across all datasets

county-year 31,843 × 22 · 2009-2019 · U.S. counties, 51 state codes (~2,900 counties; unbalanced)

Panel key: county_code x year · Source file as downloaded — merged CDC mortality, population, covariates, and state Medicaid-expansion status before any inclusion filtering.

Variable dictionary

Variable	Label	Definition	Construction	Units	Source	Coverage
`state` identifier	State name	Full name of the U.S. state the county belongs to.	From the CDC mortality file.	string	CDC WONDER	raw file
`stfips` identifier	State FIPS code	Numeric federal (FIPS) identifier for the state (1-56).	From the CDC mortality file.	FIPS	CDC WONDER	raw file
`county` identifier	County name (with state abbrev.)	County name followed by its two-letter state abbreviation, e.g. "Autauga County, AL".	From the CDC mortality file; the trailing two characters give state_abb in the prepared data.	string	CDC WONDER	both files
`county_code` identifier	County FIPS code	Five-digit federal (FIPS) county identifier; the unit id for the panel.	From the CDC mortality file.	FIPS	CDC WONDER	both files
`year` year	Calendar year	Annual time index of the observation.	From the CDC mortality file (2009-2019).	year	CDC WONDER	both files
`year_code` year	Year code (CDC)	CDC WONDER's year code; numerically equal to the calendar year here.	From the CDC mortality file.	year	CDC WONDER	raw file
`deaths` continuous	Deaths, adults 20-64	Count of deaths among adults aged 20-64 in the county-year.	From the CDC mortality file (numerator of the crude rate).	count	CDC WONDER	raw file
`population_20_64` continuous	Population aged 20-64	County adult population aged 20-64 (denominator of the crude mortality rate).	From the CDC mortality file.	persons	CDC WONDER	both files
`crude_rate_20_64` continuous	Crude mortality rate, adults 20-64	Deaths per 100,000 adults aged 20-64 — the DiD outcome variable.	100,000 x deaths / population_20_64 (as supplied by CDC WONDER).	per 100,000	CDC WONDER	both files
`population_total` continuous	Total county population	Total resident population of the county-year (all ages).	From the CDC mortality file.	persons	CDC WONDER	raw file
`population_20_64_hispanic` continuous	Population 20-64, Hispanic	Adults aged 20-64 identifying as Hispanic (numerator for perc_hispanic).	From the CDC mortality file.	persons	CDC WONDER	raw file
`population_20_64_female` continuous	Population 20-64, female	Female adults aged 20-64 (numerator for perc_female).	From the CDC mortality file.	persons	CDC WONDER	raw file
`population_20_64_white` continuous	Population 20-64, white	White adults aged 20-64 (numerator for perc_white).	From the CDC mortality file.	persons	CDC WONDER	raw file
`unemployed` continuous	Number unemployed	County count of unemployed persons in the labor force.	From the county labor-market series in the merged file.	persons	BLS (merged)	raw file
`labor_force` continuous	Civilian labor force	County civilian labor force count (denominator of the raw unemployment rate).	From the county labor-market series in the merged file.	persons	BLS (merged)	raw file
`unemp_rate` continuous	Unemployment rate (%)	County unemployment rate (baseline covariate).	Raw fractional rate rescaled to percent (x100) in the prepared data.	%	Derived (BLS)	prepared file
`poverty_rate` continuous	Poverty rate (%)	Share of the county population below the federal poverty line.	From the county socioeconomic series in the merged file (already in percent).	%	Census (merged)	both files
`median_income` continuous	Median household income	County median household income; in the raw file expressed in US$, rescaled to thousands of US$ in the prepared data.	From the county socioeconomic series; the prepared data divides by 1,000.	US$ (raw) / US$ 000s (prepared)	Census (merged)	both files
`expansion_status` identifier	ACA Medicaid expansion status (text)	Whether the state had adopted and implemented Medicaid expansion.	From the state ACA-expansion timing source.	category	KFF / ACA timing	raw file
`Description` identifier	Expansion description (free text)	Free-text note on the state's expansion implementation (date, retroactivity, etc.).	From the state ACA-expansion timing source.	string	KFF / ACA timing	raw file
`yaca` year	Year of ACA Medicaid expansion	Calendar year the state implemented Medicaid expansion; missing (NA) for never-expanders.	Parsed from the state ACA-expansion timing source; arrives as a string with "NA" sentinels and is coerced to numeric.	year	KFF / ACA timing	both files
`maca` identifier	Month of ACA Medicaid expansion	Calendar month (1-12) in which the state implemented expansion; missing for never-expanders.	Parsed from the state ACA-expansion timing source.	month (1-12)	KFF / ACA timing	raw file

Distribution & statistics (click a header to sort)

Variable	Distribution	Coverage	N	Distinct	Min	Mean	Median	Max	SD
`state`	–	100%	31,843	51	—	—	—	—	—
`stfips`	–	100%	31,843	51	—	—	—	—	—
`county`	–	100%	31,843	3,064	—	—	—	—	—
`county_code`	–	100%	31,843	3,064	—	—	—	—	—
`year`	–	100%	31,843	11	2009	2014.0	2014	2019	3.16
`year_code`	–	100%	31,843	11	2009	2014.0	2014	2019	3.16
`deaths`		100%	31,783	1,985	0	229.8	79.00	16,188	603.6
`population_20_64`		100%	31,783	24,240	47.00	65,477	16,906	6,338,759	206,172
`crude_rate_20_64`		100%	31,783	30,001	0	454.1	435.9	1,883.8	158.7
`population_total`		100%	31,783	26,782	71.00	109,908	29,358	10,170,292	336,957
`population_20_64_hispanic`		100%	31,783	9,396	0	11,009	671.0	3,016,128	75,656
`population_20_64_female`		100%	31,783	19,955	20.00	32,949	8,370.0	3,183,635	104,084
`population_20_64_white`		100%	31,783	23,587	17.00	51,224	14,695	4,558,532	149,425
`unemployed`		100%	31,758	7,939	4.00	3,515.2	873.0	621,950	12,818
`labor_force`		100%	31,758	23,031	43.00	54,367	13,724	5,148,584	169,077
`unemp_rate`		100%	31,758	31,497	0.011	0.067	0.061	0.294	0.031
`poverty_rate`		100%	31,777	448	2.60	16.44	15.50	56.70	6.43
`median_income`		100%	31,777	22,080	18,860	47,863	45,641	151,806	13,223
`expansion_status`	–	100%	31,843	2	—	—	—	—	—
`Description`	–	69%	22,054	18	—	—	—	—	—
`yaca`	–	69%	22,054	7	2014	2016.2	2014	2023	3.10
`maca`	–	69%	22,054	8	—	—	—	—	—

county-year 28,644 × 17 · 2009-2019 · 2,604 counties x 11 years = 28,644 rows (46 states)

Panel key: county_code x year · Cleaned, balanced panel that feeds every DiD stage (2x2, TWFE, OR/IPW/DRDID, 2xT, GxT, HonestDiD), with covariate shares and the 2013 population weight.

Variable dictionary

Variable	Label	Definition	Construction	Units	Source	Coverage
`state_abb` identifier	State abbreviation	Two-letter U.S. state postal abbreviation.	Last two characters of the county string (str_sub).	code	Derived	prepared file
`county` identifier	County name (with state abbrev.)	County name followed by its two-letter state abbreviation, e.g. "Autauga County, AL".	From the CDC mortality file; the trailing two characters give state_abb in the prepared data.	string	CDC WONDER	both files
`county_code` identifier	County FIPS code	Five-digit federal (FIPS) county identifier; the unit id for the panel.	From the CDC mortality file.	FIPS	CDC WONDER	both files
`year` year	Calendar year	Annual time index of the observation.	From the CDC mortality file (2009-2019).	year	CDC WONDER	both files
`population_20_64` continuous	Population aged 20-64	County adult population aged 20-64 (denominator of the crude mortality rate).	From the CDC mortality file.	persons	CDC WONDER	both files
`yaca` year	Year of ACA Medicaid expansion	Calendar year the state implemented Medicaid expansion; missing (NA) for never-expanders.	Parsed from the state ACA-expansion timing source; arrives as a string with "NA" sentinels and is coerced to numeric.	year	KFF / ACA timing	both files
`crude_rate_20_64` continuous	Crude mortality rate, adults 20-64	Deaths per 100,000 adults aged 20-64 — the DiD outcome variable.	100,000 x deaths / population_20_64 (as supplied by CDC WONDER).	per 100,000	CDC WONDER	both files
`perc_female` continuous	Female share, adults 20-64 (%)	Percent of the county's 20-64 population that is female (baseline covariate).	100 x population_20_64_female / population_20_64.	%	Derived (CDC)	prepared file
`perc_white` continuous	White share, adults 20-64 (%)	Percent of the county's 20-64 population that is white (baseline covariate).	100 x population_20_64_white / population_20_64.	%	Derived (CDC)	prepared file
`perc_hispanic` continuous	Hispanic share, adults 20-64 (%)	Percent of the county's 20-64 population that is Hispanic (baseline covariate).	100 x population_20_64_hispanic / population_20_64.	%	Derived (CDC)	prepared file
`unemp_rate` continuous	Unemployment rate (%)	County unemployment rate (baseline covariate).	Raw fractional rate rescaled to percent (x100) in the prepared data.	%	Derived (BLS)	prepared file
`poverty_rate` continuous	Poverty rate (%)	Share of the county population below the federal poverty line.	From the county socioeconomic series in the merged file (already in percent).	%	Census (merged)	both files
`median_income` continuous	Median household income	County median household income; in the raw file expressed in US$, rescaled to thousands of US$ in the prepared data.	From the county socioeconomic series; the prepared data divides by 1,000.	US$ (raw) / US$ 000s (prepared)	Census (merged)	both files
`set_wt` continuous	Fixed 2013 adult population weight	Each county's 2013 population aged 20-64, held constant across all 11 years (the population weight).	population_20_64 in 2013, broadcast to every year of the county so weighting does not conflate population growth with mortality change.	persons	Derived (CDC)	prepared file
`treat_year` identifier	Treatment year (did convention)	Year the county's state expanded Medicaid (2014/2015/2016/2019), or 0 for never-treated counties.	yaca if 2014 <= yaca <= 2019, else 0 — the did package's never-treated coding.	year / 0	Derived	prepared file
`Treat_2014` dummy	2014-cohort treatment dummy	1 if the county's state expanded Medicaid in 2014, else 0.	1 if yaca == 2014, else 0.	0/1	Derived	prepared file
`Post` dummy	Post-2014 period dummy	1 for years 2014 and later, else 0 (the post period in the 2x2 design).	1 if year >= 2014, else 0.	0/1	Derived	prepared file

Distribution & statistics (click a header to sort)

Variable	Distribution	Coverage	N	Distinct	Min	Mean	Median	Max	SD
`state_abb`	–	100%	28,644	46	—	—	—	—	—
`county`	–	100%	28,644	2,604	—	—	—	—	—
`county_code`	–	100%	28,644	2,604	—	—	—	—	—
`year`	–	100%	28,644	11	2009	2014.0	2014	2019	3.16
`population_20_64`		100%	28,644	22,256	1,793.0	65,737	18,232	6,338,759	207,493
`yaca`	–	68%	19,459	7	2014	2016.2	2014	2023	3.10
`crude_rate_20_64`		100%	28,644	27,553	72.33	458.3	441.6	1,560.7	153.7
`perc_female`		100%	28,644	28,483	24.15	49.37	50.02	60.28	3.05
`perc_white`		100%	28,644	28,599	10.10	84.95	91.72	99.70	16.49
`perc_hispanic`		100%	28,644	28,532	0.150	8.39	3.65	96.43	12.86
`unemp_rate`		100%	28,644	28,484	1.07	6.83	6.23	29.41	3.08
`poverty_rate`		100%	28,644	438	2.60	16.66	15.90	50.40	6.45
`median_income`		100%	28,644	20,550	20.99	47.62	45.24	151.8	13.23
`set_wt`		100%	28,644	2,534	1,891.0	65,530	18,408	6,221,536	206,632
`treat_year`	–	100%	28,644	5	—	—	—	—	—
`Treat_2014`		100%	28,644	2	0	0.376	0	1.00	0.484
`Post`		100%	28,644	2	0	0.545	1.00	1.00	0.498

Known limitations & caveats

Pedagogical, not definitive. The authors of the replicated guide flag this case as illustrative: "The results are pedagogical in spirit and do not represent the best possible estimates of Medicaid's effect on adult mortality."
Underpowered. None of the six 2x2 covariate-adjusted 95% confidence intervals excludes zero; the data cannot settle the policy question.
Weighting changes the estimand. Unweighted (ATT for the typical treated county) and population-weighted (ATT for the typical treated adult) answer different causal questions; the 2x2 ATT(2014) flips sign between them (+0.122 vs -2.563).
Crude, not age-adjusted. The outcome is the CDC crude death rate for ages 20-64, not an age-adjusted rate, so compositional differences across cohorts are not removed.
Small cohorts are noisy. The 2015, 2016, and 2019 expansion cohorts are small (171 / 93 / 140 counties); cohort-specific estimates carry wide confidence intervals.
Tutorial bootstrap. The Callaway-Sant'Anna bootstrap uses 2,000 iterations for speed (the reference scripts use 25,000), affecting the third significant figure of each CI.