← Back to the post
Interactive data dictionary

Difference-in-Differences for Regional Data

CDC county mortality + ACA Medicaid-expansion timing — did expansion reduce adult mortality?

2
datasets
35
variables
2604
counties (analysis)
2009–2019
years

Downloads

Each dataset is available as a labeled Stata .dta and its source file.

⇩ Download all data (ZIP)stata_codebook.do

DatasetGrainRowsStataSource
raw_datacounty-year31,843 × 22raw_data.dtaraw_data.csv
data_preparedcounty-year28,644 × 17data_prepared.dtadata_prepared.csv

Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.

Load directly in code

Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.

Stata

* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_did2/data/"
use "${BASE}raw_data.dta", clear
describe
notes

Python

!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_did2/data/"
df = pd.read_stata(BASE + "raw_data.dta")

# load every dataset at once
files = ["raw_data", "data_prepared"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}

# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "raw_data.dta", "raw_data.dta")
df, meta = pyreadstat.read_dta("raw_data.dta")

Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb

R

# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_did2/data/"
df <- read_dta(paste0(BASE, "raw_data.dta"))

Overview & sources

Companion data for a hands-on R tutorial that asks whether the Affordable Care Act's staggered Medicaid expansion reduced adult mortality, and uses the question to show how population weighting changes the target parameter when the units (U.S. counties) differ in size by orders of magnitude. Following Baker, Callaway, Cunningham, Goodman-Bacon and Sant'Anna's (2025) Practitioner's Guide, the post runs an eight-stage DiD pipeline — 2×2 cell means, three equivalent TWFE specifications, covariate-adjusted OR/IPW/DRDID, a 2×T event study, the Callaway–Sant'Anna staggered ATT(g,t) design, and a Rambachan–Roth HonestDiD sensitivity analysis — computing every estimate both unweighted and weighted by 2013 adult population. The headline 2×2 ATT(2014) flips sign with weighting, from +0.122 deaths per 100,000 unweighted to −2.563 weighted, while no 95% confidence interval at any stage comfortably excludes zero. The two estimands answer different questions — the typical treated county versus the typical treated adult.

Two files. raw_data is the merged CDC county mortality file (deaths per 100,000 adults aged 20–64) joined to state-level ACA Medicaid-expansion timing, as downloaded — one row per county × year, 2009–2019, before any cleaning. data_prepared is the balanced analysis sample built from it: drop the five pre-2014 expanders (DC, DE, MA, NY, VT), require full mortality coverage 2009–2019 and full covariate coverage in 2013–2014, and add the modeling columns (covariate shares, fixed 2013 population weight, treatment-year and post indicators).

Data sources

SourceProvidesReference / URL
CDC WONDER (mortality)County-level death counts and crude mortality rates for adults aged 20-64, plus population denominatorsU.S. Centers for Disease Control and Prevention, CDC WONDER. https://wonder.cdc.gov/
ACA Medicaid expansion timingState Medicaid-expansion adoption status, year, and month (the staggered treatment timing)KFF, Status of State Medicaid Expansion Decisions. https://www.kff.org/medicaid/issue-brief/status-of-state-medicaid-expansion-decisions-interactive-map/
County socioeconomic covariatesUnemployment, poverty, and median household income by county-year (baseline covariates)U.S. Census Bureau / Bureau of Labor Statistics county-level series (as merged in the source file).
Replicated study &amp; method referencesEmpirical example, estimators, and conceptsBaker et al. (2025, arXiv:2503.13323); Callaway & Sant'Anna (2021); Sant'Anna & Zhao (2020); Rambachan & Roth (2023); Imbens & Rubin (2015).

Cite this data

Please cite this dataset as follows.

APA

Mendez, C. (2026). Difference-in-Differences for Regional Data: Did Medicaid Expansion Reduce Mortality? [Data set]. https://carlos-mendez.org/post/r_did2/

Baker, A., Callaway, B., Cunningham, S., Goodman-Bacon, A., & Sant'Anna, P. H. C. (2025). Difference-in-Differences Designs: A Practitioner's Guide. arXiv:2503.13323. Callaway, B., & Sant'Anna, P. H. C. (2021). Difference-in-Differences with multiple time periods. Journal of Econometrics, 225(2), 200-230. Sant'Anna, P. H. C., & Zhao, J. (2020). Doubly robust difference-in-differences estimators. Journal of Econometrics, 219(1), 101-122. Rambachan, A., & Roth, J. (2023). A More Credible Approach to Parallel Trends. Review of Economic Studies, 90(5), 2555-2591.

BibTeX

@misc{mendez2026rdid2,
  author       = {Mendez, Carlos},
  title        = {Difference-in-Differences for Regional Data: Did Medicaid Expansion Reduce Mortality?},
  year         = {2026},
  howpublished = {\url{https://carlos-mendez.org/post/r_did2/}},
  note         = {Data set}
}

@article{baker2025did,
  author  = {Baker, Andrew and Callaway, Brantly and Cunningham, Scott and Goodman-Bacon, Andrew and Sant'Anna, Pedro H. C.},
  title   = {Difference-in-Differences Designs: A Practitioner's Guide},
  journal = {arXiv preprint arXiv:2503.13323}, year = {2025}
}
@article{callaway2021did,
  author  = {Callaway, Brantly and Sant'Anna, Pedro H. C.},
  title   = {Difference-in-Differences with multiple time periods},
  journal = {Journal of Econometrics}, volume = {225}, number = {2},
  pages   = {200--230}, year = {2021}
}
@article{santanna2020drdid,
  author  = {Sant'Anna, Pedro H. C. and Zhao, Jun},
  title   = {Doubly robust difference-in-differences estimators},
  journal = {Journal of Econometrics}, volume = {219}, number = {1},
  pages   = {101--122}, year = {2020}
}
@article{rambachan2023honest,
  author  = {Rambachan, Ashesh and Roth, Jonathan},
  title   = {A More Credible Approach to Parallel Trends},
  journal = {Review of Economic Studies}, volume = {90}, number = {5},
  pages   = {2555--2591}, year = {2023}
}

Variable explorer search & filter all 30 variables

Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.

VariableTypeDistributionLabelDefinitionUnitsIn filesSource
Description#identifierExpansion description (free text)Free-text note on the state's expansion implementation (date, retroactivity, etc.).stringraw_dataKFF / ACA timing
Post#dummyshare coded 1 = 0.545Post-2014 period dummy1 for years 2014 and later, else 0 (the post period in the 2x2 design).0/1data_preparedDerived
Treat_2014#dummyshare coded 1 = 0.3762014-cohort treatment dummy1 if the county's state expanded Medicaid in 2014, else 0.0/1data_preparedDerived
county#identifierCounty name (with state abbrev.)County name followed by its two-letter state abbreviation, e.g. "Autauga County, AL".stringraw_data, data_preparedCDC WONDER
county_code#identifierCounty FIPS codeFive-digit federal (FIPS) county identifier; the unit id for the panel.FIPSraw_data, data_preparedCDC WONDER
crude_rate_20_64#continuousmin 0 | median 436 | max 1.88e+03Crude mortality rate, adults 20-64Deaths per 100,000 adults aged 20-64 — the DiD outcome variable.per 100,000raw_data, data_preparedCDC WONDER
deaths#continuousmin 0 | median 79 | max 1.62e+04Deaths, adults 20-64Count of deaths among adults aged 20-64 in the county-year.countraw_dataCDC WONDER
expansion_status#identifierACA Medicaid expansion status (text)Whether the state had adopted and implemented Medicaid expansion.categoryraw_dataKFF / ACA timing
labor_force#continuousmin 43 | median 1.37e+04 | max 5.15e+06Civilian labor forceCounty civilian labor force count (denominator of the raw unemployment rate).personsraw_dataBLS (merged)
maca#identifierMonth of ACA Medicaid expansionCalendar month (1-12) in which the state implemented expansion; missing for never-expanders.month (1-12)raw_dataKFF / ACA timing
median_income#continuousmin 1.89e+04 | median 4.56e+04 | max 1.52e+05Median household incomeCounty median household income; in the raw file expressed in US$, rescaled to thousands of US$ in the prepared data.US$ (raw) / US$ 000s (prepared)raw_data, data_preparedCensus (merged)
perc_female#continuousmin 24.1 | median 50 | max 60.3Female share, adults 20-64 (%)Percent of the county's 20-64 population that is female (baseline covariate).%data_preparedDerived (CDC)
perc_hispanic#continuousmin 0.15 | median 3.65 | max 96.4Hispanic share, adults 20-64 (%)Percent of the county's 20-64 population that is Hispanic (baseline covariate).%data_preparedDerived (CDC)
perc_white#continuousmin 10.1 | median 91.7 | max 99.7White share, adults 20-64 (%)Percent of the county's 20-64 population that is white (baseline covariate).%data_preparedDerived (CDC)
population_20_64#continuousmin 47 | median 1.69e+04 | max 6.34e+06Population aged 20-64County adult population aged 20-64 (denominator of the crude mortality rate).personsraw_data, data_preparedCDC WONDER
population_20_64_female#continuousmin 20 | median 8.37e+03 | max 3.18e+06Population 20-64, femaleFemale adults aged 20-64 (numerator for perc_female).personsraw_dataCDC WONDER
population_20_64_hispanic#continuousmin 0 | median 671 | max 3.02e+06Population 20-64, HispanicAdults aged 20-64 identifying as Hispanic (numerator for perc_hispanic).personsraw_dataCDC WONDER
population_20_64_white#continuousmin 17 | median 1.47e+04 | max 4.56e+06Population 20-64, whiteWhite adults aged 20-64 (numerator for perc_white).personsraw_dataCDC WONDER
population_total#continuousmin 71 | median 2.94e+04 | max 1.02e+07Total county populationTotal resident population of the county-year (all ages).personsraw_dataCDC WONDER
poverty_rate#continuousmin 2.6 | median 15.5 | max 56.7Poverty rate (%)Share of the county population below the federal poverty line.%raw_data, data_preparedCensus (merged)
set_wt#continuousmin 1.89e+03 | median 1.84e+04 | max 6.22e+06Fixed 2013 adult population weightEach county's 2013 population aged 20-64, held constant across all 11 years (the population weight).personsdata_preparedDerived (CDC)
state#identifierState nameFull name of the U.S. state the county belongs to.stringraw_dataCDC WONDER
state_abb#identifierState abbreviationTwo-letter U.S. state postal abbreviation.codedata_preparedDerived
stfips#identifierState FIPS codeNumeric federal (FIPS) identifier for the state (1-56).FIPSraw_dataCDC WONDER
treat_year#identifierTreatment year (did convention)Year the county's state expanded Medicaid (2014/2015/2016/2019), or 0 for never-treated counties.year / 0data_preparedDerived
unemp_rate#continuousmin 0.0107 | median 0.0609 | max 0.294Unemployment rate (%)County unemployment rate (baseline covariate).%raw_data, data_preparedDerived (BLS)
unemployed#continuousmin 4 | median 873 | max 6.22e+05Number unemployedCounty count of unemployed persons in the labor force.personsraw_dataBLS (merged)
yaca#yearYear of ACA Medicaid expansionCalendar year the state implemented Medicaid expansion; missing (NA) for never-expanders.yearraw_data, data_preparedKFF / ACA timing
year#yearCalendar yearAnnual time index of the observation.yearraw_data, data_preparedCDC WONDER
year_code#yearYear code (CDC)CDC WONDER's year code; numerically equal to the calendar year here.yearraw_dataCDC WONDER

Cross-file variable index

Which file each variable appears in (● = present).

Construction & formulas

The outcome is the CDC crude mortality rate crude_rate_20_64 (deaths per 100,000 adults aged 20–64). Every estimate is computed twice: unweighted (each county counts equally — the ATT for the typical treated county) and population-weighted by the fixed 2013 adult population set_wt (each adult counts equally — the ATT for the typical treated adult).

Constructed columns in data_prepared (built by the post's R script from raw_data): perc_white = population_20_64_white / population_20_64 · 100 (and likewise perc_hispanic, perc_female); unemp_rate rescaled to percent (×100); median_income rescaled to thousands of US$ (÷1000); set_wt = each county's 2013 population_20_64, held constant across years; treat_year = yaca if it falls in 2014–2019, else 0 (the did never-treated convention); Treat_2014 = 1 if yaca == 2014; Post = 1 if year ≥ 2014.

The datasets

Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.

expand to search (Ctrl/⌘+F) or print across all datasets

county-year  31,843 × 22 · 2009-2019 · U.S. counties, 51 state codes (~2,900 counties; unbalanced)

Panel key: county_code x year · Source file as downloaded — merged CDC mortality, population, covariates, and state Medicaid-expansion status before any inclusion filtering.

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
state identifierState nameFull name of the U.S. state the county belongs to.From the CDC mortality file.stringCDC WONDERraw file
stfips identifierState FIPS codeNumeric federal (FIPS) identifier for the state (1-56).From the CDC mortality file.FIPSCDC WONDERraw file
county identifierCounty name (with state abbrev.)County name followed by its two-letter state abbreviation, e.g. "Autauga County, AL".From the CDC mortality file; the trailing two characters give state_abb in the prepared data.stringCDC WONDERboth files
county_code identifierCounty FIPS codeFive-digit federal (FIPS) county identifier; the unit id for the panel.From the CDC mortality file.FIPSCDC WONDERboth files
year yearCalendar yearAnnual time index of the observation.From the CDC mortality file (2009-2019).yearCDC WONDERboth files
year_code yearYear code (CDC)CDC WONDER's year code; numerically equal to the calendar year here.From the CDC mortality file.yearCDC WONDERraw file
deaths continuousDeaths, adults 20-64Count of deaths among adults aged 20-64 in the county-year.From the CDC mortality file (numerator of the crude rate).countCDC WONDERraw file
population_20_64 continuousPopulation aged 20-64County adult population aged 20-64 (denominator of the crude mortality rate).From the CDC mortality file.personsCDC WONDERboth files
crude_rate_20_64 continuousCrude mortality rate, adults 20-64Deaths per 100,000 adults aged 20-64 — the DiD outcome variable.100,000 x deaths / population_20_64 (as supplied by CDC WONDER).per 100,000CDC WONDERboth files
population_total continuousTotal county populationTotal resident population of the county-year (all ages).From the CDC mortality file.personsCDC WONDERraw file
population_20_64_hispanic continuousPopulation 20-64, HispanicAdults aged 20-64 identifying as Hispanic (numerator for perc_hispanic).From the CDC mortality file.personsCDC WONDERraw file
population_20_64_female continuousPopulation 20-64, femaleFemale adults aged 20-64 (numerator for perc_female).From the CDC mortality file.personsCDC WONDERraw file
population_20_64_white continuousPopulation 20-64, whiteWhite adults aged 20-64 (numerator for perc_white).From the CDC mortality file.personsCDC WONDERraw file
unemployed continuousNumber unemployedCounty count of unemployed persons in the labor force.From the county labor-market series in the merged file.personsBLS (merged)raw file
labor_force continuousCivilian labor forceCounty civilian labor force count (denominator of the raw unemployment rate).From the county labor-market series in the merged file.personsBLS (merged)raw file
unemp_rate continuousUnemployment rate (%)County unemployment rate (baseline covariate).Raw fractional rate rescaled to percent (x100) in the prepared data.%Derived (BLS)prepared file
poverty_rate continuousPoverty rate (%)Share of the county population below the federal poverty line.From the county socioeconomic series in the merged file (already in percent).%Census (merged)both files
median_income continuousMedian household incomeCounty median household income; in the raw file expressed in US$, rescaled to thousands of US$ in the prepared data.From the county socioeconomic series; the prepared data divides by 1,000.US$ (raw) / US$ 000s (prepared)Census (merged)both files
expansion_status identifierACA Medicaid expansion status (text)Whether the state had adopted and implemented Medicaid expansion.From the state ACA-expansion timing source.categoryKFF / ACA timingraw file
Description identifierExpansion description (free text)Free-text note on the state's expansion implementation (date, retroactivity, etc.).From the state ACA-expansion timing source.stringKFF / ACA timingraw file
yaca yearYear of ACA Medicaid expansionCalendar year the state implemented Medicaid expansion; missing (NA) for never-expanders.Parsed from the state ACA-expansion timing source; arrives as a string with "NA" sentinels and is coerced to numeric.yearKFF / ACA timingboth files
maca identifierMonth of ACA Medicaid expansionCalendar month (1-12) in which the state implemented expansion; missing for never-expanders.Parsed from the state ACA-expansion timing source.month (1-12)KFF / ACA timingraw file

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
state100%31,84351
stfips100%31,84351
county100%31,8433,064
county_code100%31,8433,064
year100%31,8431120092014.0201420193.16
year_code100%31,8431120092014.0201420193.16
deathsmin 0 | median 79 | max 1.62e+04100%31,7831,9850229.879.0016,188603.6
population_20_64min 47 | median 1.69e+04 | max 6.34e+06100%31,78324,24047.0065,47716,9066,338,759206,172
crude_rate_20_64min 0 | median 436 | max 1.88e+03100%31,78330,0010454.1435.91,883.8158.7
population_totalmin 71 | median 2.94e+04 | max 1.02e+07100%31,78326,78271.00109,90829,35810,170,292336,957
population_20_64_hispanicmin 0 | median 671 | max 3.02e+06100%31,7839,396011,009671.03,016,12875,656
population_20_64_femalemin 20 | median 8.37e+03 | max 3.18e+06100%31,78319,95520.0032,9498,370.03,183,635104,084
population_20_64_whitemin 17 | median 1.47e+04 | max 4.56e+06100%31,78323,58717.0051,22414,6954,558,532149,425
unemployedmin 4 | median 873 | max 6.22e+05100%31,7587,9394.003,515.2873.0621,95012,818
labor_forcemin 43 | median 1.37e+04 | max 5.15e+06100%31,75823,03143.0054,36713,7245,148,584169,077
unemp_ratemin 0.0107 | median 0.0609 | max 0.294100%31,75831,4970.0110.0670.0610.2940.031
poverty_ratemin 2.6 | median 15.5 | max 56.7100%31,7774482.6016.4415.5056.706.43
median_incomemin 1.89e+04 | median 4.56e+04 | max 1.52e+05100%31,77722,08018,86047,86345,641151,80613,223
expansion_status100%31,8432
Description69%22,05418
yaca69%22,054720142016.2201420233.10
maca69%22,0548

county-year  28,644 × 17 · 2009-2019 · 2,604 counties x 11 years = 28,644 rows (46 states)

Panel key: county_code x year · Cleaned, balanced panel that feeds every DiD stage (2x2, TWFE, OR/IPW/DRDID, 2xT, GxT, HonestDiD), with covariate shares and the 2013 population weight.

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
state_abb identifierState abbreviationTwo-letter U.S. state postal abbreviation.Last two characters of the county string (str_sub).codeDerivedprepared file
county identifierCounty name (with state abbrev.)County name followed by its two-letter state abbreviation, e.g. "Autauga County, AL".From the CDC mortality file; the trailing two characters give state_abb in the prepared data.stringCDC WONDERboth files
county_code identifierCounty FIPS codeFive-digit federal (FIPS) county identifier; the unit id for the panel.From the CDC mortality file.FIPSCDC WONDERboth files
year yearCalendar yearAnnual time index of the observation.From the CDC mortality file (2009-2019).yearCDC WONDERboth files
population_20_64 continuousPopulation aged 20-64County adult population aged 20-64 (denominator of the crude mortality rate).From the CDC mortality file.personsCDC WONDERboth files
yaca yearYear of ACA Medicaid expansionCalendar year the state implemented Medicaid expansion; missing (NA) for never-expanders.Parsed from the state ACA-expansion timing source; arrives as a string with "NA" sentinels and is coerced to numeric.yearKFF / ACA timingboth files
crude_rate_20_64 continuousCrude mortality rate, adults 20-64Deaths per 100,000 adults aged 20-64 — the DiD outcome variable.100,000 x deaths / population_20_64 (as supplied by CDC WONDER).per 100,000CDC WONDERboth files
perc_female continuousFemale share, adults 20-64 (%)Percent of the county's 20-64 population that is female (baseline covariate).100 x population_20_64_female / population_20_64.%Derived (CDC)prepared file
perc_white continuousWhite share, adults 20-64 (%)Percent of the county's 20-64 population that is white (baseline covariate).100 x population_20_64_white / population_20_64.%Derived (CDC)prepared file
perc_hispanic continuousHispanic share, adults 20-64 (%)Percent of the county's 20-64 population that is Hispanic (baseline covariate).100 x population_20_64_hispanic / population_20_64.%Derived (CDC)prepared file
unemp_rate continuousUnemployment rate (%)County unemployment rate (baseline covariate).Raw fractional rate rescaled to percent (x100) in the prepared data.%Derived (BLS)prepared file
poverty_rate continuousPoverty rate (%)Share of the county population below the federal poverty line.From the county socioeconomic series in the merged file (already in percent).%Census (merged)both files
median_income continuousMedian household incomeCounty median household income; in the raw file expressed in US$, rescaled to thousands of US$ in the prepared data.From the county socioeconomic series; the prepared data divides by 1,000.US$ (raw) / US$ 000s (prepared)Census (merged)both files
set_wt continuousFixed 2013 adult population weightEach county's 2013 population aged 20-64, held constant across all 11 years (the population weight).population_20_64 in 2013, broadcast to every year of the county so weighting does not conflate population growth with mortality change.personsDerived (CDC)prepared file
treat_year identifierTreatment year (did convention)Year the county's state expanded Medicaid (2014/2015/2016/2019), or 0 for never-treated counties.yaca if 2014 <= yaca <= 2019, else 0 — the did package's never-treated coding.year / 0Derivedprepared file
Treat_2014 dummy2014-cohort treatment dummy1 if the county's state expanded Medicaid in 2014, else 0.1 if yaca == 2014, else 0.0/1Derivedprepared file
Post dummyPost-2014 period dummy1 for years 2014 and later, else 0 (the post period in the 2x2 design).1 if year >= 2014, else 0.0/1Derivedprepared file

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
state_abb100%28,64446
county100%28,6442,604
county_code100%28,6442,604
year100%28,6441120092014.0201420193.16
population_20_64min 1.79e+03 | median 1.82e+04 | max 6.34e+06100%28,64422,2561,793.065,73718,2326,338,759207,493
yaca68%19,459720142016.2201420233.10
crude_rate_20_64min 72.3 | median 442 | max 1.56e+03100%28,64427,55372.33458.3441.61,560.7153.7
perc_femalemin 24.1 | median 50 | max 60.3100%28,64428,48324.1549.3750.0260.283.05
perc_whitemin 10.1 | median 91.7 | max 99.7100%28,64428,59910.1084.9591.7299.7016.49
perc_hispanicmin 0.15 | median 3.65 | max 96.4100%28,64428,5320.1508.393.6596.4312.86
unemp_ratemin 1.07 | median 6.23 | max 29.4100%28,64428,4841.076.836.2329.413.08
poverty_ratemin 2.6 | median 15.9 | max 50.4100%28,6444382.6016.6615.9050.406.45
median_incomemin 21 | median 45.2 | max 152100%28,64420,55020.9947.6245.24151.813.23
set_wtmin 1.89e+03 | median 1.84e+04 | max 6.22e+06100%28,6442,5341,891.065,53018,4086,221,536206,632
treat_year100%28,6445
Treat_2014share coded 1 = 0.376100%28,644200.37601.000.484
Postshare coded 1 = 0.545100%28,644200.5451.001.000.498

Known limitations & caveats