← Back to the post
Interactive data dictionary

Augmented Synthetic Control for Multiple Countries

Simulated panels for an augsynth tutorial — a two-country intuition example and a 25-country factor-model panel with a known injected effect.

2
datasets
14
variables
25
sim countries
1985–2023
years

Downloads

Each dataset is available as a labeled Stata .dta and its source file.

⇩ Download all data (ZIP)stata_codebook.do

DatasetGrainRowsStataSource
synthetic_panel_2country_intuitioncountry-year48 × 5synthetic_panel_2country_intuition.dtasynthetic_panel_2country_intuition.csv
synthetic_panel_multicountrycountry-year975 × 11synthetic_panel_multicountry.dtasynthetic_panel_multicountry.csv

Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.

Load directly in code

Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.

Stata

* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_sc_multi_country/data/"
use "${BASE}synthetic_panel_2country_intuition.dta", clear
describe
notes

Python

!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_sc_multi_country/data/"
df = pd.read_stata(BASE + "synthetic_panel_2country_intuition.dta")

# load every dataset at once
files = ["synthetic_panel_2country_intuition", "synthetic_panel_multicountry"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}

# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "synthetic_panel_2country_intuition.dta", "synthetic_panel_2country_intuition.dta")
df, meta = pyreadstat.read_dta("synthetic_panel_2country_intuition.dta")

Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb

R

# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_sc_multi_country/data/"
df <- read_dta(paste0(BASE, "synthetic_panel_2country_intuition.dta"))

Overview & sources

Companion data for a hands-on R tutorial on the Augmented Synthetic Control Method (ASCM) of Ben-Michael, Feller & Rothstein (2021), demonstrated in a multi-country setting with the augsynth package and its three entry points (single_augsynth, multisynth, augsynth_multiout). Both files here are fully synthetic: the true counterfactual and injected effect are known by construction, so every estimate can be graded against ground truth before the method is turned loose on the real Penn-World-Table EMU data (which lives in the post's reference/ folder and is not documented here). The simulated panels validate that each estimator recovers a known effect and show exactly where plain SCM fails and Ridge augmentation rescues it.

Two files. synthetic_panel_2country_intuition is a minimal two-unit annual panel (Atlantia treated vs Borealis control, 2000–2023) used in §3 to show synthetic control in its simplest possible form. synthetic_panel_multicountry is the reusable 25-country × 39-year (1985–2023) factor-model panel used in §4–§9: five treated units with staggered adoption and twenty never-treated donors, shipped with the true counterfactual and true injected effect columns for grading.

Data sources

SourceProvidesReference / URL
Synthetic (this study)All values — simulated via a calibrated factor model (open &amp; reproducible)Mendez, C. (2026). See the post's R script analysis.R for the full data-generating process.
Ben-Michael, Feller &amp; Rothstein (2021)The Augmented Synthetic Control Method estimator implemented in augsynthBen-Michael, E., Feller, A., & Rothstein, J. (2021). The Augmented Synthetic Control Method. Journal of the American Statistical Association, 116(536), 1415–1427.
Abadie, Diamond &amp; Hainmueller (2010)The classic synthetic control method (the progfunc = None baseline)Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies. Journal of the American Statistical Association, 105(490), 493–505.
augsynth packageEstimation (single_augsynth / multisynth / augsynth_multiout) and inference toolsBen-Michael, E. augsynth: https://github.com/ebenmichael/augsynth

Cite this data

Please cite this dataset as follows.

APA

Mendez, C. (2026). Augmented Synthetic Control for Multiple Countries: A Tutorial with augsynth [Data set]. https://carlos-mendez.org/post/r_sc_multi_country/

Ben-Michael, E., Feller, A., & Rothstein, J. (2021). The Augmented Synthetic Control Method. Journal of the American Statistical Association, 116(536), 1415–1427. Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies. Journal of the American Statistical Association, 105(490), 493–505. Papaioannou, S. K. (2021). European monetary integration, TFP and productivity convergence. Economics Letters, 199, 109696.

BibTeX

@misc{mendez2026rscmulticountry,
  author       = {Mendez, Carlos},
  title        = {Augmented Synthetic Control for Multiple Countries: A Tutorial with augsynth},
  year         = {2026},
  howpublished = {\url{https://carlos-mendez.org/post/r_sc_multi_country/}},
  note         = {Data set}
}

@article{benmichael2021augmented,
  author  = {Ben-Michael, Eli and Feller, Avi and Rothstein, Jesse},
  title   = {The Augmented Synthetic Control Method},
  journal = {Journal of the American Statistical Association},
  volume  = {116}, number = {536}, pages = {1415--1427}, year = {2021}
}
@article{abadie2010synthetic,
  author  = {Abadie, Alberto and Diamond, Alexis and Hainmueller, Jens},
  title   = {Synthetic Control Methods for Comparative Case Studies},
  journal = {Journal of the American Statistical Association},
  volume  = {105}, number = {490}, pages = {493--505}, year = {2010}
}
@article{papaioannou2021european,
  author  = {Papaioannou, Sotiris K.},
  title   = {European monetary integration, TFP and productivity convergence},
  journal = {Economics Letters},
  volume  = {199}, pages = {109696}, year = {2021}
}

Variable explorer search & filter all 14 variables

Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.

VariableTypeDistributionLabelDefinitionUnitsIn filesSource
adopt_year#yearAdoption year (multi-country file)The year treatment switches on for a treated unit; missing for donors.yearsynthetic_panel_multicountrySimulation
country#identifierCountry identifierSynthetic unit name (panel id).stringsynthetic_panel_2country_intuition, synthetic_panel_multicountrySimulation
gdp_index#continuousmin 3.84 | median 10.6 | max 21.3Primary outcome (observed GDP index)The primary observed outcome; for treated units it includes the injected effect.indexsynthetic_panel_multicountrySimulation
gdp_index_cf#continuousmin 3.84 | median 10.5 | max 20.3True GDP counterfactual (Y(0))The untreated potential outcome for gdp_index — the true counterfactual revealed for grading.indexsynthetic_panel_multicountrySimulation (ground truth)
outcome#continuousmin 39.7 | median 55.7 | max 85Outcome series (two-country file)The single observed outcome path for each unit in the intuition example.indexsynthetic_panel_2country_intuitionSimulation
role#identifierUnit role (two-country file)Whether the unit is the treated unit or the control/counterfactual.categorysynthetic_panel_2country_intuitionSimulation
trade_index#continuousmin 3.47 | median 8.5 | max 16.8Secondary outcome (observed trade index)A second correlated observed outcome used by augsynth_multiout.indexsynthetic_panel_multicountrySimulation
trade_index_cf#continuousmin 3.47 | median 8.4 | max 16.8True trade counterfactual (Y(0))The untreated potential outcome for trade_index — the true counterfactual revealed for grading.indexsynthetic_panel_multicountrySimulation (ground truth)
treat#dummyshare coded 1 = 0.250Treatment indicator (two-country file)1 from the intervention year onward for the treated unit, else 0.0/1synthetic_panel_2country_intuitionSimulation
treat_ms#dummyshare coded 1 = 0.056Time-varying treatment indicator (multi-country file)1 from a treated unit's adoption year onward, else 0 (the multisynth treatment column).0/1synthetic_panel_multicountrySimulation
treated_unit#dummyshare coded 1 = 0.200Treated-unit flag (multi-country file)1 if the country is ever treated (C01-C05), 0 for a never-treated donor.0/1synthetic_panel_multicountrySimulation
true_effect_gdp#continuousmin -1.35 | median 0 | max 9.5True injected GDP effectThe known treatment effect on gdp_index (gdp_index minus gdp_index_cf); 0 before adoption and for donors.index unitssynthetic_panel_multicountrySimulation (ground truth)
true_effect_trade#continuousmin -0.81 | median 0 | max 5.7True injected trade effectThe known treatment effect on trade_index; 0 before adoption and for donors.index unitssynthetic_panel_multicountrySimulation (ground truth)
year#yearCalendar yearAnnual time index.yearsynthetic_panel_2country_intuition, synthetic_panel_multicountrySimulation

Cross-file variable index

Which file each variable appears in (● = present).

Variablesynthetic_panel_2country_intuitionsynthetic_panel_multicountry
adopt_year
country
gdp_index
gdp_index_cf
outcome
role
trade_index
trade_index_cf
treat
treat_ms
treated_unit
true_effect_gdp
true_effect_trade
year

Construction & formulas

Synthetic control answers a counterfactual question: among units that were not treated, find the weighted recipe whose pre-treatment path matches the treated unit, and read the post-treatment gap as the effect (the ATT).

Two-country DGP (synthetic_panel_2country_intuition): a common trend 40 + 1.2·t + 3·sin(2π·t/9) with Gaussian noise; Borealis follows the trend, Atlantia follows the trend plus an injected effect that switches on in 2012 and grows by 1.5 units/year. By construction Borealis is the counterfactual, so the post-2012 gap equals the injected effect.

Multi-country DGP (synthetic_panel_multicountry): a three-latent-factor model Y(0) = μ + L1·f1 + L2·f2 + L3·f3 + noise with a unit fixed effect; the second outcome is 0.6·Y1 + ν + k·f1 + noise. Treated units C01–C04 are each a sparse convex blend of three named donors (a near-perfect synthetic control provably exists); C05 is placed outside the donor hull (loadings beyond every donor) to stress-test the methods. The injected effect on gdp_index is a jump at adoption plus a yearly ramp (post·jump + slope·(year − adopt)), with a correlated 0.6× effect on trade_index.

The datasets

Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.

expand to search (Ctrl/⌘+F) or print across all datasets

country-year  48 × 5 · 2000-2023 · 2 countries (Atlantia, Borealis)

Panel key: country x year · Show synthetic control in its simplest form: one perfectly matched comparison unit.

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
country identifierCountry identifierSynthetic unit name (panel id).Two-country file: 'Atlantia (treated)' / 'Borealis (control)'. Multi-country file: C01-C25 (C01-C05 treated, C06-C25 donors).stringSimulation
year yearCalendar yearAnnual time index.Two-country file 2000-2023; multi-country file 1985-2023.yearSimulation
outcome continuousOutcome series (two-country file)The single observed outcome path for each unit in the intuition example.Common trend 40 + 1.2*t + 3*sin(2*pi*t/9) + N(0,0.6); treated also adds the injected post-2012 effect.indexSimulation
treat dummyTreatment indicator (two-country file)1 from the intervention year onward for the treated unit, else 0.as.integer(year >= 2012) for Atlantia; 0 for Borealis.0/1Simulation
role identifierUnit role (two-country file)Whether the unit is the treated unit or the control/counterfactual.'treated' for Atlantia, 'control' for Borealis.categorySimulation

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
country100%482
year100%482420002011.5201120237.00
outcomemin 39.7 | median 55.7 | max 85100%484839.7056.5155.7285.0112.01
treatshare coded 1 = 0.250100%48200.25001.000.438
role100%482

country-year  975 × 11 · 1985-2023 · 25 countries (5 treated, 20 donors), balanced

Panel key: country x year · Validate all three augsynth entry points and the suitability/inference tests against a known truth.

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
country identifierCountry identifierSynthetic unit name (panel id).Two-country file: 'Atlantia (treated)' / 'Borealis (control)'. Multi-country file: C01-C25 (C01-C05 treated, C06-C25 donors).stringSimulation
year yearCalendar yearAnnual time index.Two-country file 2000-2023; multi-country file 1985-2023.yearSimulation
treated_unit dummyTreated-unit flag (multi-country file)1 if the country is ever treated (C01-C05), 0 for a never-treated donor.1 for the five treated units, 0 for the twenty donors.0/1Simulation
adopt_year yearAdoption year (multi-country file)The year treatment switches on for a treated unit; missing for donors.C01/C02 2010, C03 2013, C04/C05 2016; NA for donors.yearSimulation
treat_ms dummyTime-varying treatment indicator (multi-country file)1 from a treated unit's adoption year onward, else 0 (the multisynth treatment column).as.integer(year >= adopt_year) for treated units; 0 for donors.0/1Simulation
gdp_index continuousPrimary outcome (observed GDP index)The primary observed outcome; for treated units it includes the injected effect.Y(0) from a three-factor model (mu + L1*f1 + L2*f2 + L3*f3 + noise) plus the injected effect for treated units.indexSimulation
trade_index continuousSecondary outcome (observed trade index)A second correlated observed outcome used by augsynth_multiout.0.6*gdp_index_cf + nu + k*f1 + noise, plus 0.6x the gdp injected effect for treated units.indexSimulation
gdp_index_cf continuousTrue GDP counterfactual (Y(0))The untreated potential outcome for gdp_index — the true counterfactual revealed for grading.Y(0) from the factor model, before adding any injected effect (equals gdp_index for donors).indexSimulation (ground truth)
trade_index_cf continuousTrue trade counterfactual (Y(0))The untreated potential outcome for trade_index — the true counterfactual revealed for grading.Second-outcome Y(0) before adding any injected effect (equals trade_index for donors).indexSimulation (ground truth)
true_effect_gdp continuousTrue injected GDP effectThe known treatment effect on gdp_index (gdp_index minus gdp_index_cf); 0 before adoption and for donors.post*jump + slope*(year - adopt): a jump at adoption plus a yearly ramp.index unitsSimulation (ground truth)
true_effect_trade continuousTrue injected trade effectThe known treatment effect on trade_index; 0 before adoption and for donors.0.6 * true_effect_gdp.index unitsSimulation (ground truth)

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
country100%97525
year100%9753919852004.02004202311.26
treated_unitshare coded 1 = 0.200100%975200.20001.000.400
adopt_year20%195320102013.0201320162.69
treat_msshare coded 1 = 0.056100%975200.05601.000.231
gdp_indexmin 3.84 | median 10.6 | max 21.3100%9759753.8410.8410.6421.332.88
trade_indexmin 3.47 | median 8.5 | max 16.8100%9759753.478.528.5016.832.09
gdp_index_cfmin 3.84 | median 10.5 | max 20.3100%9759753.8410.6010.5220.292.61
trade_index_cfmin 3.47 | median 8.4 | max 16.8100%9759753.478.388.4016.831.97
true_effect_gdpmin -1.35 | median 0 | max 9.5100%97540-1.350.24609.501.22
true_effect_trademin -0.81 | median 0 | max 5.7100%97540-0.8100.14805.700.734

Known limitations & caveats