Downloads
Each dataset is available as a labeled Stata .dta and its source file.
⇩ Download all data (ZIP)stata_codebook.do
| Dataset | Grain | Rows | Stata | Source |
|---|---|---|---|---|
synthetic_panel_2country_intuition | country-year | 48 × 5 | synthetic_panel_2country_intuition.dta | synthetic_panel_2country_intuition.csv |
synthetic_panel_multicountry | country-year | 975 × 11 | synthetic_panel_multicountry.dta | synthetic_panel_multicountry.csv |
Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.
Load directly in code
Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.
Stata
* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_sc_multi_country/data/"
use "${BASE}synthetic_panel_2country_intuition.dta", clear
describe
notesPython
!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_sc_multi_country/data/"
df = pd.read_stata(BASE + "synthetic_panel_2country_intuition.dta")
# load every dataset at once
files = ["synthetic_panel_2country_intuition", "synthetic_panel_multicountry"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}
# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "synthetic_panel_2country_intuition.dta", "synthetic_panel_2country_intuition.dta")
df, meta = pyreadstat.read_dta("synthetic_panel_2country_intuition.dta")Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb
R
# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_sc_multi_country/data/"
df <- read_dta(paste0(BASE, "synthetic_panel_2country_intuition.dta"))Overview & sources
Companion data for a hands-on R tutorial on the Augmented Synthetic Control Method (ASCM) of Ben-Michael, Feller & Rothstein (2021), demonstrated in a multi-country setting with the augsynth package and its three entry points (single_augsynth, multisynth, augsynth_multiout). Both files here are fully synthetic: the true counterfactual and injected effect are known by construction, so every estimate can be graded against ground truth before the method is turned loose on the real Penn-World-Table EMU data (which lives in the post's reference/ folder and is not documented here). The simulated panels validate that each estimator recovers a known effect and show exactly where plain SCM fails and Ridge augmentation rescues it.
synthetic_panel_2country_intuition is a minimal two-unit annual panel (Atlantia treated vs Borealis control, 2000–2023) used in §3 to show synthetic control in its simplest possible form. synthetic_panel_multicountry is the reusable 25-country × 39-year (1985–2023) factor-model panel used in §4–§9: five treated units with staggered adoption and twenty never-treated donors, shipped with the true counterfactual and true injected effect columns for grading.
Data sources
| Source | Provides | Reference / URL |
|---|---|---|
| Synthetic (this study) | All values — simulated via a calibrated factor model (open & reproducible) | Mendez, C. (2026). See the post's R script analysis.R for the full data-generating process. |
| Ben-Michael, Feller & Rothstein (2021) | The Augmented Synthetic Control Method estimator implemented in augsynth | Ben-Michael, E., Feller, A., & Rothstein, J. (2021). The Augmented Synthetic Control Method. Journal of the American Statistical Association, 116(536), 1415–1427. |
| Abadie, Diamond & Hainmueller (2010) | The classic synthetic control method (the progfunc = None baseline) | Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies. Journal of the American Statistical Association, 105(490), 493–505. |
| augsynth package | Estimation (single_augsynth / multisynth / augsynth_multiout) and inference tools | Ben-Michael, E. augsynth: https://github.com/ebenmichael/augsynth |
Cite this data
Please cite this dataset as follows.
APA
Mendez, C. (2026). Augmented Synthetic Control for Multiple Countries: A Tutorial with augsynth [Data set]. https://carlos-mendez.org/post/r_sc_multi_country/
Ben-Michael, E., Feller, A., & Rothstein, J. (2021). The Augmented Synthetic Control Method. Journal of the American Statistical Association, 116(536), 1415–1427. Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies. Journal of the American Statistical Association, 105(490), 493–505. Papaioannou, S. K. (2021). European monetary integration, TFP and productivity convergence. Economics Letters, 199, 109696.BibTeX
@misc{mendez2026rscmulticountry,
author = {Mendez, Carlos},
title = {Augmented Synthetic Control for Multiple Countries: A Tutorial with augsynth},
year = {2026},
howpublished = {\url{https://carlos-mendez.org/post/r_sc_multi_country/}},
note = {Data set}
}
@article{benmichael2021augmented,
author = {Ben-Michael, Eli and Feller, Avi and Rothstein, Jesse},
title = {The Augmented Synthetic Control Method},
journal = {Journal of the American Statistical Association},
volume = {116}, number = {536}, pages = {1415--1427}, year = {2021}
}
@article{abadie2010synthetic,
author = {Abadie, Alberto and Diamond, Alexis and Hainmueller, Jens},
title = {Synthetic Control Methods for Comparative Case Studies},
journal = {Journal of the American Statistical Association},
volume = {105}, number = {490}, pages = {493--505}, year = {2010}
}
@article{papaioannou2021european,
author = {Papaioannou, Sotiris K.},
title = {European monetary integration, TFP and productivity convergence},
journal = {Economics Letters},
volume = {199}, pages = {109696}, year = {2021}
}Variable explorer search & filter all 14 variables
Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.
| Variable | Type | Distribution | Label | Definition | Units | In files | Source |
|---|---|---|---|---|---|---|---|
adopt_year# | year | – | Adoption year (multi-country file) | The year treatment switches on for a treated unit; missing for donors. | year | synthetic_panel_multicountry | Simulation |
country# | identifier | – | Country identifier | Synthetic unit name (panel id). | string | synthetic_panel_2country_intuition, synthetic_panel_multicountry | Simulation |
gdp_index# | continuous | Primary outcome (observed GDP index) | The primary observed outcome; for treated units it includes the injected effect. | index | synthetic_panel_multicountry | Simulation | |
gdp_index_cf# | continuous | True GDP counterfactual (Y(0)) | The untreated potential outcome for gdp_index — the true counterfactual revealed for grading. | index | synthetic_panel_multicountry | Simulation (ground truth) | |
outcome# | continuous | Outcome series (two-country file) | The single observed outcome path for each unit in the intuition example. | index | synthetic_panel_2country_intuition | Simulation | |
role# | identifier | – | Unit role (two-country file) | Whether the unit is the treated unit or the control/counterfactual. | category | synthetic_panel_2country_intuition | Simulation |
trade_index# | continuous | Secondary outcome (observed trade index) | A second correlated observed outcome used by augsynth_multiout. | index | synthetic_panel_multicountry | Simulation | |
trade_index_cf# | continuous | True trade counterfactual (Y(0)) | The untreated potential outcome for trade_index — the true counterfactual revealed for grading. | index | synthetic_panel_multicountry | Simulation (ground truth) | |
treat# | dummy | Treatment indicator (two-country file) | 1 from the intervention year onward for the treated unit, else 0. | 0/1 | synthetic_panel_2country_intuition | Simulation | |
treat_ms# | dummy | Time-varying treatment indicator (multi-country file) | 1 from a treated unit's adoption year onward, else 0 (the multisynth treatment column). | 0/1 | synthetic_panel_multicountry | Simulation | |
treated_unit# | dummy | Treated-unit flag (multi-country file) | 1 if the country is ever treated (C01-C05), 0 for a never-treated donor. | 0/1 | synthetic_panel_multicountry | Simulation | |
true_effect_gdp# | continuous | True injected GDP effect | The known treatment effect on gdp_index (gdp_index minus gdp_index_cf); 0 before adoption and for donors. | index units | synthetic_panel_multicountry | Simulation (ground truth) | |
true_effect_trade# | continuous | True injected trade effect | The known treatment effect on trade_index; 0 before adoption and for donors. | index units | synthetic_panel_multicountry | Simulation (ground truth) | |
year# | year | – | Calendar year | Annual time index. | year | synthetic_panel_2country_intuition, synthetic_panel_multicountry | Simulation |
Cross-file variable index
Which file each variable appears in (● = present).
| Variable | synthetic_panel_2country_intuition | synthetic_panel_multicountry |
|---|---|---|
adopt_year | ● | |
country | ● | ● |
gdp_index | ● | |
gdp_index_cf | ● | |
outcome | ● | |
role | ● | |
trade_index | ● | |
trade_index_cf | ● | |
treat | ● | |
treat_ms | ● | |
treated_unit | ● | |
true_effect_gdp | ● | |
true_effect_trade | ● | |
year | ● | ● |
Construction & formulas
Synthetic control answers a counterfactual question: among units that were not treated, find the weighted recipe whose pre-treatment path matches the treated unit, and read the post-treatment gap as the effect (the ATT).
- SCM weight problem: choose convex donor weights
Wminimizing||X1 − X0·W||_Vsubject tow_j ≥ 0andΣ_j w_j = 1— an interpolation of the donors, never an extrapolation. - Augmented (bias-corrected) estimator:
τ̂_t^aug = (Y_1t − Σ_j w_j Y_jt) − (m̂_t(X_1) − Σ_j w_j m̂_t(X_j))— the ordinary SCM gap minus what a prognostic modelm̂_t(Ridge whenprogfunc = "ridge") predicts the residual imbalance should be. With a perfect pre-fit the correction vanishes and ASCM equals SCM.
Two-country DGP (synthetic_panel_2country_intuition): a common trend
40 + 1.2·t + 3·sin(2π·t/9) with Gaussian noise; Borealis follows the trend, Atlantia
follows the trend plus an injected effect that switches on in 2012 and grows by 1.5 units/year. By
construction Borealis is the counterfactual, so the post-2012 gap equals the injected effect.
Multi-country DGP (synthetic_panel_multicountry): a three-latent-factor
model Y(0) = μ + L1·f1 + L2·f2 + L3·f3 + noise with a unit fixed effect; the second
outcome is 0.6·Y1 + ν + k·f1 + noise. Treated units C01–C04 are each a sparse convex
blend of three named donors (a near-perfect synthetic control provably exists); C05 is placed
outside the donor hull (loadings beyond every donor) to stress-test the methods. The injected
effect on gdp_index is a jump at adoption plus a yearly ramp
(post·jump + slope·(year − adopt)), with a correlated 0.6× effect on trade_index.
The datasets
Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.
expand to search (Ctrl/⌘+F) or print across all datasets
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
country identifier | Country identifier | Synthetic unit name (panel id). | Two-country file: 'Atlantia (treated)' / 'Borealis (control)'. Multi-country file: C01-C25 (C01-C05 treated, C06-C25 donors). | string | Simulation | |
year year | Calendar year | Annual time index. | Two-country file 2000-2023; multi-country file 1985-2023. | year | Simulation | |
outcome continuous | Outcome series (two-country file) | The single observed outcome path for each unit in the intuition example. | Common trend 40 + 1.2*t + 3*sin(2*pi*t/9) + N(0,0.6); treated also adds the injected post-2012 effect. | index | Simulation | |
treat dummy | Treatment indicator (two-country file) | 1 from the intervention year onward for the treated unit, else 0. | as.integer(year >= 2012) for Atlantia; 0 for Borealis. | 0/1 | Simulation | |
role identifier | Unit role (two-country file) | Whether the unit is the treated unit or the control/counterfactual. | 'treated' for Atlantia, 'control' for Borealis. | category | Simulation |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
country | – | 100% | 48 | 2 | — | — | — | — | — |
year | – | 100% | 48 | 24 | 2000 | 2011.5 | 2011 | 2023 | 7.00 |
outcome | 100% | 48 | 48 | 39.70 | 56.51 | 55.72 | 85.01 | 12.01 | |
treat | 100% | 48 | 2 | 0 | 0.250 | 0 | 1.00 | 0.438 | |
role | – | 100% | 48 | 2 | — | — | — | — | — |
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
country identifier | Country identifier | Synthetic unit name (panel id). | Two-country file: 'Atlantia (treated)' / 'Borealis (control)'. Multi-country file: C01-C25 (C01-C05 treated, C06-C25 donors). | string | Simulation | |
year year | Calendar year | Annual time index. | Two-country file 2000-2023; multi-country file 1985-2023. | year | Simulation | |
treated_unit dummy | Treated-unit flag (multi-country file) | 1 if the country is ever treated (C01-C05), 0 for a never-treated donor. | 1 for the five treated units, 0 for the twenty donors. | 0/1 | Simulation | |
adopt_year year | Adoption year (multi-country file) | The year treatment switches on for a treated unit; missing for donors. | C01/C02 2010, C03 2013, C04/C05 2016; NA for donors. | year | Simulation | |
treat_ms dummy | Time-varying treatment indicator (multi-country file) | 1 from a treated unit's adoption year onward, else 0 (the multisynth treatment column). | as.integer(year >= adopt_year) for treated units; 0 for donors. | 0/1 | Simulation | |
gdp_index continuous | Primary outcome (observed GDP index) | The primary observed outcome; for treated units it includes the injected effect. | Y(0) from a three-factor model (mu + L1*f1 + L2*f2 + L3*f3 + noise) plus the injected effect for treated units. | index | Simulation | |
trade_index continuous | Secondary outcome (observed trade index) | A second correlated observed outcome used by augsynth_multiout. | 0.6*gdp_index_cf + nu + k*f1 + noise, plus 0.6x the gdp injected effect for treated units. | index | Simulation | |
gdp_index_cf continuous | True GDP counterfactual (Y(0)) | The untreated potential outcome for gdp_index — the true counterfactual revealed for grading. | Y(0) from the factor model, before adding any injected effect (equals gdp_index for donors). | index | Simulation (ground truth) | |
trade_index_cf continuous | True trade counterfactual (Y(0)) | The untreated potential outcome for trade_index — the true counterfactual revealed for grading. | Second-outcome Y(0) before adding any injected effect (equals trade_index for donors). | index | Simulation (ground truth) | |
true_effect_gdp continuous | True injected GDP effect | The known treatment effect on gdp_index (gdp_index minus gdp_index_cf); 0 before adoption and for donors. | post*jump + slope*(year - adopt): a jump at adoption plus a yearly ramp. | index units | Simulation (ground truth) | |
true_effect_trade continuous | True injected trade effect | The known treatment effect on trade_index; 0 before adoption and for donors. | 0.6 * true_effect_gdp. | index units | Simulation (ground truth) |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
country | – | 100% | 975 | 25 | — | — | — | — | — |
year | – | 100% | 975 | 39 | 1985 | 2004.0 | 2004 | 2023 | 11.26 |
treated_unit | 100% | 975 | 2 | 0 | 0.200 | 0 | 1.00 | 0.400 | |
adopt_year | – | 20% | 195 | 3 | 2010 | 2013.0 | 2013 | 2016 | 2.69 |
treat_ms | 100% | 975 | 2 | 0 | 0.056 | 0 | 1.00 | 0.231 | |
gdp_index | 100% | 975 | 975 | 3.84 | 10.84 | 10.64 | 21.33 | 2.88 | |
trade_index | 100% | 975 | 975 | 3.47 | 8.52 | 8.50 | 16.83 | 2.09 | |
gdp_index_cf | 100% | 975 | 975 | 3.84 | 10.60 | 10.52 | 20.29 | 2.61 | |
trade_index_cf | 100% | 975 | 975 | 3.47 | 8.38 | 8.40 | 16.83 | 1.97 | |
true_effect_gdp | 100% | 975 | 40 | -1.35 | 0.246 | 0 | 9.50 | 1.22 | |
true_effect_trade | 100% | 975 | 40 | -0.810 | 0.148 | 0 | 5.70 | 0.734 |
Known limitations & caveats
- Synthetic data. Both files are fully simulated; results are internally consistent with the calibration but are not empirical evidence about real-world policy effects. The real euro-area data underlying Part 2 is the Penn World Table panel in the post's reference/ folder, which is not documented here.
- C05 is outside the donor hull by design. No convex blend of the donors can reproduce its pre-treatment path, so plain SCM mis-signs its effect; this is intentional, to show where Ridge augmentation earns its keep.
- Ground-truth columns ship with the data. gdp_index_cf / trade_index_cf (the true counterfactuals) and true_effect_gdp / true_effect_trade (the injected effects) are revealed here for grading; real datasets never give you these.
- Significance is a choice. On these simulated panels the injected effects are real, so headlines are significant — but the jackknife and the wild bootstrap can still disagree (the pooled multisynth effect is significant under the jackknife yet not under the bootstrap). Match the inference tool to the estimator and report when they disagree.