Downloads
Each dataset is available as a labeled Stata .dta and its source file.
⇩ Download all data (ZIP)stata_codebook.do
| Dataset | Grain | Rows | Stata | Source |
|---|---|---|---|---|
carbontax_data | country-year | 690 × 9 | carbontax_data.dta | carbontax_data.dta |
disentangling_data | year (Sweden only) | 46 × 6 | disentangling_data.dta | disentangling_data.dta |
Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.
Load directly in code
Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.
Stata
* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_sc_co2tax/data/"
use "${BASE}carbontax_data.dta", clear
describe
notesPython
!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_sc_co2tax/data/"
df = pd.read_stata(BASE + "carbontax_data.dta")
# load every dataset at once
files = ["carbontax_data", "disentangling_data"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}
# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "carbontax_data.dta", "carbontax_data.dta")
df, meta = pyreadstat.read_dta("carbontax_data.dta")Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb
R
# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_sc_co2tax/data/"
df <- read_dta(paste0(BASE, "carbontax_data.dta"))Overview & sources
Companion data for a Python tutorial that replicates Andersson (2019, AEJ: Economic Policy) on whether Sweden's 1991 carbon tax cut transport CO2 emissions — and at what economic cost. The headline dataset is an OECD panel of 15 advanced economies observed over 1960–2005 (46 years; 30 pre-treatment, 16 post-treatment), measuring per-capita CO2 emissions from transport in metric tons alongside the predictors used to build the counterfactual (GDP per capita, vehicles, gasoline consumption, urbanisation, population density). The post layers a naive before/after comparison, difference-in-differences, and the synthetic-control method (pysyncon) — validated by in-time, in-space, and leave-one-out placebo tests — then OLS/IV demand regressions (pyfixest). Synthetic Sweden, built from six donors (Denmark, Belgium, New Zealand, Greece, United States, Switzerland), implies an average 11.3% annual reduction over 1990–2005 with no measurable growth penalty. A second file carries the three counterfactual emission paths used to disentangle the carbon tax from the bundled VAT.
carbontax_data is the OECD country panel (one row per country × year; 15 countries, balanced 1960–2005) carrying the transport-CO2 outcome and the synthetic-control predictors. disentangling_data is a Sweden-only annual time series (one row per year, 1960–2005) holding three simulated counterfactual emission paths (carbon-tax + VAT, no carbon tax / with VAT, no carbon tax / no VAT) plus two reduction series — the inputs to the carbon-tax-versus-VAT decomposition.
Data sources
| Source | Provides | Reference / URL |
|---|---|---|
| Andersson (2019) | Replicated study; data, donor pool, synthetic-control design, OLS/IV specifications | Andersson, J. J. (2019). Carbon Taxes and CO2 Emissions: Sweden as a Case Study. American Economic Journal: Economic Policy, 11(4), 1–30. https://doi.org/10.1257/pol.20170144 |
| Graefe (2020) — RTutor | Bundled replication package (exercise sequence + the .dta material files) | Graefe, T. (2020). RTutor: Carbon Taxes and CO2 Emissions. https://github.com/TheresaGraefe/RTutorCarbonTaxesAndCO2Emissions |
| Underlying data providers (via Andersson 2019) | OECD/IEA transport CO2 & fuel data; Penn World Table / World Bank GDP, vehicles, urbanisation, population | See Andersson (2019), Data section, for the full provenance of each series across the 15 OECD countries. |
| Method references | Synthetic-control estimator and inference | Abadie, Diamond & Hainmueller (2010, JASA; 2015, AJPS); Abadie & Gardeazabal (2003, AER). |
Cite this data
Please cite this dataset as follows.
APA
Mendez, C. (2026). Carbon Taxes and CO2 Emissions: A Synthetic-Control Analysis in Python [Data set]. https://carlos-mendez.org/post/python_sc_co2tax/
Andersson, J. J. (2019). Carbon Taxes and CO2 Emissions: Sweden as a Case Study. American Economic Journal: Economic Policy, 11(4), 1–30. https://doi.org/10.1257/pol.20170144 — Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program. Journal of the American Statistical Association, 105(490), 493–505.BibTeX
@misc{mendez2026pythonscco2tax,
author = {Mendez, Carlos},
title = {Carbon Taxes and CO2 Emissions: A Synthetic-Control Analysis in Python},
year = {2026},
howpublished = {\url{https://carlos-mendez.org/post/python_sc_co2tax/}},
note = {Data set}
}
@article{andersson2019carbon,
author = {Andersson, Julius J.},
title = {Carbon Taxes and {CO2} Emissions: Sweden as a Case Study},
journal = {American Economic Journal: Economic Policy},
volume = {11}, number = {4}, pages = {1--30}, year = {2019},
doi = {10.1257/pol.20170144}
}
@article{abadie2010synthetic,
author = {Abadie, Alberto and Diamond, Alexis and Hainmueller, Jens},
title = {Synthetic Control Methods for Comparative Case Studies},
journal = {Journal of the American Statistical Association},
volume = {105}, number = {490}, pages = {493--505}, year = {2010}
}Variable explorer search & filter all 14 variables
Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.
| Variable | Type | Distribution | Label | Definition | Units | In files | Source |
|---|---|---|---|---|---|---|---|
CO2_reductions_simulation# | continuous | Simulated CO2 reduction series | Simulated per-capita CO2 reduction attributable to the reform (model-based). | metric tons / capita | disentangling_data | Simulated (Andersson 2019) | |
CO2_reductions_synth# | continuous | Synthetic-control CO2 reduction series | Per-capita CO2 reduction implied by the Synthetic-Sweden gap (Sweden minus synthetic). | metric tons / capita | disentangling_data | Computed via synthetic control (Andersson 2019) | |
CO2_transport_capita# | continuous | Transport CO2 per capita (metric tons) | Per-capita carbon-dioxide emissions from the transport sector — the outcome variable. | metric tons / capita | carbontax_data | OECD/IEA via Andersson (2019) | |
CarbonTaxandVAT# | continuous | Counterfactual: carbon tax + VAT (actual) | Simulated transport CO2 with all three components active (carbon tax, VAT, energy tax) — the factual path. | metric tons / capita | disentangling_data | Simulated (Andersson 2019) | |
Countryno# | identifier | – | Country number (1-15) | Integer identifier for each of the 15 OECD countries in the panel. | integer code | carbontax_data | Andersson (2019) replication file |
GDP_per_capita# | continuous | GDP per capita (real US$) | Real gross domestic product per capita — a synthetic-control predictor. | US$ per capita (real) | carbontax_data | PWT / World Bank via Andersson (2019) | |
NoCarbonTaxNoVAT# | continuous | Counterfactual: no carbon tax, no VAT | Simulated transport CO2 if both the carbon tax and the VAT were removed (energy tax only). | metric tons / capita | disentangling_data | Simulated (Andersson 2019) | |
NoCarbonTaxWithVAT# | continuous | Counterfactual: no carbon tax, with VAT | Simulated transport CO2 if the carbon tax were removed but the VAT retained. | metric tons / capita | disentangling_data | Simulated (Andersson 2019) | |
country# | identifier | – | Country name | Name of the OECD country. | string | carbontax_data | Andersson (2019) replication file |
gas_cons_capita# | continuous | Gasoline consumption per capita | Per-capita road-transport gasoline consumption — a synthetic-control predictor. | kg oil-equivalent / capita (approx.) | carbontax_data | IEA via Andersson (2019) | |
pop_density# | continuous | Population density | People per unit of land area (auxiliary covariate; not in the headline predictor set). | persons / km^2 (approx.) | carbontax_data | World Bank via Andersson (2019) | |
urban_pop# | continuous | Urban population share (%) | Share of the national population living in urban areas — a synthetic-control predictor. | % (0-100) | carbontax_data | World Bank via Andersson (2019) | |
vehicles_capita# | continuous | Motor vehicles per capita | Per-capita stock of motor vehicles — a synthetic-control predictor. | vehicles per capita | carbontax_data | Andersson (2019) replication file | |
year# | year | – | Calendar year | Annual time index (the panel time variable). | year | carbontax_data, disentangling_data | Andersson (2019) replication file |
Cross-file variable index
Which file each variable appears in (● = present).
| Variable | carbontax_data | disentangling_data |
|---|---|---|
CO2_reductions_simulation | ● | |
CO2_reductions_synth | ● | |
CO2_transport_capita | ● | |
CarbonTaxandVAT | ● | |
Countryno | ● | |
GDP_per_capita | ● | |
NoCarbonTaxNoVAT | ● | |
NoCarbonTaxWithVAT | ● | |
country | ● | |
gas_cons_capita | ● | |
pop_density | ● | |
urban_pop | ● | |
vehicles_capita | ● | |
year | ● | ● |
Construction & formulas
The outcome throughout is per-capita CO2 emissions from transport
(CO2_transport_capita), in metric tons per person per year.
Synthetic control. Let X₁ be the pre-treatment predictor
vector for Sweden and X₀ the same predictors for the donor countries (one column
per donor). The donor weights w minimise the weighted distance
w* = argminₙ (X₁ − X₀w)ᵀ V (X₁ − X₀w)subject tow ₃ ≥ 0andΣ ₃ w₃ = 1— a convex combination, no extrapolation.- The diagonal matrix
V(predictor importances) is itself chosen to minimise the pre-treatment mean squared prediction error (MSPE) of the outcome, CO2. - Predictors:
GDP_per_capita,vehicles_capita,gas_cons_capita,urban_pop(means over 1980–1989), plus three lagged outcome levels (CO2 in 1970, 1980, 1989).
Treatment gap: year-by-year
gapₜ = CO2ₙₜ(Sweden) − CO2ₙₜ(Synthetic Sweden),
where the synthetic series is Σ ₃ w₃ · CO2ₙₜ(donor j).
Permutation inference: re-run the SCM treating each unit as treated; rank by the
post/pre MSPE ratio MSPEₙₒₛₜ / MSPEₙₕₑ;
the p-value is the share of units with a ratio ≥ Sweden's (with 15 units the floor is 1/15 ≈ 0.067).
Disentangling. The three columns in disentangling_data are simulated
emission paths under different tax bundles. The vertical gap between two paths attributes the CO2
reduction to the component switched between them — e.g.
(NoCarbonTaxWithVAT − CarbonTaxandVAT) / NoCarbonTaxWithVAT is the carbon-tax-only share.
The datasets
Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.
expand to search (Ctrl/⌘+F) or print across all datasets
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
Countryno identifier | Country number (1-15) | Integer identifier for each of the 15 OECD countries in the panel. | Sequential code 1..15 assigned in alphabetical country order (1=Australia ... 15=United States). | integer code | Andersson (2019) replication file | 15 countries |
country identifier | Country name | Name of the OECD country. | One of the 15 advanced economies: Australia, Belgium, Canada, Denmark, France, Greece, Iceland, Japan, New Zealand, Poland, Portugal, Spain, Sweden, Switzerland, United States. | string | Andersson (2019) replication file | 15 countries |
year year | Calendar year | Annual time index (the panel time variable). | 1960-2005, balanced for every country. | year | Andersson (2019) replication file | 1960-2005 |
CO2_transport_capita continuous | Transport CO2 per capita (metric tons) | Per-capita carbon-dioxide emissions from the transport sector — the outcome variable. | Total transport-sector CO2 emissions divided by population, per country-year. | metric tons / capita | OECD/IEA via Andersson (2019) | country-year panel |
GDP_per_capita continuous | GDP per capita (real US$) | Real gross domestic product per capita — a synthetic-control predictor. | Real GDP per capita (PPP-adjusted in Andersson's source); averaged over 1980-1989 as an SCM predictor. | US$ per capita (real) | PWT / World Bank via Andersson (2019) | 680 of 690 rows (10 missing) |
gas_cons_capita continuous | Gasoline consumption per capita | Per-capita road-transport gasoline consumption — a synthetic-control predictor. | Gasoline consumption divided by population, per country-year; averaged 1980-1989 as an SCM predictor. | kg oil-equivalent / capita (approx.) | IEA via Andersson (2019) | country-year panel |
vehicles_capita continuous | Motor vehicles per capita | Per-capita stock of motor vehicles — a synthetic-control predictor. | Number of registered vehicles divided by population (often expressed per 1,000 people), per country-year; averaged 1980-1989 as an SCM predictor. | vehicles per capita | Andersson (2019) replication file | country-year panel |
urban_pop continuous | Urban population share (%) | Share of the national population living in urban areas — a synthetic-control predictor. | Urban population as a percentage of total population, per country-year; averaged 1980-1989 as an SCM predictor. | % (0-100) | World Bank via Andersson (2019) | country-year panel |
pop_density continuous | Population density | People per unit of land area (auxiliary covariate; not in the headline predictor set). | Total population divided by land area, per country-year. | persons / km^2 (approx.) | World Bank via Andersson (2019) | country-year panel |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
Countryno | – | 100% | 690 | 15 | — | — | — | — | — |
country | – | 100% | 690 | 15 | — | — | — | — | — |
year | – | 100% | 690 | 46 | 1960 | 1982.5 | 1982 | 2005 | 13.29 |
CO2_transport_capita | 100% | 690 | 690 | 0.200 | 2.07 | 1.81 | 6.06 | 1.34 | |
GDP_per_capita | 99% | 680 | 680 | 3,656.6 | 19,291 | 18,672 | 43,212 | 8,337.7 | |
gas_cons_capita | 100% | 690 | 690 | 17.40 | 399.5 | 304.2 | 1,405.1 | 314.1 | |
vehicles_capita | 100% | 690 | 690 | 8.03 | 374.8 | 379.9 | 825.0 | 193.6 | |
urban_pop | 100% | 690 | 671 | 34.95 | 74.62 | 76.05 | 97.40 | 13.12 | |
pop_density | 100% | 690 | 689 | 1.34 | 95.83 | 79.39 | 350.5 | 97.39 |
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
year year | Calendar year | Annual time index (the panel time variable). | 1960-2005, balanced for every country. | year | Andersson (2019) replication file | 1960-2005 |
CO2_reductions_simulation continuous | Simulated CO2 reduction series | Simulated per-capita CO2 reduction attributable to the reform (model-based). | Difference between simulated actual and counterfactual emission paths for Sweden, per year. | metric tons / capita | Simulated (Andersson 2019) | Sweden, 1960-2005 |
CO2_reductions_synth continuous | Synthetic-control CO2 reduction series | Per-capita CO2 reduction implied by the Synthetic-Sweden gap (Sweden minus synthetic). | Sweden actual transport CO2 minus Synthetic-Sweden transport CO2, per year. | metric tons / capita | Computed via synthetic control (Andersson 2019) | Sweden, 1960-2005 |
CarbonTaxandVAT continuous | Counterfactual: carbon tax + VAT (actual) | Simulated transport CO2 with all three components active (carbon tax, VAT, energy tax) — the factual path. | Demand model simulated at the actual price/tax bundle, per year. | metric tons / capita | Simulated (Andersson 2019) | Sweden, 1970-2005 (36 rows) |
NoCarbonTaxWithVAT continuous | Counterfactual: no carbon tax, with VAT | Simulated transport CO2 if the carbon tax were removed but the VAT retained. | Demand model simulated with the carbon tax switched off and VAT on, per year. | metric tons / capita | Simulated (Andersson 2019) | Sweden, 1970-2005 (36 rows) |
NoCarbonTaxNoVAT continuous | Counterfactual: no carbon tax, no VAT | Simulated transport CO2 if both the carbon tax and the VAT were removed (energy tax only). | Demand model simulated with carbon tax and VAT both switched off, per year. | metric tons / capita | Simulated (Andersson 2019) | Sweden, 1970-2005 (36 rows) |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
year | – | 100% | 46 | 46 | 1960 | 1982.5 | 1982 | 2005 | 13.42 |
CO2_reductions_simulation | 100% | 46 | 36 | -0.785 | -0.149 | -2.51e-06 | 6.74e-06 | 0.232 | |
CO2_reductions_synth | 100% | 46 | 46 | -0.383 | -0.099 | -0.030 | 0.075 | 0.150 | |
CarbonTaxandVAT | 78% | 36 | 36 | 1.77 | 2.17 | 2.25 | 2.48 | 0.221 | |
NoCarbonTaxWithVAT | 78% | 36 | 36 | 1.77 | 2.28 | 2.34 | 2.92 | 0.343 | |
NoCarbonTaxNoVAT | 78% | 36 | 36 | 1.77 | 2.36 | 2.42 | 3.10 | 0.417 |
Known limitations & caveats
- Single-country case. Sweden is one treated unit; external validity to larger emitters or developing economies is not guaranteed.
- Donor-pool size caps the p-value. With 15 countries the smallest possible permutation p-value is 1/15 ≈ 0.067 — exactly the value the post hits.
- Coverage gaps.
GDP_per_capitais missing for 10 of 690 country-year rows (688 present); the synthetic-control predictors are averaged over windows where data exist. - Simulated counterfactual paths. The three
*CarbonTax*/*VAT*columns indisentangling_dataare model-simulated emission scenarios (defined only from 1970 onward), not raw observations; they normalise differently from the synthetic-control baseline, so the carbon-tax-only share (≈9.5% here) and Andersson's headline (6.3%) describe the same physical wedge under different denominators. - Window ends in 2005. The panel predates the electric-vehicle surge and later EU climate policy; re-running with newer data would test persistence.