← Back to the post
Interactive data dictionary

Carbon Taxes and CO2 Emissions: Sweden as a Case Study

Replication data for a synthetic-control & IV analysis in Python of Sweden's 1991 carbon tax.

2
datasets
14
variables
15
countries
1960–2005
years

Downloads

Each dataset is available as a labeled Stata .dta and its source file.

⇩ Download all data (ZIP)stata_codebook.do

DatasetGrainRowsStataSource
carbontax_datacountry-year690 × 9carbontax_data.dtacarbontax_data.dta
disentangling_datayear (Sweden only)46 × 6disentangling_data.dtadisentangling_data.dta

Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.

Load directly in code

Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.

Stata

* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_sc_co2tax/data/"
use "${BASE}carbontax_data.dta", clear
describe
notes

Python

!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_sc_co2tax/data/"
df = pd.read_stata(BASE + "carbontax_data.dta")

# load every dataset at once
files = ["carbontax_data", "disentangling_data"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}

# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "carbontax_data.dta", "carbontax_data.dta")
df, meta = pyreadstat.read_dta("carbontax_data.dta")

Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb

R

# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_sc_co2tax/data/"
df <- read_dta(paste0(BASE, "carbontax_data.dta"))

Overview & sources

Companion data for a Python tutorial that replicates Andersson (2019, AEJ: Economic Policy) on whether Sweden's 1991 carbon tax cut transport CO2 emissions — and at what economic cost. The headline dataset is an OECD panel of 15 advanced economies observed over 1960–2005 (46 years; 30 pre-treatment, 16 post-treatment), measuring per-capita CO2 emissions from transport in metric tons alongside the predictors used to build the counterfactual (GDP per capita, vehicles, gasoline consumption, urbanisation, population density). The post layers a naive before/after comparison, difference-in-differences, and the synthetic-control method (pysyncon) — validated by in-time, in-space, and leave-one-out placebo tests — then OLS/IV demand regressions (pyfixest). Synthetic Sweden, built from six donors (Denmark, Belgium, New Zealand, Greece, United States, Switzerland), implies an average 11.3% annual reduction over 1990–2005 with no measurable growth penalty. A second file carries the three counterfactual emission paths used to disentangle the carbon tax from the bundled VAT.

Two files. carbontax_data is the OECD country panel (one row per country × year; 15 countries, balanced 1960–2005) carrying the transport-CO2 outcome and the synthetic-control predictors. disentangling_data is a Sweden-only annual time series (one row per year, 1960–2005) holding three simulated counterfactual emission paths (carbon-tax + VAT, no carbon tax / with VAT, no carbon tax / no VAT) plus two reduction series — the inputs to the carbon-tax-versus-VAT decomposition.

Data sources

SourceProvidesReference / URL
Andersson (2019)Replicated study; data, donor pool, synthetic-control design, OLS/IV specificationsAndersson, J. J. (2019). Carbon Taxes and CO2 Emissions: Sweden as a Case Study. American Economic Journal: Economic Policy, 11(4), 1–30. https://doi.org/10.1257/pol.20170144
Graefe (2020) — RTutorBundled replication package (exercise sequence + the .dta material files)Graefe, T. (2020). RTutor: Carbon Taxes and CO2 Emissions. https://github.com/TheresaGraefe/RTutorCarbonTaxesAndCO2Emissions
Underlying data providers (via Andersson 2019)OECD/IEA transport CO2 &amp; fuel data; Penn World Table / World Bank GDP, vehicles, urbanisation, populationSee Andersson (2019), Data section, for the full provenance of each series across the 15 OECD countries.
Method referencesSynthetic-control estimator and inferenceAbadie, Diamond & Hainmueller (2010, JASA; 2015, AJPS); Abadie & Gardeazabal (2003, AER).

Cite this data

Please cite this dataset as follows.

APA

Mendez, C. (2026). Carbon Taxes and CO2 Emissions: A Synthetic-Control Analysis in Python [Data set]. https://carlos-mendez.org/post/python_sc_co2tax/

Andersson, J. J. (2019). Carbon Taxes and CO2 Emissions: Sweden as a Case Study. American Economic Journal: Economic Policy, 11(4), 1–30. https://doi.org/10.1257/pol.20170144 — Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program. Journal of the American Statistical Association, 105(490), 493–505.

BibTeX

@misc{mendez2026pythonscco2tax,
  author       = {Mendez, Carlos},
  title        = {Carbon Taxes and CO2 Emissions: A Synthetic-Control Analysis in Python},
  year         = {2026},
  howpublished = {\url{https://carlos-mendez.org/post/python_sc_co2tax/}},
  note         = {Data set}
}

@article{andersson2019carbon,
  author  = {Andersson, Julius J.},
  title   = {Carbon Taxes and {CO2} Emissions: Sweden as a Case Study},
  journal = {American Economic Journal: Economic Policy},
  volume  = {11}, number = {4}, pages = {1--30}, year = {2019},
  doi     = {10.1257/pol.20170144}
}
@article{abadie2010synthetic,
  author  = {Abadie, Alberto and Diamond, Alexis and Hainmueller, Jens},
  title   = {Synthetic Control Methods for Comparative Case Studies},
  journal = {Journal of the American Statistical Association},
  volume  = {105}, number = {490}, pages = {493--505}, year = {2010}
}

Variable explorer search & filter all 14 variables

Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.

VariableTypeDistributionLabelDefinitionUnitsIn filesSource
CO2_reductions_simulation#continuousmin -0.785 | median -2.51e-06 | max 6.74e-06Simulated CO2 reduction seriesSimulated per-capita CO2 reduction attributable to the reform (model-based).metric tons / capitadisentangling_dataSimulated (Andersson 2019)
CO2_reductions_synth#continuousmin -0.383 | median -0.0302 | max 0.0751Synthetic-control CO2 reduction seriesPer-capita CO2 reduction implied by the Synthetic-Sweden gap (Sweden minus synthetic).metric tons / capitadisentangling_dataComputed via synthetic control (Andersson 2019)
CO2_transport_capita#continuousmin 0.2 | median 1.81 | max 6.06Transport CO2 per capita (metric tons)Per-capita carbon-dioxide emissions from the transport sector — the outcome variable.metric tons / capitacarbontax_dataOECD/IEA via Andersson (2019)
CarbonTaxandVAT#continuousmin 1.77 | median 2.25 | max 2.48Counterfactual: carbon tax + VAT (actual)Simulated transport CO2 with all three components active (carbon tax, VAT, energy tax) — the factual path.metric tons / capitadisentangling_dataSimulated (Andersson 2019)
Countryno#identifierCountry number (1-15)Integer identifier for each of the 15 OECD countries in the panel.integer codecarbontax_dataAndersson (2019) replication file
GDP_per_capita#continuousmin 3.66e+03 | median 1.87e+04 | max 4.32e+04GDP per capita (real US$)Real gross domestic product per capita — a synthetic-control predictor.US$ per capita (real)carbontax_dataPWT / World Bank via Andersson (2019)
NoCarbonTaxNoVAT#continuousmin 1.77 | median 2.42 | max 3.1Counterfactual: no carbon tax, no VATSimulated transport CO2 if both the carbon tax and the VAT were removed (energy tax only).metric tons / capitadisentangling_dataSimulated (Andersson 2019)
NoCarbonTaxWithVAT#continuousmin 1.77 | median 2.34 | max 2.92Counterfactual: no carbon tax, with VATSimulated transport CO2 if the carbon tax were removed but the VAT retained.metric tons / capitadisentangling_dataSimulated (Andersson 2019)
country#identifierCountry nameName of the OECD country.stringcarbontax_dataAndersson (2019) replication file
gas_cons_capita#continuousmin 17.4 | median 304 | max 1.41e+03Gasoline consumption per capitaPer-capita road-transport gasoline consumption — a synthetic-control predictor.kg oil-equivalent / capita (approx.)carbontax_dataIEA via Andersson (2019)
pop_density#continuousmin 1.34 | median 79.4 | max 351Population densityPeople per unit of land area (auxiliary covariate; not in the headline predictor set).persons / km^2 (approx.)carbontax_dataWorld Bank via Andersson (2019)
urban_pop#continuousmin 35 | median 76 | max 97.4Urban population share (%)Share of the national population living in urban areas — a synthetic-control predictor.% (0-100)carbontax_dataWorld Bank via Andersson (2019)
vehicles_capita#continuousmin 8.03 | median 380 | max 825Motor vehicles per capitaPer-capita stock of motor vehicles — a synthetic-control predictor.vehicles per capitacarbontax_dataAndersson (2019) replication file
year#yearCalendar yearAnnual time index (the panel time variable).yearcarbontax_data, disentangling_dataAndersson (2019) replication file

Cross-file variable index

Which file each variable appears in (● = present).

Construction & formulas

The outcome throughout is per-capita CO2 emissions from transport (CO2_transport_capita), in metric tons per person per year.

Synthetic control. Let X₁ be the pre-treatment predictor vector for Sweden and X₀ the same predictors for the donor countries (one column per donor). The donor weights w minimise the weighted distance

Treatment gap: year-by-year gapₜ = CO2ₙₜ(Sweden) − CO2ₙₜ(Synthetic Sweden), where the synthetic series is Σ ₃ w₃ · CO2ₙₜ(donor j).

Permutation inference: re-run the SCM treating each unit as treated; rank by the post/pre MSPE ratio MSPEₙₒₛₜ / MSPEₙₕₑ; the p-value is the share of units with a ratio ≥ Sweden's (with 15 units the floor is 1/15 ≈ 0.067).

Disentangling. The three columns in disentangling_data are simulated emission paths under different tax bundles. The vertical gap between two paths attributes the CO2 reduction to the component switched between them — e.g. (NoCarbonTaxWithVAT − CarbonTaxandVAT) / NoCarbonTaxWithVAT is the carbon-tax-only share.

The datasets

Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.

expand to search (Ctrl/⌘+F) or print across all datasets

country-year  690 × 9 · 1960-2005 · 15 OECD countries (balanced)

Panel key: country (Countryno) x year · DiD and Synthetic Sweden: the outcome (transport CO2) plus the predictors that build the counterfactual.

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
Countryno identifierCountry number (1-15)Integer identifier for each of the 15 OECD countries in the panel.Sequential code 1..15 assigned in alphabetical country order (1=Australia ... 15=United States).integer codeAndersson (2019) replication file15 countries
country identifierCountry nameName of the OECD country.One of the 15 advanced economies: Australia, Belgium, Canada, Denmark, France, Greece, Iceland, Japan, New Zealand, Poland, Portugal, Spain, Sweden, Switzerland, United States.stringAndersson (2019) replication file15 countries
year yearCalendar yearAnnual time index (the panel time variable).1960-2005, balanced for every country.yearAndersson (2019) replication file1960-2005
CO2_transport_capita continuousTransport CO2 per capita (metric tons)Per-capita carbon-dioxide emissions from the transport sector — the outcome variable.Total transport-sector CO2 emissions divided by population, per country-year.metric tons / capitaOECD/IEA via Andersson (2019)country-year panel
GDP_per_capita continuousGDP per capita (real US$)Real gross domestic product per capita — a synthetic-control predictor.Real GDP per capita (PPP-adjusted in Andersson's source); averaged over 1980-1989 as an SCM predictor.US$ per capita (real)PWT / World Bank via Andersson (2019)680 of 690 rows (10 missing)
gas_cons_capita continuousGasoline consumption per capitaPer-capita road-transport gasoline consumption — a synthetic-control predictor.Gasoline consumption divided by population, per country-year; averaged 1980-1989 as an SCM predictor.kg oil-equivalent / capita (approx.)IEA via Andersson (2019)country-year panel
vehicles_capita continuousMotor vehicles per capitaPer-capita stock of motor vehicles — a synthetic-control predictor.Number of registered vehicles divided by population (often expressed per 1,000 people), per country-year; averaged 1980-1989 as an SCM predictor.vehicles per capitaAndersson (2019) replication filecountry-year panel
urban_pop continuousUrban population share (%)Share of the national population living in urban areas — a synthetic-control predictor.Urban population as a percentage of total population, per country-year; averaged 1980-1989 as an SCM predictor.% (0-100)World Bank via Andersson (2019)country-year panel
pop_density continuousPopulation densityPeople per unit of land area (auxiliary covariate; not in the headline predictor set).Total population divided by land area, per country-year.persons / km^2 (approx.)World Bank via Andersson (2019)country-year panel

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
Countryno100%69015
country100%69015
year100%6904619601982.51982200513.29
CO2_transport_capitamin 0.2 | median 1.81 | max 6.06100%6906900.2002.071.816.061.34
GDP_per_capitamin 3.66e+03 | median 1.87e+04 | max 4.32e+0499%6806803,656.619,29118,67243,2128,337.7
gas_cons_capitamin 17.4 | median 304 | max 1.41e+03100%69069017.40399.5304.21,405.1314.1
vehicles_capitamin 8.03 | median 380 | max 825100%6906908.03374.8379.9825.0193.6
urban_popmin 35 | median 76 | max 97.4100%69067134.9574.6276.0597.4013.12
pop_densitymin 1.34 | median 79.4 | max 351100%6906891.3495.8379.39350.597.39

year (Sweden only)  46 × 6 · 1960-2005 · Sweden; scenario columns defined from 1970

Panel key: year · Separate the carbon-tax-only effect from the bundled-VAT effect via three simulated emission paths.

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
year yearCalendar yearAnnual time index (the panel time variable).1960-2005, balanced for every country.yearAndersson (2019) replication file1960-2005
CO2_reductions_simulation continuousSimulated CO2 reduction seriesSimulated per-capita CO2 reduction attributable to the reform (model-based).Difference between simulated actual and counterfactual emission paths for Sweden, per year.metric tons / capitaSimulated (Andersson 2019)Sweden, 1960-2005
CO2_reductions_synth continuousSynthetic-control CO2 reduction seriesPer-capita CO2 reduction implied by the Synthetic-Sweden gap (Sweden minus synthetic).Sweden actual transport CO2 minus Synthetic-Sweden transport CO2, per year.metric tons / capitaComputed via synthetic control (Andersson 2019)Sweden, 1960-2005
CarbonTaxandVAT continuousCounterfactual: carbon tax + VAT (actual)Simulated transport CO2 with all three components active (carbon tax, VAT, energy tax) — the factual path.Demand model simulated at the actual price/tax bundle, per year.metric tons / capitaSimulated (Andersson 2019)Sweden, 1970-2005 (36 rows)
NoCarbonTaxWithVAT continuousCounterfactual: no carbon tax, with VATSimulated transport CO2 if the carbon tax were removed but the VAT retained.Demand model simulated with the carbon tax switched off and VAT on, per year.metric tons / capitaSimulated (Andersson 2019)Sweden, 1970-2005 (36 rows)
NoCarbonTaxNoVAT continuousCounterfactual: no carbon tax, no VATSimulated transport CO2 if both the carbon tax and the VAT were removed (energy tax only).Demand model simulated with carbon tax and VAT both switched off, per year.metric tons / capitaSimulated (Andersson 2019)Sweden, 1970-2005 (36 rows)

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
year100%464619601982.51982200513.42
CO2_reductions_simulationmin -0.785 | median -2.51e-06 | max 6.74e-06100%4636-0.785-0.149-2.51e-066.74e-060.232
CO2_reductions_synthmin -0.383 | median -0.0302 | max 0.0751100%4646-0.383-0.099-0.0300.0750.150
CarbonTaxandVATmin 1.77 | median 2.25 | max 2.4878%36361.772.172.252.480.221
NoCarbonTaxWithVATmin 1.77 | median 2.34 | max 2.9278%36361.772.282.342.920.343
NoCarbonTaxNoVATmin 1.77 | median 2.42 | max 3.178%36361.772.362.423.100.417

Known limitations & caveats