← Back to the post
Interactive data dictionary

What Does TWFE Actually Do? Manual Demeaning and the FWL Theorem

Companion data for an R tutorial proving that two-way fixed effects is OLS on two-way demeaned data, on a synthetic Barro convergence panel.

2
datasets
17
variables
150
countries
8 periods
time span

Downloads

Each dataset is available as a labeled Stata .dta and its source file.

⇩ Download all data (ZIP)stata_codebook.do

DatasetGrainRowsStataSource
source_datacountry-year (period)1,200 × 11source_data.dtasource_data.csv
data_demeanedcountry-year (period)1,200 × 14data_demeaned.dtadata_demeaned.csv

Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.

Load directly in code

Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.

Stata

* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_demeaning_twfe/data/"
use "${BASE}source_data.dta", clear
describe
notes

Python

!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_demeaning_twfe/data/"
df = pd.read_stata(BASE + "source_data.dta")

# load every dataset at once
files = ["source_data", "data_demeaned"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}

# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "source_data.dta", "source_data.dta")
df, meta = pyreadstat.read_dta("source_data.dta")

Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb

R

# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_demeaning_twfe/data/"
df <- read_dta(paste0(BASE, "source_data.dta"))

Overview & sources

Companion data for a hands-on R tutorial that takes the two-way fixed effects (TWFE) estimator apart to show it is nothing more than ordinary least squares applied to two-way demeaned data — the equivalence guaranteed by the Frisch–Waugh–Lovell (FWL) theorem. The tutorial uses a balanced, synthetic Barro convergence panel of 150 countries observed over 8 time periods (1,200 observations), regressing GDP-per-capita growth on log initial income, investment share, population growth, human capital, and government consumption. It estimates the model with country and time fixed effects via fixest::feols(), then replicates the coefficients by hand — subtracting country means, subtracting time means, and adding back the grand mean before running base R's lm(). The two routes match to at least 12 significant digits (the convergence coefficient is −0.055286 either way; the largest coefficient difference is 3.05×10−16, on the order of machine epsilon), while naive lm() standard errors understate uncertainty by 7–22% because they ignore the degrees of freedom consumed by the absorbed fixed effects.

Two files. source_data is the raw balanced country×period panel as loaded for the analysis (one row per country×year, 1,200 rows). data_demeaned is the within-transformed analysis dataset: the same 1,200 rows carrying the six model variables in both their raw form and their two-way demeaned form (the _dm columns), each equal to the observed value minus its country mean minus its time mean plus the grand mean.

Data sources

SourceProvidesReference / URL
Synthetic (this study)All values — a simulated balanced Barro convergence panel (open &amp; reproducible)Mendez, C. (2026). See the post's R script analysis.R and referenceMaterials/manual_demeaning_twfe_tutorial.qmd for the data-generating process.
Frisch–Waugh–Lovell theoremThe result that guarantees the TWFE / OLS-on-demeaned equivalenceFrisch, R., & Waugh, F. V. (1933). Partial Time Regressions as Compared with Individual Trends. Econometrica, 1(4), 387–401. Lovell, M. C. (1963). Seasonal Adjustment of Economic Time Series and Multiple Regression Analysis. JASA, 58(304), 993–1010.
Method referencesTwo-way fixed effects / within transformation, growth convergence, and the estimatorBerge, L. (2018). fixest: Fast Fixed-Effects Estimations (R package). Barro, R. J., & Sala-i-Martin, X. (2004). Economic Growth (2nd ed.). MIT Press.

Cite this data

Please cite this dataset as follows.

APA

Mendez, C. (2026). What Does TWFE Actually Do? Manual Demeaning and the FWL Theorem [Data set]. https://carlos-mendez.org/post/r_demeaning_twfe/

Frisch, R., & Waugh, F. V. (1933). Partial Time Regressions as Compared with Individual Trends. Econometrica, 1(4), 387–401. Lovell, M. C. (1963). Seasonal Adjustment of Economic Time Series and Multiple Regression Analysis. Journal of the American Statistical Association, 58(304), 993–1010.

BibTeX

@misc{mendez2026rdemeaningtwfe,
  author       = {Mendez, Carlos},
  title        = {What Does TWFE Actually Do? Manual Demeaning and the FWL Theorem},
  year         = {2026},
  howpublished = {\url{https://carlos-mendez.org/post/r_demeaning_twfe/}},
  note         = {Data set}
}

@article{frisch1933partial,
  author  = {Frisch, Ragnar and Waugh, Frederick V.},
  title   = {Partial Time Regressions as Compared with Individual Trends},
  journal = {Econometrica},
  volume  = {1}, number = {4}, pages = {387--401}, year = {1933}
}
@article{lovell1963seasonal,
  author  = {Lovell, Michael C.},
  title   = {Seasonal Adjustment of Economic Time Series and Multiple Regression Analysis},
  journal = {Journal of the American Statistical Association},
  volume  = {58}, number = {304}, pages = {993--1010}, year = {1963}
}

Variable explorer search & filter all 17 variables

Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.

VariableTypeDistributionLabelDefinitionUnitsIn filesSource
gov_cons#continuousmin 0.0704 | median 0.145 | max 0.22Government consumption shareGovernment consumption as a share of GDP — a control regressor.0-1 (share)source_data, data_demeanedSimulation
gov_cons_dm#continuousmin -0.0507 | median -5.08e-05 | max 0.0404Demeaned government consumption shareTwo-way demeaned government consumption share (deviation from country + time means, grand mean restored).share (deviation)data_demeanedDerived (within transform)
growth#continuousmin -0.238 | median -0.122 | max -0.00398GDP per capita growth (dependent variable)Annualized GDP-per-capita growth rate — the outcome regressed in the TWFE model.rate (per year)source_data, data_demeanedSimulation
growth_dm#continuousmin -0.0772 | median 6.88e-05 | max 0.0908Demeaned GDP per capita growthTwo-way demeaned growth: deviation of growth from its country mean and time mean (grand mean added back). The within-variation that identifies the TWFE coefficient.rate (deviation)data_demeanedDerived (within transform)
hcap#continuousmin 1.02 | median 1.98 | max 2.98Human capital indexHuman-capital stock proxy (e.g., schooling-based index).indexsource_dataSimulation
id#identifierCountry identifierSequential country index (the entity dimension of the panel; treated as a factor for the fixed effects).integer codesource_data, data_demeanedSimulation
ln_y_initial#continuousmin 1.92 | median 5.16 | max 9.87Log initial income (convergence term)Natural log of initial GDP per capita — the beta-convergence regressor (a negative slope means poorer countries grow faster).log US$source_data, data_demeanedSimulation
ln_y_initial_dm#continuousmin -0.69 | median -0.000416 | max 0.574Demeaned log initial incomeTwo-way demeaned log initial income (deviation from country + time means, grand mean restored).log US$ (deviation)data_demeanedDerived (within transform)
log_hcap#continuousmin 0.0216 | median 0.682 | max 1.09Log human capitalNatural log of the human-capital index.log indexsource_data, data_demeanedDerived
log_hcap_dm#continuousmin -0.337 | median -4.09e-05 | max 0.177Demeaned log human capitalTwo-way demeaned log human capital (deviation from country + time means, grand mean restored).log index (deviation)data_demeanedDerived (within transform)
log_n_gd#continuousmin -2.9 | median -2.66 | max -2.43Log of population growth + g + dNatural log of population growth plus the standard 0.05 for growth and depreciation — the Solow n + g + d regressor.log ratesource_data, data_demeanedDerived
log_n_gd_dm#continuousmin -0.124 | median 0.00164 | max 0.125Demeaned log(n + g + d)Two-way demeaned log of population growth plus g + d (deviation from country + time means, grand mean restored).log rate (deviation)data_demeanedDerived (within transform)
log_s_k#continuousmin -2.39 | median -1.54 | max -1.02Log investment shareNatural log of the investment share of GDP (Solow capital-accumulation regressor).log sharesource_data, data_demeanedDerived
log_s_k_dm#continuousmin -0.421 | median 0.0029 | max 0.377Demeaned log investment shareTwo-way demeaned log investment share (deviation from country + time means, grand mean restored).log share (deviation)data_demeanedDerived (within transform)
n_pop#continuousmin 0.005 | median 0.0202 | max 0.0383Population growth ratePopulation growth rate (the n in the Solow n + g + d term).rate (per year)source_dataSimulation
s_k#continuousmin 0.0919 | median 0.215 | max 0.36Investment share of GDPPhysical-capital investment share of GDP (Solow-style accumulation rate).0-1 (share)source_dataSimulation
time#identifierTime period identifierSequential time-period index (the time dimension of the panel; treated as a factor for the fixed effects).integer codesource_data, data_demeanedSimulation

Cross-file variable index

Which file each variable appears in (● = present).

Variablesource_datadata_demeaned
gov_cons
gov_cons_dm
growth
growth_dm
hcap
id
ln_y_initial
ln_y_initial_dm
log_hcap
log_hcap_dm
log_n_gd
log_n_gd_dm
log_s_k
log_s_k_dm
n_pop
s_k
time

Construction & formulas

The model is a two-way fixed-effects growth regression over country i and period t:

Regressor construction from the raw inputs: log_s_k = log(s_k) (log investment share), log_n_gd = log(n_pop + 0.05) (log of population growth plus the standard 0.05 for growth + depreciation), and log_hcap = log(hcap) (log human capital).

The datasets

Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.

expand to search (Ctrl/⌘+F) or print across all datasets

country-year (period)  1,200 × 11 · 8 periods (indexed 1-8) · 150 countries (balanced)

Panel key: id x time · The raw analysis panel: TWFE growth regression of growth on log initial income and four controls.

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
id identifierCountry identifierSequential country index (the entity dimension of the panel; treated as a factor for the fixed effects).1..150, one per synthetic country.integer codeSimulationboth files
time identifierTime period identifierSequential time-period index (the time dimension of the panel; treated as a factor for the fixed effects).1..8, one per period.integer codeSimulationboth files
growth continuousGDP per capita growth (dependent variable)Annualized GDP-per-capita growth rate — the outcome regressed in the TWFE model.Simulated outcome of the Barro convergence data-generating process.rate (per year)Simulationboth files
ln_y_initial continuousLog initial income (convergence term)Natural log of initial GDP per capita — the beta-convergence regressor (a negative slope means poorer countries grow faster).Simulated log initial income.log US$Simulationboth files
s_k continuousInvestment share of GDPPhysical-capital investment share of GDP (Solow-style accumulation rate).Simulated; log_s_k = log(s_k).0-1 (share)Simulationsource_data only
n_pop continuousPopulation growth ratePopulation growth rate (the n in the Solow n + g + d term).Simulated; log_n_gd = log(n_pop + 0.05).rate (per year)Simulationsource_data only
hcap continuousHuman capital indexHuman-capital stock proxy (e.g., schooling-based index).Simulated; log_hcap = log(hcap).indexSimulationsource_data only
gov_cons continuousGovernment consumption shareGovernment consumption as a share of GDP — a control regressor.Simulated.0-1 (share)Simulationboth files
log_s_k continuousLog investment shareNatural log of the investment share of GDP (Solow capital-accumulation regressor).log(s_k).log shareDerivedboth files
log_n_gd continuousLog of population growth + g + dNatural log of population growth plus the standard 0.05 for growth and depreciation — the Solow n + g + d regressor.log(n_pop + 0.05).log rateDerivedboth files
log_hcap continuousLog human capitalNatural log of the human-capital index.log(hcap).log indexDerivedboth files

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
id100%1,200150
time100%1,2008
growthmin -0.238 | median -0.122 | max -0.00398100%1,2001,200-0.238-0.124-0.122-0.0040.045
ln_y_initialmin 1.92 | median 5.16 | max 9.87100%1,2001,2001.925.365.169.871.59
s_kmin 0.0919 | median 0.215 | max 0.36100%1,2001,2000.0920.2140.2150.3600.049
n_popmin 0.005 | median 0.0202 | max 0.0383100%1,2001,1980.0050.0200.0200.0380.005
hcapmin 1.02 | median 1.98 | max 2.98100%1,2001,2001.021.981.982.980.347
gov_consmin 0.0704 | median 0.145 | max 0.22100%1,2001,2000.0700.1460.1450.2200.028
log_s_kmin -2.39 | median -1.54 | max -1.02100%1,2001,200-2.39-1.57-1.54-1.020.244
log_n_gdmin -2.9 | median -2.66 | max -2.43100%1,2001,198-2.90-2.66-2.66-2.430.073
log_hcapmin 0.0216 | median 0.682 | max 1.09100%1,2001,2000.0220.6650.6821.090.185

country-year (period)  1,200 × 14 · 8 periods (indexed 1-8) · 150 countries (balanced)

Panel key: id x time · OLS on the _dm columns reproduces the TWFE coefficients, demonstrating the FWL theorem.

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
id identifierCountry identifierSequential country index (the entity dimension of the panel; treated as a factor for the fixed effects).1..150, one per synthetic country.integer codeSimulationboth files
time identifierTime period identifierSequential time-period index (the time dimension of the panel; treated as a factor for the fixed effects).1..8, one per period.integer codeSimulationboth files
growth continuousGDP per capita growth (dependent variable)Annualized GDP-per-capita growth rate — the outcome regressed in the TWFE model.Simulated outcome of the Barro convergence data-generating process.rate (per year)Simulationboth files
ln_y_initial continuousLog initial income (convergence term)Natural log of initial GDP per capita — the beta-convergence regressor (a negative slope means poorer countries grow faster).Simulated log initial income.log US$Simulationboth files
log_s_k continuousLog investment shareNatural log of the investment share of GDP (Solow capital-accumulation regressor).log(s_k).log shareDerivedboth files
log_n_gd continuousLog of population growth + g + dNatural log of population growth plus the standard 0.05 for growth and depreciation — the Solow n + g + d regressor.log(n_pop + 0.05).log rateDerivedboth files
log_hcap continuousLog human capitalNatural log of the human-capital index.log(hcap).log indexDerivedboth files
gov_cons continuousGovernment consumption shareGovernment consumption as a share of GDP — a control regressor.Simulated.0-1 (share)Simulationboth files
growth_dm continuousDemeaned GDP per capita growthTwo-way demeaned growth: deviation of growth from its country mean and time mean (grand mean added back). The within-variation that identifies the TWFE coefficient.growth - country_mean(growth) - time_mean(growth) + grand_mean(growth); mean ≈ 0.rate (deviation)Derived (within transform)data_demeaned only
ln_y_initial_dm continuousDemeaned log initial incomeTwo-way demeaned log initial income (deviation from country + time means, grand mean restored).ln_y_initial - country mean - time mean + grand mean; mean ≈ 0.log US$ (deviation)Derived (within transform)data_demeaned only
log_s_k_dm continuousDemeaned log investment shareTwo-way demeaned log investment share (deviation from country + time means, grand mean restored).log_s_k - country mean - time mean + grand mean; mean ≈ 0.log share (deviation)Derived (within transform)data_demeaned only
log_n_gd_dm continuousDemeaned log(n + g + d)Two-way demeaned log of population growth plus g + d (deviation from country + time means, grand mean restored).log_n_gd - country mean - time mean + grand mean; mean ≈ 0.log rate (deviation)Derived (within transform)data_demeaned only
log_hcap_dm continuousDemeaned log human capitalTwo-way demeaned log human capital (deviation from country + time means, grand mean restored).log_hcap - country mean - time mean + grand mean; mean ≈ 0.log index (deviation)Derived (within transform)data_demeaned only
gov_cons_dm continuousDemeaned government consumption shareTwo-way demeaned government consumption share (deviation from country + time means, grand mean restored).gov_cons - country mean - time mean + grand mean; mean ≈ 0.share (deviation)Derived (within transform)data_demeaned only

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
id100%1,200150
time100%1,2008
growthmin -0.238 | median -0.122 | max -0.00398100%1,2001,200-0.238-0.124-0.122-0.0040.045
ln_y_initialmin 1.92 | median 5.16 | max 9.87100%1,2001,2001.925.365.169.871.59
log_s_kmin -2.39 | median -1.54 | max -1.02100%1,2001,200-2.39-1.57-1.54-1.020.244
log_n_gdmin -2.9 | median -2.66 | max -2.43100%1,2001,198-2.90-2.66-2.66-2.430.073
log_hcapmin 0.0216 | median 0.682 | max 1.09100%1,2001,2000.0220.6650.6821.090.185
gov_consmin 0.0704 | median 0.145 | max 0.22100%1,2001,2000.0700.1460.1450.2200.028
growth_dmmin -0.0772 | median 6.88e-05 | max 0.0908100%1,2001,200-0.077-8.09e-176.88e-050.0910.023
ln_y_initial_dmmin -0.69 | median -0.000416 | max 0.574100%1,2001,200-0.6908.30e-15-4.16e-040.5740.164
log_s_k_dmmin -0.421 | median 0.0029 | max 0.377100%1,2001,200-0.421-1.49e-150.0030.3770.087
log_n_gd_dmmin -0.124 | median 0.00164 | max 0.125100%1,2001,200-0.1241.60e-150.0020.1250.033
log_hcap_dmmin -0.337 | median -4.09e-05 | max 0.177100%1,2001,200-0.3375.38e-17-4.09e-050.1770.044
gov_cons_dmmin -0.0507 | median -5.08e-05 | max 0.0404100%1,2001,200-0.0511.83e-16-5.08e-050.0400.014

Known limitations & caveats