← Back to the post
Interactive data dictionary

Dynamic Panel Data Models: Employment Persistence

The classic Arellano–Bond panel of 140 UK manufacturing firms (1976–1984), with the lagged/differenced analysis variables.

2
datasets
20
variables
140
firms
1976–1984
years

Downloads

Each dataset is available as a labeled Stata .dta and its source file.

⇩ Download all data (ZIP)stata_codebook.do

DatasetGrainRowsStataSource
abdatafirm-year1,031 × 10abdata.dtaabdata.csv
data_preparedfirm-year1,031 × 20data_prepared.dtadata_prepared.csv

Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.

Load directly in code

Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.

Stata

* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_dynamic_panel/data/"
use "${BASE}abdata.dta", clear
describe
notes

Python

!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_dynamic_panel/data/"
df = pd.read_stata(BASE + "abdata.dta")

# load every dataset at once
files = ["abdata", "data_prepared"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}

# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "abdata.dta", "abdata.dta")
df, meta = pyreadstat.read_dta("abdata.dta")

Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb

R

# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_dynamic_panel/data/"
df <- read_dta(paste0(BASE, "abdata.dta"))

Overview & sources

Companion data for a hands-on Python tutorial that estimates how persistent firm-level employment is — the autoregressive coefficient ρ of a dynamic labor-demand equation — using the canonical Arellano and Bond (1991) panel of 140 UK manufacturing firms observed annually over 1976–1984 (1,031 firm-years, unbalanced). The tutorial walks the full estimator ladder: pooled OLS (biased up by the omitted firm effect) and fixed effects (Nickell bias, biased down) via pyfixest, then Anderson–Hsiao IV, Arellano–Bond difference GMM and Blundell–Bond system GMM via pydynpd, with the AR(2), Hansen, and instrument-collapse diagnostics that separate the one defensible estimate (system GMM, ρ̂ = 0.927) from four seductive wrong ones. This dataset is the classic dynamic-panel teaching dataset, used by Arellano & Bond, Blundell & Bond (1998), and Roodman (2009).

Two files. abdata is the raw input panel — one row per firm × year, unbalanced over 1976–1984 — carrying employment, wages, capital, and industry output in both levels and logs. data_prepared is the analysis sample: the same panel with the one-period lags, the two-period lag of employment, and the first differences (computed firm-by-firm, respecting firm boundaries) that every estimator runs on. Requiring a single lag drops the panel from 1,031 to 891 usable rows; the GMM estimators run on 751.

Data sources

SourceProvidesReference / URL
Arellano &amp; Bond (1991)The original UK manufacturing employment panel (140 firms, 1976–1984) — the abdata datasetArellano, M., & Bond, S. (1991). Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations. Review of Economic Studies, 58(2), 277–297. https://doi.org/10.2307/2297968
pydynpd (Wu, Hua &amp; Xu 2023)Distribution of the dataset (bundled as abdata) and the published replication benchmarkWu, D., Hua, L., & Xu, J. (2023). pydynpd: A Python package for dynamic panel model. Journal of Open Source Software, 8(83), 4416. https://doi.org/10.21105/joss.04416https://github.com/dazhwu/pydynpd
Method referencesEstimators and conceptsAnderson & Hsiao (1981); Blundell & Bond (1998); Bond (2002); Roodman (2009); Windmeijer (2005); Nickell (1981).

Cite this data

Please cite this dataset as follows.

APA

Mendez, C. (2026). Dynamic Panel Data Models in Python: From Nickell Bias to System GMM [Data set]. https://carlos-mendez.org/post/python_dynamic_panel/

Arellano, M., & Bond, S. (1991). Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations. Review of Economic Studies, 58(2), 277–297.

BibTeX

@misc{mendez2026pythondynamicpanel,
  author       = {Mendez, Carlos},
  title        = {Dynamic Panel Data Models in Python: From Nickell Bias to System GMM},
  year         = {2026},
  howpublished = {\url{https://carlos-mendez.org/post/python_dynamic_panel/}},
  note         = {Data set}
}

@article{arellano1991some,
  author  = {Arellano, Manuel and Bond, Stephen},
  title   = {Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations},
  journal = {Review of Economic Studies},
  volume  = {58}, number = {2}, pages = {277--297}, year = {1991},
  doi     = {10.2307/2297968}
}

Variable explorer search & filter all 20 variables

Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.

VariableTypeDistributionLabelDefinitionUnitsIn filesSource
cap#continuousmin 0.0119 | median 0.518 | max 47.1Gross capital stock (level)Firm gross capital stock (the level behind log capital k).index/levelabdata, data_preparedArellano & Bond (1991)
d_k#continuousmin -1.08 | median -0.0445 | max 0.884First difference of log capital stockYear-on-year change in the log capital stock.log changedata_preparedDerived (this study)
d_k_lag1#continuousmin -1.08 | median -0.0301 | max 0.884Lagged first difference of log capital stockLast year's change in the log capital stock; a control in the differenced equation.log changedata_preparedDerived (this study)
d_n#continuousmin -0.997 | median -0.0251 | max 0.806First difference of log employmentYear-on-year change in log employment; the dependent variable of the differenced equation.log changedata_preparedDerived (this study)
d_n_lag1#continuousmin -0.997 | median -0.0191 | max 0.806Lagged first difference of log employmentLast year's change in log employment; the endogenous regressor in the Anderson-Hsiao 2SLS.log changedata_preparedDerived (this study)
d_w#continuousmin -0.675 | median 0.00479 | max 0.924First difference of log real wageYear-on-year change in the log real wage.log changedata_preparedDerived (this study)
d_w_lag1#continuousmin -0.675 | median 0.000759 | max 0.924Lagged first difference of log real wageLast year's change in the log real wage; a control in the differenced equation.log changedata_preparedDerived (this study)
emp#continuousmin 0.104 | median 2.29 | max 109Employment (level)Firm employment in thousands (the level behind log employment n).thousands of employeesabdata, data_preparedArellano & Bond (1991)
id#identifierFirm identifierSequential firm (panel unit) identifier; 140 UK manufacturing firms.integerabdata, data_preparedArellano & Bond (1991)
indoutpt#continuousmin 86.9 | median 101 | max 128Industry output (level)Industry-level output for the firm's sector (the level behind log industry output ys).index/levelabdata, data_preparedArellano & Bond (1991)
k#continuousmin -4.43 | median -0.658 | max 3.85Log capital stockNatural log of the firm gross capital stock; a current control.log levelabdata, data_preparedArellano & Bond (1991)
k_lag1#continuousmin -4.43 | median -0.631 | max 3.85Log capital stock, one-period lagLast year's log capital stock; a lagged control in the levels equation.log leveldata_preparedDerived (this study)
n#continuousmin -2.26 | median 0.827 | max 4.69Log employmentNatural log of firm employment; the dependent variable of the dynamic model.log thousandsabdata, data_preparedArellano & Bond (1991)
n_lag1#continuousmin -2.1 | median 0.857 | max 4.69Log employment, one-period lagLast year's log employment; the lagged dependent variable carrying persistence rho (labeled L1.n in GMM output).log thousandsdata_preparedDerived (this study)
n_lag2#continuousmin -2.08 | median 0.882 | max 4.69Log employment, two-period lagLog employment two years ago; the Anderson-Hsiao instrument for the differenced lag.log thousandsdata_preparedDerived (this study)
w#continuousmin 2.08 | median 3.18 | max 3.81Log real wageNatural log of the firm real wage; a current control.log levelabdata, data_preparedArellano & Bond (1991)
w_lag1#continuousmin 2.08 | median 3.17 | max 3.81Log real wage, one-period lagLast year's log real wage; a lagged control in the levels equation.log leveldata_preparedDerived (this study)
wage#continuousmin 8.02 | median 24 | max 45.2Real wage (level)Firm real product wage (the level behind log wage w).index/levelabdata, data_preparedArellano & Bond (1991)
year#yearCalendar yearAnnual time index of the observation.yearabdata, data_preparedArellano & Bond (1991)
ys#continuousmin 4.46 | median 4.61 | max 4.85Log industry outputNatural log of industry output for the firm's sector (auxiliary variable).log levelabdata, data_preparedArellano & Bond (1991)

Cross-file variable index

Which file each variable appears in (● = present).

Variableabdatadata_prepared
cap
d_k
d_k_lag1
d_n
d_n_lag1
d_w
d_w_lag1
emp
id
indoutpt
k
k_lag1
n
n_lag1
n_lag2
w
w_lag1
wage
year
ys

Construction & formulas

The model is a dynamic labor-demand equation for log employment n, with current and lagged log real wages w and log capital k, a firm fixed effect α_i, year effects δ_t, and an idiosyncratic shock ε_it:

All lags and differences are computed within firm id after sorting by [id, year], so no firm inherits a lag from another firm. The first observed year of every firm therefore has missing lags/differences.

The datasets

Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.

expand to search (Ctrl/⌘+F) or print across all datasets

firm-year  1,031 × 10 · 1976-1984 · 140 firms (unbalanced; 1,031 firm-years)

Panel key: id x year · Raw input panel for the dynamic labor-demand estimators.

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
id identifierFirm identifierSequential firm (panel unit) identifier; 140 UK manufacturing firms.From the Arellano-Bond panel; the panel id passed to estimators as [id, year].integerArellano & Bond (1991)both files
year yearCalendar yearAnnual time index of the observation.From the Arellano-Bond panel; range 1976-1984.yearArellano & Bond (1991)both files
emp continuousEmployment (level)Firm employment in thousands (the level behind log employment n).Raw level; n = log(emp).thousands of employeesArellano & Bond (1991)both files
wage continuousReal wage (level)Firm real product wage (the level behind log wage w).Raw level; w = log(wage).index/levelArellano & Bond (1991)both files
cap continuousGross capital stock (level)Firm gross capital stock (the level behind log capital k).Raw level; k = log(cap).index/levelArellano & Bond (1991)both files
indoutpt continuousIndustry output (level)Industry-level output for the firm's sector (the level behind log industry output ys).Raw level; ys = log(indoutpt).index/levelArellano & Bond (1991)both files
n continuousLog employmentNatural log of firm employment; the dependent variable of the dynamic model.log(emp).log thousandsArellano & Bond (1991)both files
w continuousLog real wageNatural log of the firm real wage; a current control.log(wage).log levelArellano & Bond (1991)both files
k continuousLog capital stockNatural log of the firm gross capital stock; a current control.log(cap).log levelArellano & Bond (1991)both files
ys continuousLog industry outputNatural log of industry output for the firm's sector (auxiliary variable).log(indoutpt).log levelArellano & Bond (1991)both files

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
id100%1,031140
year100%1,031919761979.7198019842.22
empmin 0.104 | median 2.29 | max 109100%1,0319550.1047.892.29108.615.93
wagemin 8.02 | median 24 | max 45.2100%1,0311,0298.0223.9224.0145.235.65
capmin 0.0119 | median 0.518 | max 47.1100%1,0311,0010.0122.510.51847.116.25
indoutptmin 86.9 | median 101 | max 128100%1,03133086.90103.8100.6128.49.94
nmin -2.26 | median 0.827 | max 4.69100%1,031955-2.261.060.8274.691.34
wmin 2.08 | median 3.18 | max 3.81100%1,0311,0292.083.143.183.810.263
kmin -4.43 | median -0.658 | max 3.85100%1,0311,001-4.43-0.442-0.6583.851.51
ysmin 4.46 | median 4.61 | max 4.85100%1,0313304.464.644.614.850.094

firm-year  1,031 × 20 · 1976-1984 · 140 firms (1,031 rows; lag/difference columns missing in each firm's first year(s))

Panel key: id x year · Identically-built lags/differences so every estimator runs on the same transformed variables.

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
id identifierFirm identifierSequential firm (panel unit) identifier; 140 UK manufacturing firms.From the Arellano-Bond panel; the panel id passed to estimators as [id, year].integerArellano & Bond (1991)both files
year yearCalendar yearAnnual time index of the observation.From the Arellano-Bond panel; range 1976-1984.yearArellano & Bond (1991)both files
emp continuousEmployment (level)Firm employment in thousands (the level behind log employment n).Raw level; n = log(emp).thousands of employeesArellano & Bond (1991)both files
wage continuousReal wage (level)Firm real product wage (the level behind log wage w).Raw level; w = log(wage).index/levelArellano & Bond (1991)both files
cap continuousGross capital stock (level)Firm gross capital stock (the level behind log capital k).Raw level; k = log(cap).index/levelArellano & Bond (1991)both files
indoutpt continuousIndustry output (level)Industry-level output for the firm's sector (the level behind log industry output ys).Raw level; ys = log(indoutpt).index/levelArellano & Bond (1991)both files
n continuousLog employmentNatural log of firm employment; the dependent variable of the dynamic model.log(emp).log thousandsArellano & Bond (1991)both files
w continuousLog real wageNatural log of the firm real wage; a current control.log(wage).log levelArellano & Bond (1991)both files
k continuousLog capital stockNatural log of the firm gross capital stock; a current control.log(cap).log levelArellano & Bond (1991)both files
ys continuousLog industry outputNatural log of industry output for the firm's sector (auxiliary variable).log(indoutpt).log levelArellano & Bond (1991)both files
n_lag1 continuousLog employment, one-period lagLast year's log employment; the lagged dependent variable carrying persistence rho (labeled L1.n in GMM output).Within firm id: n_i,t-1 = groupby('id')['n'].shift(1).log thousandsDerived (this study)data_prepared
w_lag1 continuousLog real wage, one-period lagLast year's log real wage; a lagged control in the levels equation.Within firm id: w_i,t-1 = groupby('id')['w'].shift(1).log levelDerived (this study)data_prepared
k_lag1 continuousLog capital stock, one-period lagLast year's log capital stock; a lagged control in the levels equation.Within firm id: k_i,t-1 = groupby('id')['k'].shift(1).log levelDerived (this study)data_prepared
n_lag2 continuousLog employment, two-period lagLog employment two years ago; the Anderson-Hsiao instrument for the differenced lag.Within firm id: n_i,t-2 = groupby('id')['n'].shift(2).log thousandsDerived (this study)data_prepared
d_n continuousFirst difference of log employmentYear-on-year change in log employment; the dependent variable of the differenced equation.Within firm id: n_it - n_i,t-1 = groupby('id')['n'].diff().log changeDerived (this study)data_prepared
d_w continuousFirst difference of log real wageYear-on-year change in the log real wage.Within firm id: w_it - w_i,t-1 = groupby('id')['w'].diff().log changeDerived (this study)data_prepared
d_k continuousFirst difference of log capital stockYear-on-year change in the log capital stock.Within firm id: k_it - k_i,t-1 = groupby('id')['k'].diff().log changeDerived (this study)data_prepared
d_n_lag1 continuousLagged first difference of log employmentLast year's change in log employment; the endogenous regressor in the Anderson-Hsiao 2SLS.Within firm id: d_n_i,t-1 = groupby('id')['d_n'].shift(1).log changeDerived (this study)data_prepared
d_w_lag1 continuousLagged first difference of log real wageLast year's change in the log real wage; a control in the differenced equation.Within firm id: d_w_i,t-1 = groupby('id')['d_w'].shift(1).log changeDerived (this study)data_prepared
d_k_lag1 continuousLagged first difference of log capital stockLast year's change in the log capital stock; a control in the differenced equation.Within firm id: d_k_i,t-1 = groupby('id')['d_k'].shift(1).log changeDerived (this study)data_prepared

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
id100%1,031140
year100%1,031919761979.7198019842.22
empmin 0.104 | median 2.29 | max 109100%1,0319550.1047.892.29108.615.93
wagemin 8.02 | median 24 | max 45.2100%1,0311,0298.0223.9224.0145.235.65
capmin 0.0119 | median 0.518 | max 47.1100%1,0311,0010.0122.510.51847.116.25
indoutptmin 86.9 | median 101 | max 128100%1,03133086.90103.8100.6128.49.94
nmin -2.26 | median 0.827 | max 4.69100%1,031955-2.261.060.8274.691.34
wmin 2.08 | median 3.18 | max 3.81100%1,0311,0292.083.143.183.810.263
kmin -4.43 | median -0.658 | max 3.85100%1,0311,001-4.43-0.442-0.6583.851.51
ysmin 4.46 | median 4.61 | max 4.85100%1,0313304.464.644.614.850.094
n_lag1min -2.1 | median 0.857 | max 4.6986%891832-2.101.080.8574.691.34
w_lag1min 2.08 | median 3.17 | max 3.8186%8918892.083.133.173.810.264
k_lag1min -4.43 | median -0.631 | max 3.8586%891866-4.43-0.413-0.6313.851.50
n_lag2min -2.08 | median 0.882 | max 4.6973%751702-2.081.110.8824.691.33
d_nmin -0.997 | median -0.0251 | max 0.80686%891886-0.997-0.044-0.0250.8060.138
d_wmin -0.675 | median 0.00479 | max 0.92486%891891-0.6750.0060.0050.9240.088
d_kmin -1.08 | median -0.0445 | max 0.88486%891891-1.08-0.036-0.0450.8840.162
d_n_lag1min -0.997 | median -0.0191 | max 0.80673%751746-0.997-0.038-0.0190.8060.140
d_w_lag1min -0.675 | median 0.000759 | max 0.92473%751751-0.6750.0027.59e-040.9240.090
d_k_lag1min -1.08 | median -0.0301 | max 0.88473%751751-1.08-0.025-0.0300.8840.163

Known limitations & caveats