← Back to the post
Interactive data dictionary

Visualizing Regression with the FWL Theorem in R

Three example datasets for the fwlplot tutorial: one simulated retail panel and two real-world panels.

3
datasets
57
variables
9560
rows
200 / 5,000 / 4,360
rows per file

Downloads

Each dataset is available as a labeled Stata .dta and its source file.

⇩ Download all data (ZIP)stata_codebook.do

DatasetGrainRowsStataSource
store_datastore (cross-section)200 × 4store_data.dtastore_data.csv
flights_sampleflight5,000 × 9flights_sample.dtaflights_sample.csv
wagepanindividual-year4,360 × 44wagepan.dtawagepan.csv

Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.

Load directly in code

Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.

Stata

* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_fwlplot/data/"
use "${BASE}store_data.dta", clear
describe
notes

Python

!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_fwlplot/data/"
df = pd.read_stata(BASE + "store_data.dta")

# load every dataset at once
files = ["store_data", "flights_sample", "wagepan"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}

# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "store_data.dta", "store_data.dta")
df, meta = pyreadstat.read_dta("store_data.dta")

Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb

R

# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_fwlplot/data/"
df <- read_dta(paste0(BASE, "store_data.dta"))

Overview & sources

Companion data for a hands-on R tutorial on the fwlplot package (Butts & McDermott, 2024), which renders the Frisch–Waugh–Lovell (FWL) theorem as a picture: any multiple-regression coefficient equals the slope of a simple bivariate regression after partialling the other controls out of both axes. The post builds intuition across three datasets — an n=200 simulated retail panel where income confounds the coupon–sales relationship, the nycflights13 flights data (a 5,000-row cleaned sample), and Wooldridge's wagepan panel (545 individuals over 1980–1987). The simulated case shows confounding reverse the naive coupon slope from −0.093 to the controlled +0.212 (true effect +0.2); fixed effects on the flights and wage panels show what "controlling for" looks like geometrically.

Three files. store_data is a simulated cross-section (one row per store, n=200) with sales, coupons, income and day-of-week. flights_sample is a 5,000-row sample of cleaned 2013 NYC departures (one row per flight) from the nycflights13 package. wagepan is a balanced wage panel (one row per individual × year; 545 individuals × 8 years = 4,360 rows, 1980–1987) from the Wooldridge package.

Data sources

SourceProvidesReference / URL
Simulated (this study)store_data — a synthetic retail cross-section with a known confounder (income) and a known true coupon effect (+0.2)Mendez, C. (2026). See the post's R script analysis.R for the full data-generating process (set.seed(42)).
nycflights13flights_sample — a 5,000-row cleaned sample of on-time departures from New York&#x27;s three airports in 2013Wickham, H. (2021). nycflights13: Flights that Departed NYC in 2013. CRAN. https://cran.r-project.org/package=nycflights13 (source: US Bureau of Transportation Statistics).
Wooldridge wagepanwagepan — panel of 545 men over 8 years (1980–1987) used in Wooldridge&#x27;s panel-data examplesWooldridge, J. M. Introductory Econometrics. wagepan dataset via the wooldridge R package. https://cran.r-project.org/package=wooldridge (originally from Vella & Verbeek, 1998, J. Applied Econometrics).
Method referencesFWL theorem and the fwlplot / fixest implementationFrisch & Waugh (1933); Lovell (1963); Butts & McDermott (2024, fwlplot); Berge (2018, fixest).

Cite this data

Please cite this dataset as follows.

APA

Mendez, C. (2026). Visualizing Regression with the FWL Theorem in R [Data set]. https://carlos-mendez.org/post/r_fwlplot/

Butts, K., & McDermott, G. (2024). fwlplot: Scatter Plot After Residualizing. CRAN. https://cran.r-project.org/package=fwlplot — Frisch, R., & Waugh, F. V. (1933). Partial Time Regressions as Compared with Individual Trends. Econometrica, 1(4), 387–401. — Lovell, M. C. (1963). Seasonal Adjustment of Economic Time Series and Multiple Regression Analysis. JASA, 58(304), 993–1010. — Wickham, H. (2021). nycflights13: Flights that Departed NYC in 2013. CRAN.

BibTeX

@misc{mendez2026rfwlplot,
  author       = {Mendez, Carlos},
  title        = {Visualizing Regression with the FWL Theorem in R},
  year         = {2026},
  howpublished = {\url{https://carlos-mendez.org/post/r_fwlplot/}},
  note         = {Data set}
}

@misc{butts2024fwlplot,
  author = {Butts, Kyle and McDermott, Grant},
  title  = {fwlplot: Scatter Plot After Residualizing},
  year   = {2024}, howpublished = {CRAN}, note = {R package},
  url    = {https://cran.r-project.org/package=fwlplot}
}
@article{frisch1933partial,
  author  = {Frisch, Ragnar and Waugh, Frederick V.},
  title   = {Partial Time Regressions as Compared with Individual Trends},
  journal = {Econometrica}, volume = {1}, number = {4}, pages = {387--401}, year = {1933}
}
@article{lovell1963seasonal,
  author  = {Lovell, Michael C.},
  title   = {Seasonal Adjustment of Economic Time Series and Multiple Regression Analysis},
  journal = {Journal of the American Statistical Association},
  volume  = {58}, number = {304}, pages = {993--1010}, year = {1963}
}

Variable explorer search & filter all 57 variables

Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.

VariableTypeDistributionLabelDefinitionUnitsIn filesSource
agric#dummyshare coded 1 = 0.032Industry: agriculture (1=yes)1 if employed in agriculture, else 0.0/1wagepanWooldridge wagepan
air_time#continuousmin 22 | median 130 | max 650Air time (min)Time in the air, in minutes (the regressor of interest in the flights example).minutesflights_samplenycflights13 (US BTS)
arr_delay#continuousmin -66 | median -6 | max 166Arrival delay (min)Arrival delay in minutes.minutesflights_samplenycflights13 (US BTS)
black#dummyshare coded 1 = 0.116Race: Black (1=yes)1 if the individual is Black, else 0 (time-invariant).0/1wagepanWooldridge wagepan
bus#dummyshare coded 1 = 0.076Industry: business/repair services (1=yes)1 if employed in business and repair services, else 0.0/1wagepanWooldridge wagepan
carrier#identifierCarrier codeTwo-letter airline carrier code.codeflights_samplenycflights13 (US BTS)
construc#dummyshare coded 1 = 0.075Industry: construction (1=yes)1 if employed in construction, else 0.0/1wagepanWooldridge wagepan
coupons#continuousmin 18.7 | median 34.8 | max 53.2Coupons distributed (treatment)Number/intensity of coupons distributed (the regressor of interest).count/indexstore_dataSimulation (this study)
d81#dummyshare coded 1 = 0.125Year dummy: 1981 (1=yes)1 if the observation year is 1981, else 0.0/1wagepanWooldridge wagepan
d82#dummyshare coded 1 = 0.125Year dummy: 1982 (1=yes)1 if the observation year is 1982, else 0.0/1wagepanWooldridge wagepan
d83#dummyshare coded 1 = 0.125Year dummy: 1983 (1=yes)1 if the observation year is 1983, else 0.0/1wagepanWooldridge wagepan
d84#dummyshare coded 1 = 0.125Year dummy: 1984 (1=yes)1 if the observation year is 1984, else 0.0/1wagepanWooldridge wagepan
d85#dummyshare coded 1 = 0.125Year dummy: 1985 (1=yes)1 if the observation year is 1985, else 0.0/1wagepanWooldridge wagepan
d86#dummyshare coded 1 = 0.125Year dummy: 1986 (1=yes)1 if the observation year is 1986, else 0.0/1wagepanWooldridge wagepan
d87#dummyshare coded 1 = 0.125Year dummy: 1987 (1=yes)1 if the observation year is 1987, else 0.0/1wagepanWooldridge wagepan
day#identifierDay of month (1-31)Calendar day of month of the scheduled departure.1-31flights_samplenycflights13 (US BTS)
dayofweek#identifierDay of week (1-7)Day-of-week indicator used as an additional control in §5.4.1-7store_dataSimulation (this study)
dep_delay#continuousmin -20 | median -2 | max 119Departure delay (min)Departure delay in minutes (the outcome in the flights regressions).minutesflights_samplenycflights13 (US BTS)
dest#identifierDestination airport (FE)Destination airport code; used as a fixed effect alongside origin.codeflights_samplenycflights13 (US BTS)
educ#continuousmin 3 | median 12 | max 16Years of educationYears of schooling (time-invariant; drops out under individual FE).yearswagepanWooldridge wagepan
ent#dummyshare coded 1 = 0.015Industry: entertainment (1=yes)1 if employed in entertainment, else 0.0/1wagepanWooldridge wagepan
exper#continuousmin 0 | median 6 | max 18Labor-market experience (years)Years of (potential) labor-market experience — the regressor of interest in §7.yearswagepanWooldridge wagepan
expersq#continuousmin 0 | median 36 | max 324Experience squaredSquare of labor-market experience (captures the concave wage-experience profile).years^2wagepanWooldridge wagepan (derived)
fin#dummyshare coded 1 = 0.037Industry: finance (1=yes)1 if employed in finance, insurance, or real estate, else 0.0/1wagepanWooldridge wagepan
hisp#dummyshare coded 1 = 0.156Ethnicity: Hispanic (1=yes)1 if the individual is Hispanic, else 0 (time-invariant).0/1wagepanWooldridge wagepan
hour#identifierScheduled departure hour (0-23)Scheduled departure hour (local).0-23flights_samplenycflights13 (US BTS)
hours#continuousmin 120 | median 2.08e+03 | max 4.99e+03Annual hours workedAnnual hours worked.hours/yearwagepanWooldridge wagepan
income#continuousmin 20.1 | median 49.8 | max 77Neighborhood income (confounder)Neighborhood income level — the confounder that drives both coupons and sales.index unitsstore_dataSimulation (this study)
lwage#continuousmin -3.58 | median 1.67 | max 4.05Log hourly wageNatural log of the hourly wage (the outcome in the wage regressions).log US$wagepanWooldridge wagepan
manuf#dummyshare coded 1 = 0.282Industry: manufacturing (1=yes)1 if employed in manufacturing, else 0.0/1wagepanWooldridge wagepan
married#dummyshare coded 1 = 0.439Married (1=yes)1 if married, else 0.0/1wagepanWooldridge wagepan
min#dummyshare coded 1 = 0.016Industry: mining (1=yes)1 if employed in mining, else 0.0/1wagepanWooldridge wagepan
month#identifierMonth of flight (1-12)Calendar month of the scheduled departure.1-12flights_samplenycflights13 (US BTS)
nr#identifierPerson identifierUnique individual identifier (the panel unit; used as the individual fixed effect).idwagepanWooldridge wagepan
nrthcen#dummyshare coded 1 = 0.258Region: North Central (1=yes)1 if resident of the North Central census region, else 0.0/1wagepanWooldridge wagepan
nrtheast#dummyshare coded 1 = 0.190Region: Northeast (1=yes)1 if resident of the Northeast census region, else 0.0/1wagepanWooldridge wagepan
occ1#dummyshare coded 1 = 0.104Occupation group 1 (1=yes)1 if in occupation group 1, else 0 (occupational dummies occ1-occ9).0/1wagepanWooldridge wagepan
occ2#dummyshare coded 1 = 0.092Occupation group 2 (1=yes)1 if in occupation group 2, else 0.0/1wagepanWooldridge wagepan
occ3#dummyshare coded 1 = 0.053Occupation group 3 (1=yes)1 if in occupation group 3, else 0.0/1wagepanWooldridge wagepan
occ4#dummyshare coded 1 = 0.111Occupation group 4 (1=yes)1 if in occupation group 4, else 0.0/1wagepanWooldridge wagepan
occ5#dummyshare coded 1 = 0.214Occupation group 5 (1=yes)1 if in occupation group 5, else 0.0/1wagepanWooldridge wagepan
occ6#dummyshare coded 1 = 0.202Occupation group 6 (1=yes)1 if in occupation group 6, else 0.0/1wagepanWooldridge wagepan
occ7#dummyshare coded 1 = 0.092Occupation group 7 (1=yes)1 if in occupation group 7, else 0.0/1wagepanWooldridge wagepan
occ8#dummyshare coded 1 = 0.015Occupation group 8 (1=yes)1 if in occupation group 8, else 0.0/1wagepanWooldridge wagepan
occ9#dummyshare coded 1 = 0.117Occupation group 9 (1=yes)1 if in occupation group 9, else 0.0/1wagepanWooldridge wagepan
origin#identifierOrigin airport (FE)Origin airport code — one of New York's three airports; used as a fixed effect.codeflights_samplenycflights13 (US BTS)
per#dummyshare coded 1 = 0.017Industry: personal services (1=yes)1 if employed in personal services, else 0.0/1wagepanWooldridge wagepan
poorhlth#dummyshare coded 1 = 0.017Poor health (1=yes)1 if the individual reports being in poor health, else 0.0/1wagepanWooldridge wagepan
pro#dummyshare coded 1 = 0.076Industry: professional services (1=yes)1 if employed in professional and related services, else 0.0/1wagepanWooldridge wagepan
pub#dummyshare coded 1 = 0.040Industry: public administration (1=yes)1 if employed in public administration, else 0.0/1wagepanWooldridge wagepan
rur#dummyshare coded 1 = 0.204Rural residence (1=yes)1 if resident in a rural area, else 0.0/1wagepanWooldridge wagepan
sales#continuousmin 24.9 | median 33.6 | max 45.2Store sales (simulated)Simulated sales for the store (the outcome variable).index unitsstore_dataSimulation (this study)
south#dummyshare coded 1 = 0.351Region: South (1=yes)1 if resident of the South census region, else 0.0/1wagepanWooldridge wagepan
tra#dummyshare coded 1 = 0.066Industry: transportation (1=yes)1 if employed in transportation, communications, or utilities, else 0.0/1wagepanWooldridge wagepan
trad#dummyshare coded 1 = 0.268Industry: trade (1=yes)1 if employed in wholesale or retail trade, else 0.0/1wagepanWooldridge wagepan
union#dummyshare coded 1 = 0.244Union contract (1=yes)1 if wage is set by a collective-bargaining agreement, else 0.0/1wagepanWooldridge wagepan
year#yearCalendar year (1980-1987)Year of the observation.yearwagepanWooldridge wagepan

Cross-file variable index

Which file each variable appears in (● = present).

Variablestore_dataflights_samplewagepan
agric
air_time
arr_delay
black
bus
carrier
construc
coupons
d81
d82
d83
d84
d85
d86
d87
day
dayofweek
dep_delay
dest
educ
ent
exper
expersq
fin
hisp
hour
hours
income
lwage
manuf
married
min
month
nr
nrthcen
nrtheast
occ1
occ2
occ3
occ4
occ5
occ6
occ7
occ8
occ9
origin
per
poorhlth
pro
pub
rur
sales
south
tra
trad
union
year

Construction & formulas

The Frisch–Waugh–Lovell (FWL) theorem: in the regression Y = X₁β₁ + X₂β₂ + ε, the coefficient β₁ on the variable of interest equals the slope from a simple bivariate regression after partialling X₂ out of both axes:

Here M₂ = I − X₂(X₂'X₂)⁻¹X₂' is the residual-maker matrix. Fixed effects are FWL applied to group dummies: including | origin + dest (flights) or | nr (wages) demeans each variable within group before fitting. fwl_plot() automates all of this and plots the residualized scatter (an added-variable plot) with the regression line overlaid.

Omitted variable bias: bias = γ × δ, where γ is the effect of the omitted control on the outcome and δ is the slope of the omitted control on the regressor. In the store data, 0.300 × (−0.494) = −0.148.

The datasets

Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.

expand to search (Ctrl/⌘+F) or print across all datasets

store (cross-section)  200 × 4 · n/a (simulated) · 200 simulated stores

Panel key: row index (no id column) · Illustrate confounding, FWL residualization, OVB, and Simpson's paradox where the true coupon effect (+0.2) is known.

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
sales continuousStore sales (simulated)Simulated sales for the store (the outcome variable).sales = 10 + 0.2·coupons + 0.3·income + 0.5·dayofweek + N(0,3); rounded to 2 decimals.index unitsSimulation (this study)200 stores
coupons continuousCoupons distributed (treatment)Number/intensity of coupons distributed (the regressor of interest).coupons = 60 − 0.5·income + N(0,5); rounded to 2 decimals. Negatively driven by income (the confounder).count/indexSimulation (this study)200 stores
income continuousNeighborhood income (confounder)Neighborhood income level — the confounder that drives both coupons and sales.income ~ N(50, 10); rounded to 2 decimals.index unitsSimulation (this study)200 stores
dayofweek identifierDay of week (1-7)Day-of-week indicator used as an additional control in §5.4.Uniform draw sample(1:7); 1=first day ... 7=last day.1-7Simulation (this study)200 stores

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
salesmin 24.9 | median 33.6 | max 45.2100%20019124.8933.6733.5545.233.81
couponsmin 18.7 | median 34.8 | max 53.2100%20019018.7234.8634.8253.256.79
incomemin 20.1 | median 49.8 | max 77100%20019220.0749.7349.8477.029.75
dayofweek100%2007

flight  5,000 × 9 · 2013 · 5,000 flights sampled from ~317,578 cleaned departures (EWR/JFK/LGA)

Panel key: row (one per flight; no stable id) · Demonstrate fixed-effects residualization (origin + destination FE) on real data with fwl_plot().

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
dep_delay continuousDeparture delay (min)Departure delay in minutes (the outcome in the flights regressions).From nycflights13; cleaned sample keeps dep_delay in (−30, 120).minutesnycflights13 (US BTS)5,000 flights
arr_delay continuousArrival delay (min)Arrival delay in minutes.From nycflights13 (carried in the saved sample; not used in the post's regressions).minutesnycflights13 (US BTS)5,000 flights
air_time continuousAir time (min)Time in the air, in minutes (the regressor of interest in the flights example).From nycflights13; cleaned to non-missing values.minutesnycflights13 (US BTS)5,000 flights
origin identifierOrigin airport (FE)Origin airport code — one of New York's three airports; used as a fixed effect.From nycflights13: EWR, JFK, or LGA.codenycflights13 (US BTS)5,000 flights
dest identifierDestination airport (FE)Destination airport code; used as a fixed effect alongside origin.From nycflights13 (IATA destination code).codenycflights13 (US BTS)5,000 flights
carrier identifierCarrier codeTwo-letter airline carrier code.From nycflights13 (carried in the sample; not used in the post's regressions).codenycflights13 (US BTS)5,000 flights
month identifierMonth of flight (1-12)Calendar month of the scheduled departure.From nycflights13.1-12nycflights13 (US BTS)5,000 flights
day identifierDay of month (1-31)Calendar day of month of the scheduled departure.From nycflights13.1-31nycflights13 (US BTS)5,000 flights
hour identifierScheduled departure hour (0-23)Scheduled departure hour (local).From nycflights13.0-23nycflights13 (US BTS)5,000 flights

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
dep_delaymin -20 | median -2 | max 119100%5,000137-20.007.32-2.00119.022.84
arr_delaymin -66 | median -6 | max 166100%5,000191-66.001.40-6.00166.029.43
air_timemin 22 | median 130 | max 650100%5,00036122.00150.4130.0650.093.48
origin100%5,0003
dest100%5,00096
carrier100%5,00015
month100%5,00012
day100%5,00031
hour100%5,00019

individual-year  4,360 × 44 · 1980-1987 · 545 individuals × 8 years = 4,360 observations

Panel key: nr x year · Demonstrate individual and two-way fixed effects (returns to experience) with fwl_plot().

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
nr identifierPerson identifierUnique individual identifier (the panel unit; used as the individual fixed effect).From the Wooldridge wagepan dataset.idWooldridge wagepan545 individuals
year yearCalendar year (1980-1987)Year of the observation.From wagepan; used as the year fixed effect in two-way FE models.yearWooldridge wagepan1980-1987
agric dummyIndustry: agriculture (1=yes)1 if employed in agriculture, else 0.From wagepan industry indicators.0/1Wooldridge wagepanpanel
black dummyRace: Black (1=yes)1 if the individual is Black, else 0 (time-invariant).From wagepan.0/1Wooldridge wagepanpanel
bus dummyIndustry: business/repair services (1=yes)1 if employed in business and repair services, else 0.From wagepan industry indicators.0/1Wooldridge wagepanpanel
construc dummyIndustry: construction (1=yes)1 if employed in construction, else 0.From wagepan industry indicators.0/1Wooldridge wagepanpanel
ent dummyIndustry: entertainment (1=yes)1 if employed in entertainment, else 0.From wagepan industry indicators.0/1Wooldridge wagepanpanel
exper continuousLabor-market experience (years)Years of (potential) labor-market experience — the regressor of interest in §7.From wagepan; increments by one year per individual per year.yearsWooldridge wagepan0-18
fin dummyIndustry: finance (1=yes)1 if employed in finance, insurance, or real estate, else 0.From wagepan industry indicators.0/1Wooldridge wagepanpanel
hisp dummyEthnicity: Hispanic (1=yes)1 if the individual is Hispanic, else 0 (time-invariant).From wagepan.0/1Wooldridge wagepanpanel
poorhlth dummyPoor health (1=yes)1 if the individual reports being in poor health, else 0.From wagepan.0/1Wooldridge wagepanpanel
hours continuousAnnual hours workedAnnual hours worked.From wagepan.hours/yearWooldridge wagepanpanel
manuf dummyIndustry: manufacturing (1=yes)1 if employed in manufacturing, else 0.From wagepan industry indicators.0/1Wooldridge wagepanpanel
married dummyMarried (1=yes)1 if married, else 0.From wagepan.0/1Wooldridge wagepanpanel
min dummyIndustry: mining (1=yes)1 if employed in mining, else 0.From wagepan industry indicators.0/1Wooldridge wagepanpanel
nrthcen dummyRegion: North Central (1=yes)1 if resident of the North Central census region, else 0.From wagepan region indicators.0/1Wooldridge wagepanpanel
nrtheast dummyRegion: Northeast (1=yes)1 if resident of the Northeast census region, else 0.From wagepan region indicators.0/1Wooldridge wagepanpanel
occ1 dummyOccupation group 1 (1=yes)1 if in occupation group 1, else 0 (occupational dummies occ1-occ9).From wagepan occupation indicators.0/1Wooldridge wagepanpanel
occ2 dummyOccupation group 2 (1=yes)1 if in occupation group 2, else 0.From wagepan occupation indicators.0/1Wooldridge wagepanpanel
occ3 dummyOccupation group 3 (1=yes)1 if in occupation group 3, else 0.From wagepan occupation indicators.0/1Wooldridge wagepanpanel
occ4 dummyOccupation group 4 (1=yes)1 if in occupation group 4, else 0.From wagepan occupation indicators.0/1Wooldridge wagepanpanel
occ5 dummyOccupation group 5 (1=yes)1 if in occupation group 5, else 0.From wagepan occupation indicators.0/1Wooldridge wagepanpanel
occ6 dummyOccupation group 6 (1=yes)1 if in occupation group 6, else 0.From wagepan occupation indicators.0/1Wooldridge wagepanpanel
occ7 dummyOccupation group 7 (1=yes)1 if in occupation group 7, else 0.From wagepan occupation indicators.0/1Wooldridge wagepanpanel
occ8 dummyOccupation group 8 (1=yes)1 if in occupation group 8, else 0.From wagepan occupation indicators.0/1Wooldridge wagepanpanel
occ9 dummyOccupation group 9 (1=yes)1 if in occupation group 9, else 0.From wagepan occupation indicators.0/1Wooldridge wagepanpanel
per dummyIndustry: personal services (1=yes)1 if employed in personal services, else 0.From wagepan industry indicators.0/1Wooldridge wagepanpanel
pro dummyIndustry: professional services (1=yes)1 if employed in professional and related services, else 0.From wagepan industry indicators.0/1Wooldridge wagepanpanel
pub dummyIndustry: public administration (1=yes)1 if employed in public administration, else 0.From wagepan industry indicators.0/1Wooldridge wagepanpanel
rur dummyRural residence (1=yes)1 if resident in a rural area, else 0.From wagepan.0/1Wooldridge wagepanpanel
south dummyRegion: South (1=yes)1 if resident of the South census region, else 0.From wagepan region indicators.0/1Wooldridge wagepanpanel
educ continuousYears of educationYears of schooling (time-invariant; drops out under individual FE).From wagepan.yearsWooldridge wagepan3-16
tra dummyIndustry: transportation (1=yes)1 if employed in transportation, communications, or utilities, else 0.From wagepan industry indicators.0/1Wooldridge wagepanpanel
trad dummyIndustry: trade (1=yes)1 if employed in wholesale or retail trade, else 0.From wagepan industry indicators.0/1Wooldridge wagepanpanel
union dummyUnion contract (1=yes)1 if wage is set by a collective-bargaining agreement, else 0.From wagepan.0/1Wooldridge wagepanpanel
lwage continuousLog hourly wageNatural log of the hourly wage (the outcome in the wage regressions).From wagepan (log of hourly wage).log US$Wooldridge wagepanpanel
d81 dummyYear dummy: 1981 (1=yes)1 if the observation year is 1981, else 0.From wagepan year dummies d81-d87.0/1Wooldridge wagepanpanel
d82 dummyYear dummy: 1982 (1=yes)1 if the observation year is 1982, else 0.From wagepan year dummies d81-d87.0/1Wooldridge wagepanpanel
d83 dummyYear dummy: 1983 (1=yes)1 if the observation year is 1983, else 0.From wagepan year dummies d81-d87.0/1Wooldridge wagepanpanel
d84 dummyYear dummy: 1984 (1=yes)1 if the observation year is 1984, else 0.From wagepan year dummies d81-d87.0/1Wooldridge wagepanpanel
d85 dummyYear dummy: 1985 (1=yes)1 if the observation year is 1985, else 0.From wagepan year dummies d81-d87.0/1Wooldridge wagepanpanel
d86 dummyYear dummy: 1986 (1=yes)1 if the observation year is 1986, else 0.From wagepan year dummies d81-d87.0/1Wooldridge wagepanpanel
d87 dummyYear dummy: 1987 (1=yes)1 if the observation year is 1987, else 0.From wagepan year dummies d81-d87.0/1Wooldridge wagepanpanel
expersq continuousExperience squaredSquare of labor-market experience (captures the concave wage-experience profile).exper^2.years^2Wooldridge wagepan (derived)panel

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
nr100%4,360545
year100%4,360819801983.5198319872.29
agricshare coded 1 = 0.032100%4,360200.03201.000.176
blackshare coded 1 = 0.116100%4,360200.11601.000.320
busshare coded 1 = 0.076100%4,360200.07601.000.265
construcshare coded 1 = 0.075100%4,360200.07501.000.263
entshare coded 1 = 0.015100%4,360200.01501.000.122
expermin 0 | median 6 | max 18100%4,3601906.516.0018.002.83
finshare coded 1 = 0.037100%4,360200.03701.000.189
hispshare coded 1 = 0.156100%4,360200.15601.000.363
poorhlthshare coded 1 = 0.017100%4,360200.01701.000.129
hoursmin 120 | median 2.08e+03 | max 4.99e+03100%4,3601,276120.02,191.32,080.04,992.0566.4
manufshare coded 1 = 0.282100%4,360200.28201.000.450
marriedshare coded 1 = 0.439100%4,360200.43901.000.496
minshare coded 1 = 0.016100%4,360200.01601.000.124
nrthcenshare coded 1 = 0.258100%4,360200.25801.000.437
nrtheastshare coded 1 = 0.190100%4,360200.19001.000.392
occ1share coded 1 = 0.104100%4,360200.10401.000.305
occ2share coded 1 = 0.092100%4,360200.09201.000.288
occ3share coded 1 = 0.053100%4,360200.05301.000.225
occ4share coded 1 = 0.111100%4,360200.11101.000.315
occ5share coded 1 = 0.214100%4,360200.21401.000.410
occ6share coded 1 = 0.202100%4,360200.20201.000.402
occ7share coded 1 = 0.092100%4,360200.09201.000.289
occ8share coded 1 = 0.015100%4,360200.01501.000.120
occ9share coded 1 = 0.117100%4,360200.11701.000.321
pershare coded 1 = 0.017100%4,360200.01701.000.128
proshare coded 1 = 0.076100%4,360200.07601.000.266
pubshare coded 1 = 0.040100%4,360200.04001.000.196
rurshare coded 1 = 0.204100%4,360200.20401.000.403
southshare coded 1 = 0.351100%4,360200.35101.000.477
educmin 3 | median 12 | max 16100%4,360133.0011.7712.0016.001.75
trashare coded 1 = 0.066100%4,360200.06601.000.248
tradshare coded 1 = 0.268100%4,360200.26801.000.443
unionshare coded 1 = 0.244100%4,360200.24401.000.430
lwagemin -3.58 | median 1.67 | max 4.05100%4,3603,631-3.581.651.674.050.533
d81share coded 1 = 0.125100%4,360200.12501.000.331
d82share coded 1 = 0.125100%4,360200.12501.000.331
d83share coded 1 = 0.125100%4,360200.12501.000.331
d84share coded 1 = 0.125100%4,360200.12501.000.331
d85share coded 1 = 0.125100%4,360200.12501.000.331
d86share coded 1 = 0.125100%4,360200.12501.000.331
d87share coded 1 = 0.125100%4,360200.12501.000.331
expersqmin 0 | median 36 | max 324100%4,36019050.4236.00324.040.78

Known limitations & caveats