What does scatterfit y x, controls(z) actually do?
The Frisch–Waugh–Lovell theorem is the engine inside
Stata's scatterfit and reghdfe commands.
It says the coefficient on a variable in a multiple regression equals the
slope of a simple bivariate regression — after first removing every
other control from both the outcome and that variable. This is the picture:
the raw scatter (orange, left) gives a misleading slope; once income is
partialled out via controls(income), the true positive coupon
effect emerges (teal, right).
The fog lifts — watch confounding disappear
The same 150 simulated stores, plotted two ways. The left panel is the
raw scatterfit sales coupons scatter — slope is wrong-signed
because high-income neighborhoods get fewer coupons but spend more. The
right panel morphs from raw to the FWL residualized version
(scatterfit sales coupons, controls(income)): as income is
partialled out, the cloud reshapes and the slope flips to its true
positive value.
Confounding Lab
Slide the true causal effect, the income effect, and the confounding link. Watch the naive and FWL slopes diverge — or coincide, when there is no confounder to remove.
Forest Plot
The post's three Stata stories — store, NYC flights with fcontrols(), and wages with absorb() — side by side. Hover any point for SE, CI, and FE count.
Panel FE
Within-person residualization in action. Toggle the view to demean a panel by absorb(nr) and see the slope jump from 0.03 (pooled) to 0.12 (within-person).
Glossary — open a card if a term is unfamiliar
FWL theorem
reghdfe.scatterfit
controls() for continuous confounders, fcontrols() for fixed effects, binned for large data.reghdfe
Confounder
Residualization
regress y z; predict y_resid, residuals. Wipe the fog off the window before looking through it.Omitted-variable bias (OVB)
Binned scatter
scatterfit y x, binned replaces an unreadable cloud with a few readable dots — essential for large datasets where R/Python FWL plots fail.Within (FE) demeaning
reghdfe ... , absorb(nr).Confounding Lab — when does the naive slope mislead?
The post simulates a store DGP with true β = +0.2 for coupons
and an income confounder (γ = +0.3 on sales, δ = −0.5 on coupons). Slide
those three knobs and watch the naive slope (orange, what
scatterfit sales coupons shows) drift away from the truth while
the FWL slope (teal, what scatterfit sales coupons, controls(income)
shows) stays anchored. When the confounding link δ = 0, the naive and FWL
slopes coincide — that is the regime where "controlling for" does nothing.
regress sales couponsregress sales coupons incomeWhat to look for
- Move δ to zero. The naive and FWL slopes converge — when the confounder does not predict the treatment, there is no bias to remove.
controls(income)inscatterfitwould change nothing. - Crank γ up. A stronger income effect on sales amplifies the OVB (γ × δ). The naive slope drifts further from truth; the FWL slope stays put.
- Flip the sign of β. Set β = -0.1, leave γ = 0.3 and δ = -0.5. The naive slope can still appear positive while the truth is negative — Simpson's paradox in action.
- OVB column. The predicted bias γ × δ matches the gap (naive − true) up to sampling noise. The mismatch shrinks as n grows. The post's measured OVB at the default settings is −0.148.
Bias vs variance over many simulations
Single draws are noisy. Run the pipeline 100 times with fresh random draws (same parameters) to see whether the bias is systematic.
The post's three FWL stories — at a glance
The numbers below come from stata_fwl/index.md §4, §7, §8 —
measured with regress and reghdfe in Stata. Each
outcome is one empirical setting; each row is a progressive "control" step
handled by controls() or fcontrols() /
absorb(). Watch the coefficient march from the naive value to
the fully-controlled estimate — that march is FWL in action.
What to look for
- Store DGP. Naive β̂ = −0.093 (wrong sign!),
controls(income)gives +0.212 (close to truth +0.2),controls(income dayofweek)gives +0.222. The sign flip is the headline confounding result. - Flights (Stata sample, 5,000 obs). No-FE estimate is −0.005 with R²≈0. With
fcontrols(origin_fe)the effect strengthens to −0.008; withfcontrols(origin_fe dest_fe)it jumps to −0.032 but 6 singleton routes are dropped and the SE widens (CI now includes zero). - Wages (4,360 person-years). Pooled exper effect is +0.105; with
absorb(nr)the within-person return jumps to +0.122. Ability-confounded pooled view underestimates the true return. - Hover any point for the SE, the 95% CI, and the number of controls/FE that estimator used.
Outcomes
Methods (control steps)
Why does the Flights CI widen with fcontrols(origin_fe dest_fe)?
Adding ~100 destination dummies absorbs every cross-route
comparison. The remaining variation is purely within-route: for flights
on the same route, does longer-than-usual air time predict
longer-than-usual delay? With so little variation left, the answer is
noisy — the point estimate moves from −0.008 to −0.032 but the SE jumps
from 0.003 to 0.027 (six singleton routes are dropped, N goes 5000 →
4994). Fixed effects absorb confounding and identifying
variation; the trade-off is always there. This is exactly the kind of
diagnostic scatterfit's fcontrols() makes visual.
Why does the Wages slope grow with absorb(nr)?
Higher-ability workers earn more and tend to accumulate
experience in higher-paying jobs. The raw cross-section confounds these
two effects, and the resulting pooled slope (+0.105) is an attenuated
blend. reghdfe lwage exper expersq, absorb(nr) strips away
ability and forces the slope to identify off the same individual at two
different points in their career — recovering the true return to
experience (+0.122). scatterfit lwage exper, fcontrols(nr)
draws this picture directly.
Panel FE — within-person residualization in pictures
A synthetic 60 × 8 wage panel (60 individuals, 8 years each) with strong
unobserved ability and a homogeneous true within-person return to
experience. Click the toggle to demean each individual's exper and lwage
by their own mean — that is FWL applied to individual dummies, exactly
what scatterfit lwage exper, fcontrols(nr) does behind the
scenes via reghdfe. The pooled slope is shallow (≈ 0.03);
the within slope is roughly 4× steeper (≈ 0.12),
matching the post's §8.3 finding.
What to look for
- Raw mode. A wide fan of points — same experience level, very different wages. Each color is a person; ability differences spread them apart vertically. This is the
scatterfit lwage experpicture. - Within mode. Subtract every person's mean from both axes. The cloud collapses around zero; what's left is each person's deviation from their own typical career. The slope steepens. This is what
fcontrols(nr)draws. - Why steeper? Higher-ability people earn more and happen to occupy more experienced positions on average. Pooled OLS confounds these. Demeaning removes the ability component, leaving only the within-person increment.
Connecting back to Tab 1
The morphing animation in Tab 1 and the toggle here do the same thing:
they show the data before and after partialling-out.
In Tab 1 it was a continuous confounder (income, removed via
controls()); here it is a categorical confounder (each
individual's identity, removed via fcontrols() /
absorb()). Both are special cases of the same FWL recipe —
and both produce the same picture: the raw slope is confounded; the
residualized slope is the truth the regress /
reghdfe table reports.