Visualizing Regression with the FWL Theorem in Stata

What ‘controlling for’ looks like as a scatter plot

−0.093naive slope · wrong sign

+0.212after partialling out income

0.212288manual FWL matches to 6 decimals

Carlos Mendez

Nagoya University (GSID)

July 8, 2026

The Tension

Act I

“Controlling for income” is the one move you can never draw on a scatter

A store manager asks: do coupons lift sales? The raw scatter says no — more coupons, lower sales.

“Holding income fixed” lives in three dimensions. How do you put that on two axes?

The same data, same slope, but one picture is honest and one is a lie

Naive scatter (left): negative slope, R² = 0.028. After controls(income) the slope reverses to positive, R² = 0.32. Same 200 stores, two pictures.

Where we’re going

A confounded toy dataset where the right answer is known
scatterfit — partialling-out in one command
The same number by hand, to six decimals (the FWL identity)
Fixed effects as FWL on group dummies — flights and a wage panel

The Investigation

Act II

A toy dataset built so income confounds the coupon–sales link

Outcome — store sales (200 stores)
Treatment — coupons distributed (true effect +0.2)
Confounder — income: rich areas get fewer coupons but buy more

Coupons and income are negatively linked (−0.71) — a backdoor the naive slope misses.

The raw correlation lies: coupons and sales correlate −0.17

Pair	Correlation	Reading
coupons · sales	−0.17	looks like coupons hurt sales
income · coupons	−0.71	store sends fewer coupons to rich areas
income · sales	+0.50	rich areas buy more

The true coupon effect is +0.2 — the negative raw correlation is income leaking through.

The FWL theorem: residualize first, then read one slope

\[\hat\beta_1=\frac{\operatorname{Cov}(\tilde y,\tilde x_1)}{\operatorname{Var}(\tilde x_1)}\]

Residualize both axes on \(Z\), then read one slope — it equals the multiple-regression coefficient on \(x_1\).

The tildes mean “the part \(Z\) cannot explain.” Two paths, one number.

One option turns the multiple regression into a partial-regression plot

scatterfit sales coupons, regparameters(coef pval r2)              // naive: −0.093
scatterfit sales coupons, controls(income) regparameters(coef pval r2)  // FWL: +0.212

controls(income) tells scatterfit to call reghdfe, residualize both axes on income, then plot \(\tilde y\) against \(\tilde x_1\).

Partialling out income flips the slope from −0.093 to +0.212

Term	Naive OLS	+ income	Reading
coupons	−0.0934	0.2123	sign reverses
income	—	0.3004	confounder, near true 0.3
R²	0.028	0.321	variation explained jumps

The +0.212 lands right on the true effect of +0.2; income’s coefficient lands on its true 0.3.

The omitted-variable-bias formula predicts the gap exactly

\[\text{bias}=\hat\gamma\times\hat\delta=0.300\times(-0.494)=-0.148\]

\(\hat\gamma\) is income’s effect on sales; \(\hat\delta\) is the coupons-on-income slope. The naive slope is the true slope plus this bias.

Nothing mysterious: a positive \(\hat\gamma\) times a negative \(\hat\delta\) drags the naive estimate down.

By hand, the residual-on-residual slope is 0.212288 — six matching decimals

* Step 1 — residualize sales on income
regress sales income
predict resid_sales, residuals
* Step 2 — residualize coupons on income
regress coupons income
predict resid_coupons, residuals
* Step 3 — regress residual on residual
regress resid_sales resid_coupons      // slope = 0.212288

The manual slope 0.212288 equals the full-regression coefficient 0.212288 — not close, identical.

The residual regression reproduces the coefficient to six decimals

0.212288

Manual FWL slope = full-regression coupon coefficient. Same number, two paths.

Add controls progressively and watch the scatter tighten

No controls (left, R² = 0.028) → + income (center, R² = 0.32) → + income + day-of-week (right, R² = 0.37). Each panel residualizes on more, so the cloud tightens.

For thousands of points, bin the scatter — the slope is unchanged

Unbinned (left) vs. binned into 20 quantiles (right). Both show the same FWL-residualized fit (β = 0.21, R² = 0.32); binning replaces 200 points with 20 readable means.

Fixed effects are just FWL on group dummies — and they reshape the cloud

Air-time vs. delay, NYC flights. No FE (left, R² ≈ 0) → origin FE (center) → origin + destination FE (right). fcontrols() demeans by group via reghdfe.

The Resolution

Act III

In a wage panel, individual fixed effects lift R² from 0.04 to 0.59

Raw pooled cross-section (left, R² = 0.043) vs. individual fixed-effects residualized scatter (right, R² = 0.59). fcontrols(nr) strips each person’s average — within-person experience returns ≈ 7%.

One algebra, three languages — only the syntax changes

Stata (`scatterfit`)

scatterfit y x — raw
, controls(z) — partial out Z
, fcontrols(fe) — group FE
, binned · regparameters()

R / Python

R: fwl_plot(y ~ x + z)
R: fwl_plot(y ~ x | fe)
Python: manual resid()
no binning / on-plot stats

The numbers match across all three because the datasets are identical — FWL is the same theorem everywhere.

Does the scatter make this causal? No — it only makes the algebra visible

Objection. A partial-regression plot looks like proof that coupons cause +0.212.

Response. It is not. FWL is an algebraic identity — it visualizes “holding Z fixed,” nothing more. The +0.212 is causal here only because we built the DGP.

“Controlling for X” is a residual-on-residual slope you can finally draw.