FWL Theorem — Interactive Lab

A pedagogical companion to The FWL Theorem: Making Multivariate Regressions Intuitive ↗ Back to the post

What does it really mean to "control for" a variable?

A multivariate regression returns a coefficient, but a multivariate scatter plot does not exist. The Frisch–Waugh–Lovell theorem resolves the tension. It says any coefficient from a multi-variable regression can be recovered from a simple two-variable regression — after you partial out all the other variables. The animation below shows the trick in motion.

In the post's retail example, ignoring neighbourhood income produces a coupon-on-sales slope of −0.106 (wrong sign). After residualising both variables on income, the slope flips to +0.267 — very close to the true DGP value of +0.20. Drag the slider in Tab 2 to see why.

Partialling out — what the regression is doing under the hood

The loop below alternates between two views. The first shows raw coupons against income with a regression line and the residuals (dashed) — that line's slope is income's effect on coupons. The second view discards everything explained by income and re-scatters the residuals. The slope of the second scatter is the FWL slope.

Left view: raw scatter with residuals shown as dashed lines (the part of coupons income cannot explain). Right view: residualised scatter — the conditional effect emerges as a positive slope.
Tab 2

Confounding Lab

Set the strength of the confounder yourself. Watch the naive estimate flip sign as you crank up the income–coupons link.

Tab 3

Forest Plot

The post's summary table as a horizontal forest plot. Six estimators, one true effect, hover for SEs and p-values.

Tab 4

Monte Carlo

Run the whole simulation 100 times. See the naive estimator's bias and the FWL estimator's centering on the truth.

Glossary (open a card if a term is unfamiliar)

FWL theorem
The OLS coefficient on x₁ from the full regression equals the OLS slope of ỹ on x̃₁, where the tildes are residuals from regressing on the other controls. An algebraic identity, not an approximation.
Partialling out
Replace each variable with the part not explained by the controls. Wiping a foggy window before looking through it.
Confounder
A variable that affects both the treatment and the outcome. In the post, income confounds coupons → sales.
Omitted variable bias
OVB = γ · δ — the omitted variable's effect on y times its slope on the treatment. The exact tilt of the foggy lens.
Conditional vs marginal effect
Conditional holds other variables fixed; marginal averages over them. They agree only when no relevant variable is omitted.
Backdoor path
A non-causal route from treatment to outcome through a confounder. Including the confounder closes the door.
Simpson's paradox
A trend in aggregate data that reverses inside subgroups. The coupons → sales sign flip in the post is the canonical retail example.
DML bridge
Double Machine Learning replaces the OLS residualisation with a flexible ML model. FWL with a smarter mop.

Confounding Lab — make the sign flip yourself

The simulator below mirrors the post's DGP: income drives coupons (negatively) and sales (positively); coupons drive sales (positively). The true causal effect of coupons on sales is fixed at α = 0.20. Adjust the three sliders and compare the naive coupons-on-sales slope to the FWL slope. Find a setting where the naive slope flips negative — that is when ignoring income misleads the analyst.

More stores ⇒ each estimate is sharper. The post uses 50.
How much income raises sales. Set close to zero to disable the confounding.
Negative ⇒ wealthy stores use fewer coupons. Bias = γ · δ.

Naive OLS sales ~ coupons

α̂ (slope)
SE
bias vs truth
γ · δ (predicted OVB)

FWL / Full OLS controlling for income

α̂ (FWL slope)
SE
bias vs truth
true α0.20
Horizontal bars: naive (orange) vs FWL (teal). The vertical steel line marks the true causal effect α = 0.20.

What to look for

  • Sign flip. With the default γ = 0.30 and δ = −0.50, the naive slope sits well below zero while the FWL slope hugs the true 0.20. That is Simpson's paradox in one slider.
  • OVB formula. The "γ · δ" stat predicts the bias as the product of the two backdoor coefficients. Verify by comparing it to (naive α̂) − (true α): they should agree to within sampling noise.
  • Turn off the confounding. Drag γ to 0. The naive and FWL slopes collapse to nearly identical values — without a relevant omitted variable, the two strategies agree.
  • Sample size. Drag n up to 500. Both standard errors shrink, but the naive estimate stays biased — sampling more data does not fix a misspecified model.

The post's summary table — interactively

These six numbers come straight from the Summary of results table at the end of the post. Every FWL variant produces a coefficient identical to the corresponding full regression — that is the algebraic guarantee of the theorem. The vertical steel line marks the true DGP value α = 0.20. Toggle methods to declutter.

Methods

What to look for

  • Naive OLS sits on the wrong side of zero. Its 95% CI is the only one that crosses both signs — confounding by income flips the apparent direction.
  • Full OLS and FWL Step 2 are visually indistinguishable. Their estimates coincide at 0.267 down to four decimal places. That is the FWL identity made visible.
  • FWL Step 1's CI is enormous. Hover the row: SE = 1.271. Residualising only the treatment (not the outcome) leaves income-driven sales variation in the residuals, inflating uncertainty.
  • Adding day-of-week (last two rows) barely moves the coupon coefficient but tightens the SE — extra non-confounding controls help precision without changing the point estimate.

Why are the FWL and full-OLS estimates exactly equal?

The FWL theorem is an algebraic identity, not a statistical approximation. The OLS coefficient on coupons in a regression of sales on coupons and income equals the OLS slope from regressing the residual of sales on the residual of coupons — both purged of income. The identity holds for any number of additional controls. The only thing the two procedures disagree on is the standard error, because residualising eats degrees of freedom that the one-shot regression books explicitly.

Monte Carlo — bias vs variance over many draws

A single sample is noisy: maybe the naive estimate was off because the random draw happened to be unfair. To find out, run the whole simulation 100 times with fresh random draws and look at the distribution of estimates. If naive OLS were merely noisy, the orange histogram would centre on α = 0.20. If it is biased, it centres somewhere else.

Each Monte Carlo draw simulates this many stores.
Set close to zero to make the naive estimator unbiased.
Negative δ + positive γ ⇒ negative bias on the naive slope.

What to look for

  • Two distributions, not one. The teal histogram (FWL) centres near 0.20 (true α). The orange histogram (naive) centres somewhere else — often even on the wrong side of zero.
  • Variance vs bias. Crank n up to 500. The teal cluster tightens around 0.20 (consistent estimator). The orange cluster tightens too — but around the wrong value. Larger samples cannot rescue an omitted-variables model.
  • Disable the bias. Set γ to 0 (or δ to 0). The two histograms collapse on top of each other — both centred on 0.20. The naive sign-flip rate drops to roughly its noise floor.
  • The sign-flip stat is the headline of the post. With the default settings it is typically 40–80%: in almost half (or more) of imagined retail studies, a naive analyst would conclude coupons hurt sales.