What does two-way fixed effects actually do?
Suppose you fit a panel regression with feols(y ~ x | id + time).
What is the package doing under the hood? The
Frisch-Waugh-Lovell (FWL) theorem says it is mathematically
equivalent to (a) subtracting each unit's mean, (b) subtracting each
period's mean, (c) adding back the grand mean to correct for
double-subtraction, and (d) running plain OLS on the residuals. The
coefficients match to machine precision — in this post's panel of 150
countries × 8 periods, the maximum difference is 3.05 × 10⁻¹⁶,
smaller than IEEE 754 double-precision epsilon.
This app lets you experiment with the transformation. In four tabs you
will: sweep the within-variation knob and watch how cross-country
differences disappear; reproduce the FWL coefficient equivalence on
simulated panels; toggle covariates on the post's forest plot; and see
why naive lm() standard errors are systematically too small.
Shrinkage vs. absorption — two different ways to deal with controls
The animation contrasts two strategies for handling many controls. L1 (LASSO) shrinks coefficients toward zero, exactly removing some. L2 (Ridge) shrinks but never reaches zero. Fixed effects take a third route: they absorb the unit-level and period-level variation entirely, leaving the coefficient identified only from within-variation. The TWFE coefficient is OLS on the leftover — no shrinkage at all. The slider here is illustrative; Tab 2 has the actual demeaning controls.
Demeaning Lab
Build a simulated panel. Watch unit means and time means stripped away. See within-variation emerge.
FWL Showdown
Two routes to the same coefficient: full LSDV regression vs OLS on demeaned data. Confirm machine-precision agreement across 100 simulations.
Forest Plot
The post's headline result: feols TWFE and manual demeaning OLS overlap exactly. Hover for SEs and confidence intervals.
Glossary (open a card if a term is unfamiliar)
Two-way fixed effects (TWFE)
Within transformation (demeaning)
Frisch-Waugh-Lovell (FWL) theorem
LSDV (Least-Squares Dummy Variables)
Within-R²
Degrees of freedom (df)
Clustered standard errors
Balanced panel
Demeaning Lab — see what TWFE strips away
Build a simulated panel of n units × T periods. The data-generating process bakes in unit-specific intercepts (some countries are always richer than others) and a common time trend (everyone grows on average). The slider below let you increase the treatment signal. The plot shows the LASSO path as a stand-in for the "many possible models" that demeaning makes unnecessary — with FE, you do not need to choose. You absorb them all.
What to look for
- Cross-unit variation is large. In real panels, most variation lies between units (countries differ in income levels by factors of 100×). Demeaning removes all of that.
- Within-variation is what identifies β. After two-way demeaning, only deviations from each unit's average and each period's average remain. In the post, that within-variation explains 17.7% of the residual variance.
- FE never need a tuning parameter. Unlike LASSO's λ, the demeaning transformation is parameter-free. You always subtract the same means and add back the same grand mean.
FWL Showdown — feols vs manual demeaning
The Frisch-Waugh-Lovell theorem guarantees that feols(y ~ x | id + time)
and lm(y_demeaned ~ x_demeaned) produce identical
coefficients on the slopes. Not approximately. Not "close." Identical to
machine precision. The two cards below show both estimates side-by-side
on the same simulated panel. Run 100 simulations to see the
distributions overlap exactly.
feols TWFE (LSDV)
Estimator: OLS on [d, X, unit dummies, time dummies] — feols absorbs them internally.
Manual demeaning
Estimator: ỹit = yit − ȳi· − ȳ·t + ȳ··, then plain lm().
Why the coefficients match to machine precision
- Same projection. Both LSDV (with dummies) and demeaning project the data onto the same subspace — the orthogonal complement of the span of unit and time indicator vectors.
- Coefficients identical, SEs differ. The slopes are guaranteed to match exactly. But naive
lm()doesn't know about the absorbed FE, so its SE uses the wrong degrees of freedom (1195 instead of 1038 in the post's panel). - This works only for balanced panels in one pass. With unbalanced panels, you need the iterative algorithm in
fixest. The closed-form three-step demeaning fails.
Stability of the FWL equivalence over many simulations
Single runs are noisy. Run the whole pipeline 100 times with fresh draws (same parameters, different random errors) to see the two distributions overlap perfectly.
The post's coefficient table — interactively
These numbers come straight from coefficient_comparison.csv and
se_comparison.csv in the post's folder — the same data used to
produce the headline figures. Toggle outcomes (the five regressors) and
methods (feols TWFE vs manual demeaning OLS). The two markers overlap
exactly for every variable. The bars at the bottom show standard-error
comparisons across naive, IID-corrected, and clustered approaches.
What to look for
- The two markers always overlap. Every covariate's feols and manual estimate coincide to the fourth decimal place. The largest absolute difference in the post is 3.05 × 10⁻¹⁶ — pure floating-point round-off.
- Confidence intervals differ slightly. Even though the point estimates match, the CIs use different SEs (clustered for feols, naive for lm). Hover any marker to inspect.
- The SE bars below show the three SE variants side-by-side. Naive lm() (steel blue) ignores the absorbed FE and so uses too many df. feols IID (green) uses the correct df. feols cluster (orange) further adjusts for within-unit residual correlation. For log(n+g+d), cluster SE is ~22% larger than naive; for gov. consumption it is ~5% larger.
Outcomes (regressors)
Methods
Why are the naive lm() SEs wrong?
The naive lm() on demeaned data thinks the residuals have
1200 − 6 = 1194 degrees of freedom — one for each of the 5
slopes plus the intercept. But the demeaning silently consumed
150 + 8 − 1 = 157 additional df (one per country FE plus one per
time FE, minus 1 normalisation). The correct df is
1200 − 5 − 157 = 1038. Using 1194 instead of 1038 inflates the
effective sample size, producing SEs that are too small. The feols
package knows about the absorbed FE and gets the df right automatically.
Connecting back to Tab 3
The FWL equivalence you just saw in simulation is exactly what happens on the real Barro convergence panel of 150 countries × 8 periods:
- Log Initial Income (convergence parameter): feols = −0.0553, manual = −0.0553 (difference = −4.2 × 10⁻¹⁷).
- Gov. Consumption Share: feols = −0.1028, manual = −0.1028 (difference = −3.1 × 10⁻¹⁶).
- All five coefficients agree to 12 significant digits. R's
all.equal()returnsTRUE.
The takeaway from the post is therefore visible twice: once on a controlled simulation where you set the truth, and once on the original 150 × 8 = 1,200 observations panel that motivates the whole exercise.