<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Fixed Effects and TWFE | Carlos Mendez</title><link>https://carlos-mendez.org/category/fixed-effects-and-twfe/</link><atom:link href="https://carlos-mendez.org/category/fixed-effects-and-twfe/index.xml" rel="self" type="application/rss+xml"/><description>Fixed Effects and TWFE</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>Carlos Mendez</copyright><lastBuildDate>Thu, 02 Apr 2026 00:00:00 +0000</lastBuildDate><image><url>https://carlos-mendez.org/media/icon_huedfae549300b4ca5d201a9bd09a3ecd5_79625_512x512_fill_lanczos_center_3.png</url><title>Fixed Effects and TWFE</title><link>https://carlos-mendez.org/category/fixed-effects-and-twfe/</link></image><item><title>What Does TWFE Actually Do? Manual Demeaning and the FWL Theorem</title><link>https://carlos-mendez.org/post/r_demeaning_twfe/</link><pubDate>Thu, 02 Apr 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/r_demeaning_twfe/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>Two-way fixed effects (TWFE) is one of the most widely used estimators in applied economics. Packages like &lt;code>fixest&lt;/code> make it easy to estimate TWFE models with a single line of code. But what does the estimator actually &lt;em>do&lt;/em> to the data? Why do time-invariant regressors like geography or colonial origin get dropped? And if you run &lt;code>lm()&lt;/code> on manually demeaned data, should you get the same answer?&lt;/p>
&lt;p>This tutorial answers these questions by taking TWFE apart. We estimate a standard growth regression with country and time fixed effects, then replicate the exact same coefficients by hand &amp;mdash; subtracting country means, time means, and adding back the grand mean before running ordinary least squares. The result is not an approximation: the coefficients match to 12 significant digits. The theoretical foundation for this equivalence is the &lt;strong>Frisch-Waugh-Lovell (FWL) theorem&lt;/strong>, a fundamental result in econometrics that connects controlling for variables in a regression to projecting them out by residualization.&lt;/p>
&lt;p>We use a balanced panel of 150 countries observed over 8 time periods from the Barro convergence dataset. Along the way, we also discover why standard errors from naive &lt;code>lm()&lt;/code> on demeaned data are wrong &amp;mdash; and why you should always use a dedicated panel estimator for inference.&lt;/p>
&lt;p>&lt;strong>Learning objectives:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Understand what two-way fixed effects does mechanically to the data and why time-invariant regressors are dropped&lt;/li>
&lt;li>Implement the two-way demeaning formula step by step: subtract country means, subtract time means, add back the grand mean&lt;/li>
&lt;li>Verify the Frisch-Waugh-Lovell theorem empirically by comparing &lt;code>feols()&lt;/code> and &lt;code>lm()&lt;/code> coefficients&lt;/li>
&lt;li>Interpret why naive standard errors from &lt;code>lm()&lt;/code> on demeaned data are incorrect and how &lt;code>fixest&lt;/code> corrects them&lt;/li>
&lt;li>Visualize the demeaning transformation to build intuition about within-variation identification&lt;/li>
&lt;/ul>
&lt;h2 id="2-the-frisch-waugh-lovell-theorem">2. The Frisch-Waugh-Lovell Theorem&lt;/h2>
&lt;p>Before diving into code, let us build the conceptual foundation. The FWL theorem answers a simple question: if you want to estimate the effect of $X$ on $Y$ while controlling for a set of variables $Z$, do you need to include everything in one big regression?&lt;/p>
&lt;p>Think of it like noise-canceling headphones. Instead of listening to music with the engine noise mixed in, the headphones first &lt;em>subtract out&lt;/em> the engine noise from what you hear. The result is the same music you would hear in a silent room. The FWL theorem says: instead of including all control variables in one regression, you can first &amp;ldquo;subtract them out&amp;rdquo; from both $Y$ and $X$, and then regress the residuals on each other. The coefficient on $X$ will be identical either way.&lt;/p>
&lt;h3 id="applying-fwl-to-two-way-fixed-effects">Applying FWL to two-way fixed effects&lt;/h3>
&lt;p>In a TWFE model, the &amp;ldquo;controls&amp;rdquo; $Z$ are the full set of country dummies and time dummies. Including all these dummies is equivalent to subtracting group means. For a variable $x_{it}$ observed for country $i$ in period $t$, the &lt;strong>two-way demeaned&lt;/strong> version is:&lt;/p>
&lt;p>$$\tilde{x}_{it} = x_{it} - \bar{x}_{i \cdot} - \bar{x}_{\cdot t} + \bar{x}_{\cdot \cdot}$$&lt;/p>
&lt;p>In words, this formula says: take the observed value, subtract the country average (to remove persistent country differences), subtract the time-period average (to remove common shocks), and add back the overall average (to correct for double-subtracting the grand mean).&lt;/p>
&lt;p>Here is what each symbol means:&lt;/p>
&lt;ul>
&lt;li>$x_{it}$ is the observed value for country $i$ at time $t$ &amp;mdash; in code, this is a single cell in the panel dataset&lt;/li>
&lt;li>$\bar{x}_{i \cdot}$ is the &lt;strong>country mean&lt;/strong> &amp;mdash; the average of $x$ across all periods for country $i$&lt;/li>
&lt;li>$\bar{x}_{\cdot t}$ is the &lt;strong>time mean&lt;/strong> &amp;mdash; the average of $x$ across all countries in period $t$&lt;/li>
&lt;li>$\bar{x}_{\cdot \cdot}$ is the &lt;strong>grand mean&lt;/strong> &amp;mdash; the overall average of $x$ across all observations&lt;/li>
&lt;/ul>
&lt;h3 id="why-add-back-the-grand-mean">Why add back the grand mean?&lt;/h3>
&lt;p>When we subtract both the country mean and the time mean, the grand mean gets subtracted &lt;em>twice&lt;/em> &amp;mdash; once as part of $\bar{x}_{i \cdot}$ and once as part of $\bar{x}_{\cdot t}$. Adding $\bar{x}_{\cdot \cdot}$ back corrects for this double subtraction. Think of it like a Venn diagram with two overlapping circles. If you subtract both circles entirely, the overlap region gets removed twice. Adding the overlap back once restores the correct amount. Without this correction, the demeaned variables would not be centered at zero, and the equivalence with TWFE would break.&lt;/p>
&lt;p>The FWL theorem guarantees this equivalence formally:&lt;/p>
&lt;p>$$\hat{\beta}_{\text{TWFE}} = \hat{\beta}_{\text{OLS on demeaned data}}$$&lt;/p>
&lt;p>In words, the slope coefficients from a regression that includes a full set of entity and time dummies are exactly equal to the slopes from OLS applied to the two-way demeaned data. Not approximately &amp;mdash; exactly. Let us verify this with real data.&lt;/p>
&lt;h2 id="3-setup">3. Setup&lt;/h2>
&lt;p>We need &lt;code>fixest&lt;/code> for TWFE estimation and &lt;code>tidyverse&lt;/code> for data wrangling and visualization. The &lt;code>scales&lt;/code> package provides axis formatting utilities.&lt;/p>
&lt;pre>&lt;code class="language-r">library(fixest)
library(tidyverse)
library(scales)
set.seed(42)
# Site color palette
STEEL_BLUE &amp;lt;- &amp;quot;#6a9bcc&amp;quot;
WARM_ORANGE &amp;lt;- &amp;quot;#d97757&amp;quot;
NEAR_BLACK &amp;lt;- &amp;quot;#141413&amp;quot;
TEAL &amp;lt;- &amp;quot;#00d4c8&amp;quot;
# Variables to demean
VARS_TO_DEMEAN &amp;lt;- c(&amp;quot;growth&amp;quot;, &amp;quot;ln_y_initial&amp;quot;, &amp;quot;log_s_k&amp;quot;,
&amp;quot;log_n_gd&amp;quot;, &amp;quot;log_hcap&amp;quot;, &amp;quot;gov_cons&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>We define the six variables that will be demeaned: the dependent variable (&lt;code>growth&lt;/code>) and all five regressors. Keeping them in a vector allows us to apply the demeaning formula programmatically rather than copying and pasting for each variable.&lt;/p>
&lt;h2 id="4-data-loading-and-panel-structure">4. Data Loading and Panel Structure&lt;/h2>
&lt;p>We load a balanced panel dataset with 150 countries observed over 8 time periods. The data comes from a Barro convergence exercise where the key question is whether poorer countries grow faster (conditional convergence). We convert &lt;code>id&lt;/code> and &lt;code>time&lt;/code> to factors so R treats them as categorical grouping variables.&lt;/p>
&lt;pre>&lt;code class="language-r">panel_data &amp;lt;- read.csv(&amp;quot;referenceMaterials/barro_convergence_panel.csv&amp;quot;)
panel_data$id &amp;lt;- factor(panel_data$id)
panel_data$time &amp;lt;- factor(panel_data$time)
cat(&amp;quot;Countries:&amp;quot;, nlevels(panel_data$id), &amp;quot;\n&amp;quot;)
cat(&amp;quot;Time periods:&amp;quot;, nlevels(panel_data$time), &amp;quot;\n&amp;quot;)
cat(&amp;quot;Total observations:&amp;quot;, nrow(panel_data), &amp;quot;\n&amp;quot;)
cat(&amp;quot;Balanced panel:&amp;quot;, all(table(panel_data$id) == nlevels(panel_data$time)), &amp;quot;\n&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Countries: 150
Time periods: 8
Total observations: 1200
Balanced panel: TRUE
&lt;/code>&lt;/pre>
&lt;p>The dataset is a perfectly balanced panel of 150 countries observed across 8 time periods, yielding 1,200 total observations. A balanced panel means every country appears in every period with no missing cells &amp;mdash; the ideal setting for demonstrating the demeaning formula. The key variables are:&lt;/p>
&lt;ul>
&lt;li>&lt;code>growth&lt;/code>: annualized GDP per capita growth rate (dependent variable)&lt;/li>
&lt;li>&lt;code>ln_y_initial&lt;/code>: log of initial income (convergence term)&lt;/li>
&lt;li>&lt;code>log_s_k&lt;/code>: log of the investment share&lt;/li>
&lt;li>&lt;code>log_n_gd&lt;/code>: log of population growth plus depreciation&lt;/li>
&lt;li>&lt;code>log_hcap&lt;/code>: log of human capital&lt;/li>
&lt;li>&lt;code>gov_cons&lt;/code>: government consumption share&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="r_demeaning_twfe_panel_structure.png" alt="Panel structure: 150 countries across 8 time periods, all cells filled.">
&lt;em>Panel structure heatmap showing all 150 countries observed across 8 time periods with no missing cells.&lt;/em>&lt;/p>
&lt;p>The heatmap confirms the balanced structure. Every one of the 150 countries is observed in all 8 time periods. This balance simplifies our demeaning procedure because we can use the closed-form formula directly, without the iterative projection that unbalanced panels would require.&lt;/p>
&lt;h2 id="5-twfe-estimation-with-fixest">5. TWFE Estimation with fixest&lt;/h2>
&lt;p>The &lt;code>fixest&lt;/code> package makes TWFE estimation straightforward. The formula uses &lt;code>|&lt;/code> to separate the regressors (left) from the fixed effects dimensions (right). Writing &lt;code>| id + time&lt;/code> tells &lt;code>feols()&lt;/code> to absorb both country and time fixed effects. Internally, &lt;code>fixest&lt;/code> performs an efficient iterative demeaning algorithm to remove the fixed effects before estimating the slope coefficients.&lt;/p>
&lt;pre>&lt;code class="language-r">twfe_model &amp;lt;- feols(
growth ~ ln_y_initial + log_s_k + log_n_gd + log_hcap + gov_cons | id + time,
data = panel_data
)
summary(twfe_model)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">OLS estimation, Dep. Var.: growth
Observations: 1,200
Fixed-effects: id: 150, time: 8
Standard-errors: Clustered (id)
Estimate Std. Error t value Pr(&amp;gt;|t|)
ln_y_initial -0.055286 0.003744 -14.765156 &amp;lt; 2.2e-16 ***
log_s_k 0.019725 0.007583 2.601311 0.010223 *
log_n_gd -0.049614 0.022168 -2.238117 0.026696 *
log_hcap 0.009081 0.014564 0.623549 0.533877
gov_cons -0.102795 0.046398 -2.215501 0.028243 *
RMSE: 0.020517 Adj. R2: 0.755103
Within R2: 0.176777
&lt;/code>&lt;/pre>
&lt;p>The TWFE model reveals strong conditional beta-convergence &amp;mdash; the hypothesis that poorer countries tend to grow faster, so income levels converge over time. The coefficient on log initial income is -0.055 (t = -14.77, p &amp;lt; 2.2e-16), meaning that a 1% higher initial income is associated with 0.055 percentage points slower subsequent growth, after controlling for the other covariates. Investment has the expected positive effect (0.020, p = 0.010), population growth has the expected negative effect (-0.050, p = 0.027), and government consumption is significantly negative (-0.103, p = 0.028). Human capital is positive but not statistically significant (0.009, p = 0.534). The model explains 75.5% of total variation (Adj. R-squared = 0.755), though only 17.7% of the within-variation (Within R-squared = 0.177) &amp;mdash; typical for panel models where fixed effects absorb most cross-country heterogeneity.&lt;/p>
&lt;p>Now let us replicate these coefficients by hand.&lt;/p>
&lt;h2 id="6-manual-demeaning-----step-by-step">6. Manual Demeaning &amp;mdash; Step by Step&lt;/h2>
&lt;p>We now walk through the demeaning procedure one step at a time. The goal is to transform every variable so that the country and time effects are removed. We will then run plain OLS on the result and verify that the coefficients match.&lt;/p>
&lt;h3 id="step-1-country-means">Step 1: Country means&lt;/h3>
&lt;p>For each country, we compute the average of each variable across all time periods. This gives us one mean per country per variable &amp;mdash; capturing persistent country characteristics like geography, institutions, or long-run income level.&lt;/p>
&lt;pre>&lt;code class="language-r">country_means &amp;lt;- panel_data |&amp;gt;
group_by(id) |&amp;gt;
summarise(across(all_of(VARS_TO_DEMEAN), mean), .groups = &amp;quot;drop&amp;quot;)
&lt;/code>&lt;/pre>
&lt;h3 id="step-2-time-means">Step 2: Time means&lt;/h3>
&lt;p>For each time period, we compute the average of each variable across all countries. These time means capture common shocks or trends that affect all countries in a given period &amp;mdash; for instance, a global recession or a worldwide productivity boom.&lt;/p>
&lt;pre>&lt;code class="language-r">time_means &amp;lt;- panel_data |&amp;gt;
group_by(time) |&amp;gt;
summarise(across(all_of(VARS_TO_DEMEAN), mean), .groups = &amp;quot;drop&amp;quot;)
&lt;/code>&lt;/pre>
&lt;h3 id="step-3-grand-mean">Step 3: Grand mean&lt;/h3>
&lt;p>The grand mean is simply the overall average of each variable across all countries and all time periods. It is a single number per variable, and we need it to correct for the double subtraction.&lt;/p>
&lt;pre>&lt;code class="language-r">grand_means &amp;lt;- colMeans(panel_data[VARS_TO_DEMEAN])
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> growth ln_y_initial log_s_k log_n_gd log_hcap gov_cons
-0.1243637 5.3643127 -1.5699117 -2.6569021 0.6645657 0.1461335
&lt;/code>&lt;/pre>
&lt;h3 id="step-4-apply-the-demeaning-formula">Step 4: Apply the demeaning formula&lt;/h3>
&lt;p>Now we bring everything together. We merge the country means and time means back into the main dataset, then apply the formula $\tilde{x}_{it} = x_{it} - \bar{x}_{i \cdot} - \bar{x}_{\cdot t} + \bar{x}_{\cdot \cdot}$ programmatically to each variable.&lt;/p>
&lt;pre>&lt;code class="language-r"># Merge means
panel_dm &amp;lt;- panel_data |&amp;gt;
left_join(
country_means |&amp;gt; rename_with(~ paste0(.x, &amp;quot;_cmean&amp;quot;), all_of(VARS_TO_DEMEAN)),
by = &amp;quot;id&amp;quot;
) |&amp;gt;
left_join(
time_means |&amp;gt; rename_with(~ paste0(.x, &amp;quot;_tmean&amp;quot;), all_of(VARS_TO_DEMEAN)),
by = &amp;quot;time&amp;quot;
)
# Apply demeaning formula
for (v in VARS_TO_DEMEAN) {
panel_dm[[paste0(v, &amp;quot;_dm&amp;quot;)]] &amp;lt;-
panel_dm[[v]] -
panel_dm[[paste0(v, &amp;quot;_cmean&amp;quot;)]] -
panel_dm[[paste0(v, &amp;quot;_tmean&amp;quot;)]] +
grand_means[v]
}
&lt;/code>&lt;/pre>
&lt;p>Let us verify that the demeaning worked correctly. If the formula is implemented right, the mean of each demeaned variable should be approximately zero.&lt;/p>
&lt;pre>&lt;code class="language-text">Mean of demeaned variables (should be ~0):
growth_dm : -8.114169e-17
ln_y_initial_dm : 8.295170e-15
log_s_k_dm : -1.482923e-15
log_n_gd_dm : 1.599953e-15
log_hcap_dm : 5.384582e-17
gov_cons_dm : 1.832302e-16
&lt;/code>&lt;/pre>
&lt;p>All six demeaned variables have means on the order of $10^{-15}$ to $10^{-17}$ &amp;mdash; effectively zero within floating-point precision. The demeaning formula is implemented correctly: the within-variation that remains is purely the deviation from both entity-specific and time-specific patterns.&lt;/p>
&lt;h2 id="7-ols-on-the-demeaned-data">7. OLS on the Demeaned Data&lt;/h2>
&lt;p>With the demeaning complete, we run a standard OLS regression on the demeaned variables using base R&amp;rsquo;s &lt;code>lm()&lt;/code>. We deliberately use &lt;code>lm()&lt;/code> rather than &lt;code>feols()&lt;/code> to emphasize that this is plain ordinary least squares &amp;mdash; no fixed effects machinery is involved.&lt;/p>
&lt;pre>&lt;code class="language-r">manual_model &amp;lt;- lm(
growth_dm ~ ln_y_initial_dm + log_s_k_dm + log_n_gd_dm + log_hcap_dm + gov_cons_dm,
data = panel_dm
)
summary(manual_model)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Coefficients:
Estimate Std. Error t value Pr(&amp;gt;|t|)
(Intercept) 5.035e-16 5.938e-04 0.000 1.00000
ln_y_initial_dm -5.529e-02 3.618e-03 -15.282 &amp;lt; 2e-16 ***
log_s_k_dm 1.972e-02 6.846e-03 2.881 0.00403 **
log_n_gd_dm -4.961e-02 1.820e-02 -2.726 0.00651 **
log_hcap_dm 9.081e-03 1.370e-02 0.663 0.50751
gov_cons_dm -1.028e-01 4.411e-02 -2.331 0.01994 *
Residual standard error: 0.02057 on 1194 degrees of freedom
Multiple R-squared: 0.1768
&lt;/code>&lt;/pre>
&lt;p>Two things stand out. First, the &lt;strong>intercept is 5.03 x 10^-16&lt;/strong> &amp;mdash; effectively zero. After proper two-way demeaning, the mean of all demeaned variables is near zero, so there is nothing left for the intercept to capture. This is a good sanity check: if the grand mean correction had been omitted, the intercept would be non-zero. Second, the &lt;strong>slope coefficients&lt;/strong> look identical to those from &lt;code>feols()&lt;/code>. But &amp;ldquo;look identical&amp;rdquo; is not the same as &amp;ldquo;are identical.&amp;rdquo; The next section proves they are.&lt;/p>
&lt;h2 id="8-coefficient-comparison-the-proof">8. Coefficient Comparison: The Proof&lt;/h2>
&lt;p>We now place the coefficients from both approaches side by side and compute their difference. If the FWL theorem holds, the slope coefficients must be identical up to floating-point precision.&lt;/p>
&lt;pre>&lt;code class="language-r">twfe_coefs &amp;lt;- coef(twfe_model)
manual_coefs &amp;lt;- coef(manual_model)[-1] # drop intercept
names(manual_coefs) &amp;lt;- names(twfe_coefs)
comparison &amp;lt;- data.frame(
feols_TWFE = round(twfe_coefs, 12),
Manual_OLS = round(manual_coefs, 12),
Difference = twfe_coefs - manual_coefs
)
all.equal(unname(twfe_coefs), unname(manual_coefs))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Side-by-side coefficient comparison:
variable feols_TWFE manual_OLS difference
ln_y_initial -0.055286009819 -0.055286009819 -4.163336342e-17
log_s_k 0.019724899416 0.019724899416 3.469446952e-18
log_n_gd -0.049613972524 -0.049613972524 -2.775557562e-16
log_hcap 0.009081150621 0.009081150621 3.469446952e-17
gov_cons -0.102795317426 -0.102795317426 -3.053113318e-16
Maximum absolute difference: 3.053113e-16
all.equal() test: TRUE
&lt;/code>&lt;/pre>
&lt;p>This is the central result of the tutorial. All five slope coefficients are identical to at least 12 significant digits. The largest difference is 3.05 x 10^-16 &amp;mdash; on the order of IEEE 754 double-precision machine epsilon (~2.2 x 10^-16). R&amp;rsquo;s &lt;code>all.equal()&lt;/code> function confirms equality within its default tolerance. This is not an approximation: it is an exact algebraic identity guaranteed by the Frisch-Waugh-Lovell theorem.&lt;/p>
&lt;p>&lt;img src="r_demeaning_twfe_coef_comparison.png" alt="TWFE and manual demeaning coefficients overlap perfectly for all five variables.">
&lt;em>Coefficient comparison: feols TWFE (blue circles) and manual demeaning OLS (orange triangles) occupy the exact same positions.&lt;/em>&lt;/p>
&lt;p>The dot plot makes the equivalence visually concrete. For each of the five covariates, the steel blue circle (feols TWFE) and warm orange triangle (manual demeaning OLS) occupy the exact same position. Government consumption has the largest coefficient in magnitude at -0.103, while the convergence parameter (log initial income) sits at -0.055. The dashed zero line helps distinguish positive from negative effects.&lt;/p>
&lt;h2 id="9-visualizing-what-demeaning-does">9. Visualizing What Demeaning Does&lt;/h2>
&lt;p>The coefficient equivalence is proven, but what does demeaning &lt;em>look like&lt;/em>? How does it change the data? The following visualizations build intuition about the transformation.&lt;/p>
&lt;p>&lt;img src="r_demeaning_twfe_scatter_before_after.png" alt="Raw data shows wide cross-country spread; demeaned data collapses to a narrow range around zero.">
&lt;em>Before vs after two-way demeaning: the wide cross-country spread (left) collapses to a narrow range around zero (right).&lt;/em>&lt;/p>
&lt;p>The faceted scatter plot tells the story. In the left panel (raw data), 10 countries are plotted with log initial income on the x-axis and growth on the y-axis. Each country&amp;rsquo;s observations form a distinct cluster at different income levels &amp;mdash; the x-axis spans roughly 3 to 9. In the right panel (after demeaning), the same data is compressed to approximately -0.5 to 0.3 around zero. The between-country income differences and common time trends have been stripped away, leaving only the &lt;strong>within-variation&lt;/strong> &amp;mdash; the deviations from each country&amp;rsquo;s own average and each period&amp;rsquo;s common trend. This is the variation that identifies the TWFE coefficient.&lt;/p>
&lt;h3 id="decomposing-the-formula-for-one-country">Decomposing the formula for one country&lt;/h3>
&lt;p>To see exactly how the formula works, let us trace each component for Country 1&amp;rsquo;s growth rate across all 8 periods.&lt;/p>
&lt;p>&lt;img src="r_demeaning_twfe_decomposition.png" alt="Observed values, country mean, time means, grand mean, and the demeaned residual for Country 1.">
&lt;em>Demeaning decomposition for Country 1: observed growth (blue), country mean (orange dashed), time means (teal), grand mean (gray), and the demeaned residual (black).&lt;/em>&lt;/p>
&lt;p>The decomposition makes the formula concrete. The observed growth values (blue line) decline from about -0.18 to -0.07. The country mean (orange dashed line) is a flat horizontal at -0.127 &amp;mdash; this is $\bar{x}_{i \cdot}$. The time means (teal dot-dash line) capture the common cross-country trend, declining from -0.189 to -0.076 &amp;mdash; this is $\bar{x}_{\cdot t}$. The grand mean (gray dotted) sits at -0.124 &amp;mdash; this is $\bar{x}_{\cdot \cdot}$. The demeaned series (black line) is the residual: $\tilde{x}_{it} = x_{it} - \bar{x}_{i \cdot} - \bar{x}_{\cdot t} + \bar{x}_{\cdot \cdot}$. It fluctuates around zero, capturing only the within-country, within-period deviations that TWFE uses for identification.&lt;/p>
&lt;h2 id="10-a-caveat-standard-errors-differ">10. A Caveat: Standard Errors Differ&lt;/h2>
&lt;p>While the coefficients are identical, the &lt;strong>standard errors&lt;/strong> from &lt;code>lm()&lt;/code> on demeaned data are wrong. This is a critical practical point that many textbooks gloss over.&lt;/p>
&lt;pre>&lt;code class="language-r">se_naive &amp;lt;- summary(manual_model)$coefficients[-1, &amp;quot;Std. Error&amp;quot;]
se_feols_iid &amp;lt;- se(twfe_model, se = &amp;quot;iid&amp;quot;)
se_feols_cl &amp;lt;- se(twfe_model) # default: clustered by first FE
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Standard error comparison:
variable se_naive_lm se_feols_iid se_feols_cluster
ln_y_initial 0.00361766 0.00388000 0.00374436
log_s_k 0.00684559 0.00734199 0.00758268
log_n_gd 0.01820117 0.01952104 0.02216773
log_hcap 0.01369872 0.01469209 0.01456365
gov_cons 0.04410809 0.04730660 0.04639822
&lt;/code>&lt;/pre>
&lt;p>Why do they differ? The &lt;code>lm()&lt;/code> function does not know that 157 degrees of freedom were consumed by estimating 150 country effects and 8 time effects (minus 1 for normalization). It uses $df = N \times T - K = 1{,}195$ when the correct value is $N \times T - N - T + 1 - K = 1{,}038$. This makes naive SEs systematically too small &amp;mdash; they understate uncertainty by 7&amp;ndash;22% depending on the variable.&lt;/p>
&lt;p>&lt;img src="r_demeaning_twfe_se_comparison.png" alt="Naive lm() SEs are systematically smaller than both feols variants.">
&lt;em>Standard error comparison: naive lm() (gray) systematically underestimates uncertainty compared to feols IID (orange) and clustered (blue).&lt;/em>&lt;/p>
&lt;p>The grouped bar chart makes the pattern clear. For every variable, the gray bars (naive &lt;code>lm()&lt;/code>) are shorter than the orange (feols IID) and blue (feols clustered) bars. The gap is most visible for &lt;code>log(n+g+d)&lt;/code>, where the naive SE is 0.0182 versus 0.0222 for clustered &amp;mdash; a 22% understatement. The feols IID SEs correct for the degrees-of-freedom adjustment, while the clustered SEs additionally account for within-entity serial correlation. The practical lesson: &lt;strong>always use a dedicated panel estimator for inference&lt;/strong>, even though &lt;code>lm()&lt;/code> on demeaned data gives the correct point estimates.&lt;/p>
&lt;h2 id="11-discussion">11. Discussion&lt;/h2>
&lt;p>This tutorial has demonstrated a fundamental equivalence in econometrics. TWFE is not a special estimator &amp;mdash; it is ordinary least squares applied to data that has been demeaned by entity and time. The &lt;code>fixest&lt;/code> package automates this process efficiently, but the underlying operation is straightforward subtraction. The FWL theorem guarantees the equivalence mathematically, and our empirical verification confirms it to machine precision.&lt;/p>
&lt;p>Three practical insights emerge:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Demeaning reveals what FE can and cannot identify.&lt;/strong> Any variable that does not vary within a country over time (like geography or colonial history) has a country mean equal to itself. After demeaning, such a variable becomes zero everywhere and drops out of the regression. This is why fixed effects models cannot estimate the effect of time-invariant characteristics.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The grand mean correction is not optional.&lt;/strong> Omitting the $+ \bar{x}_{\cdot \cdot}$ term in the demeaning formula would double-subtract the overall level, producing a non-zero intercept and subtly wrong demeaned values. The correction is algebraically necessary for the FWL equivalence to hold.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Correct coefficients do not mean correct inference.&lt;/strong> The &lt;code>lm()&lt;/code> standard errors are too small because they ignore the degrees of freedom consumed by the absorbed fixed effects. In applied work, this means artificially narrow confidence intervals and inflated t-statistics. Always use &lt;code>feols()&lt;/code> or an equivalent panel estimator for standard errors and hypothesis testing.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="12-summary-and-next-steps">12. Summary and Next Steps&lt;/h2>
&lt;p>&lt;strong>Key takeaways:&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>
&lt;p>TWFE estimation via &lt;code>feols()&lt;/code> and OLS on manually demeaned data produce identical coefficients &amp;mdash; the maximum difference across 5 coefficients is 3.05 x 10^-16, confirming the FWL theorem.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The demeaning formula subtracts entity means and time means, then adds back the grand mean to correct for double subtraction. After demeaning, all variable means are effectively zero (order of 10^-15).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The Within R-squared of 0.177 versus the overall Adjusted R-squared of 0.755 shows that most variation in growth is absorbed by the fixed effects, not by the regressors.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Naive &lt;code>lm()&lt;/code> standard errors understate uncertainty by 7&amp;ndash;22% because they ignore the 157 degrees of freedom consumed by the fixed effects. Always use a dedicated panel estimator for inference.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Limitations:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>The dataset is simulated, so coefficient values reflect the data-generating process rather than real-world economic dynamics.&lt;/li>
&lt;li>The tutorial assumes a balanced panel. With unbalanced panels, the simple closed-form demeaning still works algebraically, but &lt;code>fixest&lt;/code> uses a more efficient iterative algorithm.&lt;/li>
&lt;li>The SE comparison covers only IID and entity-clustered SEs. Other corrections (heteroskedasticity-robust, Driscoll-Kraay for cross-sectional dependence) may be relevant in applied work.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Next steps:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Apply the demeaning logic to understand why specific variables drop out of your own FE models.&lt;/li>
&lt;li>Explore heterogeneous treatment effects with interaction-weighted TWFE estimators.&lt;/li>
&lt;li>Read Cunningham (2021), &lt;em>Causal Inference: The Mixtape&lt;/em>, Chapter 9, for the connection between TWFE demeaning and difference-in-differences designs.&lt;/li>
&lt;/ul>
&lt;h2 id="13-exercises">13. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Omit the grand mean correction.&lt;/strong> Modify the demeaning formula to skip the $+ \bar{x}_{\cdot \cdot}$ term. Run &lt;code>lm()&lt;/code> on the incorrectly demeaned data. What happens to the intercept? Do the slope coefficients still match the TWFE estimates? Why or why not?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>One-way demeaning.&lt;/strong> Repeat the exercise using only entity demeaning (subtract country means, skip time means). Compare the coefficients to a one-way FE model (&lt;code>feols(growth ~ ... | id)&lt;/code>). Verify the equivalence and examine how the coefficients change compared to the two-way specification.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Visualize a different variable.&lt;/strong> Recreate the demeaning decomposition plot (Section 9) for &lt;code>log_s_k&lt;/code> (investment share) instead of &lt;code>growth&lt;/code>. Does the country mean, time mean, or within-variation dominate for this variable? What does this tell you about the source of variation that identifies its coefficient?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="14-references">14. References&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>Frisch, R. and Waugh, F.V. (1933). &amp;ldquo;Partial Time Regressions as Compared with Individual Trends.&amp;rdquo; &lt;em>Econometrica&lt;/em>, 1(4), 387&amp;ndash;401.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Lovell, M.C. (1963). &amp;ldquo;Seasonal Adjustment of Economic Time Series and Multiple Regression Analysis.&amp;rdquo; &lt;em>Journal of the American Statistical Association&lt;/em>, 58(304), 993&amp;ndash;1010.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Berge, L. (2018). &lt;em>fixest: Fast Fixed-Effects Estimations&lt;/em>. R package. &lt;a href="https://cran.r-project.org/package=fixest" target="_blank" rel="noopener">CRAN&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Cunningham, S. (2021). &lt;em>Causal Inference: The Mixtape&lt;/em>. Yale University Press. &lt;a href="https://mixtape.scunning.com/" target="_blank" rel="noopener">Online edition&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Barro, R.J. and Sala-i-Martin, X. (2004). &lt;em>Economic Growth&lt;/em>. 2nd edition. MIT Press.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h4 id="acknowledgements">Acknowledgements&lt;/h4>
&lt;p>AI tools (Claude Code, Gemini, NotebookLM) were used to make the contents of this post more accessible to students. Nevertheless, the content in this post may still have errors. Caution is needed when applying the contents of this post to true research projects.&lt;/p></description></item><item><title>High-Dimensional Fixed Effects Regression: An Introduction in Python</title><link>https://carlos-mendez.org/post/python_pyfixest/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/python_pyfixest/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>Imagine you want to know whether union membership raises wages. You run a regression and find a strong positive association: union workers earn 18% more. But wait &amp;mdash; what if the workers who join unions are also more motivated, more experienced, or work in industries that pay well regardless? That 18% could be mostly &lt;em>selection&lt;/em>, not a genuine union effect. This is one of the most pervasive problems in empirical research: &lt;strong>omitted variable bias&lt;/strong>. Any time your data is grouped &amp;mdash; by individual, firm, country, or time period &amp;mdash; unobserved characteristics that differ across groups can contaminate your estimates, leading to conclusions that look solid but are fundamentally misleading.&lt;/p>
&lt;p>&lt;strong>Fixed effects regression&lt;/strong> is the workhorse solution. By absorbing all time-invariant group-level heterogeneity &amp;mdash; a worker&amp;rsquo;s innate ability, a firm&amp;rsquo;s management culture, a country&amp;rsquo;s institutional quality &amp;mdash; fixed effects eliminate an entire class of confounders in one step. The result is striking: in the wage panel we analyze below, the apparent union premium drops from 18% to just 7% once we account for individual fixed effects, revealing that more than half the raw association was driven by who selects into unions, not what unions do. This kind of dramatic correction is routine in applied research, which is why fixed effects appear in virtually every empirical paper that uses panel data.&lt;/p>
&lt;p>Modern implementations make this computationally painless. Rather than estimating thousands of dummy variables, they use a &lt;em>demeaning&lt;/em> algorithm that sweeps out group means before estimation. &lt;a href="https://pyfixest.org/" target="_blank" rel="noopener">PyFixest&lt;/a> brings this approach to Python with a concise formula syntax inspired by R&amp;rsquo;s &lt;code>fixest&lt;/code> package &amp;mdash; the most popular fixed effects library in the R ecosystem. In this tutorial we use PyFixest to build from simple OLS through one-way and two-way fixed effects, compare inference methods, perform instrumental variable estimation, analyze a real wage panel, and run event study designs for difference-in-differences &amp;mdash; all with a few lines of code. Along the way, we will see &lt;em>why&lt;/em> fixed effects work (by manually reproducing them via demeaning), discover what they &lt;em>cannot&lt;/em> do (estimate time-invariant effects like education), learn when standard TWFE breaks down in staggered treatment designs, and apply the CRE/Mundlak approach to recover the very coefficients that one-way FE absorb.&lt;/p>
&lt;p>&lt;strong>Learning objectives:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Understand why unobserved group heterogeneity biases OLS and how fixed effects remove that bias&lt;/li>
&lt;li>Implement one-way and two-way fixed effects regressions using PyFixest&amp;rsquo;s formula syntax&lt;/li>
&lt;li>Compare multiple model specifications efficiently using PyFixest&amp;rsquo;s stepwise operators&lt;/li>
&lt;li>Assess robustness by computing standard errors under different clustering assumptions&lt;/li>
&lt;li>Decompose panel variation into between and within components to diagnose what FE can and cannot estimate&lt;/li>
&lt;li>Frame a real wage panel through the Mincer equation and its panel extensions&lt;/li>
&lt;li>Recover time-invariant coefficients (education, race) using the CRE/Mundlak approach&lt;/li>
&lt;li>Apply fixed effects to event study designs with staggered treatment adoption&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Content outline.&lt;/strong> Sections 2&amp;ndash;4 set up the environment and establish an OLS baseline. Sections 5&amp;ndash;6 introduce fixed effects &amp;mdash; first through PyFixest&amp;rsquo;s absorption syntax, then by reproducing the same result manually via demeaning, building intuition for what FE actually does to the data. Section 7 shows how to compare multiple specifications in a single call, and Section 8 explores how standard error choices affect inference. Section 9 extends to two-way FE, and Section 10 combines FE with instrumental variables. Section 11 is the core case study: a real wage panel framed by the Mincer equation, where we decompose within and between variation, see how one-way FE absorb time-invariant variables like education, stress-test the common trends assumption with group-specific time effects, and recover education&amp;rsquo;s coefficient through the CRE/Mundlak approach. Section 12 applies FE to event study designs, with a careful discussion of why period −1 serves as the universal baseline. Throughout, each section builds on the previous &amp;mdash; the manual demeaning in Section 6 explains why education vanishes in Section 11, and the stepwise comparison in Section 7 foreshadows the specification table in Section 11.&lt;/p>
&lt;h2 id="2-setup-and-imports">2. Setup and imports&lt;/h2>
&lt;p>Before running the analysis, install the required packages if needed:&lt;/p>
&lt;pre>&lt;code class="language-python">pip install pyfixest
&lt;/code>&lt;/pre>
&lt;p>The following code imports PyFixest and standard data science libraries. PyFixest provides &lt;a href="https://pyfixest.org/reference/estimation.feols.html" target="_blank" rel="noopener">feols()&lt;/a> as its main estimation function, which accepts R-style formulas with a pipe &lt;code>|&lt;/code> separator for fixed effects.&lt;/p>
&lt;pre>&lt;code class="language-python">import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pyfixest as pf
# Reproducibility
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
# Site color palette
STEEL_BLUE = &amp;quot;#6a9bcc&amp;quot;
WARM_ORANGE = &amp;quot;#d97757&amp;quot;
NEAR_BLACK = &amp;quot;#141413&amp;quot;
TEAL = &amp;quot;#00d4c8&amp;quot;
&lt;/code>&lt;/pre>
&lt;details>
&lt;summary>&lt;strong>Dark theme figure styling&lt;/strong> (click to expand)&lt;/summary>
&lt;pre>&lt;code class="language-python"># Dark theme palette (consistent with site navbar/dark sections)
DARK_NAVY = &amp;quot;#0f1729&amp;quot;
GRID_LINE = &amp;quot;#1f2b5e&amp;quot;
LIGHT_TEXT = &amp;quot;#c8d0e0&amp;quot;
WHITE_TEXT = &amp;quot;#e8ecf2&amp;quot;
# Plot defaults — minimal, spine-free, dark background
plt.rcParams.update({
&amp;quot;figure.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.edgecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.linewidth&amp;quot;: 0,
&amp;quot;axes.labelcolor&amp;quot;: LIGHT_TEXT,
&amp;quot;axes.titlecolor&amp;quot;: WHITE_TEXT,
&amp;quot;axes.spines.top&amp;quot;: False,
&amp;quot;axes.spines.right&amp;quot;: False,
&amp;quot;axes.spines.left&amp;quot;: False,
&amp;quot;axes.spines.bottom&amp;quot;: False,
&amp;quot;axes.grid&amp;quot;: True,
&amp;quot;grid.color&amp;quot;: GRID_LINE,
&amp;quot;grid.linewidth&amp;quot;: 0.6,
&amp;quot;grid.alpha&amp;quot;: 0.8,
&amp;quot;xtick.color&amp;quot;: LIGHT_TEXT,
&amp;quot;ytick.color&amp;quot;: LIGHT_TEXT,
&amp;quot;xtick.major.size&amp;quot;: 0,
&amp;quot;ytick.major.size&amp;quot;: 0,
&amp;quot;text.color&amp;quot;: WHITE_TEXT,
&amp;quot;font.size&amp;quot;: 12,
&amp;quot;legend.frameon&amp;quot;: False,
&amp;quot;legend.fontsize&amp;quot;: 11,
&amp;quot;legend.labelcolor&amp;quot;: LIGHT_TEXT,
&amp;quot;figure.edgecolor&amp;quot;: DARK_NAVY,
&amp;quot;savefig.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;savefig.edgecolor&amp;quot;: DARK_NAVY,
})
&lt;/code>&lt;/pre>
&lt;/details>
&lt;h2 id="3-data-loading-and-exploration">3. Data loading and exploration&lt;/h2>
&lt;h3 id="31-loading-the-dataset">3.1 Loading the dataset&lt;/h3>
&lt;p>PyFixest includes a built-in synthetic dataset designed for demonstrating fixed effects regression. We load it with &lt;a href="https://pyfixest.org/reference/utils.get_data.html" target="_blank" rel="noopener">pf.get_data()&lt;/a>, which returns a DataFrame with outcome variables (&lt;code>Y&lt;/code>, &lt;code>Y2&lt;/code>), covariates (&lt;code>X1&lt;/code>, &lt;code>X2&lt;/code>), fixed effect identifiers (&lt;code>f1&lt;/code>, &lt;code>f2&lt;/code>, &lt;code>f3&lt;/code>, &lt;code>group_id&lt;/code>), instruments (&lt;code>Z1&lt;/code>, &lt;code>Z2&lt;/code>), and sampling weights.&lt;/p>
&lt;pre>&lt;code class="language-python">data = pf.get_data()
print(f&amp;quot;Dataset shape: {data.shape}&amp;quot;)
print(f&amp;quot;\nColumn names: {list(data.columns)}&amp;quot;)
print(data.head())
print(data.describe().round(3))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Dataset shape: (1000, 11)
Column names: ['Y', 'Y2', 'X1', 'X2', 'f1', 'f2', 'f3', 'group_id', 'Z1', 'Z2', 'weights']
Y Y2 X1 X2 ... group_id Z1 Z2 weights
0 NaN 2.357103 0.0 0.457858 ... 9.0 -0.330607 1.054826 0.661478
1 -1.458643 5.163147 NaN -4.998406 ... 8.0 NaN -4.113690 0.772732
2 0.169132 0.751140 2.0 1.558480 ... 16.0 1.207778 0.465282 0.990929
3 3.319513 -2.656368 1.0 1.560402 ... 3.0 2.869997 0.467570 0.021123
4 0.134420 -1.866416 2.0 -3.472232 ... 14.0 0.835819 -3.115669 0.790815
Y Y2 X1 ... Z1 Z2 weights
count 999.000 1000.000 999.000 ... 999.000 1000.000 1000.000
mean -0.127 -0.309 1.043 ... 1.040 -0.113 0.495
std 2.305 5.584 0.808 ... 1.307 3.172 0.291
min -6.536 -16.974 0.000 ... -2.825 -11.576 0.000
25% -1.732 -4.029 0.000 ... 0.121 -2.252 0.248
50% -0.211 -0.459 1.000 ... 1.040 -0.064 0.469
75% 1.576 3.528 2.000 ... 1.946 2.028 0.746
max 6.907 17.156 2.000 ... 4.601 11.420 1.000
&lt;/code>&lt;/pre>
&lt;p>The dataset has 1,000 observations across 11 columns. The outcome &lt;code>Y&lt;/code> has a mean of -0.127 and standard deviation of 2.305, while &lt;code>X1&lt;/code> takes discrete values 0, 1, and 2. A few observations have missing values (1 missing in &lt;code>Y&lt;/code>, &lt;code>X1&lt;/code>, &lt;code>f1&lt;/code>, and &lt;code>Z1&lt;/code>), which PyFixest handles automatically by dropping incomplete cases. The &lt;code>group_id&lt;/code> variable identifies the group each observation belongs to, and this is the dimension we will control for with fixed effects.&lt;/p>
&lt;h3 id="32-visualizing-group-structure">3.2 Visualizing group structure&lt;/h3>
&lt;p>Before estimating any model, it helps to see how the relationship between &lt;code>X1&lt;/code> and &lt;code>Y&lt;/code> varies across groups. If groups have different average levels of &lt;code>Y&lt;/code>, standard OLS will mix within-group variation (what we care about) with between-group variation (which may reflect confounders).&lt;/p>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(10, 6))
groups = data[&amp;quot;group_id&amp;quot;].unique()
n_groups = len(groups)
cmap = plt.cm.tab20
for i, g in enumerate(sorted(groups)):
subset = data[data[&amp;quot;group_id&amp;quot;] == g]
ax.scatter(subset[&amp;quot;X1&amp;quot;], subset[&amp;quot;Y&amp;quot;], alpha=0.5, s=20,
color=cmap(i / n_groups),
label=f&amp;quot;Group {g}&amp;quot; if i &amp;lt; 5 else None)
ax.set_xlabel(&amp;quot;X1&amp;quot;, fontsize=13)
ax.set_ylabel(&amp;quot;Y&amp;quot;, fontsize=13)
ax.set_title(&amp;quot;Outcome (Y) vs Covariate (X1) by Group&amp;quot;, fontsize=15, fontweight=&amp;quot;bold&amp;quot;)
ax.legend(title=&amp;quot;Group (first 5)&amp;quot;, fontsize=9)
plt.savefig(&amp;quot;pyfixest_scatter_by_group.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_scatter_by_group.png" alt="Scatter plot of Y versus X1 colored by group membership, showing different intercepts across groups.">&lt;/p>
&lt;p>The scatter plot reveals that different groups have distinct average levels of &lt;code>Y&lt;/code> &amp;mdash; some clusters sit higher and others lower on the vertical axis. Within each group, however, &lt;code>Y&lt;/code> tends to decrease as &lt;code>X1&lt;/code> increases. This visual separation between groups is exactly the kind of heterogeneity that fixed effects regression absorbs, allowing us to isolate the within-group relationship between &lt;code>X1&lt;/code> and &lt;code>Y&lt;/code>.&lt;/p>
&lt;h2 id="4-simple-ols-baseline-no-fixed-effects">4. Simple OLS baseline (no fixed effects)&lt;/h2>
&lt;p>To establish a benchmark, we first estimate a standard OLS regression of &lt;code>Y&lt;/code> on &lt;code>X1&lt;/code> without any fixed effects. The model is:&lt;/p>
&lt;p>$$Y_i = \beta_0 + \beta_1 X_{1i} + \epsilon_i$$&lt;/p>
&lt;p>In words, we assume the outcome $Y$ is a linear function of $X_1$ plus random noise $\epsilon$. This gives us the overall association, mixing both within-group and between-group variation. We use heteroskedasticity-robust standard errors (&lt;code>HC1&lt;/code>) to account for non-constant variance.&lt;/p>
&lt;pre>&lt;code class="language-python">fit_ols = pf.feols(&amp;quot;Y ~ X1&amp;quot;, data=data, vcov=&amp;quot;HC1&amp;quot;)
print(fit_ols.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: Y, Fixed effects: 0
Inference: HC1
Observations: 998
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 0.919 | 0.112 | 8.223 | 0.000 | 0.699 | 1.138 |
| X1 | -1.000 | 0.082 | -12.134 | 0.000 | -1.162 | -0.838 |
---
RMSE: 2.158 R2: 0.123
&lt;/code>&lt;/pre>
&lt;p>The pooled OLS estimates a coefficient of -1.000 on &lt;code>X1&lt;/code> (SE = 0.082, p &amp;lt; 0.001), with an R-squared of 0.123. This means that a one-unit increase in &lt;code>X1&lt;/code> is associated with a 1.0-point decrease in &lt;code>Y&lt;/code> on average. However, this estimate ignores group-level differences &amp;mdash; it could be biased if &lt;code>X1&lt;/code> correlates with unobserved group characteristics. The model explains only 12.3% of the total variation in &lt;code>Y&lt;/code>, leaving substantial unexplained heterogeneity. Let us now see how fixed effects change the picture.&lt;/p>
&lt;h2 id="5-one-way-fixed-effects">5. One-way fixed effects&lt;/h2>
&lt;p>The following diagram illustrates the core problem fixed effects solve. When an unobserved group characteristic correlates with both the covariate and the outcome, it creates a &lt;em>backdoor path&lt;/em> that biases OLS. Fixed effects block this path by absorbing all group-level variation.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
A[&amp;quot;&amp;lt;b&amp;gt;Group Characteristics&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(unobserved)&amp;quot;] --&amp;gt;|&amp;quot;correlates&amp;quot;| X[&amp;quot;&amp;lt;b&amp;gt;X1&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(covariate)&amp;quot;]
A --&amp;gt;|&amp;quot;affects&amp;quot;| Y[&amp;quot;&amp;lt;b&amp;gt;Y&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(outcome)&amp;quot;]
X --&amp;gt;|&amp;quot;causal effect β = ?&amp;quot;| Y
FE[&amp;quot;&amp;lt;b&amp;gt;Fixed Effects&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(absorbs A)&amp;quot;] -.-&amp;gt;|&amp;quot;blocks backdoor&amp;quot;| A
style A fill:#d97757,stroke:#141413,color:#fff
style X fill:#6a9bcc,stroke:#141413,color:#fff
style Y fill:#00d4c8,stroke:#141413,color:#fff
style FE fill:#1a3a8a,stroke:#141413,color:#fff,stroke-dasharray: 5 5
&lt;/code>&lt;/pre>
&lt;h3 id="51-absorbing-group-heterogeneity">5.1 Absorbing group heterogeneity&lt;/h3>
&lt;p>Fixed effects regression controls for all time-invariant group characteristics by effectively adding a separate intercept for each group. In PyFixest, we specify fixed effects after a pipe &lt;code>|&lt;/code> in the formula. The syntax &lt;code>Y ~ X1 | group_id&lt;/code> means: regress &lt;code>Y&lt;/code> on &lt;code>X1&lt;/code>, absorbing &lt;code>group_id&lt;/code> fixed effects. Think of this as asking: &amp;ldquo;within each group, what is the relationship between &lt;code>X1&lt;/code> and &lt;code>Y&lt;/code>?&amp;rdquo;&lt;/p>
&lt;pre>&lt;code class="language-python">fit_fe1 = pf.feols(&amp;quot;Y ~ X1 | group_id&amp;quot;, data=data, vcov=&amp;quot;HC1&amp;quot;)
print(fit_fe1.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: Y, Fixed effects: group_id
Inference: HC1
Observations: 998
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1 | -1.019 | 0.083 | -12.234 | 0.000 | -1.182 | -0.856 |
---
RMSE: 2.141 R2: 0.137 R2 Within: 0.126
&lt;/code>&lt;/pre>
&lt;p>With &lt;code>group_id&lt;/code> fixed effects absorbed, the coefficient on &lt;code>X1&lt;/code> shifts slightly to -1.019 (SE = 0.083). The within R-squared of 0.126 tells us how much of the within-group variation in &lt;code>Y&lt;/code> is explained by &lt;code>X1&lt;/code> after removing group means. Compared to the pooled OLS estimate of -1.000, the fixed effects estimate is similar in this synthetic dataset, suggesting that &lt;code>X1&lt;/code> does not strongly correlate with group-level unobservables here. In real data, the shift can be dramatic &amp;mdash; that gap is the omitted variable bias that fixed effects remove.&lt;/p>
&lt;h3 id="52-equivalence-with-dummy-variables">5.2 Equivalence with dummy variables&lt;/h3>
&lt;p>Under the hood, fixed effects absorption produces the same point estimates as including explicit dummy variables for each group. PyFixest&amp;rsquo;s &lt;code>C()&lt;/code> operator creates these dummies. The key advantage of absorption is computational: with thousands of groups, estimating thousands of dummy coefficients is slow and memory-intensive, while demeaning is fast.&lt;/p>
&lt;pre>&lt;code class="language-python">fit_dummy = pf.feols(&amp;quot;Y ~ X1 + C(group_id)&amp;quot;, data=data, vcov=&amp;quot;HC1&amp;quot;)
print(f&amp;quot;X1 coefficient (FE absorption): {fit_fe1.coef()['X1']:.4f}&amp;quot;)
print(f&amp;quot;X1 coefficient (dummy vars): {fit_dummy.coef()['X1']:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">X1 coefficient (FE absorption): -1.0190
X1 coefficient (dummy vars): -1.0190
&lt;/code>&lt;/pre>
&lt;p>Both approaches yield identical coefficients of -1.0190 on &lt;code>X1&lt;/code>, confirming that FE absorption and dummy variable inclusion are algebraically equivalent. The absorption approach simply avoids estimating and storing the hundreds or thousands of group intercepts that are typically not of interest &amp;mdash; what econometricians call &lt;em>nuisance parameters&lt;/em>.&lt;/p>
&lt;h2 id="6-understanding-fixed-effects-via-manual-demeaning">6. Understanding fixed effects via manual demeaning&lt;/h2>
&lt;h3 id="61-the-within-transformation">6.1 The within transformation&lt;/h3>
&lt;p>To build intuition for what fixed effects actually do, we can perform the &lt;em>within transformation&lt;/em> manually. For each observation, we subtract its group mean from both &lt;code>Y&lt;/code> and &lt;code>X1&lt;/code>. This removes all between-group variation, leaving only the deviations from each group&amp;rsquo;s average. Regressing the demeaned &lt;code>Y&lt;/code> on the demeaned &lt;code>X1&lt;/code> recovers the same coefficient as the FE estimator. It is like centering each group at the origin &amp;mdash; the only variation left is how individuals within a group differ from their group&amp;rsquo;s typical level.&lt;/p>
&lt;p>The fixed effects estimator solves:&lt;/p>
&lt;p>$$\hat{\beta}_{FE} = \left(\sum_{i=1}^{N} \ddot{X}_i' \ddot{X}_i\right)^{-1} \sum_{i=1}^{N} \ddot{X}_i' \ddot{Y}_i$$&lt;/p>
&lt;p>where $\ddot{X}_i = X_{it} - \bar{X}_i$ and $\ddot{Y}_i = Y_{it} - \bar{Y}_i$ are the demeaned variables. In words, this says the FE estimator uses only within-group deviations from group means, eliminating any bias from group-level confounders.&lt;/p>
&lt;pre>&lt;code class="language-python"># Manual demeaning (within transformation)
data_dm = data.copy()
for col in [&amp;quot;Y&amp;quot;, &amp;quot;X1&amp;quot;]:
group_means = data_dm.groupby(&amp;quot;group_id&amp;quot;)[col].transform(&amp;quot;mean&amp;quot;)
data_dm[f&amp;quot;{col}_dm&amp;quot;] = data_dm[col] - group_means
fit_demeaned = pf.feols(&amp;quot;Y_dm ~ X1_dm&amp;quot;, data=data_dm, vcov=&amp;quot;HC1&amp;quot;)
print(f&amp;quot;X1 coefficient (FE absorption): {fit_fe1.coef()['X1']:.4f}&amp;quot;)
print(f&amp;quot;X1 coefficient (manual demean): {fit_demeaned.coef()['X1_dm']:.4f}&amp;quot;)
print(f&amp;quot;X1 coefficient (OLS, no FE): {fit_ols.coef()['X1']:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">X1 coefficient (FE absorption): -1.0190
X1 coefficient (manual demean): -1.0190
X1 coefficient (OLS, no FE): -1.0001
&lt;/code>&lt;/pre>
&lt;p>The manual demeaning produces a coefficient of -1.0190, exactly matching the FE absorption result. The pooled OLS gave -1.0001 by comparison. This confirms that fixed effects regression is mathematically equivalent to subtracting group means from every variable before running OLS. The difference between -1.019 (FE) and -1.000 (OLS) reflects the bias introduced by between-group variation that is removed by demeaning.&lt;/p>
&lt;h3 id="62-visualizing-the-demeaning">6.2 Visualizing the demeaning&lt;/h3>
&lt;pre>&lt;code class="language-python">fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# Left: Raw data
for i, g in enumerate(sorted(groups)[:5]):
subset = data[data[&amp;quot;group_id&amp;quot;] == g]
axes[0].scatter(subset[&amp;quot;X1&amp;quot;], subset[&amp;quot;Y&amp;quot;], alpha=0.4, s=20,
color=cmap(i / n_groups))
axes[0].set_xlabel(&amp;quot;X1 (raw)&amp;quot;, fontsize=13)
axes[0].set_ylabel(&amp;quot;Y (raw)&amp;quot;, fontsize=13)
axes[0].set_title(&amp;quot;Raw Data: Between + Within Variation&amp;quot;, fontsize=13, fontweight=&amp;quot;bold&amp;quot;)
# Right: Demeaned data
axes[1].scatter(data_dm[&amp;quot;X1_dm&amp;quot;], data_dm[&amp;quot;Y_dm&amp;quot;], alpha=0.4, s=20, color=STEEL_BLUE)
x_range = np.linspace(data_dm[&amp;quot;X1_dm&amp;quot;].min(), data_dm[&amp;quot;X1_dm&amp;quot;].max(), 100)
y_pred = fit_demeaned.coef()[&amp;quot;X1_dm&amp;quot;] * x_range
axes[1].plot(x_range, y_pred, color=WARM_ORANGE, linewidth=2.5,
label=f&amp;quot;FE slope = {fit_demeaned.coef()['X1_dm']:.3f}&amp;quot;)
axes[1].set_xlabel(&amp;quot;X1 (demeaned)&amp;quot;, fontsize=13)
axes[1].set_ylabel(&amp;quot;Y (demeaned)&amp;quot;, fontsize=13)
axes[1].set_title(&amp;quot;Demeaned Data: Within-Group Variation Only&amp;quot;, fontsize=13, fontweight=&amp;quot;bold&amp;quot;)
axes[1].legend(fontsize=11)
plt.savefig(&amp;quot;pyfixest_demeaning.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_demeaning.png" alt="Side-by-side comparison of raw data (left) showing scattered clusters at different vertical levels, and demeaned data (right) centered at the origin with a clear negative slope.">&lt;/p>
&lt;p>The left panel shows the raw data with groups scattered at different vertical levels &amp;mdash; this between-group variation is what confounds the OLS estimate. The right panel shows the demeaned data: all groups are now centered at the origin, and the clear negative slope of -1.019 reflects the pure within-group relationship. This visual makes the FE intuition concrete: by removing group averages, we eliminate confounding from any variable that is constant within groups. Now let us explore how to estimate multiple specifications efficiently.&lt;/p>
&lt;h2 id="7-multiple-estimation-with-stepwise-operators">7. Multiple estimation with stepwise operators&lt;/h2>
&lt;h3 id="71-cumulative-stepwise-fixed-effects">7.1 Cumulative stepwise fixed effects&lt;/h3>
&lt;p>One of PyFixest&amp;rsquo;s most powerful features is its formula operators for estimating multiple models in a single call. The &lt;code>csw0()&lt;/code> operator adds fixed effects &lt;em>cumulatively&lt;/em>: &lt;code>csw0(f1, f2)&lt;/code> estimates three models &amp;mdash; no FE, then &lt;code>f1&lt;/code> only, then &lt;code>f1 + f2&lt;/code> &amp;mdash; in one line. This is far more efficient than writing three separate calls and makes it easy to see how results change as we add controls.&lt;/p>
&lt;pre>&lt;code class="language-python">fit_multi = pf.feols(&amp;quot;Y ~ X1 | csw0(f1, f2)&amp;quot;, data=data, vcov=&amp;quot;HC1&amp;quot;)
# Print summary for each model
models = fit_multi.all_fitted_models
for key in models:
m = models[key]
print(f&amp;quot;\nModel: {key}&amp;quot;)
print(m.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Model: Y~X1
Estimation: OLS
Dep. var.: Y, Fixed effects: 0
Inference: HC1
Observations: 998
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) |
|:--------------|-----------:|-------------:|----------:|-----------:|
| Intercept | 0.919 | 0.112 | 8.223 | 0.000 |
| X1 | -1.000 | 0.082 | -12.134 | 0.000 |
---
RMSE: 2.158 R2: 0.123
Model: Y~X1|f1
Estimation: OLS
Dep. var.: Y, Fixed effects: f1
Inference: HC1
Observations: 997
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) |
|:--------------|-----------:|-------------:|----------:|-----------:|
| X1 | -0.949 | 0.067 | -14.094 | 0.000 |
---
RMSE: 1.73 R2: 0.437 R2 Within: 0.161
Model: Y~X1|f1+f2
Estimation: OLS
Dep. var.: Y, Fixed effects: f1+f2
Inference: HC1
Observations: 997
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) |
|:--------------|-----------:|-------------:|----------:|-----------:|
| X1 | -0.919 | 0.060 | -15.440 | 0.000 |
---
RMSE: 1.441 R2: 0.609 R2 Within: 0.200
&lt;/code>&lt;/pre>
&lt;p>The coefficient on &lt;code>X1&lt;/code> shifts from -1.000 (no FE) to -0.949 (with &lt;code>f1&lt;/code>) to -0.919 (with &lt;code>f1 + f2&lt;/code>), while the overall R-squared jumps from 0.123 to 0.437 to 0.609. Adding &lt;code>f1&lt;/code> alone explains an additional 31 percentage points of variation &amp;mdash; a massive improvement that shows how much group-level heterogeneity &lt;code>f1&lt;/code> captures. Adding &lt;code>f2&lt;/code> on top of &lt;code>f1&lt;/code> brings R-squared to 0.609, meaning the two fixed effect dimensions together account for over 60% of the total variation in &lt;code>Y&lt;/code>. The standard error on &lt;code>X1&lt;/code> also shrinks from 0.082 to 0.060, reflecting the precision gain from reducing residual noise.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Specification&lt;/th>
&lt;th>X1 Coef.&lt;/th>
&lt;th>SE&lt;/th>
&lt;th>R-squared&lt;/th>
&lt;th>R-squared Within&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>No FE&lt;/td>
&lt;td>-1.000&lt;/td>
&lt;td>0.082&lt;/td>
&lt;td>0.123&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>FE: f1&lt;/td>
&lt;td>-0.949&lt;/td>
&lt;td>0.067&lt;/td>
&lt;td>0.437&lt;/td>
&lt;td>0.161&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>FE: f1 + f2&lt;/td>
&lt;td>-0.919&lt;/td>
&lt;td>0.060&lt;/td>
&lt;td>0.609&lt;/td>
&lt;td>0.200&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="72-visualizing-coefficient-stability">7.2 Visualizing coefficient stability&lt;/h3>
&lt;p>The table above shows the numbers, but a figure makes the comparison more immediate. Plotting the coefficient with its 95% confidence interval across specifications reveals both the stability of the point estimate and the precision gain from adding fixed effects.&lt;/p>
&lt;pre>&lt;code class="language-python"># Coefficient comparison across specifications
model_names = [&amp;quot;No FE&amp;quot;, &amp;quot;FE: f1&amp;quot;, &amp;quot;FE: f1 + f2&amp;quot;]
coefs = [models[k].coef()[&amp;quot;X1&amp;quot;] for k in models]
ses = [models[k].se()[&amp;quot;X1&amp;quot;] for k in models]
fig, ax = plt.subplots(figsize=(8, 5))
y_pos = np.arange(len(model_names))
ax.barh(y_pos, coefs, xerr=[1.96 * s for s in ses], height=0.5,
color=[STEEL_BLUE, WARM_ORANGE, TEAL], edgecolor=DARK_NAVY, capsize=5)
ax.set_yticks(y_pos)
ax.set_yticklabels(model_names, fontsize=12)
ax.set_xlabel(&amp;quot;Coefficient on X1&amp;quot;, fontsize=13)
ax.set_title(&amp;quot;Effect of X1 Across Fixed Effect Specifications&amp;quot;, fontsize=14, fontweight=&amp;quot;bold&amp;quot;)
ax.axvline(x=0, color=NEAR_BLACK, linewidth=0.8, linestyle=&amp;quot;--&amp;quot;, alpha=0.5)
plt.savefig(&amp;quot;pyfixest_coef_comparison.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_coef_comparison.png" alt="Horizontal bar chart comparing X1 coefficient estimates across no FE, one-way FE, and two-way FE specifications, all showing negative effects near -1.0 with narrowing confidence intervals.">&lt;/p>
&lt;p>The coefficient comparison chart shows that the point estimate on &lt;code>X1&lt;/code> remains stable around -1.0 across all three specifications, with confidence intervals narrowing as we add fixed effects. This stability suggests the estimate is robust to the inclusion of group-level controls. In applied research, large shifts across specifications would signal omitted variable concerns, making this type of comparison essential for assessing credibility.&lt;/p>
&lt;h2 id="8-inference-choosing-the-right-standard-errors">8. Inference: choosing the right standard errors&lt;/h2>
&lt;h3 id="81-comparing-standard-error-estimators">8.1 Comparing standard error estimators&lt;/h3>
&lt;p>The choice of standard errors can dramatically change statistical inference, even when point estimates remain the same. Standard (iid) errors assume all observations are independent and identically distributed. Heteroskedasticity-robust (HC1) errors relax the constant-variance assumption. Cluster-robust (CRV) errors account for arbitrary correlation within groups &amp;mdash; essential when observations within a group are not independent, like repeated measurements of the same individual. Think of it like estimating average height: if you measure the same person ten times, those ten measurements are not ten independent observations, and your standard error should reflect that.&lt;/p>
&lt;pre>&lt;code class="language-python">se_types = {
&amp;quot;iid&amp;quot;: &amp;quot;iid&amp;quot;,
&amp;quot;HC1 (robust)&amp;quot;: &amp;quot;HC1&amp;quot;,
&amp;quot;CRV1 (group_id)&amp;quot;: {&amp;quot;CRV1&amp;quot;: &amp;quot;group_id&amp;quot;},
&amp;quot;CRV1 (group_id + f2)&amp;quot;: {&amp;quot;CRV1&amp;quot;: &amp;quot;group_id + f2&amp;quot;},
&amp;quot;CRV3 (group_id)&amp;quot;: {&amp;quot;CRV3&amp;quot;: &amp;quot;group_id&amp;quot;},
}
print(f&amp;quot;{'SE Type':&amp;lt;22} {'SE(X1)':&amp;lt;10} {'t-stat':&amp;lt;10} {'p-value':&amp;lt;10}&amp;quot;)
print(&amp;quot;-&amp;quot; * 52)
for name, vcov in se_types.items():
fit_tmp = pf.feols(&amp;quot;Y ~ X1 | group_id&amp;quot;, data=data, vcov=vcov)
print(f&amp;quot;{name:&amp;lt;22} {fit_tmp.se()['X1']:&amp;lt;10.4f} &amp;quot;
f&amp;quot;{fit_tmp.tstat()['X1']:&amp;lt;10.3f} {fit_tmp.pvalue()['X1']:&amp;lt;10.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">SE Type SE(X1) t-stat p-value
----------------------------------------------------
iid 0.0858 -11.875 0.0000
HC1 (robust) 0.0833 -12.234 0.0000
CRV1 (group_id) 0.1172 -8.696 0.0000
CRV1 (group_id + f2) 0.1207 -8.445 0.0000
CRV3 (group_id) 0.1247 -8.174 0.0000
&lt;/code>&lt;/pre>
&lt;p>The standard error on &lt;code>X1&lt;/code> ranges from 0.0833 (HC1) to 0.1247 (CRV3), a 50% increase depending on the assumption about error correlation. While all p-values remain below 0.001 in this case, the t-statistic drops from 12.2 to 8.2 &amp;mdash; a substantial difference that could determine significance for weaker effects. Cluster-robust SEs (CRV1) inflate to 0.1172 because they account for within-group correlation. The CRV3 estimator, which provides a more conservative finite-sample correction, gives the largest SE of 0.1247. In practice, you should cluster at the level where you believe errors are correlated.&lt;/p>
&lt;h3 id="82-visualizing-the-se-tradeoff">8.2 Visualizing the SE tradeoff&lt;/h3>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(9, 5))
se_names = list(se_types.keys())
se_vals = []
for name, vcov in se_types.items():
fit_tmp = pf.feols(&amp;quot;Y ~ X1 | group_id&amp;quot;, data=data, vcov=vcov)
se_vals.append(fit_tmp.se()[&amp;quot;X1&amp;quot;])
colors = [STEEL_BLUE, WARM_ORANGE, TEAL, &amp;quot;#e8956a&amp;quot;, &amp;quot;#f0a88c&amp;quot;]
bars = ax.bar(range(len(se_names)), se_vals, color=colors, edgecolor=DARK_NAVY, width=0.6)
ax.set_xticks(range(len(se_names)))
ax.set_xticklabels(se_names, rotation=25, ha=&amp;quot;right&amp;quot;, fontsize=10)
ax.set_ylabel(&amp;quot;Standard Error of X1&amp;quot;, fontsize=13)
ax.set_title(&amp;quot;Standard Errors Under Different Assumptions&amp;quot;, fontsize=14, fontweight=&amp;quot;bold&amp;quot;)
for i, v in enumerate(se_vals):
ax.text(i, v + 0.002, f&amp;quot;{v:.4f}&amp;quot;, ha=&amp;quot;center&amp;quot;, fontsize=10, fontweight=&amp;quot;bold&amp;quot;)
plt.savefig(&amp;quot;pyfixest_se_comparison.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_se_comparison.png" alt="Bar chart showing standard errors increasing from iid (0.0858) to CRV3 (0.1247), illustrating how clustering assumptions inflate uncertainty.">&lt;/p>
&lt;p>The bar chart makes the progression vivid: moving from iid to cluster-robust standard errors increases uncertainty by nearly 50%. The iid and HC1 estimates are similar because heteroskedasticity is not a major concern here. The real jump occurs when we account for within-group correlation (CRV1), and the CRV3 bias-corrected estimator is the most conservative. For applied work with grouped data, defaulting to cluster-robust errors is the safest choice &amp;mdash; underestimating standard errors leads to falsely significant results.&lt;/p>
&lt;h2 id="9-two-way-fixed-effects">9. Two-way fixed effects&lt;/h2>
&lt;p>When data has two grouping dimensions &amp;mdash; for example, firms and years, or workers and occupations &amp;mdash; two-way fixed effects absorb unobserved heterogeneity along both dimensions. In PyFixest, we simply list both FE variables after the pipe: &lt;code>Y ~ X1 + X2 | f1 + f2&lt;/code>. This absorbs all factors that are constant within each level of &lt;code>f1&lt;/code> and each level of &lt;code>f2&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-python">fit_twoway = pf.feols(&amp;quot;Y ~ X1 + X2 | f1 + f2&amp;quot;, data=data, vcov=&amp;quot;HC1&amp;quot;)
print(fit_twoway.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: Y, Fixed effects: f1+f2
Inference: HC1
Observations: 997
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1 | -0.924 | 0.056 | -16.375 | 0.000 | -1.035 | -0.813 |
| X2 | -0.174 | 0.015 | -11.246 | 0.000 | -0.204 | -0.144 |
---
RMSE: 1.346 R2: 0.659 R2 Within: 0.303
&lt;/code>&lt;/pre>
&lt;p>Adding both &lt;code>f1&lt;/code> and &lt;code>f2&lt;/code> as fixed effects plus the additional covariate &lt;code>X2&lt;/code> yields an R-squared of 0.659 and a within R-squared of 0.303. The coefficient on &lt;code>X1&lt;/code> is -0.924 (SE = 0.056) and &lt;code>X2&lt;/code> is -0.174 (SE = 0.015), both highly significant. The within R-squared of 0.303 means that &lt;code>X1&lt;/code> and &lt;code>X2&lt;/code> together explain about 30% of the variation in &lt;code>Y&lt;/code> after absorbing both dimensions of fixed effects &amp;mdash; a substantial improvement over the 20% with &lt;code>X1&lt;/code> alone in the previous section.&lt;/p>
&lt;h2 id="10-instrumental-variables-with-fixed-effects">10. Instrumental variables with fixed effects&lt;/h2>
&lt;p>Sometimes the explanatory variable itself is &lt;em>endogenous&lt;/em> &amp;mdash; correlated with the error term due to measurement error, simultaneity, or omitted variables that fixed effects do not capture. Instrumental variables (IV) estimation addresses this by using external variables (instruments) that affect the outcome only through the endogenous variable. Think of instruments as a natural experiment embedded in the data: &lt;code>Z&lt;/code> affects &lt;code>X&lt;/code> but has no direct path to &lt;code>Y&lt;/code>, so any association between &lt;code>Z&lt;/code> and &lt;code>Y&lt;/code> must flow through &lt;code>X&lt;/code>. In PyFixest, the IV syntax uses a second pipe: &lt;code>Y2 ~ 1 | f1 + f2 | X1 ~ Z1 + Z2&lt;/code>. This reads: outcome &lt;code>Y2&lt;/code>, no exogenous controls (just the intercept &lt;code>1&lt;/code>), fixed effects &lt;code>f1 + f2&lt;/code>, and endogenous variable &lt;code>X1&lt;/code> instrumented by &lt;code>Z1&lt;/code> and &lt;code>Z2&lt;/code>.&lt;/p>
&lt;p>The IV estimator recovers the coefficient on &lt;code>X1&lt;/code> by first predicting &lt;code>X1&lt;/code> using the instruments, then using these predictions in the second-stage regression:&lt;/p>
&lt;p>$$\text{First stage: } X_1 = \pi_0 + \pi_1 Z_1 + \pi_2 Z_2 + \alpha_i + \gamma_t + \nu$$&lt;/p>
&lt;p>$$\text{Second stage: } Y_2 = \beta X_1^{predicted} + \alpha_i + \gamma_t + \epsilon$$&lt;/p>
&lt;p>In words, the first stage isolates the variation in &lt;code>X1&lt;/code> that is driven by the instruments &lt;code>Z1&lt;/code> and &lt;code>Z2&lt;/code>, stripping away the endogenous component. The second stage then uses only this &amp;ldquo;clean&amp;rdquo; variation to estimate the effect of &lt;code>X1&lt;/code> on &lt;code>Y2&lt;/code>. Here, $\alpha_i$ corresponds to the &lt;code>f1&lt;/code> fixed effects, $\gamma_t$ corresponds to the &lt;code>f2&lt;/code> fixed effects, and $\beta$ is the causal parameter of interest that we recover from the &lt;code>X1&lt;/code> coefficient in PyFixest&amp;rsquo;s output.&lt;/p>
&lt;pre>&lt;code class="language-python">fit_iv = pf.feols(&amp;quot;Y2 ~ 1 | f1 + f2 | X1 ~ Z1 + Z2&amp;quot;, data=data)
print(fit_iv.summary())
print(f&amp;quot;\nFirst-stage F-statistic: {fit_iv._f_stat_1st_stage:.2f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: IV
Dep. var.: Y2, Fixed effects: f1+f2
Inference: iid
Observations: 998
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1 | -1.600 | 0.336 | -4.768 | 0.000 | -2.259 | -0.942 |
---
First-stage F-statistic: 311.54
&lt;/code>&lt;/pre>
&lt;p>The IV estimate of &lt;code>X1&lt;/code> is -1.600 (SE = 0.336), substantially larger in magnitude than the OLS estimate of approximately -1.0. This divergence suggests that the OLS coefficient on &lt;code>X1&lt;/code> is attenuated &amp;mdash; a classic sign of measurement error or endogeneity that biases OLS toward zero. The first-stage F-statistic of 311.54 is well above the conventional threshold of 10, indicating that &lt;code>Z1&lt;/code> and &lt;code>Z2&lt;/code> are strong instruments. Strong instruments mean the IV estimate is reliable; with weak instruments, IV can perform worse than OLS. Note that with heterogeneous treatment effects, IV identifies the &lt;em>Local Average Treatment Effect&lt;/em> (LATE) &amp;mdash; the effect for units whose treatment status is shifted by the instruments &amp;mdash; rather than the Average Treatment Effect (ATE) for the entire population.&lt;/p>
&lt;h2 id="11-panel-data-application-wage-determinants">11. Panel data application: wage determinants&lt;/h2>
&lt;h3 id="111-the-wage-panel-variables-and-structure">11.1 The wage panel: variables and structure&lt;/h3>
&lt;p>To see fixed effects in action with real data, we analyze the Vella and Verbeek (1998) panel of 545 young men observed over 8 years (1980&amp;ndash;1987) from the National Longitudinal Survey of Youth (NLSY). This dataset, used in many econometrics textbooks, is ideal for studying wage determinants because it tracks the same workers as they enter the labor market, gain experience, change jobs, and make decisions about union membership and marriage. The key challenge is that unobserved individual ability differs across workers and correlates with both wages and these covariates &amp;mdash; a classic case for one-way fixed effects.&lt;/p>
&lt;pre>&lt;code class="language-python">url = &amp;quot;https://raw.githubusercontent.com/bashtage/linearmodels/main/linearmodels/datasets/wage_panel/wage_panel.csv.bz2&amp;quot;
wage_df = pd.read_csv(url, compression=&amp;quot;bz2&amp;quot;)
print(f&amp;quot;Wage panel shape: {wage_df.shape}&amp;quot;)
print(wage_df.describe().round(3))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Wage panel shape: (4360, 12)
nr year black exper hisp ... educ union lwage expersq occupation
count 4360.000 4360.000 4360.000 4360.000 4360.000 ... 4360.000 4360.000 4360.000 4360.000 4360.000
mean 5262.059 1983.500 0.116 6.500 0.161 ... 11.768 0.244 1.649 50.425 4.989
std 3496.150 2.292 0.320 2.292 0.367 ... 1.353 0.430 0.533 40.782 2.320
min 13.000 1980.000 0.000 1.000 0.000 ... 3.000 0.000 -3.579 1.000 1.000
25% 2329.000 1981.750 0.000 4.750 0.000 ... 11.000 0.000 1.351 16.000 4.000
50% 4569.000 1983.500 0.000 6.500 0.000 ... 12.000 0.000 1.671 36.000 5.000
75% 8406.000 1985.250 0.000 8.250 0.000 ... 12.000 0.000 1.991 81.000 6.000
max 12548.000 1987.000 1.000 12.000 1.000 ... 16.000 1.000 4.052 324.000 9.000
&lt;/code>&lt;/pre>
&lt;p>The panel contains 4,360 observations (545 individuals over 8 years) with 12 variables. Before running any model, it is important to understand how each variable is defined and measured.&lt;/p>
&lt;p>&lt;strong>Outcome variable:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;code>lwage&lt;/code> &amp;mdash; the natural logarithm of hourly wage. The log transformation means that coefficients are interpreted as approximate percentage changes. The mean of 1.649 corresponds to about \$5.20 per hour in 1980s dollars ($e^{1.649} \approx 5.20$). The standard deviation of 0.533 indicates substantial wage dispersion: the gap between a worker at the 25th percentile (\$3.86/hr) and the 75th percentile (\$7.32/hr) is roughly a doubling of wages.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Time-varying covariates&lt;/strong> (change within a worker over time):&lt;/p>
&lt;ul>
&lt;li>&lt;code>hours&lt;/code> &amp;mdash; annual hours worked. Mean of 2,191 (roughly 42 hours per week for 52 weeks). Ranges from 120 to 4,992, capturing both part-time spells and heavy overtime. We include hours to control for labor supply differences that affect hourly wage calculations.&lt;/li>
&lt;li>&lt;code>union&lt;/code> &amp;mdash; binary indicator (1 = covered by a union contract in the current year, 0 = not covered). About 24.4% of person-year observations are union-covered. Workers can move in and out of union jobs across years, and this within-worker variation in union status is what one-way FE use to identify the union wage premium.&lt;/li>
&lt;li>&lt;code>married&lt;/code> &amp;mdash; binary indicator (1 = currently married, 0 = not married). About 43.9% of observations are married. Since these are young men tracked from their early twenties, many transition from single to married during the panel, providing within-worker variation.&lt;/li>
&lt;li>&lt;code>exper&lt;/code> &amp;mdash; years of potential labor market experience, defined as age minus years of education minus 6. Ranges from 1 to 12 years. In this balanced panel where every worker is observed in every year, experience increases by exactly 1 each year, making it perfectly collinear with entity + year fixed effects. We therefore use &lt;code>expersq&lt;/code> instead in FE models.&lt;/li>
&lt;li>&lt;code>expersq&lt;/code> &amp;mdash; experience squared ($exper^2$). Captures the well-documented concavity in the experience&amp;ndash;earnings profile: wages rise with experience but at a diminishing rate. Unlike &lt;code>exper&lt;/code>, the squared term is a nonlinear function of time, so it is not collinear with entity + year FE and can be estimated.&lt;/li>
&lt;li>&lt;code>occupation&lt;/code> &amp;mdash; occupational category, coded 1 through 9 (9 distinct categories). Workers can and do switch occupations across years. This variable can be used as an additional fixed effect dimension.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Time-invariant covariates&lt;/strong> (fixed for each worker across all years):&lt;/p>
&lt;ul>
&lt;li>&lt;code>educ&lt;/code> &amp;mdash; years of completed schooling at the start of the panel. Mean of 11.77 years (just below a high school diploma), ranging from 3 to 16 years. Because the sample tracks young men who have already finished their schooling, education does not change over time. The median of 12 years (exactly a high school diploma) and the 75th percentile of 12 years indicate that most workers in this sample have a high school education, with a smaller group holding college degrees.&lt;/li>
&lt;li>&lt;code>black&lt;/code> &amp;mdash; binary indicator (1 = Black, 0 = non-Black). About 11.6% of workers are Black. Because race does not change over time, one-way FE absorb any wage differences associated with being Black.&lt;/li>
&lt;li>&lt;code>hisp&lt;/code> &amp;mdash; binary indicator (1 = Hispanic, 0 = non-Hispanic). About 16.1% of workers are Hispanic. Like &lt;code>black&lt;/code>, this is absorbed by one-way FE.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Panel identifiers:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;code>nr&lt;/code> &amp;mdash; unique worker identifier (545 distinct workers). This defines the entity dimension for fixed effects.&lt;/li>
&lt;li>&lt;code>year&lt;/code> &amp;mdash; calendar year, taking values 1980 through 1987. The panel is balanced: every worker appears in every year, giving exactly $545 \times 8 = 4,360$ observations.&lt;/li>
&lt;/ul>
&lt;p>The distinction between time-varying and time-invariant variables is the most consequential feature of this dataset for fixed effects analysis. Time-invariant variables will be perfectly collinear with entity dummies and cannot be estimated under one-way FE. Time-varying variables survive the within transformation and their effects can be identified. We verify this classification empirically:&lt;/p>
&lt;pre>&lt;code class="language-python">invariance = wage_df.groupby(&amp;quot;nr&amp;quot;)[[&amp;quot;educ&amp;quot;, &amp;quot;black&amp;quot;, &amp;quot;hisp&amp;quot;]].nunique()
print(f&amp;quot;Max unique values per worker:&amp;quot;)
print(invariance.max())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Max unique values per worker:
educ 1
black 1
hisp 1
dtype: int64
&lt;/code>&lt;/pre>
&lt;p>Each worker has exactly one value of education, race, and ethnicity across all eight years &amp;mdash; confirming these are truly time-invariant. By contrast, occupation is time-varying:&lt;/p>
&lt;pre>&lt;code class="language-python">occ_changes = wage_df.groupby(&amp;quot;nr&amp;quot;)[&amp;quot;occupation&amp;quot;].nunique()
print(f&amp;quot;Workers who change occupation: {(occ_changes &amp;gt; 1).sum()} / {len(occ_changes)}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Workers who change occupation: 484 / 545
&lt;/code>&lt;/pre>
&lt;p>Nearly 89% of workers switch occupations at least once during the panel. This high rate of switching makes occupation a valid candidate for a fixed effect dimension of its own (Section 11.5). By contrast, a variable like education, which never changes within a worker, would produce a column of zeros after demeaning and must be dropped &amp;mdash; a point we return to in Sections 11.3 and 11.4.&lt;/p>
&lt;h3 id="112-within-vs-between-variation">11.2 Within vs between variation&lt;/h3>
&lt;p>Before estimating any model, it helps to decompose the variation in each variable into &lt;em>between-worker&lt;/em> variation (permanent differences across workers) and &lt;em>within-worker&lt;/em> variation (changes over a worker&amp;rsquo;s career). This decomposition foreshadows what one-way fixed effects can and cannot estimate.&lt;/p>
&lt;pre>&lt;code class="language-python">cols = [&amp;quot;lwage&amp;quot;, &amp;quot;hours&amp;quot;, &amp;quot;union&amp;quot;, &amp;quot;married&amp;quot;, &amp;quot;expersq&amp;quot;, &amp;quot;educ&amp;quot;]
between = wage_df.groupby(&amp;quot;nr&amp;quot;)[cols].mean().std()
for col in cols:
wage_df[f&amp;quot;{col}_within&amp;quot;] = wage_df[col] - wage_df.groupby(&amp;quot;nr&amp;quot;)[col].transform(&amp;quot;mean&amp;quot;)
within = wage_df[[f&amp;quot;{c}_within&amp;quot; for c in cols]].std()
variation = pd.DataFrame({&amp;quot;Between&amp;quot;: between, &amp;quot;Within&amp;quot;: within}).round(4)
print(variation)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Between Within
lwage 0.3907 0.3623
hours 381.7831 418.6057
union 0.3294 0.2760
married 0.3766 0.3236
expersq 26.3513 31.1431
educ 1.7476 0.0000
&lt;/code>&lt;/pre>
&lt;p>The raw standard deviations differ wildly across variables (hours is in the hundreds, union is a fraction), so we normalize by computing each variable&amp;rsquo;s &lt;em>within share&lt;/em> &amp;mdash; the fraction of total variation that comes from within-worker changes over time. This puts all variables on the same 0&amp;ndash;100% scale:&lt;/p>
&lt;pre>&lt;code class="language-python">total = np.sqrt(between**2 + within**2)
within_share = (within / total).fillna(0) # educ: 0/0 → 0
between_share = 1 - within_share
fig, ax = plt.subplots(figsize=(10, 5))
y_pos = np.arange(len(cols))
bar_height = 0.55
# Stacked horizontal bars: between (left) + within (right) = 100%
ax.barh(y_pos, between_share.values, bar_height,
label=&amp;quot;Between (cross-worker)&amp;quot;, color=STEEL_BLUE, edgecolor=DARK_NAVY)
ax.barh(y_pos, within_share.values, bar_height, left=between_share.values,
label=&amp;quot;Within (over career)&amp;quot;, color=WARM_ORANGE, edgecolor=DARK_NAVY)
plt.savefig(&amp;quot;pyfixest_within_between.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_within_between.png" alt="Stacked horizontal bar chart showing the within vs between share of total variation for key wage panel variables, with education at 100% between variation.">&lt;/p>
&lt;p>The decomposition reveals a critical pattern. Education is 100% between-worker variation &amp;mdash; its within share is exactly 0% &amp;mdash; because no worker changes their education level during the panel. This means one-way FE literally cannot estimate education&amp;rsquo;s effect: the demeaned education column is all zeros. Log wages have a 68% within share and 32% between share, meaning most wage variation comes from changes over a worker&amp;rsquo;s career rather than permanent differences across workers. Variables with substantial within shares &amp;mdash; union (64%), married (65%), hours (74%), expersq (76%) &amp;mdash; can be estimated under one-way FE because they change over a worker&amp;rsquo;s career. The higher the within share, the more statistical power one-way FE retains for that variable.&lt;/p>
&lt;h3 id="113-the-mincer-equation-and-its-panel-extensions">11.3 The Mincer equation and its panel extensions&lt;/h3>
&lt;p>Before estimating any models, it helps to lay out the econometric framework that organizes all subsequent specifications. The &lt;strong>classic Mincer equation&lt;/strong> (Mincer, 1974) is the workhorse model of labor economics:&lt;/p>
&lt;p>$$\ln(wage_i) = \beta_0 + \beta_1 educ_i + \beta_2 exper_i + \beta_3 exper_i^2 + \epsilon_i$$&lt;/p>
&lt;p>This log-linear specification models wages as a function of years of schooling and experience, with experience entering quadratically to capture concave returns &amp;mdash; each additional year of experience raises wages, but by a diminishing amount. It is a cross-sectional model, estimating the average relationship across all workers at a single point in time.&lt;/p>
&lt;p>The &lt;strong>extended Mincer equation&lt;/strong> adds controls for union membership, marital status, hours worked, and demographic characteristics:&lt;/p>
&lt;p>$$\ln(wage_{it}) = \beta_0 + \beta_1 educ_i + \beta_2 expersq_{it} + \beta_3 union_{it} + \beta_4 married_{it} + \beta_5 hours_{it} + \beta_6 black_i + \beta_7 hisp_i + \epsilon_{it}$$&lt;/p>
&lt;p>The &lt;strong>panel FE extension&lt;/strong> replaces explicit controls for time-invariant characteristics with entity and time fixed effects:&lt;/p>
&lt;p>$$\ln(wage_{it}) = \beta X_{it} + \gamma Z_i + \alpha_i + \delta_t + \epsilon_{it}$$&lt;/p>
&lt;p>where $X_{it}$ denotes time-varying covariates (union, married, hours, experience), $Z_i$ denotes time-invariant characteristics (education, race), $\alpha_i$ captures one-way fixed effects (one intercept per worker), and $\delta_t$ captures year fixed effects. The key insight: when we include $\alpha_i$, the time-invariant variables $Z_i$ become perfectly collinear with the entity dummies and are absorbed. We gain protection against omitted variable bias from all unobserved time-invariant confounders, but we lose the ability to estimate $\gamma$.&lt;/p>
&lt;p>The &lt;strong>CRE/Mundlak extension&lt;/strong> &amp;mdash; the Mundlak (1978) device &amp;mdash; offers a way to recover $\gamma$:&lt;/p>
&lt;p>$$\ln(wage_{it}) = \beta X_{it} + \gamma Z_i + \pi \bar{X}_i + \epsilon_{it}$$&lt;/p>
&lt;p>where $\bar{X}_i$ are individual means of the time-varying variables. This replaces entity dummies with individual means, which model the correlation between unobserved heterogeneity and the covariates. The result: $\hat{\beta} \approx \hat{\beta}_{FE}$ for the time-varying variables, while $\gamma$ is now estimable because we no longer include entity dummies that absorb it.&lt;/p>
&lt;p>Sections 11.4&amp;ndash;11.7 estimate these models progressively: pooled OLS and one-way FE (11.4), two-way and three-way FE (11.5), group-specific time trends (11.6), and CRE/Mundlak (11.7).&lt;/p>
&lt;h3 id="114-from-pooled-ols-to-one-way-fe-the-education-tradeoff">11.4 From pooled OLS to one-way FE: the education tradeoff&lt;/h3>
&lt;p>We begin with the extended Mincer equation estimated by pooled OLS, which includes both time-varying and time-invariant variables:&lt;/p>
&lt;pre>&lt;code class="language-python">fit_pooled = pf.feols(
&amp;quot;lwage ~ educ + expersq + union + married + hours + black + hisp&amp;quot;,
data=wage_df, vcov=&amp;quot;HC1&amp;quot;
)
print(fit_pooled.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: lwage, Fixed effects: 0
Inference: HC1
Observations: 4360
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 0.265 | 0.069 | 3.823 | 0.000 | 0.129 | 0.402 |
| educ | 0.106 | 0.005 | 22.924 | 0.000 | 0.097 | 0.115 |
| expersq | 0.003 | 0.000 | 16.930 | 0.000 | 0.003 | 0.004 |
| union | 0.183 | 0.016 | 11.205 | 0.000 | 0.151 | 0.215 |
| married | 0.141 | 0.015 | 9.308 | 0.000 | 0.111 | 0.171 |
| hours | -0.000 | 0.000 | -3.139 | 0.002 | -0.000 | -0.000 |
| black | -0.135 | 0.024 | -5.549 | 0.000 | -0.182 | -0.087 |
| hisp | 0.013 | 0.020 | 0.670 | 0.503 | -0.025 | 0.052 |
---
RMSE: 0.484 R2: 0.175
&lt;/code>&lt;/pre>
&lt;p>Pooled OLS estimates a 10.6% return to each year of education, an 18.3% union premium, and a 14.1% marriage premium. Black workers earn about 13.5% less, while the Hispanic coefficient is small and insignificant. The R-squared is 0.175 &amp;mdash; these variables explain less than a fifth of wage variation.&lt;/p>
&lt;p>Now we estimate the one-way FE model, which absorbs all time-invariant worker characteristics:&lt;/p>
&lt;pre>&lt;code class="language-python">fit_entity = pf.feols(&amp;quot;lwage ~ expersq + union + married + hours | nr&amp;quot;,
data=wage_df, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;nr&amp;quot;})
print(fit_entity.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: lwage, Fixed effects: nr
Inference: CRV1
Observations: 4360
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| expersq | 0.004 | 0.000 | 16.537 | 0.000 | 0.003 | 0.004 |
| union | 0.078 | 0.024 | 3.319 | 0.001 | 0.032 | 0.125 |
| married | 0.115 | 0.022 | 5.217 | 0.000 | 0.071 | 0.158 |
| hours | -0.000 | 0.000 | -3.807 | 0.000 | -0.000 | -0.000 |
---
RMSE: 0.335 R2: 0.605 R2 Within: 0.145
&lt;/code>&lt;/pre>
&lt;p>One-way fixed effects dramatically improve model fit: R-squared jumps from 0.175 (pooled OLS) to 0.605, meaning worker-level heterogeneity accounts for over 40 percentage points of explained variation. The union premium drops from 18.3% to 7.8% (SE = 0.024) &amp;mdash; more than half the pooled estimate was driven by selection (workers who join unions differ systematically from those who do not). The marriage premium falls from 14.1% to 11.5% (SE = 0.022), a smaller reduction suggesting that marital status is less confounded by unobserved ability. The &lt;code>expersq&lt;/code> coefficient of 0.004 captures the concavity of the experience&amp;ndash;earnings profile within workers over time. Notice that &lt;code>educ&lt;/code>, &lt;code>black&lt;/code>, and &lt;code>hisp&lt;/code> are absent: these time-invariant variables are perfectly collinear with the 545 worker dummies and cannot be estimated under one-way FE.&lt;/p>
&lt;p>To see what happens when we try to include a time-invariant variable alongside one-way FE:&lt;/p>
&lt;pre>&lt;code class="language-python">import warnings
with warnings.catch_warnings(record=True) as w:
warnings.simplefilter(&amp;quot;always&amp;quot;)
fit_educ = pf.feols(&amp;quot;lwage ~ expersq + union + married + educ | nr&amp;quot;,
data=wage_df, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;nr&amp;quot;})
print(f&amp;quot;Coefficients estimated: {list(fit_educ.coef().index)}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Coefficients estimated: ['expersq', 'union', 'married']
&lt;/code>&lt;/pre>
&lt;p>Education is silently dropped. This is not a bug &amp;mdash; it is a fundamental consequence of the within transformation (Section 6):&lt;/p>
&lt;p>$$\ddot{educ}_{it} = educ_i - \bar{educ}_i = 0 \quad \text{for all } t$$&lt;/p>
&lt;p>Because a worker&amp;rsquo;s education does not change over the eight years of the panel, the demeaned value is exactly zero for every observation. A column of zeros is perfectly collinear with the entity dummies, so it must be dropped. The same applies to &lt;code>black&lt;/code> and &lt;code>hisp&lt;/code>.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>Pooled OLS&lt;/th>
&lt;th>One-Way FE&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>educ&lt;/td>
&lt;td>0.106&lt;/td>
&lt;td>dropped&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>expersq&lt;/td>
&lt;td>0.003&lt;/td>
&lt;td>0.004&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>union&lt;/td>
&lt;td>0.183&lt;/td>
&lt;td>0.078&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>married&lt;/td>
&lt;td>0.141&lt;/td>
&lt;td>0.115&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>hours&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>black&lt;/td>
&lt;td>-0.135&lt;/td>
&lt;td>dropped&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>hisp&lt;/td>
&lt;td>0.013&lt;/td>
&lt;td>dropped&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>R-squared&lt;/td>
&lt;td>0.175&lt;/td>
&lt;td>0.605&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>This table crystallizes the fundamental tradeoff. Pooled OLS estimates everything &amp;mdash; education, race, union, marriage &amp;mdash; but its estimates are biased by unobserved ability. One-Way FE eliminates the ability bias, and the union premium drops from 18.3% to 7.8%, revealing that more than half the raw association was selection. But the price is steep: education, Black, and Hispanic are all absorbed into the individual intercepts. We cannot estimate the return to schooling or the racial wage gap under one-way FE. Sections 11.5&amp;ndash;11.6 push further with additional FE dimensions, and Section 11.7 shows how CRE partially resolves this tradeoff.&lt;/p>
&lt;h3 id="115-two-way-and-three-way-fixed-effects">11.5 Two-way and three-way fixed effects&lt;/h3>
&lt;p>Adding year fixed effects to one-way FE creates a two-way FE (TWFE) model that absorbs both individual heterogeneity and common time trends:&lt;/p>
&lt;pre>&lt;code class="language-python">fit_panel = pf.feols(&amp;quot;lwage ~ expersq + union + married + hours | nr + year&amp;quot;,
data=wage_df, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;nr + year&amp;quot;})
&lt;/code>&lt;/pre>
&lt;p>We can go further by adding occupation as a third fixed effect dimension. As we saw in Section 11.1, nearly 89% of workers switch occupations during the panel, so occupation is a valid time-varying dimension:&lt;/p>
&lt;pre>&lt;code class="language-python">fit_threeway = pf.feols(
&amp;quot;lwage ~ expersq + union + married + hours | nr + year + C(occupation)&amp;quot;,
data=wage_df, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;nr&amp;quot;}
)
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>Pooled OLS&lt;/th>
&lt;th>One-Way FE&lt;/th>
&lt;th>Two-Way FE&lt;/th>
&lt;th>Three-Way FE&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>expersq&lt;/td>
&lt;td>0.003&lt;/td>
&lt;td>0.004&lt;/td>
&lt;td>-0.006&lt;/td>
&lt;td>-0.006&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>union&lt;/td>
&lt;td>0.183&lt;/td>
&lt;td>0.078&lt;/td>
&lt;td>0.073&lt;/td>
&lt;td>0.075&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>married&lt;/td>
&lt;td>0.141&lt;/td>
&lt;td>0.115&lt;/td>
&lt;td>0.048&lt;/td>
&lt;td>0.047&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>hours&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>R-squared&lt;/td>
&lt;td>0.175&lt;/td>
&lt;td>0.605&lt;/td>
&lt;td>0.631&lt;/td>
&lt;td>0.632&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;pre>&lt;code class="language-python">fig, axes = plt.subplots(2, 2, figsize=(12, 8))
panel_models = {&amp;quot;Pooled OLS&amp;quot;: fit_pooled, &amp;quot;One-Way FE&amp;quot;: fit_entity,
&amp;quot;Two-Way FE&amp;quot;: fit_panel, &amp;quot;Three-Way FE&amp;quot;: fit_threeway}
panel_vars = [&amp;quot;expersq&amp;quot;, &amp;quot;union&amp;quot;, &amp;quot;married&amp;quot;, &amp;quot;hours&amp;quot;]
panel_colors = [STEEL_BLUE, WARM_ORANGE, TEAL, &amp;quot;#e8956a&amp;quot;]
for idx, var in enumerate(panel_vars):
ax = axes.flatten()[idx]
model_names_p = list(panel_models.keys())
coefs_p = [panel_models[m].coef()[var] for m in model_names_p]
ses_p = [panel_models[m].se()[var] for m in model_names_p]
ax.bar(range(4), coefs_p, yerr=[1.96 * s for s in ses_p],
color=panel_colors, edgecolor=DARK_NAVY, width=0.5, capsize=4)
ax.set_xticks(range(4))
ax.set_xticklabels(model_names_p, fontsize=8, rotation=15)
ax.set_title(var, fontsize=12, fontweight=&amp;quot;bold&amp;quot;)
ax.axhline(y=0, color=NEAR_BLACK, linewidth=0.5, linestyle=&amp;quot;--&amp;quot;, alpha=0.5)
fig.suptitle(&amp;quot;Coefficient Estimates Across FE Specifications&amp;quot;,
fontsize=14, fontweight=&amp;quot;bold&amp;quot;, y=1.02)
plt.savefig(&amp;quot;pyfixest_wage_extended.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_wage_extended.png" alt="Four-panel chart comparing coefficient estimates across pooled OLS, one-way FE, two-way FE, and three-way FE specifications.">&lt;/p>
&lt;p>The results show diminishing returns to additional FE dimensions. The big action was one-way FE: R-squared jumps from 0.175 to 0.605, and the union premium drops from 18.3% to 7.8%. Adding year effects (TWFE) pushes R-squared to 0.631 and the union premium stabilizes at 7.3%. Adding occupation as a third dimension barely moves anything &amp;mdash; R-squared rises to 0.632 and the union premium is 7.5%. The &lt;code>expersq&lt;/code> coefficient flips sign with TWFE (-0.006) because year effects absorb common trends in experience and wages. The stability of the union and marriage coefficients across the last three specifications suggests these estimates are robust to additional controls for time trends and occupational sorting.&lt;/p>
&lt;h3 id="116-interactive-fixed-effects">11.6 Interactive fixed effects&lt;/h3>
&lt;p>Sections 11.4&amp;ndash;11.5 used &lt;em>additive&lt;/em> fixed effects (&lt;code>nr + year&lt;/code>), where every individual shares the same set of year effects. &lt;strong>Interactive&lt;/strong> (or &lt;em>interacted&lt;/em>) fixed effects generalize this by allowing one FE dimension to vary across levels of another &amp;mdash; producing group-specific intercepts for each time period. Instead of a single set of year dummies shared by all workers, we estimate separate year effects for each demographic group.&lt;/p>
&lt;p>Why does this matter? Black and non-Black workers may face different labor market trends during the 1980s. If macroeconomic shocks hit these groups differently, a common set of year effects would be misspecified. We can test this by allowing year effects to vary by race:&lt;/p>
&lt;p>$$\ln(wage_{it}) = \beta X_{it} + \alpha_i + \gamma_{t,g(i)} + \epsilon_{it}$$&lt;/p>
&lt;p>where $g(i) \in \{Black, non\text{-}Black\}$, so we estimate separate year effects for each racial group.&lt;/p>
&lt;p>Pyfixest implements interactive FE with the &lt;strong>caret operator&lt;/strong> (&lt;code>^&lt;/code>): the syntax &lt;code>year^black&lt;/code> in the fixed-effects slot creates a separate year dummy for each value of &lt;code>black&lt;/code>. This mirrors R&amp;rsquo;s fixest package. The equivalent manual approach is to concatenate the columns (&lt;code>wage_df[&amp;quot;year_black&amp;quot;] = wage_df[&amp;quot;year&amp;quot;].astype(str) + &amp;quot;_&amp;quot; + wage_df[&amp;quot;black&amp;quot;].astype(str)&lt;/code>) and absorb the resulting string variable, but the caret operator is preferred because it keeps the interaction structure visible in the formula.&lt;/p>
&lt;pre>&lt;code class="language-python"># Pyfixest caret operator for interacted fixed effects
fit_gtrends = pf.feols(&amp;quot;lwage ~ expersq + union + married + hours | nr + year^black&amp;quot;,
data=wage_df, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;nr&amp;quot;})
print(fit_gtrends.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: lwage, Fixed effects: nr+year^black
Inference: CRV1
Observations: 4360
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|----------: |------------: |--------: |---------: |-----: |------: |
| expersq | -0.006 | 0.001 | -5.878 | 0.000 | -0.008 | -0.004 |
| union | 0.074 | 0.024 | 3.129 | 0.002 | 0.028 | 0.121 |
| married | 0.045 | 0.020 | 2.262 | 0.024 | 0.006 | 0.084 |
| hours | -0.000 | 0.000 | -0.393 | 0.694 | -0.001 | 0.001 |
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>Two-Way FE (additive)&lt;/th>
&lt;th>Interactive FE (year × race)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>expersq&lt;/td>
&lt;td>-0.006&lt;/td>
&lt;td>-0.006&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>union&lt;/td>
&lt;td>0.073&lt;/td>
&lt;td>0.074&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>married&lt;/td>
&lt;td>0.048&lt;/td>
&lt;td>0.045&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>hours&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(9, 5))
vars_plot = [&amp;quot;expersq&amp;quot;, &amp;quot;union&amp;quot;, &amp;quot;married&amp;quot;, &amp;quot;hours&amp;quot;]
x = np.arange(len(vars_plot))
width = 0.35
twfe_coefs = [fit_panel.coef()[v] for v in vars_plot]
gtrend_coefs = [fit_gtrends.coef()[v] for v in vars_plot]
ax.bar(x - width/2, twfe_coefs, width, label=&amp;quot;Two-Way FE&amp;quot;, color=STEEL_BLUE, edgecolor=DARK_NAVY)
ax.bar(x + width/2, gtrend_coefs, width, label=&amp;quot;Interactive FE&amp;quot;, color=WARM_ORANGE, edgecolor=DARK_NAVY)
ax.set_xticks(x)
ax.set_xticklabels(vars_plot, fontsize=11)
ax.set_ylabel(&amp;quot;Coefficient Estimate&amp;quot;, fontsize=13)
ax.set_title(&amp;quot;Additive vs Interactive Fixed Effects&amp;quot;, fontsize=14, fontweight=&amp;quot;bold&amp;quot;)
ax.legend(fontsize=11)
ax.axhline(y=0, color=NEAR_BLACK, linewidth=0.5, linestyle=&amp;quot;--&amp;quot;, alpha=0.5)
plt.savefig(&amp;quot;pyfixest_group_trends.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_group_trends.png" alt="Side-by-side bar chart comparing additive TWFE and interactive fixed effect coefficient estimates.">&lt;/p>
&lt;p>The coefficients are nearly identical under both specifications. Moving from additive to interactive fixed effects barely changes the estimated returns to union membership (7.3% → 7.4%), marriage (4.8% → 4.5%), or experience. This stability indicates that year effects are similar across racial groups &amp;mdash; the additive TWFE specification is not misspecified by imposing common year effects. The interactive model uses 545 one-way FE plus 16 group-year FE (8 years × 2 groups) = 561 FE parameters to explain 4,360 observations &amp;mdash; well short of saturation. Had the coefficients shifted substantially, that would have signaled that Black and non-Black workers face sufficiently different macro trends to warrant group-specific year effects, and that the standard additive TWFE was masking this heterogeneity.&lt;/p>
&lt;h3 id="117-recovering-time-invariant-effects-the-cremundlak-approach">11.7 Recovering time-invariant effects: the CRE/Mundlak approach&lt;/h3>
&lt;p>Sections 11.4&amp;ndash;11.6 revealed a fundamental tradeoff in panel econometrics. One-way FE eliminate omitted variable bias from all unobserved time-invariant confounders &amp;mdash; a powerful guarantee &amp;mdash; but they absorb education, race, and ethnicity in the process. Pooled OLS estimates coefficients for everything, but those estimates are biased whenever unobserved worker traits correlate with the covariates. We want the best of both worlds: the bias protection of FE with the ability to estimate time-invariant effects.&lt;/p>
&lt;p>Imagine you could describe each worker&amp;rsquo;s &amp;ldquo;type&amp;rdquo; not with a unique ID but with a summary of their career trajectory &amp;mdash; their average union participation rate, average hours worked, average marital status, and so on. Two workers with similar career averages are arguably similar in unobserved ways too: a worker who spends 80% of their career in a union likely differs systematically from one who never joins. The &lt;strong>Correlated Random Effects&lt;/strong> (CRE) model &amp;mdash; also called the &lt;strong>Mundlak (1978) device&lt;/strong> &amp;mdash; operationalizes this intuition by replacing the 545 entity dummies with a handful of individual-mean variables that capture the same correlation structure.&lt;/p>
&lt;p>&lt;strong>The CRE equation.&lt;/strong> Recall from Section 11.3 that the CRE equation replaces entity dummies $\alpha_i$ with individual means $\bar{X}_i$ of the time-varying variables:&lt;/p>
&lt;p>$$\ln(wage_{it}) = \beta X_{it} + \gamma Z_i + \pi \bar{X}_i + \epsilon_{it}$$&lt;/p>
&lt;p>In words, this equation says that a worker&amp;rsquo;s log wage depends on three components: (1) their current values of time-varying covariates ($X_{it}$), (2) their permanent characteristics ($Z_i$ like education and race), and (3) a set of correction terms ($\bar{X}_i$) that capture the &lt;em>average&lt;/em> level of each time-varying variable across their career. In our code, $X_{it}$ corresponds to &lt;code>expersq&lt;/code>, &lt;code>union&lt;/code>, &lt;code>married&lt;/code>, and &lt;code>hours&lt;/code> in each year; $Z_i$ corresponds to &lt;code>educ&lt;/code>, &lt;code>black&lt;/code>, and &lt;code>hisp&lt;/code>; and $\bar{X}_i$ corresponds to the &lt;code>*_mean&lt;/code> columns we compute below.&lt;/p>
&lt;p>&lt;strong>Why does including $\bar{X}_i$ work?&lt;/strong> The individual means proxy for the unobserved individual effect $\alpha_i$. Consider union membership: if workers who join unions more often (high $\overline{union}_i$) also have higher unobserved ability or motivation, then $\overline{union}_i$ captures that correlation. Once we control for it, the remaining within-person variation in union status is &amp;ldquo;clean&amp;rdquo; &amp;mdash; and the time-invariant variables are no longer collinear with entity dummies (because there are no entity dummies).&lt;/p>
&lt;p>&lt;strong>Contrast with FE.&lt;/strong> One-way FE assumes $\alpha_i$ can be &lt;em>anything&lt;/em> &amp;mdash; completely unrestricted. CRE assumes $\alpha_i = \pi \bar{X}_i + \text{error}$ &amp;mdash; the individual effect is a linear function of the career averages. This is a stronger assumption, but it buys back education and race. The payoff: $\hat{\beta}$ for time-varying variables should approximately match the one-way FE estimates (because the means absorb the same correlation), while $\gamma$ for time-invariant variables is now estimable.&lt;/p>
&lt;pre>&lt;code class="language-python">mundlak_vars = [&amp;quot;union&amp;quot;, &amp;quot;married&amp;quot;, &amp;quot;hours&amp;quot;, &amp;quot;expersq&amp;quot;]
for var in mundlak_vars:
wage_df[f&amp;quot;{var}_mean&amp;quot;] = wage_df.groupby(&amp;quot;nr&amp;quot;)[var].transform(&amp;quot;mean&amp;quot;)
fit_mundlak = pf.feols(
&amp;quot;lwage ~ expersq + union + married + hours + educ + black + hisp &amp;quot;
&amp;quot;+ expersq_mean + union_mean + married_mean + hours_mean&amp;quot;,
data=wage_df, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;nr&amp;quot;}
)
print(fit_mundlak.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: lwage, Fixed effects: 0
Inference: CRV1
Observations: 4360
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 0.276 | 0.073 | 3.798 | 0.000 | 0.133 | 0.418 |
| expersq | 0.004 | 0.000 | 13.284 | 0.000 | 0.004 | 0.005 |
| union | 0.078 | 0.019 | 4.050 | 0.000 | 0.040 | 0.116 |
| married | 0.115 | 0.017 | 6.664 | 0.000 | 0.081 | 0.149 |
| hours | -0.000 | 0.000 | -0.007 | 0.994 | -0.000 | 0.000 |
| educ | 0.094 | 0.005 | 17.295 | 0.000 | 0.083 | 0.104 |
| black | -0.140 | 0.024 | -5.930 | 0.000 | -0.187 | -0.094 |
| hisp | 0.009 | 0.019 | 0.469 | 0.639 | -0.028 | 0.045 |
| expersq_mean | -0.003 | 0.001 | -3.498 | 0.001 | -0.005 | -0.001 |
| union_mean | 0.179 | 0.037 | 4.838 | 0.000 | 0.106 | 0.251 |
| married_mean | -0.041 | 0.042 | -0.969 | 0.333 | -0.123 | 0.042 |
| hours_mean | 0.002 | 0.001 | 3.109 | 0.002 | 0.001 | 0.003 |
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>One-Way FE&lt;/th>
&lt;th>CRE&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>expersq&lt;/td>
&lt;td>0.004&lt;/td>
&lt;td>0.004&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>union&lt;/td>
&lt;td>0.078&lt;/td>
&lt;td>0.078&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>married&lt;/td>
&lt;td>0.115&lt;/td>
&lt;td>0.115&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>hours&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>educ&lt;/td>
&lt;td>dropped&lt;/td>
&lt;td>0.094&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>black&lt;/td>
&lt;td>dropped&lt;/td>
&lt;td>-0.140&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>hisp&lt;/td>
&lt;td>dropped&lt;/td>
&lt;td>0.009&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(10, 6))
compare_vars = [&amp;quot;expersq&amp;quot;, &amp;quot;union&amp;quot;, &amp;quot;married&amp;quot;, &amp;quot;hours&amp;quot;, &amp;quot;educ&amp;quot;, &amp;quot;black&amp;quot;, &amp;quot;hisp&amp;quot;]
x = np.arange(len(compare_vars))
width = 0.25
pooled_vals = [fit_pooled.coef()[v] for v in compare_vars]
entity_vals = [fit_entity.coef()[v] if v in fit_entity.coef().index else 0 for v in compare_vars]
mundlak_vals = [fit_mundlak.coef()[v] if v in fit_mundlak.coef().index else 0 for v in compare_vars]
ax.bar(x - width, pooled_vals, width, label=&amp;quot;Pooled OLS&amp;quot;, color=STEEL_BLUE, edgecolor=DARK_NAVY)
ax.bar(x, entity_vals, width, label=&amp;quot;One-Way FE&amp;quot;, color=WARM_ORANGE, edgecolor=DARK_NAVY)
ax.bar(x + width, mundlak_vals, width, label=&amp;quot;CRE&amp;quot;, color=TEAL, edgecolor=DARK_NAVY)
ax.set_xticks(x)
ax.set_xticklabels(compare_vars, fontsize=10, rotation=15)
ax.set_ylabel(&amp;quot;Coefficient Estimate&amp;quot;, fontsize=13)
ax.set_title(&amp;quot;Pooled OLS vs One-Way FE vs CRE&amp;quot;, fontsize=14, fontweight=&amp;quot;bold&amp;quot;)
ax.legend(fontsize=11)
ax.axhline(y=0, color=NEAR_BLACK, linewidth=0.5, linestyle=&amp;quot;--&amp;quot;, alpha=0.5)
plt.savefig(&amp;quot;pyfixest_mundlak.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_mundlak.png" alt="Grouped bar chart comparing Pooled OLS, One-Way FE, and CRE coefficient estimates, showing CRE recovers education while matching one-way FE on time-varying variables.">&lt;/p>
&lt;p>The CRE model bridges one-way FE and pooled OLS. For time-varying variables (union, married, hours, expersq), the CRE coefficients closely match the one-way FE estimates &amp;mdash; confirming that the individual means successfully proxy for entity dummies. For time-invariant variables, CRE recovers what one-way FE cannot: education&amp;rsquo;s coefficient is 0.094 per year of schooling (a 9.4% return), and the Black wage gap is -0.140 (14.0% lower wages). These are close to the pooled OLS estimates, but now they are estimated in a framework that controls for the correlation between unobserved heterogeneity and the covariates (via the individual means).&lt;/p>
&lt;p>The CRE correction terms ($\pi$ coefficients) are informative in their own right. The &lt;code>union_mean&lt;/code> coefficient of 0.179 is large and highly significant ($p &amp;lt; 0.001$): workers with persistently higher union participation earn substantially more &lt;em>on average&lt;/em>, even after controlling for the within-person union effect (0.078). This gap &amp;mdash; 0.179 versus 0.078 &amp;mdash; is evidence of positive selection into unions: workers who join unions more often tend to have higher unobserved ability or to work in higher-paying industries. The &lt;code>hours_mean&lt;/code> coefficient (0.002, $p = 0.002$) suggests that workers who consistently work longer hours earn more per hour on average, while &lt;code>married_mean&lt;/code> is small and insignificant, indicating that selection into marriage is not strongly associated with unobserved wage determinants once other factors are controlled.&lt;/p>
&lt;p>The caveat is that CRE relies on the assumption that unobserved heterogeneity correlates with covariates &lt;em>only through their individual means&lt;/em> &amp;mdash; a stronger assumption than one-way FE, which makes no such restriction. However, this assumption is testable. The CRE correction terms provide a built-in Hausman-type test: if $\pi = 0$ jointly (all correction terms are zero), then pooled OLS and one-way FE yield the same estimates, and the simpler random effects model is efficient. In our case, the large and significant &lt;code>union_mean&lt;/code> and &lt;code>hours_mean&lt;/code> coefficients strongly reject $\pi = 0$, confirming that unobserved heterogeneity &lt;em>does&lt;/em> correlate with the covariates and that FE or CRE is needed over pooled OLS. Exercise 6 asks you to formalize this test.&lt;/p>
&lt;h3 id="118-what-fixed-effects-absorb-vs-what-survives">11.8 What fixed effects absorb vs. what survives&lt;/h3>
&lt;p>The wage panel illustrates a general principle: one-way fixed effects absorb everything about a person that does not change over the observation window. Variables that &lt;em>do&lt;/em> change over time &amp;mdash; like union status, marital status, and occupation &amp;mdash; survive the within transformation and can be estimated. The CRE/Mundlak approach (Section 11.7) partially resolves the tradeoff by recovering time-invariant coefficients. The diagram below summarizes this partition and recovery:&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
subgraph &amp;quot;Absorbed by One-Way FE&amp;quot;
ED[&amp;quot;&amp;lt;b&amp;gt;Education&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(time-invariant)&amp;quot;]
AB[&amp;quot;&amp;lt;b&amp;gt;Ability&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(unobserved)&amp;quot;]
RC[&amp;quot;&amp;lt;b&amp;gt;Race&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(time-invariant)&amp;quot;]
end
subgraph &amp;quot;Estimated (time-varying)&amp;quot;
UN[&amp;quot;&amp;lt;b&amp;gt;Union&amp;lt;/b&amp;gt;&amp;quot;]
MA[&amp;quot;&amp;lt;b&amp;gt;Married&amp;lt;/b&amp;gt;&amp;quot;]
OC[&amp;quot;&amp;lt;b&amp;gt;Occupation&amp;lt;/b&amp;gt;&amp;quot;]
end
subgraph &amp;quot;Recovery strategies&amp;quot;
MK[&amp;quot;&amp;lt;b&amp;gt;CRE/Mundlak&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(individual means)&amp;quot;]
end
UN --&amp;gt; W[&amp;quot;&amp;lt;b&amp;gt;Log Wage&amp;lt;/b&amp;gt;&amp;quot;]
MA --&amp;gt; W
OC --&amp;gt; W
ED -.-&amp;gt; W
AB -.-&amp;gt; W
MK -.-&amp;gt;|&amp;quot;recovers γ&amp;quot;| ED
MK -.-&amp;gt;|&amp;quot;recovers γ&amp;quot;| RC
style ED fill:#d97757,stroke:#141413,color:#fff,stroke-dasharray: 5 5
style AB fill:#d97757,stroke:#141413,color:#fff,stroke-dasharray: 5 5
style RC fill:#d97757,stroke:#141413,color:#fff,stroke-dasharray: 5 5
style UN fill:#6a9bcc,stroke:#141413,color:#fff
style MA fill:#6a9bcc,stroke:#141413,color:#fff
style OC fill:#6a9bcc,stroke:#141413,color:#fff
style W fill:#00d4c8,stroke:#141413,color:#fff
style MK fill:#1a3a8a,stroke:#141413,color:#fff,stroke-dasharray: 5 5
&lt;/code>&lt;/pre>
&lt;p>The dashed arrows from the orange (absorbed) variables indicate that their effects on wages are &lt;em>real&lt;/em> but &lt;em>unestimable&lt;/em> under one-way FE &amp;mdash; they are folded into each worker&amp;rsquo;s individual intercept. The solid arrows from the blue (estimated) variables show the effects we can identify: changes in union status, marital status, and occupation that occur within a worker&amp;rsquo;s career. The dark blue CRE/Mundlak node represents the recovery strategy from Section 11.7: by substituting individual means for entity dummies, we recover the coefficients $\gamma$ for education and race while producing time-varying estimates that closely match one-way FE. This partially resolves the tradeoff from Section 11.4, though at the cost of a stronger modeling assumption.&lt;/p>
&lt;h2 id="12-event-study-difference-in-differences">12. Event study: difference-in-differences&lt;/h2>
&lt;h3 id="121-staggered-treatment-adoption">12.1 Staggered treatment adoption&lt;/h3>
&lt;p>Event studies are a popular extension of fixed effects that estimate dynamic treatment effects around the time of an intervention. In a &lt;em>staggered&lt;/em> design, different groups (states, firms, individuals) receive treatment at different times &amp;mdash; for example, states adopting a minimum wage increase in different years. The standard approach uses TWFE with relative-time indicators. However, this can produce biased estimates when treatment timing varies across groups and effects are heterogeneous. The DID2S estimator (Gardner, 2022) addresses this by separating the estimation into two stages: first estimating fixed effects from untreated observations, then recovering treatment effects from the residuals. The target estimand in this design is the &lt;em>Average Treatment Effect on the Treated&lt;/em> (ATT) &amp;mdash; the average effect for units that actually received treatment.&lt;/p>
&lt;p>PyFixest provides both approaches. We use a simulated dataset with staggered treatment adoption across states:&lt;/p>
&lt;pre>&lt;code class="language-python">df_het = pd.read_csv(
&amp;quot;https://raw.githubusercontent.com/py-econometrics/pyfixest/master/pyfixest/did/data/df_het.csv&amp;quot;
)
print(f&amp;quot;DiD dataset shape: {df_het.shape}&amp;quot;)
print(f&amp;quot;Columns: {list(df_het.columns)}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">DiD dataset shape: (46500, 14)
Columns: ['unit', 'state', 'group', 'unit_fe', 'g', 'year', 'year_fe', 'treat',
'rel_year', 'rel_year_binned', 'error', 'te', 'te_dynamic', 'dep_var']
&lt;/code>&lt;/pre>
&lt;p>The event study dataset contains 46,500 observations across units nested in states, with a binary treatment indicator and relative time variable measuring periods before and after treatment onset. The &lt;code>dep_var&lt;/code> column is the outcome we want to explain, and &lt;code>rel_year&lt;/code> measures the distance in years from each unit&amp;rsquo;s treatment date (negative values are pre-treatment). This structure is typical of policy evaluation studies where different states adopt a policy at different times.&lt;/p>
&lt;h3 id="122-year-1-as-the-universal-baseline">12.2 Year −1 as the universal baseline&lt;/h3>
&lt;p>Both estimators use &lt;code>ref=-1.0&lt;/code>, setting the last pre-treatment period as the baseline. This choice is not arbitrary &amp;mdash; it is the conventional and most informative reference point for three reasons:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Closest to treatment onset.&lt;/strong> Period −1 is the last observation before treatment begins. Using it as the baseline minimizes the extrapolation distance: we compare each period&amp;rsquo;s outcome to the most recent untreated state, rather than to some distant past.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Universal across cohorts.&lt;/strong> In staggered designs, different states adopt treatment in different calendar years. But &lt;code>rel_year = -1&lt;/code> has the same meaning for every cohort: &amp;ldquo;the last year before this group was treated.&amp;rdquo; It aligns all cohorts to a common relative-time clock, making the coefficients directly comparable.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Transparent parallel trends test.&lt;/strong> Pre-treatment coefficients (periods −20 through −2) measure deviations from the baseline. If these coefficients are near zero, the treated and control groups were on parallel trajectories &lt;em>before&lt;/em> treatment &amp;mdash; validating the key identifying assumption. Choosing −1 as the baseline makes this test as transparent as possible: any non-zero pre-treatment coefficient is a direct signal of differential pre-trends.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>How to read the event study plot.&lt;/strong> Each coefficient represents the difference in outcomes between treatment and control groups, relative to their difference at period −1. Pre-treatment coefficients near zero validate parallel trends. The coefficient at period 0 is the immediate treatment effect. Post-treatment coefficients show how the effect evolves over time. If we had chosen a different baseline (say, period −5), all coefficients would shift by a constant &amp;mdash; the &lt;em>shape&lt;/em> of the event study would be identical, but the levels would change. The convention of using −1 simply makes the plot easiest to interpret.&lt;/p>
&lt;h3 id="123-twfe-vs-did2s">12.3 TWFE vs DID2S&lt;/h3>
&lt;p>We estimate event study coefficients using both TWFE and DID2S, with period -1 (the year before treatment) as the reference category. The &lt;code>i()&lt;/code> operator in PyFixest creates indicator variables for each relative year, analogous to R&amp;rsquo;s &lt;code>i()&lt;/code> function.&lt;/p>
&lt;pre>&lt;code class="language-python"># TWFE event study
fit_twfe = pf.feols(
&amp;quot;dep_var ~ i(rel_year, ref=-1.0) | state + year&amp;quot;,
data=df_het, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;state&amp;quot;},
)
# DID2S (Gardner 2022) -- two-stage estimator
fit_did2s = pf.did2s(
df_het, yname=&amp;quot;dep_var&amp;quot;,
first_stage=&amp;quot;~ 0 | state + year&amp;quot;,
second_stage=&amp;quot;~ i(rel_year, ref=-1.0)&amp;quot;,
treatment=&amp;quot;treat&amp;quot;, cluster=&amp;quot;state&amp;quot;,
)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-python"># Extract coefficients from both estimators for plotting
import re
def parse_rel_years(coef_dict, se_dict):
years, vals, ses_list = [], [], []
for k in coef_dict.index:
match = re.search(r'\[T\.(-?\d+\.?\d*)\]', str(k))
if match:
years.append(float(match.group(1)))
vals.append(coef_dict[k])
ses_list.append(se_dict[k])
return years, vals, ses_list
twfe_years, twfe_vals, twfe_ses = parse_rel_years(fit_twfe.coef(), fit_twfe.se())
did2s_years, did2s_vals, did2s_ses = parse_rel_years(fit_did2s.coef(), fit_did2s.se())
&lt;/code>&lt;/pre>
&lt;p>PyFixest stores event study coefficients with names like &lt;code>[T.-5.0]&lt;/code>, &lt;code>[T.0.0]&lt;/code>, etc. The helper function above extracts the relative year from each coefficient name and pairs it with the estimate and standard error, giving us arrays ready for plotting.&lt;/p>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(12, 6))
offset = 0.15
ax.errorbar([y - offset for y in twfe_years], twfe_vals,
yerr=[1.96*s for s in twfe_ses],
fmt='o', color=STEEL_BLUE, capsize=3, label='TWFE')
ax.errorbar([y + offset for y in did2s_years], did2s_vals,
yerr=[1.96*s for s in did2s_ses],
fmt='s', color=WARM_ORANGE, capsize=3, label='DID2S (Gardner 2022)')
ax.axhline(y=0, color=LIGHT_TEXT, linewidth=0.8, linestyle=&amp;quot;--&amp;quot;, alpha=0.5)
ax.axvline(x=-0.5, color=LIGHT_TEXT, linewidth=1, linestyle=&amp;quot;--&amp;quot;, alpha=0.6)
ax.plot(-1, 0, 'D', color=TEAL, markersize=10, zorder=5,
label=&amp;quot;Baseline (t = −1)&amp;quot;)
ax.set_xlabel(&amp;quot;Relative Year&amp;quot;, fontsize=13)
ax.set_ylabel(&amp;quot;Coefficient Estimate&amp;quot;, fontsize=13)
ax.set_title(&amp;quot;Event Study: TWFE vs DID2S&amp;quot;, fontsize=14, fontweight=&amp;quot;bold&amp;quot;)
ax.legend(fontsize=11)
plt.savefig(&amp;quot;pyfixest_event_study.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_event_study.png" alt="Event study plot comparing TWFE and DID2S coefficient estimates across relative years, showing flat pre-trends and rising post-treatment effects.">&lt;/p>
&lt;p>Both estimators show near-zero pre-treatment coefficients (validating the parallel trends assumption) and a sharp jump at treatment onset. The immediate treatment effect at period 0 is approximately 1.3&amp;ndash;1.4, growing steadily to about 2.8 by period 20. The TWFE estimates (blue circles) are slightly larger than DID2S (orange squares) in post-treatment periods &amp;mdash; this upward bias is the well-documented problem with TWFE under staggered adoption and heterogeneous effects. The DID2S estimator corrects this by using only untreated observations to estimate the counterfactual, producing cleaner estimates of the dynamic treatment path.&lt;/p>
&lt;h2 id="13-hypothesis-testing-wald-test">13. Hypothesis testing: Wald test&lt;/h2>
&lt;p>PyFixest supports joint hypothesis testing via &lt;a href="https://pyfixest.org/reference/estimation.feols_.Feols.wald_test.html" target="_blank" rel="noopener">Wald tests&lt;/a>, which assess whether multiple coefficients are simultaneously equal to zero. This is useful when you want to test whether a group of related variables jointly matters, not just one at a time.&lt;/p>
&lt;pre>&lt;code class="language-python">fit_wald = pf.feols(&amp;quot;Y ~ X1 + X2 | f1&amp;quot;, data=data, vcov=&amp;quot;HC1&amp;quot;)
R = np.eye(2) # Test both X1=0 and X2=0 jointly
wald_result = fit_wald.wald_test(R=R)
print(f&amp;quot;Wald test (joint null: X1=0, X2=0):&amp;quot;)
print(wald_result)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Wald test (joint null: X1=0, X2=0):
statistic 1.554006e+02
pvalue 1.110223e-16
&lt;/code>&lt;/pre>
&lt;p>The Wald test statistic is 155.4 with a p-value effectively zero (&amp;lt; 10^{-16}), overwhelmingly rejecting the null hypothesis that both &lt;code>X1&lt;/code> and &lt;code>X2&lt;/code> have zero effect on &lt;code>Y&lt;/code>. This joint test is more informative than individual t-tests because it accounts for the correlation between the two coefficient estimates. In practice, Wald tests are essential for testing hypotheses about groups of variables, such as whether all interaction terms or all year dummies are jointly significant.&lt;/p>
&lt;h2 id="14-wild-cluster-bootstrap">14. Wild cluster bootstrap&lt;/h2>
&lt;p>When the number of clusters is small (roughly below 50), cluster-robust standard errors can be unreliable. The &lt;em>wild cluster bootstrap&lt;/em> provides more accurate inference in this setting by simulating the distribution of the test statistic under the null hypothesis. PyFixest integrates with the &lt;code>wildboottest&lt;/code> package to make this straightforward:&lt;/p>
&lt;pre>&lt;code class="language-python">fit_boot = pf.feols(&amp;quot;Y ~ X1 | group_id&amp;quot;, data=data, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;group_id&amp;quot;})
boot_result = fit_boot.wildboottest(param=&amp;quot;X1&amp;quot;, reps=999, seed=42)
print(boot_result)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">param X1
t value -8.616818459577098
Pr(&amp;gt;|t|) 0.0
bootstrap_type 11
inference CRV(group_id)
impose_null True
&lt;/code>&lt;/pre>
&lt;p>The wild bootstrap t-statistic of -8.62 and p-value of 0.0 confirm that the effect of &lt;code>X1&lt;/code> remains highly significant even under the more conservative bootstrap inference. The &lt;code>impose_null=True&lt;/code> setting means the bootstrap simulates data under the null hypothesis of no effect, which generally provides better size control in finite samples. With only ~20 groups in this dataset, the bootstrap p-value is more trustworthy than the asymptotic cluster-robust p-value.&lt;/p>
&lt;h2 id="15-discussion">15. Discussion&lt;/h2>
&lt;p>This tutorial posed a simple question: how do unobserved group-level characteristics bias regression estimates, and how can we account for them? The answer, demonstrated across multiple settings, is that fixed effects regression removes this bias by focusing on within-group variation only.&lt;/p>
&lt;p>The synthetic data showed that OLS estimates shift from -1.000 to -1.019 when absorbing group fixed effects &amp;mdash; a modest change in this controlled setting, but one that demonstrates the mechanism. The real-world wage panel told a more dramatic story: the union wage premium dropped from 18.3% (pooled OLS) to 7.3% (two-way FE), revealing that more than half of the apparent union premium reflects worker selection rather than a genuine union effect. This has direct implications for labor economists and policymakers: overestimating the union premium leads to overestimating the economic impact of declining unionization.&lt;/p>
&lt;p>Framing the wage panel through the Mincer equation (Section 11.3) provided a unifying thread for the entire analysis. The classic Mincer specification &amp;mdash; log wages as a function of education, experience, and experience squared &amp;mdash; is the starting point for virtually all empirical wage research. By extending it with additional controls and then progressively adding fixed effects, we traced a clear arc from pooled cross-sectional estimation to panel methods that account for unobserved heterogeneity. The within-versus-between decomposition (Section 11.2) made this arc concrete: education has zero within-worker variation, so one-way FE cannot estimate its effect, while variables like union status and marital status have substantial within-worker variation and can be identified.&lt;/p>
&lt;p>The wage panel also highlighted a fundamental tradeoff in fixed effects estimation: the very mechanism that removes ability bias &amp;mdash; absorbing all time-invariant individual characteristics &amp;mdash; also prevents estimation of time-invariant variables like education. This is not a limitation to be worked around but a defining feature of the method. The CRE/Mundlak approach (Section 11.7) offers a principled resolution: by including individual means of time-varying variables as additional regressors, it proxies for the unobserved heterogeneity that one-way FE would absorb, recovering education&amp;rsquo;s coefficient (0.094 per year of schooling) while producing time-varying estimates that closely match one-way FE. The key assumption &amp;mdash; that unobserved heterogeneity correlates with covariates only through their individual means &amp;mdash; is stronger than FE&amp;rsquo;s assumption of no time-varying confounding, but it is the price of recovering time-invariant effects.&lt;/p>
&lt;p>The three-way FE extension (adding occupation fixed effects) showed that occupation sorting explains negligible additional wage variation beyond individual and time effects, confirming that the dominant source of wage heterogeneity is persistent individual characteristics. The group-specific time trends analysis (Section 11.6) showed that allowing Black and non-Black workers to have different year effects produces estimates nearly identical to standard TWFE, supporting the common trends assumption in this particular panel. This is a useful diagnostic in practice: if group-specific trends substantially change the coefficients, the researcher should worry about whether the standard TWFE results are confounded by differential macro trends.&lt;/p>
&lt;p>PyFixest makes the entire workflow &amp;mdash; from simple OLS through two-way FE, IV, CRE/Mundlak, and event studies &amp;mdash; accessible with a concise formula syntax. The ability to estimate multiple specifications in one call (&lt;code>csw0&lt;/code>) and compare inference methods (iid, HC1, CRV1, CRV3, wild bootstrap) means researchers can quickly build a comprehensive picture of how sensitive their results are to modeling choices.&lt;/p>
&lt;h2 id="16-summary-and-next-steps">16. Summary and next steps&lt;/h2>
&lt;p>&lt;strong>Key takeaways:&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Fixed effects remove group-level confounding.&lt;/strong> In the wage panel, individual FE reduced the apparent union premium from 18.3% to 7.8%, revealing that over half the raw premium reflects selection on unobserved ability. Without FE, policy conclusions about unionization would be substantially biased.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The within-between decomposition diagnoses what FE can estimate.&lt;/strong> Decomposing each variable&amp;rsquo;s variation into between-worker and within-worker components reveals which coefficients survive one-way FE. Education has zero within variation and is absorbed; union status and marital status have substantial within shares (64% and 65%) and can be estimated. This diagnostic should precede any panel analysis.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The Mincer equation provides a unifying framework for wage regressions.&lt;/strong> Framing the analysis through the classic Mincer specification &amp;mdash; and its extensions to panel data &amp;mdash; makes the progression from pooled OLS to one-way FE to CRE/Mundlak a coherent arc rather than a collection of ad hoc specifications.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Standard errors matter as much as point estimates.&lt;/strong> Clustering standard errors inflated the SE on &lt;code>X1&lt;/code> by 50% compared to iid errors (0.1247 vs 0.0833). With weaker effects, this difference could flip a result from significant to insignificant &amp;mdash; always cluster at the appropriate level.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Multiple specifications are a robustness check, not a fishing exercise.&lt;/strong> The coefficient on &lt;code>X1&lt;/code> remained stable around -1.0 across no FE, one-way FE, and two-way FE. In the wage panel, the union premium stabilized at 7.3&amp;ndash;7.8% across one-way FE, two-way FE, three-way FE, and group-specific time trends &amp;mdash; strong evidence that these estimates are robust.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Group-specific time trends test the common trends assumption.&lt;/strong> Allowing Black and non-Black workers to have different year effects produced estimates nearly identical to standard TWFE, supporting the assumption that both groups faced similar macroeconomic trends during 1980&amp;ndash;1987. When this test fails, standard TWFE results may be unreliable.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>One-Way FE cannot estimate time-invariant effects, but CRE can recover them.&lt;/strong> Education was silently dropped from the one-way FE model because the within transformation reduces any constant variable to zero. The CRE model partially resolves this tradeoff by substituting individual means of time-varying variables for entity dummies, recovering education&amp;rsquo;s coefficient (0.094 per year) while producing time-varying estimates that match one-way FE. The cost is a stronger modeling assumption &amp;mdash; that unobserved heterogeneity correlates with covariates only through their individual means.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>TWFE event studies can be biased with staggered adoption.&lt;/strong> The DID2S estimator produced cleaner estimates by separating counterfactual estimation from treatment effect recovery. When treatment timing varies, always compare TWFE with a robust alternative like DID2S.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The event study baseline is not arbitrary.&lt;/strong> Setting &lt;code>ref=-1&lt;/code> (the last pre-treatment period) is the convention because it provides the most transparent test of parallel trends and minimizes extrapolation from the baseline to treatment onset. All cohorts in a staggered design share this reference point, making it the natural common clock.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Limitations:&lt;/strong> Fixed effects only remove time-invariant confounders. If a relevant confounder changes over time within groups, FE cannot address it. Additionally, FE estimation discards all between-group variation, which reduces statistical power and makes it impossible to estimate the effects of time-invariant variables &amp;mdash; as we saw directly in Section 11.2, where education&amp;rsquo;s within share was exactly zero. CRE offers a partial resolution, but its assumption that unobserved heterogeneity correlates with covariates only through individual means may not hold in all settings &amp;mdash; if ability correlates with the &lt;em>trajectory&lt;/em> of union membership rather than its mean, the CRE estimates would still be biased. The group-specific time trends test (Section 11.6) is a useful diagnostic but is not definitive: passing it does not prove that common trends hold, only that the data are consistent with the assumption along the dimension tested. Finally, the datasets here are synthetic or well-studied &amp;mdash; in messy real-world data, the parallel trends assumption underlying event studies may not hold.&lt;/p>
&lt;p>&lt;strong>Next steps:&lt;/strong> The CRE/Mundlak approach demonstrated in Section 11.7 can be extended in several directions: Wooldridge (2010, Ch. 10) develops the correlated random effects framework more formally, including CRE probit and tobit models for limited dependent variables. Hausman-Taylor estimation offers an alternative strategy for recovering time-invariant coefficients under different identifying assumptions. Beyond the wage panel, explore PyFixest&amp;rsquo;s support for Poisson regression (&lt;code>pf.fepois&lt;/code>) for count data, quantile regression (&lt;code>pf.quantreg&lt;/code>) for distributional effects, and the &lt;code>pf.event_study()&lt;/code> common API for streamlined event study estimation with multiple estimators. For more advanced inference, investigate randomization inference via &lt;code>fit.ritest()&lt;/code> and multiple testing corrections with &lt;code>pf.bonferroni()&lt;/code> and &lt;code>pf.rwolf()&lt;/code>.&lt;/p>
&lt;h2 id="17-exercises">17. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Varying the clustering level.&lt;/strong> Re-estimate the one-way FE model (&lt;code>Y ~ X1 | group_id&lt;/code>) with different clustering variables: &lt;code>f1&lt;/code>, &lt;code>f2&lt;/code>, and &lt;code>f3&lt;/code>. How do the standard errors change? Which clustering level produces the most conservative inference, and why?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Weak instruments.&lt;/strong> Modify the IV specification to use only &lt;code>Z1&lt;/code> as an instrument (instead of both &lt;code>Z1&lt;/code> and &lt;code>Z2&lt;/code>). How does the first-stage F-statistic change? How does the IV coefficient and its standard error respond to the weaker first stage?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>CRE with additional means.&lt;/strong> In Section 11.7, we included individual means only for the time-varying regressors. What happens if you also include year fixed effects alongside the CRE correction terms (i.e., add &lt;code>| year&lt;/code> to the CRE specification)? Do the time-varying coefficients shift closer to the TWFE estimates? Does the education coefficient change?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Group-specific trends by other dimensions.&lt;/strong> Section 11.6 allowed year effects to vary by race (&lt;code>black&lt;/code>). Repeat this analysis using &lt;code>hisp&lt;/code> instead, or using a union-status interaction (&lt;code>C(year):C(union)&lt;/code>). Do the results differ from the standard TWFE specification? What does this tell you about the common trends assumption along different group dimensions?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Within-between decomposition on new data.&lt;/strong> Download a panel dataset of your choice (e.g., Penn World Table, World Development Indicators) and compute the within-versus-between decomposition for all variables. Which variables have the highest within share? What does this predict about which coefficients will survive one-way FE? Verify by estimating both pooled OLS and one-way FE models.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Hausman test via CRE.&lt;/strong> The CRE model provides a simple Hausman-type test: if the coefficients on the individual means ($\bar{X}_i$) are jointly zero, then pooled OLS and one-way FE yield the same estimates, and random effects is efficient. Test whether the four CRE correction terms (union_mean, married_mean, hours_mean, expersq_mean) are jointly significant using a Wald test. What does the result imply about the choice between random effects and fixed effects for this panel?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="18-references">18. References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="http://scorreia.com/research/hdfe.pdf" target="_blank" rel="noopener">Correia, S. (2016). A Feasible Estimator for Linear Models with Multi-Way Fixed Effects. Working Paper.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1016/j.jeconom.2021.10.004" target="_blank" rel="noopener">Gardner, J. (2022). Two-Stage Differences in Differences. Journal of Econometrics.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/py-econometrics/pyfixest" target="_blank" rel="noopener">Fischer, A. and Schar, S. (2024). PyFixest: Fast High-Dimensional Fixed Effects Estimation in Python.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://pyfixest.org/quickstart.html" target="_blank" rel="noopener">PyFixest Documentation &amp;ndash; Quickstart Guide.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1002/%28SICI%291099-1255%28199803/04%2913:2%3c163::AID-JAE460%3e3.0.CO;2-Y" target="_blank" rel="noopener">Vella, F. and Verbeek, M. (1998). Whose Wages Do Unions Raise? A Dynamic Model of Unionism and Wage Rate Determination for Young Men. Journal of Applied Econometrics.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.3368/jhr.50.2.317" target="_blank" rel="noopener">Cameron, A.C. and Miller, D.L. (2015). A Practitioner&amp;rsquo;s Guide to Cluster-Robust Inference. Journal of Human Resources.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.nber.org/books-and-chapters/schooling-experience-and-earnings" target="_blank" rel="noopener">Mincer, J. (1974). &lt;em>Schooling, Experience, and Earnings.&lt;/em> Columbia University Press.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.2307/1913646" target="_blank" rel="noopener">Mundlak, Y. (1978). On the Pooling of Time Series and Cross Section Data. &lt;em>Econometrica&lt;/em>, 46(1), 69&amp;ndash;85.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://mitpress.mit.edu/9780262232586/" target="_blank" rel="noopener">Wooldridge, J.M. (2010). &lt;em>Econometric Analysis of Cross Section and Panel Data.&lt;/em> 2nd ed. MIT Press.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1080/00401706.2013.806694" target="_blank" rel="noopener">Olea, J.L.M. and Pflueger, C. (2013). A Robust Test for Weak Instruments. Journal of Business &amp;amp; Economic Statistics.&lt;/a>&lt;/li>
&lt;/ol>
&lt;h4 id="acknowledgements">Acknowledgements&lt;/h4>
&lt;p>AI tools (Claude Code, Gemini, NotebookLM) were used to make the contents of this post more accessible to students. Nevertheless, the content in this post may still have errors. Caution is needed when applying the contents of this post to true research projects.&lt;/p></description></item></channel></rss>