<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Panel Data | Carlos Mendez</title><link>https://carlos-mendez.org/category/panel-data/</link><atom:link href="https://carlos-mendez.org/category/panel-data/index.xml" rel="self" type="application/rss+xml"/><description>Panel Data</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>Carlos Mendez</copyright><lastBuildDate>Sat, 04 Apr 2026 00:00:00 +0000</lastBuildDate><image><url>https://carlos-mendez.org/media/icon_huedfae549300b4ca5d201a9bd09a3ecd5_79625_512x512_fill_lanczos_center_3.png</url><title>Panel Data</title><link>https://carlos-mendez.org/category/panel-data/</link></image><item><title>Identifying Latent Group Structures in Panel Data: The classifylasso Command in Stata</title><link>https://carlos-mendez.org/post/stata_panel_lasso_cluster/</link><pubDate>Sat, 04 Apr 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/stata_panel_lasso_cluster/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>Do all countries respond the same way to inflation? To interest rates? To democratic transitions? Most panel data models assume yes. They force every country to share the same slope coefficients. That is a strong assumption &amp;mdash; and often a wrong one.&lt;/p>
&lt;p>Here is a preview of what we will discover. When we estimate the effect of inflation on savings across 56 countries, the pooled model says: &amp;ldquo;no significant effect.&amp;rdquo; But that average is a lie. One group of countries saves &lt;em>less&lt;/em> when inflation rises. Another group saves &lt;em>more&lt;/em>. The pooled estimate averages a negative and a positive effect, producing a misleading zero.&lt;/p>
&lt;p>The &lt;strong>Classifier-LASSO&lt;/strong> (C-LASSO) method solves this problem. Developed by Su, Shi, and Phillips (2016), it discovers &lt;strong>latent groups&lt;/strong> in your panel data. Countries within each group share the same coefficients. Countries across groups can differ. Think of it like a sorting hat: rather than treating all countries as identical or all as unique, C-LASSO sorts them into a small number of groups with shared behavioral patterns.&lt;/p>
&lt;p>This tutorial demonstrates the &lt;code>classifylasso&lt;/code> Stata command (Huang, Wang, and Zhou 2024) with two applications:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Savings behavior&lt;/strong> across 56 countries (1995&amp;ndash;2010) &amp;mdash; where inflation affects savings in &lt;em>opposite directions&lt;/em> depending on the country group&lt;/li>
&lt;li>&lt;strong>Democracy and economic growth&lt;/strong> across 98 countries (1970&amp;ndash;2010) &amp;mdash; where the pooled estimate of +1.05 masks a split of +2.15 in one group and -0.94 in another&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Learning objectives:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Understand why assuming homogeneous slopes can be misleading in panel data&lt;/li>
&lt;li>Learn the Classifier-LASSO method for identifying latent group structures&lt;/li>
&lt;li>Implement &lt;code>classifylasso&lt;/code> in Stata with both static and dynamic specifications&lt;/li>
&lt;li>Use postestimation commands (&lt;code>classogroup&lt;/code>, &lt;code>classocoef&lt;/code>, &lt;code>predict gid&lt;/code>) to visualize and interpret results&lt;/li>
&lt;li>Compare pooled fixed-effects estimates with group-specific C-LASSO estimates&lt;/li>
&lt;/ul>
&lt;p>The diagram below maps the tutorial&amp;rsquo;s progression. We start simple and build complexity step by step.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
A[&amp;quot;&amp;lt;b&amp;gt;EDA&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Savings data&amp;quot;] --&amp;gt; B[&amp;quot;&amp;lt;b&amp;gt;Baseline FE&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Pooled &amp;amp;&amp;lt;br/&amp;gt;fixed effects&amp;quot;]
B --&amp;gt; C[&amp;quot;&amp;lt;b&amp;gt;C-LASSO&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Static model&amp;lt;br/&amp;gt;(no lagged DV)&amp;quot;]
C --&amp;gt; D[&amp;quot;&amp;lt;b&amp;gt;C-LASSO&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Dynamic model&amp;lt;br/&amp;gt;(jackknife)&amp;quot;]
D --&amp;gt; E[&amp;quot;&amp;lt;b&amp;gt;Democracy&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Application&amp;lt;br/&amp;gt;(two-way FE)&amp;quot;]
E --&amp;gt; F[&amp;quot;&amp;lt;b&amp;gt;Comparison&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Pooled vs&amp;lt;br/&amp;gt;group-specific&amp;quot;]
style A fill:#141413,stroke:#141413,color:#fff
style B fill:#6a9bcc,stroke:#141413,color:#fff
style C fill:#d97757,stroke:#141413,color:#fff
style D fill:#d97757,stroke:#141413,color:#fff
style E fill:#00d4c8,stroke:#141413,color:#141413
style F fill:#1a3a8a,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;hr>
&lt;h2 id="2-the-problem-homogeneous-vs-heterogeneous-slopes">2. The Problem: Homogeneous vs Heterogeneous Slopes&lt;/h2>
&lt;h3 id="21-three-approaches-to-slope-heterogeneity">2.1 Three approaches to slope heterogeneity&lt;/h3>
&lt;p>Imagine 56 students taking the same exam. &lt;strong>Approach 1&lt;/strong> assumes they all studied the same way &amp;mdash; one average study strategy explains everyone&amp;rsquo;s score. &lt;strong>Approach 2&lt;/strong> gives each student a unique strategy &amp;mdash; but with only a few data points per student, the estimates are noisy. &lt;strong>Approach 3&lt;/strong> (C-LASSO) discovers that students naturally fall into 2&amp;ndash;3 study groups. Students within a group share the same strategy. Students across groups differ.&lt;/p>
&lt;p>The same logic applies to panel data. The standard fixed-effects model is:&lt;/p>
&lt;p>$$y_{it} = \mu_i + \boldsymbol{\beta}' \mathbf{x}_{it} + u_{it}$$&lt;/p>
&lt;p>Here, $y_{it}$ is the outcome for country $i$ at time $t$. The term $\mu_i$ captures country-specific intercepts (fixed effects). The slope vector $\boldsymbol{\beta}$ links the regressors $\mathbf{x}_{it}$ to the outcome. The critical assumption: $\boldsymbol{\beta}$ is the &lt;strong>same for all countries&lt;/strong>. Japan and Nigeria get the same coefficient on inflation. That may be wrong.&lt;/p>
&lt;p>At the other extreme, we could run separate regressions for each country. But with only $T = 15$ time periods per country, individual estimates are noisy. We lose statistical power.&lt;/p>
&lt;p>C-LASSO introduces a middle ground. It assumes countries belong to $K$ latent groups:&lt;/p>
&lt;p>$$\boldsymbol{\beta}_i = \boldsymbol{\alpha}_k \quad \text{if} \quad i \in G_k, \quad k = 1, \ldots, K$$&lt;/p>
&lt;p>In words, country $i$ gets the slope coefficients of its group $G_k$. The method estimates three things simultaneously: the number of groups $K$, which countries belong to which group, and each group&amp;rsquo;s coefficients $\boldsymbol{\alpha}_k$. You do not need to specify the groups in advance. The data reveals them.&lt;/p>
&lt;h3 id="22-why-not-just-use-k-means">2.2 Why not just use K-means?&lt;/h3>
&lt;p>A natural question: why not run individual regressions first and then cluster the coefficients with K-means? C-LASSO has two advantages. First, it estimates group membership and coefficients &lt;strong>jointly&lt;/strong>. A two-step approach (estimate, then cluster) propagates first-stage errors into the grouping. Second, C-LASSO&amp;rsquo;s penalty structure naturally pulls similar countries toward the same group. It is a statistically principled sorting mechanism, not an ad-hoc post-processing step.&lt;/p>
&lt;hr>
&lt;h2 id="3-the-classifier-lasso-method">3. The Classifier-LASSO Method&lt;/h2>
&lt;h3 id="31-the-c-lasso-objective-function">3.1 The C-LASSO objective function&lt;/h3>
&lt;p>C-LASSO minimizes a penalized least-squares objective:&lt;/p>
&lt;p>$$Q_{NT,\lambda}^{(K)} = \frac{1}{NT} \sum_{i=1}^{N} \sum_{t=1}^{T} (y_{it} - \boldsymbol{\beta}_i' \mathbf{x}_{it})^2 + \frac{\lambda_{NT}}{N} \sum_{i=1}^{N} \prod_{k=1}^{K} |\boldsymbol{\beta}_i - \boldsymbol{\alpha}_k|$$&lt;/p>
&lt;p>The first term is the standard sum of squared residuals. It measures how well the model fits the data. The second term is the &lt;strong>penalty&lt;/strong>. It encourages each country&amp;rsquo;s coefficients $\boldsymbol{\beta}_i$ to be close to one of the group centers $\boldsymbol{\alpha}_k$.&lt;/p>
&lt;p>Think of each group center as a &lt;strong>planet with gravitational pull&lt;/strong>. If a country&amp;rsquo;s coefficients are close to &lt;em>any&lt;/em> planet, the product $\prod_k |\boldsymbol{\beta}_i - \boldsymbol{\alpha}_k|$ shrinks toward zero. The penalty becomes small. The country gets pulled into that group. If the coefficients are far from all planets, the penalty stays large. The tuning parameter $\lambda_{NT} = c_\lambda T^{-1/3}$ controls how strong this gravitational pull is.&lt;/p>
&lt;h3 id="32-three-step-estimation-procedure">3.2 Three-step estimation procedure&lt;/h3>
&lt;p>The &lt;code>classifylasso&lt;/code> command works in three steps:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Sort countries into groups.&lt;/strong> For each candidate number of groups $K$, the algorithm iteratively updates group centers and reassigns countries until convergence. Starting values come from unit-by-unit regressions.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Re-estimate within groups (postlasso).&lt;/strong> The LASSO penalty biases the coefficient estimates. So after sorting, we discard the penalized estimates and re-run plain OLS within each group. Think of it like a talent show: LASSO is the audition that selects who is in which group, but the final performance (the coefficient estimates) is unpenalized. This postlasso step gives us valid standard errors and confidence intervals.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Pick the best $K$ (information criterion).&lt;/strong> How many groups are there? The command tests $K = 1, 2, \ldots, K_{\max}$ and picks the $K$ that minimizes an information criterion. The IC acts like a &lt;strong>referee&lt;/strong> balancing two concerns: fit (more groups fit better) and complexity (more groups risk overfitting). It works like AIC or BIC. The tuning parameter $\rho_{NT} = c_\rho (NT)^{-1/2}$ controls how harshly the referee penalizes extra groups.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h3 id="33-dynamic-panels-and-nickell-bias">3.3 Dynamic panels and Nickell bias&lt;/h3>
&lt;p>What if your model includes a lagged dependent variable, like $y_{i,t-1}$? This creates a problem called &lt;strong>Nickell bias&lt;/strong>. When you demean the data to remove fixed effects, the demeaned lagged outcome becomes correlated with the demeaned error. The result: biased coefficients.&lt;/p>
&lt;p>The &lt;code>classifylasso&lt;/code> command offers a &lt;code>dynamic&lt;/code> option to fix this. It uses the &lt;strong>half-panel jackknife&lt;/strong> (Dhaene and Jochmans 2015). The idea is simple: split the time series in half. Estimate the model on each half. Combine the two estimates in a way that cancels the bias. Problem solved.&lt;/p>
&lt;p>Now that we understand the method, let&amp;rsquo;s apply it to real data.&lt;/p>
&lt;hr>
&lt;h2 id="4-data-exploration-savings">4. Data Exploration: Savings&lt;/h2>
&lt;h3 id="41-load-and-describe-the-data">4.1 Load and describe the data&lt;/h3>
&lt;p>Our first application uses a panel of 56 countries over 15 years, from Su, Shi, and Phillips (2016). The outcome is the savings-to-GDP ratio. The regressors are lagged savings, CPI inflation, real interest rates, and GDP growth.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;https://github.com/cmg777/starter-academic-v501/raw/master/content/post/stata_panel_lasso_cluster/refMaterials/saving.dta&amp;quot;, clear
xtset code year
summarize savings lagsavings cpi interest gdp
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
savings | 840 -2.87e-08 1.000596 -2.495871 2.893858
lagsavings | 840 5.81e-08 1.000596 -2.832278 2.91508
cpi | 840 3.56e-09 1.000596 -2.773791 3.548945
interest | 840 -7.17e-09 1.000596 -3.600348 3.277582
gdp | 840 1.06e-08 1.000596 -3.554419 2.461317
&lt;/code>&lt;/pre>
&lt;p>The panel is strongly balanced: 56 countries $\times$ 15 years = 840 observations. All variables are standardized to mean zero and standard deviation one. This means coefficients are in standard-deviation units. A coefficient of 0.18 means &amp;ldquo;a one-SD increase in CPI is associated with a 0.18-SD change in savings.&amp;rdquo; The balanced structure matters: C-LASSO requires all countries to be observed in all time periods.&lt;/p>
&lt;h3 id="42-visualize-cross-country-heterogeneity">4.2 Visualize cross-country heterogeneity&lt;/h3>
&lt;p>Before running any regressions, it helps to visualize how savings trajectories differ across countries. The &lt;code>xtline&lt;/code> command overlays all 56 country lines on a single plot:&lt;/p>
&lt;pre>&lt;code class="language-stata">xtline savings, overlay ///
title(&amp;quot;Savings-to-GDP Ratio Across 56 Countries&amp;quot;, size(medium)) ///
subtitle(&amp;quot;Each line represents one country&amp;quot;, size(small)) ///
ytitle(&amp;quot;Savings / GDP&amp;quot;) xtitle(&amp;quot;Year&amp;quot;) legend(off)
graph export &amp;quot;stata_panel_lasso_cluster_fig1_savings_scatter.png&amp;quot;, replace width(2400)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_panel_lasso_cluster_fig1_savings_scatter.png" alt="Spaghetti plot of savings-to-GDP ratio across 56 countries, showing wide dispersion in trajectories.">
&lt;em>Figure 1: Savings-to-GDP ratio across 56 countries (1995&amp;ndash;2010). Each line represents one country, revealing substantial heterogeneity in savings dynamics.&lt;/em>&lt;/p>
&lt;p>The spaghetti plot tells a clear story: countries do not move in lockstep. Some maintain positive savings ratios throughout. Others swing below zero. The lines diverge, cross, and cluster &amp;mdash; suggesting that different countries follow fundamentally different savings dynamics. This is exactly the kind of heterogeneity that C-LASSO is designed to detect. Perhaps subsets of countries share similar responses, even if the full panel does not.&lt;/p>
&lt;p>But first, let&amp;rsquo;s see what the standard models say.&lt;/p>
&lt;hr>
&lt;h2 id="5-baseline-pooled-and-fixed-effects-regressions">5. Baseline: Pooled and Fixed Effects Regressions&lt;/h2>
&lt;p>Before applying C-LASSO, we establish a benchmark by estimating the standard pooled OLS and fixed-effects models. These models assume that all 56 countries share the same slope coefficients.&lt;/p>
&lt;pre>&lt;code class="language-stata">* Pooled OLS
regress savings lagsavings cpi interest gdp
* Standard Fixed Effects
xtreg savings lagsavings cpi interest gdp, fe
* Robust Fixed Effects (reghdfe)
reghdfe savings lagsavings cpi interest gdp, absorb(code) vce(robust)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Pooled OLS FE (robust)
lagsavings 0.6051 0.6051
cpi 0.0301 0.0301
interest 0.0059 0.0059
gdp 0.1882 0.1882
&lt;/code>&lt;/pre>
&lt;p>The pooled OLS and fixed-effects estimates are virtually identical. R-squared is 0.438. Lagged savings dominates (coefficient 0.605, $p &amp;lt; 0.001$). GDP growth matters too (0.188, $p &amp;lt; 0.001$).&lt;/p>
&lt;p>Now look at the two remaining variables. CPI: 0.030. Interest rate: 0.006. Both statistically insignificant. A textbook conclusion would be: &amp;ldquo;Inflation and interest rates do not affect savings.&amp;rdquo;&lt;/p>
&lt;p>But what if the average is lying? Imagine a city where half the neighborhoods warm up by 5 degrees and the other half cool down by 5 degrees. The citywide average temperature change is zero. A meteorologist reporting &amp;ldquo;no change&amp;rdquo; would be wrong &amp;mdash; there &lt;em>are&lt;/em> changes, just in opposite directions. This is exactly what we will discover with C-LASSO.&lt;/p>
&lt;hr>
&lt;h2 id="6-classifier-lasso-savings-static-model">6. Classifier-LASSO: Savings, Static Model&lt;/h2>
&lt;h3 id="61-estimation">6.1 Estimation&lt;/h3>
&lt;p>We start with the simplest C-LASSO specification: a static model without the lagged dependent variable. This lets us focus on the core mechanics before adding complexity.&lt;/p>
&lt;pre>&lt;code class="language-stata">classifylasso savings cpi interest gdp, grouplist(1/5) tolerance(1e-4)
&lt;/code>&lt;/pre>
&lt;p>The command searches over $K = 1$ to $K = 5$ groups and reports the information criterion (IC) for each:&lt;/p>
&lt;pre>&lt;code class="language-text">Estimation 1: Group Number = 1; IC = 0.054
Estimation 2: Group Number = 2; IC = -0.028 ← minimum
Estimation 3: Group Number = 3; IC = 0.059
Estimation 4: Group Number = 4; IC = 0.131
Estimation 5: Group Number = 5; IC = 0.213
* Selected Group Number: 2
&lt;/code>&lt;/pre>
&lt;p>The IC is minimized at $K = 2$, with values rising monotonically from $K = 3$ onward. This clear U-shape provides strong evidence for exactly two latent groups in the data.&lt;/p>
&lt;h3 id="62-group-specific-coefficients">6.2 Group-specific coefficients&lt;/h3>
&lt;pre>&lt;code class="language-stata">classoselect, postselection
predict gid_static, gid
tabulate gid_static
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Group 1 (34 countries, 510 obs): Within R-sq. = 0.2019
cpi | -0.1813 (z = -4.29, p &amp;lt; 0.001)
interest | -0.1966 (z = -4.64, p &amp;lt; 0.001)
gdp | 0.3346 (z = 7.98, p &amp;lt; 0.001)
Group 2 (22 countries, 330 obs): Within R-sq. = 0.2369
cpi | 0.4781 (z = 9.10, p &amp;lt; 0.001)
interest | 0.2631 (z = 5.01, p &amp;lt; 0.001)
gdp | 0.1117 (z = 2.23, p = 0.026)
&lt;/code>&lt;/pre>
&lt;p>The results are striking. Look at CPI.&lt;/p>
&lt;p>In &lt;strong>Group 1&lt;/strong> (34 countries), higher inflation &lt;em>reduces&lt;/em> savings: coefficient $-0.181$ ($p &amp;lt; 0.001$). In &lt;strong>Group 2&lt;/strong> (22 countries), higher inflation &lt;em>increases&lt;/em> savings: coefficient $+0.478$ ($p &amp;lt; 0.001$). The sign flips completely.&lt;/p>
&lt;p>The same reversal appears for the interest rate: $-0.197$ in Group 1 versus $+0.263$ in Group 2.&lt;/p>
&lt;p>Now the pooled CPI coefficient of $+0.030$ makes sense. It was averaging $-0.181$ and $+0.478$ &amp;mdash; a negative and a positive effect canceling each other out. The &amp;ldquo;insignificant&amp;rdquo; result was not evidence of no effect. It was evidence of &lt;strong>two opposing effects&lt;/strong> hidden inside the average.&lt;/p>
&lt;p>Why the reversal? In Group 1, higher inflation erodes the real value of savings, discouraging people from saving. In Group 2, higher inflation may trigger &lt;strong>precautionary savings&lt;/strong> &amp;mdash; households save &lt;em>more&lt;/em> precisely because the economic environment feels uncertain. Same macroeconomic shock, opposite behavioral response.&lt;/p>
&lt;h3 id="63-group-selection-plot">6.3 Group selection plot&lt;/h3>
&lt;pre>&lt;code class="language-stata">classogroup
graph export &amp;quot;stata_panel_lasso_cluster_fig2_group_selection_static.png&amp;quot;, replace width(2400)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_panel_lasso_cluster_fig2_group_selection_static.png" alt="Information criterion and iteration count by number of groups for the static savings model. IC is minimized at K=2.">
&lt;em>Figure 2: Group selection for the static savings model. The information criterion (left axis) is minimized at K=2, with a clear U-shape from K=3 onward.&lt;/em>&lt;/p>
&lt;p>The triangle marks the IC minimum at $K = 2$. The left axis shows IC values; the right axis shows iterations to convergence. Notice: $K = 2$ converged quickly (about 3 iterations). Models with $K \geq 3$ hit the maximum 20 iterations. When the algorithm struggles to converge, it is a sign of overparameterization &amp;mdash; too many groups for the data to support.&lt;/p>
&lt;p>So far, we have found two groups with a static model. But we omitted lagged savings. Let&amp;rsquo;s add it back.&lt;/p>
&lt;hr>
&lt;h2 id="7-classifier-lasso-savings-dynamic-model">7. Classifier-LASSO: Savings, Dynamic Model&lt;/h2>
&lt;h3 id="71-adding-the-lagged-dependent-variable">7.1 Adding the lagged dependent variable&lt;/h3>
&lt;p>Savings are highly persistent. The pooled coefficient on &lt;code>lagsavings&lt;/code> was 0.605 &amp;mdash; a country&amp;rsquo;s savings this year strongly predicts its savings next year. Omitting this variable may bias everything else. We now add it back and replicate Su, Shi, and Phillips (2016). The &lt;code>dynamic&lt;/code> option activates the half-panel jackknife to correct Nickell bias.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;https://github.com/cmg777/starter-academic-v501/raw/master/content/post/stata_panel_lasso_cluster/refMaterials/saving.dta&amp;quot;, clear
xtset code year
classifylasso savings lagsavings cpi interest gdp, ///
grouplist(1/5) lambda(1.5485) tolerance(1e-4) dynamic
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">* Selected Group Number: 2
The algorithm takes 9min57s.
Group 1 (31 countries, 465 obs): Within R-sq. = 0.4988
lagsavings | 0.6952 (z = 18.15, p &amp;lt; 0.001)
cpi | -0.1602 (z = -4.09, p &amp;lt; 0.001)
interest | -0.1490 (z = -4.04, p &amp;lt; 0.001)
gdp | 0.2892 (z = 7.62, p &amp;lt; 0.001)
Group 2 (25 countries, 375 obs): Within R-sq. = 0.4372
lagsavings | 0.6939 (z = 19.45, p &amp;lt; 0.001)
cpi | 0.1967 (z = 4.93, p &amp;lt; 0.001)
interest | 0.1225 (z = 2.98, p = 0.003)
gdp | 0.1127 (z = 2.38, p = 0.018)
&lt;/code>&lt;/pre>
&lt;p>Again, C-LASSO selects $K = 2$ groups. The sign reversal on CPI survives: $-0.160$ in Group 1 versus $+0.197$ in Group 2. Same for the interest rate: $-0.149$ versus $+0.123$.&lt;/p>
&lt;p>Here is what is interesting about the &lt;code>lagsavings&lt;/code> coefficient. Both groups show nearly identical persistence: 0.695 in Group 1 and 0.694 in Group 2. Think of it like a speedometer. Both groups of countries cruise at the same speed (savings persistence). But they swerve in opposite directions when they hit a pothole (an inflation or interest rate shock). The heterogeneity is about &lt;em>reactions to shocks&lt;/em>, not about baseline behavior.&lt;/p>
&lt;p>Adding lagged savings also improved the fit. Within R-squared jumped from 0.20&amp;ndash;0.24 (static) to 0.44&amp;ndash;0.50 (dynamic). The lagged variable clearly matters.&lt;/p>
&lt;h3 id="72-coefficient-plots">7.2 Coefficient plots&lt;/h3>
&lt;p>The &lt;code>classocoef&lt;/code> postestimation command visualizes group-specific coefficients with 95% confidence bands:&lt;/p>
&lt;pre>&lt;code class="language-stata">classocoef cpi
graph export &amp;quot;stata_panel_lasso_cluster_fig3_coef_cpi.png&amp;quot;, replace width(2400)
classocoef interest
graph export &amp;quot;stata_panel_lasso_cluster_fig4_coef_interest.png&amp;quot;, replace width(2400)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_panel_lasso_cluster_fig3_coef_cpi.png" alt="CPI coefficient estimates and 95% confidence bands by group, showing a clear sign reversal with non-overlapping confidence intervals.">
&lt;em>Figure 3: Heterogeneous effects of CPI on savings. Group 1 (31 countries) shows a negative effect; Group 2 (25 countries) shows a positive effect. Confidence bands do not overlap.&lt;/em>&lt;/p>
&lt;p>This is the &amp;ldquo;smoking gun&amp;rdquo; figure. The two horizontal lines are the group-specific coefficients. The dashed lines show 95% confidence bands. The bands do not overlap. This is not a marginal difference. It is a robust sign reversal.&lt;/p>
&lt;p>For 31 countries (Group 1), higher inflation reduces savings ($-0.160$, $p &amp;lt; 0.001$). For 25 countries (Group 2), higher inflation increases savings ($+0.197$, $p &amp;lt; 0.001$). A pooled model averages these opposing forces and finds CPI &amp;ldquo;insignificant.&amp;rdquo; That is aggregation bias at work.&lt;/p>
&lt;p>&lt;img src="stata_panel_lasso_cluster_fig4_coef_interest.png" alt="Interest rate coefficient estimates and 95% confidence bands by group, showing the same sign reversal pattern as CPI.">
&lt;em>Figure 4: Heterogeneous effects of the interest rate on savings. The same sign reversal pattern as CPI: negative in Group 1, positive in Group 2.&lt;/em>&lt;/p>
&lt;p>The interest rate tells the same story. Group 1 countries save &lt;em>less&lt;/em> when rates rise ($-0.149$). Group 2 countries save &lt;em>more&lt;/em> ($+0.123$).&lt;/p>
&lt;p>Why? One interpretation: in Group 1 (more developed financial markets), higher returns make consumption more attractive &amp;mdash; the &lt;strong>substitution effect&lt;/strong> dominates. In Group 2 (limited financial access), higher returns make saving more rewarding &amp;mdash; the &lt;strong>income effect&lt;/strong> dominates.&lt;/p>
&lt;p>We have now established that latent groups exist in savings data. The next question: does the same pattern appear in a completely different economic context?&lt;/p>
&lt;hr>
&lt;h2 id="8-democracy-application-does-democracy-cause-growth">8. Democracy Application: Does Democracy Cause Growth?&lt;/h2>
&lt;h3 id="81-the-acemoglu-et-al-2019-question">8.1 The Acemoglu et al. (2019) question&lt;/h3>
&lt;p>&amp;ldquo;Democracy does cause growth.&amp;rdquo; That is the title of a famous 2019 paper by Acemoglu, Naidu, Restrepo, and Robinson in the &lt;em>Journal of Political Economy&lt;/em>. Their evidence: a pooled two-way fixed-effects model with lagged GDP finds a positive, significant effect.&lt;/p>
&lt;p>But we have learned to be skeptical of pooled estimates. Does this average apply to all 98 countries? Or does it mask the same kind of sign reversal we found in savings?&lt;/p>
&lt;h3 id="82-data-exploration">8.2 Data exploration&lt;/h3>
&lt;pre>&lt;code class="language-stata">use &amp;quot;https://github.com/cmg777/starter-academic-v501/raw/master/content/post/stata_panel_lasso_cluster/refMaterials/democracy.dta&amp;quot;, clear
xtset country year
summarize lnPGDP Democracy ly1
tabulate Democracy
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
lnPGDP | 4,018 758.5558 162.9137 405.6728 1094.003
Democracy | 4,018 .5450473 .4980286 0 1
ly1 | 3,920 757.7754 162.6702 405.6728 1094.003
Democracy | Freq. Percent
------------+-----------------------------------
0 | 1,828 45.50
1 | 2,190 54.50
&lt;/code>&lt;/pre>
&lt;p>The panel covers 98 countries from 1970 to 2010 &amp;mdash; 4,018 observations. The binary &lt;code>Democracy&lt;/code> indicator is 1 for democratic country-years and 0 otherwise. About 55% of observations are democratic, reflecting the global wave of democratization. The dependent variable &lt;code>lnPGDP&lt;/code> (log per-capita GDP, scaled) ranges from 406 to 1,094 &amp;mdash; the full spectrum from low-income to high-income countries.&lt;/p>
&lt;h3 id="83-pooled-fixed-effects-benchmark">8.3 Pooled fixed-effects benchmark&lt;/h3>
&lt;pre>&lt;code class="language-stata">reghdfe lnPGDP Democracy ly1, absorb(country year) cluster(country)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">HDFE Linear regression Number of obs = 3,920
R-squared = 0.9991
Within R-sq. = 0.9607
(Std. err. adjusted for 98 clusters in country)
lnPGDP | Coefficient Robust std. err. t P&amp;gt;|t|
Democracy | 1.054992 .369806 2.85 0.005
ly1 | .970495 .0059964 161.85 0.000
&lt;/code>&lt;/pre>
&lt;p>Democracy is associated with a 1.055-unit increase in log per-capita GDP ($p = 0.005$, clustered SE = 0.370). Lagged GDP has a coefficient of 0.970 &amp;mdash; strong persistence. This replicates Acemoglu et al. (2019): on average, democracy promotes growth.&lt;/p>
&lt;p>On average. But we already know what &amp;ldquo;on average&amp;rdquo; can hide. Let&amp;rsquo;s run C-LASSO.&lt;/p>
&lt;h3 id="84-c-lasso-revealing-the-heterogeneity">8.4 C-LASSO: revealing the heterogeneity&lt;/h3>
&lt;pre>&lt;code class="language-stata">classifylasso lnPGDP Democracy ly1, ///
grouplist(1/5) rho(0.2) absorb(country year) ///
cluster(country) dynamic optmaxiter(300)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">* Selected Group Number: 2
The algorithm takes 2h33min41s.
Group 1 (57 countries, 2,280 obs): Within R-sq. = 0.9609
Democracy | 2.151397 (z = 3.94, p &amp;lt; 0.001)
ly1 | 1.032752 (z = 149.97, p &amp;lt; 0.001)
Group 2 (41 countries, 1,640 obs): Within R-sq. = 0.9538
Democracy | -0.935589 (z = -2.69, p = 0.007)
ly1 | 0.979327 (z = 95.73, p &amp;lt; 0.001)
&lt;/code>&lt;/pre>
&lt;p>This is the tutorial&amp;rsquo;s most striking finding.&lt;/p>
&lt;p>The pooled coefficient of $+1.055$ is &lt;strong>not representative of any actual country group&lt;/strong>. It is a weighted average of two fundamentally different effects:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Group 1&lt;/strong> (57 countries): democracy effect = $+2.151$ ($p &amp;lt; 0.001$). More than twice the pooled estimate.&lt;/li>
&lt;li>&lt;strong>Group 2&lt;/strong> (41 countries): democracy effect = $-0.936$ ($p = 0.007$). Negative and significant.&lt;/li>
&lt;/ul>
&lt;p>The coefficient literally changes sign. For 58% of countries, democratic transitions are associated with GDP gains. For the remaining 42%, they are associated with GDP declines. The pooled model sees one number. C-LASSO sees two stories.&lt;/p>
&lt;p>Note: these are conditional associations within the panel model. A causal interpretation requires the same identifying assumptions as Acemoglu et al. (2019).&lt;/p>
&lt;h3 id="85-visualizing-the-democracy-growth-split">8.5 Visualizing the democracy-growth split&lt;/h3>
&lt;pre>&lt;code class="language-stata">classogroup
graph export &amp;quot;stata_panel_lasso_cluster_fig5_democracy_selection.png&amp;quot;, replace width(2400)
classocoef Democracy
graph export &amp;quot;stata_panel_lasso_cluster_fig6_democracy_coef.png&amp;quot;, replace width(2400)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_panel_lasso_cluster_fig5_democracy_selection.png" alt="Information criterion and iteration count for the democracy model. IC is minimized at K=2, though values are close across specifications.">
&lt;em>Figure 5: Group selection for the democracy-growth model. IC is minimized at K=2, though values are close across all K (range 3.267&amp;ndash;3.280).&lt;/em>&lt;/p>
&lt;p>The IC selects $K = 2$. But look closely: the IC values range from 3.267 to 3.280 &amp;mdash; a span of just 0.013. The 2-group structure is optimal but not overwhelmingly so. This is a useful reminder: always check sensitivity to the tuning parameter $\rho$.&lt;/p>
&lt;p>&lt;img src="stata_panel_lasso_cluster_fig6_democracy_coef.png" alt="Democracy coefficient polarization across two groups: Group 1 (57 countries) shows a positive effect around +2.2, Group 2 (41 countries) shows a negative effect around -1.0.">
&lt;em>Figure 6: Heterogeneous effects of democracy on economic growth. Group 1 (57 countries) shows a positive effect (+2.15); Group 2 (41 countries) shows a negative effect (-0.94). The pooled estimate of +1.05 describes neither group.&lt;/em>&lt;/p>
&lt;p>This is the key figure of the tutorial. Each dot is one country&amp;rsquo;s individual coefficient estimate. The horizontal lines show group-specific postlasso estimates with 95% confidence bands.&lt;/p>
&lt;p>The polarization is unmistakable. Group 1 (left cluster): strongly positive. Group 2 (right cluster): negative. Neither group&amp;rsquo;s confidence band crosses zero. Both effects are statistically significant.&lt;/p>
&lt;p>This is not &amp;ldquo;some countries benefit, others see no effect.&amp;rdquo; It is a genuine sign reversal. Democracy is associated with growth in one group and with decline in another.&lt;/p>
&lt;hr>
&lt;h2 id="9-comparison-what-the-pooled-model-misses">9. Comparison: What the Pooled Model Misses&lt;/h2>
&lt;h3 id="91-summary-table">9.1 Summary table&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>Pooled FE&lt;/th>
&lt;th>C-LASSO Group 1&lt;/th>
&lt;th>C-LASSO Group 2&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>Democracy coefficient&lt;/strong>&lt;/td>
&lt;td>+1.055&lt;/td>
&lt;td>+2.151&lt;/td>
&lt;td>-0.936&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Standard error&lt;/strong>&lt;/td>
&lt;td>0.370&lt;/td>
&lt;td>0.546&lt;/td>
&lt;td>0.348&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>p-value&lt;/strong>&lt;/td>
&lt;td>0.005&lt;/td>
&lt;td>&amp;lt; 0.001&lt;/td>
&lt;td>0.007&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Lagged GDP&lt;/strong>&lt;/td>
&lt;td>0.970&lt;/td>
&lt;td>1.033&lt;/td>
&lt;td>0.979&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Countries&lt;/strong>&lt;/td>
&lt;td>98&lt;/td>
&lt;td>57&lt;/td>
&lt;td>41&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Observations&lt;/strong>&lt;/td>
&lt;td>3,920&lt;/td>
&lt;td>2,280&lt;/td>
&lt;td>1,640&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="92-simpsons-paradox-in-panel-data">9.2 Simpson&amp;rsquo;s paradox in panel data&lt;/h3>
&lt;p>This is &lt;strong>Simpson&amp;rsquo;s paradox&lt;/strong> &amp;mdash; the phenomenon where a trend that appears in aggregated data reverses when you look at subgroups.&lt;/p>
&lt;p>Here is a concrete analogy. A hospital treats two types of patients: mild cases and severe cases. For mild cases, Treatment A has a higher survival rate. For severe cases, Treatment A also has a higher survival rate. But when you pool all patients together, Treatment B appears better &amp;mdash; because it treats a disproportionate number of mild (easy) cases. The aggregate reverses the subgroup trend.&lt;/p>
&lt;p>The same thing happened here. The pooled democracy estimate of $+1.055$ sits between $+2.151$ and $-0.936$. It describes neither group accurately. A policymaker relying on the pooled result would conclude that democracy universally promotes growth. They would miss that for 41 countries (42% of the sample), the relationship runs in the opposite direction.&lt;/p>
&lt;p>The savings model showed the same pattern. The insignificant pooled CPI coefficient ($+0.030$) masked significant effects of $-0.160$ and $+0.197$. When effects have opposite signs, pooling does not just underestimate the magnitude. It produces a qualitatively wrong conclusion.&lt;/p>
&lt;h3 id="93-robustness-of-the-group-structure">9.3 Robustness of the group structure&lt;/h3>
&lt;p>Across all three C-LASSO specifications &amp;mdash; static savings, dynamic savings, and democracy &amp;mdash; the IC consistently selected $K = 2$ groups. The CPI sign reversal survived the switch from static to dynamic, despite a shift in group composition (34/22 to 31/25). This consistency suggests the latent groups are real structural features of the data, not artifacts of a particular specification.&lt;/p>
&lt;hr>
&lt;h2 id="10-summary-and-takeaways">10. Summary and Takeaways&lt;/h2>
&lt;h3 id="101-what-we-learned">10.1 What we learned&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Pooled estimates can be misleading.&lt;/strong> The insignificant pooled CPI coefficient ($+0.030$) in the savings model masked opposing effects of $-0.160$ and $+0.197$ in two latent groups. The pooled democracy coefficient ($+1.055$) masked a split of $+2.151$ versus $-0.936$.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>C-LASSO finds latent groups.&lt;/strong> In all three specifications, the information criterion selected $K = 2$ groups, revealing binary latent structures in both datasets. The &lt;code>classifylasso&lt;/code> command handles the full workflow: estimation, group selection, and postestimation.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The &lt;code>dynamic&lt;/code> option corrects Nickell bias.&lt;/strong> When lagged dependent variables are included, the half-panel jackknife bias correction preserves the group structure while improving within-group R-squared (from 0.20&amp;ndash;0.24 in the static model to 0.44&amp;ndash;0.50 in the dynamic model).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Postestimation tools aid interpretation.&lt;/strong> The &lt;code>classogroup&lt;/code> command visualizes the information criterion, &lt;code>classocoef&lt;/code> plots group-specific coefficients with confidence bands, and &lt;code>predict gid&lt;/code> assigns countries to groups.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="102-limitations">10.2 Limitations&lt;/h3>
&lt;p>Three caveats. First, the IC values in the democracy model were very close across $K = 1$ through $K = 5$ (range 3.267&amp;ndash;3.280). The 2-group structure is optimal but not dominant. Second, the datasets use numeric country codes, not names. We cannot easily identify which countries are in which group. Third, C-LASSO is computationally intensive. The democracy model took over 2.5 hours. Plan accordingly.&lt;/p>
&lt;h3 id="103-exercises">10.3 Exercises&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Sensitivity analysis.&lt;/strong> Re-run the democracy model with &lt;code>rho(0.5)&lt;/code> and &lt;code>rho(1.0)&lt;/code> instead of &lt;code>rho(0.2)&lt;/code>. Does the selected number of groups change? How sensitive are the group assignments to this tuning parameter?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Extended lag structure.&lt;/strong> Following the reference &lt;code>empirical.do&lt;/code>, estimate the democracy model with 2, 3, and 4 lags of GDP (&lt;code>ly1-ly2&lt;/code>, &lt;code>ly1-ly3&lt;/code>, &lt;code>ly1-ly4&lt;/code>). Do the group-specific democracy coefficients remain stable?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Static vs dynamic comparison.&lt;/strong> Run &lt;code>classifylasso savings cpi interest gdp&lt;/code> (without &lt;code>dynamic&lt;/code>) on the savings data and compare group assignments with the dynamic model using &lt;code>tabulate gid_static gid_dynamic&lt;/code>. How many countries switch groups?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="references">References&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>Su, L., Shi, Z., and Phillips, P. C. B. (2016). &lt;a href="https://doi.org/10.3982/ECTA12560" target="_blank" rel="noopener">Identifying latent structures in panel data&lt;/a>. &lt;em>Econometrica&lt;/em>, 84(6), 2215&amp;ndash;2264.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Huang, W., Wang, Y., and Zhou, L. (2024). &lt;a href="https://doi.org/10.1177/1536867X241233664" target="_blank" rel="noopener">Identify latent group structures in panel data: The classifylasso command&lt;/a>. &lt;em>Stata Journal&lt;/em>, 24(1), 173&amp;ndash;203.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Acemoglu, D., Naidu, S., Restrepo, P., and Robinson, J. A. (2019). &lt;a href="https://doi.org/10.1086/700936" target="_blank" rel="noopener">Democracy does cause growth&lt;/a>. &lt;em>Journal of Political Economy&lt;/em>, 127(1), 47&amp;ndash;100.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Dhaene, G. and Jochmans, K. (2015). &lt;a href="https://doi.org/10.1093/restud/rdv007" target="_blank" rel="noopener">Split-panel jackknife estimation of fixed-effect models&lt;/a>. &lt;em>Review of Economic Studies&lt;/em>, 82(3), 991&amp;ndash;1030.&lt;/p>
&lt;/li>
&lt;/ol></description></item><item><title>What Does TWFE Actually Do? Manual Demeaning and the FWL Theorem</title><link>https://carlos-mendez.org/post/r_demeaning_twfe/</link><pubDate>Thu, 02 Apr 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/r_demeaning_twfe/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>Two-way fixed effects (TWFE) is one of the most widely used estimators in applied economics. Packages like &lt;code>fixest&lt;/code> make it easy to estimate TWFE models with a single line of code. But what does the estimator actually &lt;em>do&lt;/em> to the data? Why do time-invariant regressors like geography or colonial origin get dropped? And if you run &lt;code>lm()&lt;/code> on manually demeaned data, should you get the same answer?&lt;/p>
&lt;p>This tutorial answers these questions by taking TWFE apart. We estimate a standard growth regression with country and time fixed effects, then replicate the exact same coefficients by hand &amp;mdash; subtracting country means, time means, and adding back the grand mean before running ordinary least squares. The result is not an approximation: the coefficients match to 12 significant digits. The theoretical foundation for this equivalence is the &lt;strong>Frisch-Waugh-Lovell (FWL) theorem&lt;/strong>, a fundamental result in econometrics that connects controlling for variables in a regression to projecting them out by residualization.&lt;/p>
&lt;p>We use a balanced panel of 150 countries observed over 8 time periods from the Barro convergence dataset. Along the way, we also discover why standard errors from naive &lt;code>lm()&lt;/code> on demeaned data are wrong &amp;mdash; and why you should always use a dedicated panel estimator for inference.&lt;/p>
&lt;p>&lt;strong>Learning objectives:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Understand what two-way fixed effects does mechanically to the data and why time-invariant regressors are dropped&lt;/li>
&lt;li>Implement the two-way demeaning formula step by step: subtract country means, subtract time means, add back the grand mean&lt;/li>
&lt;li>Verify the Frisch-Waugh-Lovell theorem empirically by comparing &lt;code>feols()&lt;/code> and &lt;code>lm()&lt;/code> coefficients&lt;/li>
&lt;li>Interpret why naive standard errors from &lt;code>lm()&lt;/code> on demeaned data are incorrect and how &lt;code>fixest&lt;/code> corrects them&lt;/li>
&lt;li>Visualize the demeaning transformation to build intuition about within-variation identification&lt;/li>
&lt;/ul>
&lt;h2 id="2-the-frisch-waugh-lovell-theorem">2. The Frisch-Waugh-Lovell Theorem&lt;/h2>
&lt;p>Before diving into code, let us build the conceptual foundation. The FWL theorem answers a simple question: if you want to estimate the effect of $X$ on $Y$ while controlling for a set of variables $Z$, do you need to include everything in one big regression?&lt;/p>
&lt;p>Think of it like noise-canceling headphones. Instead of listening to music with the engine noise mixed in, the headphones first &lt;em>subtract out&lt;/em> the engine noise from what you hear. The result is the same music you would hear in a silent room. The FWL theorem says: instead of including all control variables in one regression, you can first &amp;ldquo;subtract them out&amp;rdquo; from both $Y$ and $X$, and then regress the residuals on each other. The coefficient on $X$ will be identical either way.&lt;/p>
&lt;h3 id="applying-fwl-to-two-way-fixed-effects">Applying FWL to two-way fixed effects&lt;/h3>
&lt;p>In a TWFE model, the &amp;ldquo;controls&amp;rdquo; $Z$ are the full set of country dummies and time dummies. Including all these dummies is equivalent to subtracting group means. For a variable $x_{it}$ observed for country $i$ in period $t$, the &lt;strong>two-way demeaned&lt;/strong> version is:&lt;/p>
&lt;p>$$\tilde{x}_{it} = x_{it} - \bar{x}_{i \cdot} - \bar{x}_{\cdot t} + \bar{x}_{\cdot \cdot}$$&lt;/p>
&lt;p>In words, this formula says: take the observed value, subtract the country average (to remove persistent country differences), subtract the time-period average (to remove common shocks), and add back the overall average (to correct for double-subtracting the grand mean).&lt;/p>
&lt;p>Here is what each symbol means:&lt;/p>
&lt;ul>
&lt;li>$x_{it}$ is the observed value for country $i$ at time $t$ &amp;mdash; in code, this is a single cell in the panel dataset&lt;/li>
&lt;li>$\bar{x}_{i \cdot}$ is the &lt;strong>country mean&lt;/strong> &amp;mdash; the average of $x$ across all periods for country $i$&lt;/li>
&lt;li>$\bar{x}_{\cdot t}$ is the &lt;strong>time mean&lt;/strong> &amp;mdash; the average of $x$ across all countries in period $t$&lt;/li>
&lt;li>$\bar{x}_{\cdot \cdot}$ is the &lt;strong>grand mean&lt;/strong> &amp;mdash; the overall average of $x$ across all observations&lt;/li>
&lt;/ul>
&lt;h3 id="why-add-back-the-grand-mean">Why add back the grand mean?&lt;/h3>
&lt;p>When we subtract both the country mean and the time mean, the grand mean gets subtracted &lt;em>twice&lt;/em> &amp;mdash; once as part of $\bar{x}_{i \cdot}$ and once as part of $\bar{x}_{\cdot t}$. Adding $\bar{x}_{\cdot \cdot}$ back corrects for this double subtraction. Think of it like a Venn diagram with two overlapping circles. If you subtract both circles entirely, the overlap region gets removed twice. Adding the overlap back once restores the correct amount. Without this correction, the demeaned variables would not be centered at zero, and the equivalence with TWFE would break.&lt;/p>
&lt;p>The FWL theorem guarantees this equivalence formally:&lt;/p>
&lt;p>$$\hat{\beta}_{\text{TWFE}} = \hat{\beta}_{\text{OLS on demeaned data}}$$&lt;/p>
&lt;p>In words, the slope coefficients from a regression that includes a full set of entity and time dummies are exactly equal to the slopes from OLS applied to the two-way demeaned data. Not approximately &amp;mdash; exactly. Let us verify this with real data.&lt;/p>
&lt;h2 id="3-setup">3. Setup&lt;/h2>
&lt;p>We need &lt;code>fixest&lt;/code> for TWFE estimation and &lt;code>tidyverse&lt;/code> for data wrangling and visualization. The &lt;code>scales&lt;/code> package provides axis formatting utilities.&lt;/p>
&lt;pre>&lt;code class="language-r">library(fixest)
library(tidyverse)
library(scales)
set.seed(42)
# Site color palette
STEEL_BLUE &amp;lt;- &amp;quot;#6a9bcc&amp;quot;
WARM_ORANGE &amp;lt;- &amp;quot;#d97757&amp;quot;
NEAR_BLACK &amp;lt;- &amp;quot;#141413&amp;quot;
TEAL &amp;lt;- &amp;quot;#00d4c8&amp;quot;
# Variables to demean
VARS_TO_DEMEAN &amp;lt;- c(&amp;quot;growth&amp;quot;, &amp;quot;ln_y_initial&amp;quot;, &amp;quot;log_s_k&amp;quot;,
&amp;quot;log_n_gd&amp;quot;, &amp;quot;log_hcap&amp;quot;, &amp;quot;gov_cons&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>We define the six variables that will be demeaned: the dependent variable (&lt;code>growth&lt;/code>) and all five regressors. Keeping them in a vector allows us to apply the demeaning formula programmatically rather than copying and pasting for each variable.&lt;/p>
&lt;h2 id="4-data-loading-and-panel-structure">4. Data Loading and Panel Structure&lt;/h2>
&lt;p>We load a balanced panel dataset with 150 countries observed over 8 time periods. The data comes from a Barro convergence exercise where the key question is whether poorer countries grow faster (conditional convergence). We convert &lt;code>id&lt;/code> and &lt;code>time&lt;/code> to factors so R treats them as categorical grouping variables.&lt;/p>
&lt;pre>&lt;code class="language-r">panel_data &amp;lt;- read.csv(&amp;quot;referenceMaterials/barro_convergence_panel.csv&amp;quot;)
panel_data$id &amp;lt;- factor(panel_data$id)
panel_data$time &amp;lt;- factor(panel_data$time)
cat(&amp;quot;Countries:&amp;quot;, nlevels(panel_data$id), &amp;quot;\n&amp;quot;)
cat(&amp;quot;Time periods:&amp;quot;, nlevels(panel_data$time), &amp;quot;\n&amp;quot;)
cat(&amp;quot;Total observations:&amp;quot;, nrow(panel_data), &amp;quot;\n&amp;quot;)
cat(&amp;quot;Balanced panel:&amp;quot;, all(table(panel_data$id) == nlevels(panel_data$time)), &amp;quot;\n&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Countries: 150
Time periods: 8
Total observations: 1200
Balanced panel: TRUE
&lt;/code>&lt;/pre>
&lt;p>The dataset is a perfectly balanced panel of 150 countries observed across 8 time periods, yielding 1,200 total observations. A balanced panel means every country appears in every period with no missing cells &amp;mdash; the ideal setting for demonstrating the demeaning formula. The key variables are:&lt;/p>
&lt;ul>
&lt;li>&lt;code>growth&lt;/code>: annualized GDP per capita growth rate (dependent variable)&lt;/li>
&lt;li>&lt;code>ln_y_initial&lt;/code>: log of initial income (convergence term)&lt;/li>
&lt;li>&lt;code>log_s_k&lt;/code>: log of the investment share&lt;/li>
&lt;li>&lt;code>log_n_gd&lt;/code>: log of population growth plus depreciation&lt;/li>
&lt;li>&lt;code>log_hcap&lt;/code>: log of human capital&lt;/li>
&lt;li>&lt;code>gov_cons&lt;/code>: government consumption share&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="r_demeaning_twfe_panel_structure.png" alt="Panel structure: 150 countries across 8 time periods, all cells filled.">
&lt;em>Panel structure heatmap showing all 150 countries observed across 8 time periods with no missing cells.&lt;/em>&lt;/p>
&lt;p>The heatmap confirms the balanced structure. Every one of the 150 countries is observed in all 8 time periods. This balance simplifies our demeaning procedure because we can use the closed-form formula directly, without the iterative projection that unbalanced panels would require.&lt;/p>
&lt;h2 id="5-twfe-estimation-with-fixest">5. TWFE Estimation with fixest&lt;/h2>
&lt;p>The &lt;code>fixest&lt;/code> package makes TWFE estimation straightforward. The formula uses &lt;code>|&lt;/code> to separate the regressors (left) from the fixed effects dimensions (right). Writing &lt;code>| id + time&lt;/code> tells &lt;code>feols()&lt;/code> to absorb both country and time fixed effects. Internally, &lt;code>fixest&lt;/code> performs an efficient iterative demeaning algorithm to remove the fixed effects before estimating the slope coefficients.&lt;/p>
&lt;pre>&lt;code class="language-r">twfe_model &amp;lt;- feols(
growth ~ ln_y_initial + log_s_k + log_n_gd + log_hcap + gov_cons | id + time,
data = panel_data
)
summary(twfe_model)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">OLS estimation, Dep. Var.: growth
Observations: 1,200
Fixed-effects: id: 150, time: 8
Standard-errors: Clustered (id)
Estimate Std. Error t value Pr(&amp;gt;|t|)
ln_y_initial -0.055286 0.003744 -14.765156 &amp;lt; 2.2e-16 ***
log_s_k 0.019725 0.007583 2.601311 0.010223 *
log_n_gd -0.049614 0.022168 -2.238117 0.026696 *
log_hcap 0.009081 0.014564 0.623549 0.533877
gov_cons -0.102795 0.046398 -2.215501 0.028243 *
RMSE: 0.020517 Adj. R2: 0.755103
Within R2: 0.176777
&lt;/code>&lt;/pre>
&lt;p>The TWFE model reveals strong conditional beta-convergence &amp;mdash; the hypothesis that poorer countries tend to grow faster, so income levels converge over time. The coefficient on log initial income is -0.055 (t = -14.77, p &amp;lt; 2.2e-16), meaning that a 1% higher initial income is associated with 0.055 percentage points slower subsequent growth, after controlling for the other covariates. Investment has the expected positive effect (0.020, p = 0.010), population growth has the expected negative effect (-0.050, p = 0.027), and government consumption is significantly negative (-0.103, p = 0.028). Human capital is positive but not statistically significant (0.009, p = 0.534). The model explains 75.5% of total variation (Adj. R-squared = 0.755), though only 17.7% of the within-variation (Within R-squared = 0.177) &amp;mdash; typical for panel models where fixed effects absorb most cross-country heterogeneity.&lt;/p>
&lt;p>Now let us replicate these coefficients by hand.&lt;/p>
&lt;h2 id="6-manual-demeaning-----step-by-step">6. Manual Demeaning &amp;mdash; Step by Step&lt;/h2>
&lt;p>We now walk through the demeaning procedure one step at a time. The goal is to transform every variable so that the country and time effects are removed. We will then run plain OLS on the result and verify that the coefficients match.&lt;/p>
&lt;h3 id="step-1-country-means">Step 1: Country means&lt;/h3>
&lt;p>For each country, we compute the average of each variable across all time periods. This gives us one mean per country per variable &amp;mdash; capturing persistent country characteristics like geography, institutions, or long-run income level.&lt;/p>
&lt;pre>&lt;code class="language-r">country_means &amp;lt;- panel_data |&amp;gt;
group_by(id) |&amp;gt;
summarise(across(all_of(VARS_TO_DEMEAN), mean), .groups = &amp;quot;drop&amp;quot;)
&lt;/code>&lt;/pre>
&lt;h3 id="step-2-time-means">Step 2: Time means&lt;/h3>
&lt;p>For each time period, we compute the average of each variable across all countries. These time means capture common shocks or trends that affect all countries in a given period &amp;mdash; for instance, a global recession or a worldwide productivity boom.&lt;/p>
&lt;pre>&lt;code class="language-r">time_means &amp;lt;- panel_data |&amp;gt;
group_by(time) |&amp;gt;
summarise(across(all_of(VARS_TO_DEMEAN), mean), .groups = &amp;quot;drop&amp;quot;)
&lt;/code>&lt;/pre>
&lt;h3 id="step-3-grand-mean">Step 3: Grand mean&lt;/h3>
&lt;p>The grand mean is simply the overall average of each variable across all countries and all time periods. It is a single number per variable, and we need it to correct for the double subtraction.&lt;/p>
&lt;pre>&lt;code class="language-r">grand_means &amp;lt;- colMeans(panel_data[VARS_TO_DEMEAN])
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> growth ln_y_initial log_s_k log_n_gd log_hcap gov_cons
-0.1243637 5.3643127 -1.5699117 -2.6569021 0.6645657 0.1461335
&lt;/code>&lt;/pre>
&lt;h3 id="step-4-apply-the-demeaning-formula">Step 4: Apply the demeaning formula&lt;/h3>
&lt;p>Now we bring everything together. We merge the country means and time means back into the main dataset, then apply the formula $\tilde{x}_{it} = x_{it} - \bar{x}_{i \cdot} - \bar{x}_{\cdot t} + \bar{x}_{\cdot \cdot}$ programmatically to each variable.&lt;/p>
&lt;pre>&lt;code class="language-r"># Merge means
panel_dm &amp;lt;- panel_data |&amp;gt;
left_join(
country_means |&amp;gt; rename_with(~ paste0(.x, &amp;quot;_cmean&amp;quot;), all_of(VARS_TO_DEMEAN)),
by = &amp;quot;id&amp;quot;
) |&amp;gt;
left_join(
time_means |&amp;gt; rename_with(~ paste0(.x, &amp;quot;_tmean&amp;quot;), all_of(VARS_TO_DEMEAN)),
by = &amp;quot;time&amp;quot;
)
# Apply demeaning formula
for (v in VARS_TO_DEMEAN) {
panel_dm[[paste0(v, &amp;quot;_dm&amp;quot;)]] &amp;lt;-
panel_dm[[v]] -
panel_dm[[paste0(v, &amp;quot;_cmean&amp;quot;)]] -
panel_dm[[paste0(v, &amp;quot;_tmean&amp;quot;)]] +
grand_means[v]
}
&lt;/code>&lt;/pre>
&lt;p>Let us verify that the demeaning worked correctly. If the formula is implemented right, the mean of each demeaned variable should be approximately zero.&lt;/p>
&lt;pre>&lt;code class="language-text">Mean of demeaned variables (should be ~0):
growth_dm : -8.114169e-17
ln_y_initial_dm : 8.295170e-15
log_s_k_dm : -1.482923e-15
log_n_gd_dm : 1.599953e-15
log_hcap_dm : 5.384582e-17
gov_cons_dm : 1.832302e-16
&lt;/code>&lt;/pre>
&lt;p>All six demeaned variables have means on the order of $10^{-15}$ to $10^{-17}$ &amp;mdash; effectively zero within floating-point precision. The demeaning formula is implemented correctly: the within-variation that remains is purely the deviation from both entity-specific and time-specific patterns.&lt;/p>
&lt;h2 id="7-ols-on-the-demeaned-data">7. OLS on the Demeaned Data&lt;/h2>
&lt;p>With the demeaning complete, we run a standard OLS regression on the demeaned variables using base R&amp;rsquo;s &lt;code>lm()&lt;/code>. We deliberately use &lt;code>lm()&lt;/code> rather than &lt;code>feols()&lt;/code> to emphasize that this is plain ordinary least squares &amp;mdash; no fixed effects machinery is involved.&lt;/p>
&lt;pre>&lt;code class="language-r">manual_model &amp;lt;- lm(
growth_dm ~ ln_y_initial_dm + log_s_k_dm + log_n_gd_dm + log_hcap_dm + gov_cons_dm,
data = panel_dm
)
summary(manual_model)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Coefficients:
Estimate Std. Error t value Pr(&amp;gt;|t|)
(Intercept) 5.035e-16 5.938e-04 0.000 1.00000
ln_y_initial_dm -5.529e-02 3.618e-03 -15.282 &amp;lt; 2e-16 ***
log_s_k_dm 1.972e-02 6.846e-03 2.881 0.00403 **
log_n_gd_dm -4.961e-02 1.820e-02 -2.726 0.00651 **
log_hcap_dm 9.081e-03 1.370e-02 0.663 0.50751
gov_cons_dm -1.028e-01 4.411e-02 -2.331 0.01994 *
Residual standard error: 0.02057 on 1194 degrees of freedom
Multiple R-squared: 0.1768
&lt;/code>&lt;/pre>
&lt;p>Two things stand out. First, the &lt;strong>intercept is 5.03 x 10^-16&lt;/strong> &amp;mdash; effectively zero. After proper two-way demeaning, the mean of all demeaned variables is near zero, so there is nothing left for the intercept to capture. This is a good sanity check: if the grand mean correction had been omitted, the intercept would be non-zero. Second, the &lt;strong>slope coefficients&lt;/strong> look identical to those from &lt;code>feols()&lt;/code>. But &amp;ldquo;look identical&amp;rdquo; is not the same as &amp;ldquo;are identical.&amp;rdquo; The next section proves they are.&lt;/p>
&lt;h2 id="8-coefficient-comparison-the-proof">8. Coefficient Comparison: The Proof&lt;/h2>
&lt;p>We now place the coefficients from both approaches side by side and compute their difference. If the FWL theorem holds, the slope coefficients must be identical up to floating-point precision.&lt;/p>
&lt;pre>&lt;code class="language-r">twfe_coefs &amp;lt;- coef(twfe_model)
manual_coefs &amp;lt;- coef(manual_model)[-1] # drop intercept
names(manual_coefs) &amp;lt;- names(twfe_coefs)
comparison &amp;lt;- data.frame(
feols_TWFE = round(twfe_coefs, 12),
Manual_OLS = round(manual_coefs, 12),
Difference = twfe_coefs - manual_coefs
)
all.equal(unname(twfe_coefs), unname(manual_coefs))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Side-by-side coefficient comparison:
variable feols_TWFE manual_OLS difference
ln_y_initial -0.055286009819 -0.055286009819 -4.163336342e-17
log_s_k 0.019724899416 0.019724899416 3.469446952e-18
log_n_gd -0.049613972524 -0.049613972524 -2.775557562e-16
log_hcap 0.009081150621 0.009081150621 3.469446952e-17
gov_cons -0.102795317426 -0.102795317426 -3.053113318e-16
Maximum absolute difference: 3.053113e-16
all.equal() test: TRUE
&lt;/code>&lt;/pre>
&lt;p>This is the central result of the tutorial. All five slope coefficients are identical to at least 12 significant digits. The largest difference is 3.05 x 10^-16 &amp;mdash; on the order of IEEE 754 double-precision machine epsilon (~2.2 x 10^-16). R&amp;rsquo;s &lt;code>all.equal()&lt;/code> function confirms equality within its default tolerance. This is not an approximation: it is an exact algebraic identity guaranteed by the Frisch-Waugh-Lovell theorem.&lt;/p>
&lt;p>&lt;img src="r_demeaning_twfe_coef_comparison.png" alt="TWFE and manual demeaning coefficients overlap perfectly for all five variables.">
&lt;em>Coefficient comparison: feols TWFE (blue circles) and manual demeaning OLS (orange triangles) occupy the exact same positions.&lt;/em>&lt;/p>
&lt;p>The dot plot makes the equivalence visually concrete. For each of the five covariates, the steel blue circle (feols TWFE) and warm orange triangle (manual demeaning OLS) occupy the exact same position. Government consumption has the largest coefficient in magnitude at -0.103, while the convergence parameter (log initial income) sits at -0.055. The dashed zero line helps distinguish positive from negative effects.&lt;/p>
&lt;h2 id="9-visualizing-what-demeaning-does">9. Visualizing What Demeaning Does&lt;/h2>
&lt;p>The coefficient equivalence is proven, but what does demeaning &lt;em>look like&lt;/em>? How does it change the data? The following visualizations build intuition about the transformation.&lt;/p>
&lt;p>&lt;img src="r_demeaning_twfe_scatter_before_after.png" alt="Raw data shows wide cross-country spread; demeaned data collapses to a narrow range around zero.">
&lt;em>Before vs after two-way demeaning: the wide cross-country spread (left) collapses to a narrow range around zero (right).&lt;/em>&lt;/p>
&lt;p>The faceted scatter plot tells the story. In the left panel (raw data), 10 countries are plotted with log initial income on the x-axis and growth on the y-axis. Each country&amp;rsquo;s observations form a distinct cluster at different income levels &amp;mdash; the x-axis spans roughly 3 to 9. In the right panel (after demeaning), the same data is compressed to approximately -0.5 to 0.3 around zero. The between-country income differences and common time trends have been stripped away, leaving only the &lt;strong>within-variation&lt;/strong> &amp;mdash; the deviations from each country&amp;rsquo;s own average and each period&amp;rsquo;s common trend. This is the variation that identifies the TWFE coefficient.&lt;/p>
&lt;h3 id="decomposing-the-formula-for-one-country">Decomposing the formula for one country&lt;/h3>
&lt;p>To see exactly how the formula works, let us trace each component for Country 1&amp;rsquo;s growth rate across all 8 periods.&lt;/p>
&lt;p>&lt;img src="r_demeaning_twfe_decomposition.png" alt="Observed values, country mean, time means, grand mean, and the demeaned residual for Country 1.">
&lt;em>Demeaning decomposition for Country 1: observed growth (blue), country mean (orange dashed), time means (teal), grand mean (gray), and the demeaned residual (black).&lt;/em>&lt;/p>
&lt;p>The decomposition makes the formula concrete. The observed growth values (blue line) decline from about -0.18 to -0.07. The country mean (orange dashed line) is a flat horizontal at -0.127 &amp;mdash; this is $\bar{x}_{i \cdot}$. The time means (teal dot-dash line) capture the common cross-country trend, declining from -0.189 to -0.076 &amp;mdash; this is $\bar{x}_{\cdot t}$. The grand mean (gray dotted) sits at -0.124 &amp;mdash; this is $\bar{x}_{\cdot \cdot}$. The demeaned series (black line) is the residual: $\tilde{x}_{it} = x_{it} - \bar{x}_{i \cdot} - \bar{x}_{\cdot t} + \bar{x}_{\cdot \cdot}$. It fluctuates around zero, capturing only the within-country, within-period deviations that TWFE uses for identification.&lt;/p>
&lt;h2 id="10-a-caveat-standard-errors-differ">10. A Caveat: Standard Errors Differ&lt;/h2>
&lt;p>While the coefficients are identical, the &lt;strong>standard errors&lt;/strong> from &lt;code>lm()&lt;/code> on demeaned data are wrong. This is a critical practical point that many textbooks gloss over.&lt;/p>
&lt;pre>&lt;code class="language-r">se_naive &amp;lt;- summary(manual_model)$coefficients[-1, &amp;quot;Std. Error&amp;quot;]
se_feols_iid &amp;lt;- se(twfe_model, se = &amp;quot;iid&amp;quot;)
se_feols_cl &amp;lt;- se(twfe_model) # default: clustered by first FE
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Standard error comparison:
variable se_naive_lm se_feols_iid se_feols_cluster
ln_y_initial 0.00361766 0.00388000 0.00374436
log_s_k 0.00684559 0.00734199 0.00758268
log_n_gd 0.01820117 0.01952104 0.02216773
log_hcap 0.01369872 0.01469209 0.01456365
gov_cons 0.04410809 0.04730660 0.04639822
&lt;/code>&lt;/pre>
&lt;p>Why do they differ? The &lt;code>lm()&lt;/code> function does not know that 157 degrees of freedom were consumed by estimating 150 country effects and 8 time effects (minus 1 for normalization). It uses $df = N \times T - K = 1{,}195$ when the correct value is $N \times T - N - T + 1 - K = 1{,}038$. This makes naive SEs systematically too small &amp;mdash; they understate uncertainty by 7&amp;ndash;22% depending on the variable.&lt;/p>
&lt;p>&lt;img src="r_demeaning_twfe_se_comparison.png" alt="Naive lm() SEs are systematically smaller than both feols variants.">
&lt;em>Standard error comparison: naive lm() (gray) systematically underestimates uncertainty compared to feols IID (orange) and clustered (blue).&lt;/em>&lt;/p>
&lt;p>The grouped bar chart makes the pattern clear. For every variable, the gray bars (naive &lt;code>lm()&lt;/code>) are shorter than the orange (feols IID) and blue (feols clustered) bars. The gap is most visible for &lt;code>log(n+g+d)&lt;/code>, where the naive SE is 0.0182 versus 0.0222 for clustered &amp;mdash; a 22% understatement. The feols IID SEs correct for the degrees-of-freedom adjustment, while the clustered SEs additionally account for within-entity serial correlation. The practical lesson: &lt;strong>always use a dedicated panel estimator for inference&lt;/strong>, even though &lt;code>lm()&lt;/code> on demeaned data gives the correct point estimates.&lt;/p>
&lt;h2 id="11-discussion">11. Discussion&lt;/h2>
&lt;p>This tutorial has demonstrated a fundamental equivalence in econometrics. TWFE is not a special estimator &amp;mdash; it is ordinary least squares applied to data that has been demeaned by entity and time. The &lt;code>fixest&lt;/code> package automates this process efficiently, but the underlying operation is straightforward subtraction. The FWL theorem guarantees the equivalence mathematically, and our empirical verification confirms it to machine precision.&lt;/p>
&lt;p>Three practical insights emerge:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Demeaning reveals what FE can and cannot identify.&lt;/strong> Any variable that does not vary within a country over time (like geography or colonial history) has a country mean equal to itself. After demeaning, such a variable becomes zero everywhere and drops out of the regression. This is why fixed effects models cannot estimate the effect of time-invariant characteristics.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The grand mean correction is not optional.&lt;/strong> Omitting the $+ \bar{x}_{\cdot \cdot}$ term in the demeaning formula would double-subtract the overall level, producing a non-zero intercept and subtly wrong demeaned values. The correction is algebraically necessary for the FWL equivalence to hold.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Correct coefficients do not mean correct inference.&lt;/strong> The &lt;code>lm()&lt;/code> standard errors are too small because they ignore the degrees of freedom consumed by the absorbed fixed effects. In applied work, this means artificially narrow confidence intervals and inflated t-statistics. Always use &lt;code>feols()&lt;/code> or an equivalent panel estimator for standard errors and hypothesis testing.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="12-summary-and-next-steps">12. Summary and Next Steps&lt;/h2>
&lt;p>&lt;strong>Key takeaways:&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>
&lt;p>TWFE estimation via &lt;code>feols()&lt;/code> and OLS on manually demeaned data produce identical coefficients &amp;mdash; the maximum difference across 5 coefficients is 3.05 x 10^-16, confirming the FWL theorem.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The demeaning formula subtracts entity means and time means, then adds back the grand mean to correct for double subtraction. After demeaning, all variable means are effectively zero (order of 10^-15).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The Within R-squared of 0.177 versus the overall Adjusted R-squared of 0.755 shows that most variation in growth is absorbed by the fixed effects, not by the regressors.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Naive &lt;code>lm()&lt;/code> standard errors understate uncertainty by 7&amp;ndash;22% because they ignore the 157 degrees of freedom consumed by the fixed effects. Always use a dedicated panel estimator for inference.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Limitations:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>The dataset is simulated, so coefficient values reflect the data-generating process rather than real-world economic dynamics.&lt;/li>
&lt;li>The tutorial assumes a balanced panel. With unbalanced panels, the simple closed-form demeaning still works algebraically, but &lt;code>fixest&lt;/code> uses a more efficient iterative algorithm.&lt;/li>
&lt;li>The SE comparison covers only IID and entity-clustered SEs. Other corrections (heteroskedasticity-robust, Driscoll-Kraay for cross-sectional dependence) may be relevant in applied work.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Next steps:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Apply the demeaning logic to understand why specific variables drop out of your own FE models.&lt;/li>
&lt;li>Explore heterogeneous treatment effects with interaction-weighted TWFE estimators.&lt;/li>
&lt;li>Read Cunningham (2021), &lt;em>Causal Inference: The Mixtape&lt;/em>, Chapter 9, for the connection between TWFE demeaning and difference-in-differences designs.&lt;/li>
&lt;/ul>
&lt;h2 id="13-exercises">13. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Omit the grand mean correction.&lt;/strong> Modify the demeaning formula to skip the $+ \bar{x}_{\cdot \cdot}$ term. Run &lt;code>lm()&lt;/code> on the incorrectly demeaned data. What happens to the intercept? Do the slope coefficients still match the TWFE estimates? Why or why not?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>One-way demeaning.&lt;/strong> Repeat the exercise using only entity demeaning (subtract country means, skip time means). Compare the coefficients to a one-way FE model (&lt;code>feols(growth ~ ... | id)&lt;/code>). Verify the equivalence and examine how the coefficients change compared to the two-way specification.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Visualize a different variable.&lt;/strong> Recreate the demeaning decomposition plot (Section 9) for &lt;code>log_s_k&lt;/code> (investment share) instead of &lt;code>growth&lt;/code>. Does the country mean, time mean, or within-variation dominate for this variable? What does this tell you about the source of variation that identifies its coefficient?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="14-references">14. References&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>Frisch, R. and Waugh, F.V. (1933). &amp;ldquo;Partial Time Regressions as Compared with Individual Trends.&amp;rdquo; &lt;em>Econometrica&lt;/em>, 1(4), 387&amp;ndash;401.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Lovell, M.C. (1963). &amp;ldquo;Seasonal Adjustment of Economic Time Series and Multiple Regression Analysis.&amp;rdquo; &lt;em>Journal of the American Statistical Association&lt;/em>, 58(304), 993&amp;ndash;1010.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Berge, L. (2018). &lt;em>fixest: Fast Fixed-Effects Estimations&lt;/em>. R package. &lt;a href="https://cran.r-project.org/package=fixest" target="_blank" rel="noopener">CRAN&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Cunningham, S. (2021). &lt;em>Causal Inference: The Mixtape&lt;/em>. Yale University Press. &lt;a href="https://mixtape.scunning.com/" target="_blank" rel="noopener">Online edition&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Barro, R.J. and Sala-i-Martin, X. (2004). &lt;em>Economic Growth&lt;/em>. 2nd edition. MIT Press.&lt;/p>
&lt;/li>
&lt;/ol></description></item><item><title>Standard Errors in Panel Data: A Beginner's Guide in Python</title><link>https://carlos-mendez.org/post/python_panel_ses/</link><pubDate>Tue, 31 Mar 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/python_panel_ses/</guid><description>&lt;p>&lt;a href="https://colab.research.google.com/github/cmg777/starter-academic-v501/blob/master/content/post/python_panel_ses/notebook.ipynb" target="_blank">&lt;img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab">&lt;/a>&lt;/p>
&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>Imagine you run a regression and find that R&amp;amp;D spending significantly boosts firm performance, with a t-statistic of 30. Sounds like a rock-solid result. But what if that impressive t-statistic is an illusion &amp;mdash; a consequence of using the wrong formula for your standard errors? In panel data, where the same firms are observed year after year, this is not a hypothetical worry. The repeated observations within each firm create &lt;em>correlation patterns&lt;/em> that violate the assumptions behind ordinary standard errors, and ignoring these patterns can make your estimates look far more precise than they actually are.&lt;/p>
&lt;p>Standard errors are the bridge between a point estimate and a statistical conclusion. If that bridge is built on the wrong assumptions, the conclusion collapses. In a classic cross-sectional regression with independent observations, conventional standard errors work well. But panel data &amp;mdash; where firm 1 in 2015 is related to firm 1 in 2016 &amp;mdash; breaks the independence assumption. A firm that performs well one year tends to perform well the next. Errors within the same firm are correlated, and this &lt;em>within-cluster correlation&lt;/em> means conventional standard errors understate the true uncertainty surrounding your estimates.&lt;/p>
&lt;p>The solution is to use standard error estimators that account for the structure of the data. In this tutorial, we build a simulated panel of 100 firms over 10 years with a &lt;em>known true effect&lt;/em>, then systematically compare six approaches to standard error estimation: conventional, White (heteroskedasticity-robust), entity-clustered, time-clustered, two-way clustered, and Driscoll-Kraay. Along the way, we discover two critical lessons. First, no standard error estimator can rescue a biased estimator &amp;mdash; fixed effects are needed to remove omitted variable bias. Second, even after fixing bias, the &lt;em>choice&lt;/em> of standard error estimator determines whether our confidence intervals have the coverage they promise. The tutorial is inspired by and builds upon the excellent reference by &lt;a href="https://vincent.codes.finance/posts/panel-ols-standard-errors/" target="_blank" rel="noopener">Gregoire (2024)&lt;/a>, while using original simulated data and explanations.&lt;/p>
&lt;p>&lt;strong>Learning objectives:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Understand why within-cluster correlation invalidates conventional standard errors in panel data&lt;/li>
&lt;li>Implement six standard error estimators using Python&amp;rsquo;s &lt;code>linearmodels&lt;/code> package&lt;/li>
&lt;li>Compare how different SE choices affect t-statistics and inference for the same regression&lt;/li>
&lt;li>Assess empirical rejection rates via Monte Carlo simulation to identify which SEs correctly control size &amp;mdash; that is, reject the true null hypothesis no more than 5% of the time&lt;/li>
&lt;li>Distinguish between the bias problem (which SEs cannot fix) and the inference problem (which SEs can fix)&lt;/li>
&lt;/ul>
&lt;h2 id="2-setup-and-imports">2. Setup and imports&lt;/h2>
&lt;p>Before running the analysis, install the required package if needed:&lt;/p>
&lt;pre>&lt;code class="language-bash">pip install linearmodels
&lt;/code>&lt;/pre>
&lt;p>The &lt;code>linearmodels&lt;/code> library, developed by &lt;a href="https://bashtage.github.io/linearmodels/" target="_blank" rel="noopener">Kevin Sheppard&lt;/a>, extends &lt;code>statsmodels&lt;/code> with specialized panel data estimators. It provides &lt;a href="https://bashtage.github.io/linearmodels/panel/panel/linearmodels.panel.model.PanelOLS.html" target="_blank" rel="noopener">PanelOLS&lt;/a> for fixed effects regressions with flexible covariance options. The &lt;code>from_formula()&lt;/code> method accepts R-style formulas where &lt;code>EntityEffects&lt;/code> and &lt;code>TimeEffects&lt;/code> keywords absorb group-level fixed effects.&lt;/p>
&lt;pre>&lt;code class="language-python">import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
from linearmodels.panel import PanelOLS
# Reproducibility
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
# Site color palette
STEEL_BLUE = &amp;quot;#6a9bcc&amp;quot;
WARM_ORANGE = &amp;quot;#d97757&amp;quot;
NEAR_BLACK = &amp;quot;#141413&amp;quot;
TEAL = &amp;quot;#00d4c8&amp;quot;
&lt;/code>&lt;/pre>
&lt;details>
&lt;summary>&lt;strong>Dark theme figure styling&lt;/strong> (click to expand)&lt;/summary>
&lt;pre>&lt;code class="language-python"># Dark theme palette (consistent with site navbar/dark sections)
DARK_NAVY = &amp;quot;#0f1729&amp;quot;
GRID_LINE = &amp;quot;#1f2b5e&amp;quot;
LIGHT_TEXT = &amp;quot;#c8d0e0&amp;quot;
WHITE_TEXT = &amp;quot;#e8ecf2&amp;quot;
# Plot defaults — minimal, spine-free, dark background
plt.rcParams.update({
&amp;quot;figure.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.edgecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.linewidth&amp;quot;: 0,
&amp;quot;axes.labelcolor&amp;quot;: LIGHT_TEXT,
&amp;quot;axes.titlecolor&amp;quot;: WHITE_TEXT,
&amp;quot;axes.spines.top&amp;quot;: False,
&amp;quot;axes.spines.right&amp;quot;: False,
&amp;quot;axes.spines.left&amp;quot;: False,
&amp;quot;axes.spines.bottom&amp;quot;: False,
&amp;quot;axes.grid&amp;quot;: True,
&amp;quot;grid.color&amp;quot;: GRID_LINE,
&amp;quot;grid.linewidth&amp;quot;: 0.6,
&amp;quot;grid.alpha&amp;quot;: 0.8,
&amp;quot;xtick.color&amp;quot;: LIGHT_TEXT,
&amp;quot;ytick.color&amp;quot;: LIGHT_TEXT,
&amp;quot;xtick.major.size&amp;quot;: 0,
&amp;quot;ytick.major.size&amp;quot;: 0,
&amp;quot;text.color&amp;quot;: WHITE_TEXT,
&amp;quot;font.size&amp;quot;: 12,
&amp;quot;legend.frameon&amp;quot;: False,
&amp;quot;legend.fontsize&amp;quot;: 11,
&amp;quot;legend.labelcolor&amp;quot;: LIGHT_TEXT,
&amp;quot;figure.edgecolor&amp;quot;: DARK_NAVY,
&amp;quot;savefig.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;savefig.edgecolor&amp;quot;: DARK_NAVY,
})
&lt;/code>&lt;/pre>
&lt;/details>
&lt;h2 id="3-the-data-generating-process">3. The data generating process&lt;/h2>
&lt;h3 id="31-why-simulated-data">3.1 Why simulated data?&lt;/h3>
&lt;p>When studying standard errors, simulated data has a decisive advantage over real data: we &lt;em>know the true answer&lt;/em>. If the true effect of R&amp;amp;D on performance is exactly 0.5, we can check whether each standard error estimator produces confidence intervals that contain 0.5 roughly 95% of the time. With real data, we never know the truth, so we cannot directly evaluate whether our SEs are working correctly.&lt;/p>
&lt;p>Think of it like testing a thermometer. You would not test it in unknown conditions &amp;mdash; you would dip it in ice water (0 degrees C) and boiling water (100 degrees C) to see if it reads correctly. Simulated data serves as our &amp;ldquo;known temperature.&amp;rdquo;&lt;/p>
&lt;h3 id="32-the-dgp">3.2 The DGP&lt;/h3>
&lt;p>Our data generating process creates a panel of 100 firms observed over 10 years. The key feature is that &lt;em>firm ability&lt;/em> &amp;mdash; an unobserved characteristic that differs across firms but stays constant over time &amp;mdash; affects both R&amp;amp;D intensity and firm performance. This creates omitted variable bias in pooled regressions, exactly the scenario that motivates fixed effects.&lt;/p>
&lt;p>The true model is:&lt;/p>
&lt;p>$$y_{it} = 2.0 + 0.5 \cdot x_{it} + \mu_i + \lambda_t + \varepsilon_{it}$$&lt;/p>
&lt;p>In words, firm performance ($y$) equals a constant (2.0) plus the true causal effect of R&amp;amp;D intensity ($x$) times 0.5, plus a firm-specific effect ($\mu_i$), a time-specific effect ($\lambda_t$), and an idiosyncratic error ($\varepsilon_{it}$). The firm effect $\mu_i$ is correlated with $x_{it}$ &amp;mdash; more capable firms invest more in R&amp;amp;D &amp;mdash; which means pooled OLS will overestimate the true effect. The errors follow an AR(1) &amp;mdash; or &lt;em>first-order autoregressive&lt;/em> &amp;mdash; process within each firm, meaning each year&amp;rsquo;s error depends on the previous year&amp;rsquo;s error (with autocorrelation coefficient $\rho = 0.5$). This creates the within-cluster serial correlation that makes standard error choice critical.&lt;/p>
&lt;p>In code, $y$ corresponds to our &lt;code>y&lt;/code> column, $x$ is &lt;code>x&lt;/code> (R&amp;amp;D intensity), and $\mu_i$ is the unobserved firm fixed effect that we will absorb with &lt;code>EntityEffects&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-python">def simulate_panel(n_firms=100, n_years=10, seed=42):
&amp;quot;&amp;quot;&amp;quot;Simulate a panel dataset with firm and time effects.
True DGP:
y_it = 2.0 + 0.5 * x_it + mu_i + lambda_t + eps_it
Where mu_i is correlated with x_it (firm ability drives both
R&amp;amp;D and performance), and eps_it has AR(1) serial correlation
within firms (rho = 0.5).
The TRUE causal effect of x on y is beta = 0.5.
&amp;quot;&amp;quot;&amp;quot;
rng = np.random.default_rng(seed)
firms = np.repeat(np.arange(1, n_firms + 1), n_years)
years = np.tile(np.arange(2010, 2010 + n_years), n_firms)
# Firm-level unobserved heterogeneity (ability)
firm_ability = rng.normal(0, 2, n_firms)
mu = np.repeat(firm_ability, n_years)
# Time effects (business cycle)
time_shocks = rng.normal(0, 0.5, n_years)
lam = np.tile(time_shocks, n_firms)
# Treatment: R&amp;amp;D intensity (correlated with firm ability)
x = 3.0 + 0.8 * mu + rng.normal(0, 1.5, n_firms * n_years)
# Idiosyncratic errors with within-firm AR(1) serial correlation
eps = np.zeros(n_firms * n_years)
rho_ar = 0.5
for i in range(n_firms):
start = i * n_years
eps[start] = rng.normal(0, 1.5)
for t in range(1, n_years):
eps[start + t] = rho_ar * eps[start + t - 1] + rng.normal(0, 1.5)
# True model
y = 2.0 + 0.5 * x + mu + lam + eps
return pd.DataFrame({&amp;quot;firm&amp;quot;: firms, &amp;quot;year&amp;quot;: years, &amp;quot;y&amp;quot;: y, &amp;quot;x&amp;quot;: x})
df = simulate_panel(n_firms=100, n_years=10, seed=42)
print(f&amp;quot;Dataset shape: {df.shape}&amp;quot;)
print(f&amp;quot;Number of firms: {df['firm'].nunique()}&amp;quot;)
print(f&amp;quot;Number of years: {df['year'].nunique()}&amp;quot;)
print(df.head())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Dataset shape: (1000, 4)
Number of firms: 100
Number of years: 10
firm year y x
1 2010 6.721042 4.139183
1 2011 5.889161 3.844151
1 2012 2.355109 2.596322
1 2013 2.589589 1.318461
1 2014 3.569626 3.595742
&lt;/code>&lt;/pre>
&lt;p>The simulated panel contains 1,000 observations &amp;mdash; 100 firms, each observed over 10 years from 2010 to 2019. Firm 1&amp;rsquo;s performance (&lt;code>y&lt;/code>) ranges from about 2.4 to 6.7 across the decade, and its R&amp;amp;D intensity (&lt;code>x&lt;/code>) varies between 1.3 and 4.1. These year-to-year fluctuations within a single firm represent the &lt;em>within-firm variation&lt;/em> that fixed effects regressions exploit, while the systematic differences across firms (some consistently high, others consistently low) represent the &lt;em>between-firm variation&lt;/em> that firm fixed effects absorb.&lt;/p>
&lt;pre>&lt;code class="language-python">print(df.describe().round(4))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> firm year y x
count 1000.0000 1000.0000 1000.0000 1000.0000
mean 50.5000 2014.5000 2.9699 2.8984
std 28.8805 2.8737 2.9686 1.9783
min 1.0000 2010.0000 -7.0880 -3.0834
25% 25.7500 2012.0000 0.9376 1.5721
50% 50.5000 2014.5000 2.9351 2.9669
75% 75.2500 2017.0000 5.0383 4.1769
max 100.0000 2019.0000 13.5170 9.1612
&lt;/code>&lt;/pre>
&lt;p>Firm performance (&lt;code>y&lt;/code>) averages 2.97 with a standard deviation of 2.97, spanning from -7.09 to 13.52. R&amp;amp;D intensity (&lt;code>x&lt;/code>) averages 2.90 with a standard deviation of 1.98. The wide ranges in both variables reflect the combination of genuine within-firm fluctuations and the large cross-firm differences injected by firm fixed effects. Next, we decompose this total variation to understand how much comes from differences &lt;em>between&lt;/em> firms versus changes &lt;em>within&lt;/em> firms over time.&lt;/p>
&lt;h2 id="4-exploring-the-panel-structure">4. Exploring the panel structure&lt;/h2>
&lt;p>Before estimating any model, we need to understand the structure of our panel data. A key diagnostic is the &lt;em>decomposition of variance&lt;/em> into between-firm and within-firm components. This tells us where the action is &amp;mdash; and why pooled OLS can go wrong.&lt;/p>
&lt;h3 id="41-between-vs-within-variation">4.1 Between vs. within variation&lt;/h3>
&lt;p>Think of variation in firm performance like variation in student test scores within a school. Some variation comes from differences &lt;em>between&lt;/em> students (some students are consistently stronger than others) and some comes from variation &lt;em>within&lt;/em> students over time (a student scores differently on different exams). In panel data, the &amp;ldquo;between&amp;rdquo; component captures persistent firm-level differences, while the &amp;ldquo;within&amp;rdquo; component captures how each firm deviates from its own average over time.&lt;/p>
&lt;pre>&lt;code class="language-python"># Panel balance check
obs_per_firm = df.groupby(&amp;quot;firm&amp;quot;).size()
print(f&amp;quot;Observations per firm: min={obs_per_firm.min()}, &amp;quot;
f&amp;quot;max={obs_per_firm.max()}, mean={obs_per_firm.mean():.1f}&amp;quot;)
print(f&amp;quot;Panel is {'balanced' if obs_per_firm.nunique() == 1 else 'unbalanced'}&amp;quot;)
# Within vs between variation
overall_std_y = df[&amp;quot;y&amp;quot;].std()
between_std_y = df.groupby(&amp;quot;firm&amp;quot;)[&amp;quot;y&amp;quot;].mean().std()
within_std_y = df.groupby(&amp;quot;firm&amp;quot;)[&amp;quot;y&amp;quot;].transform(lambda g: g - g.mean()).std()
print(f&amp;quot;\nVariation in y:&amp;quot;)
print(f&amp;quot; Overall std: {overall_std_y:.4f}&amp;quot;)
print(f&amp;quot; Between std: {between_std_y:.4f}&amp;quot;)
print(f&amp;quot; Within std: {within_std_y:.4f}&amp;quot;)
overall_std_x = df[&amp;quot;x&amp;quot;].std()
between_std_x = df.groupby(&amp;quot;firm&amp;quot;)[&amp;quot;x&amp;quot;].mean().std()
within_std_x = df.groupby(&amp;quot;firm&amp;quot;)[&amp;quot;x&amp;quot;].transform(lambda g: g - g.mean()).std()
print(f&amp;quot;\nVariation in x:&amp;quot;)
print(f&amp;quot; Overall std: {overall_std_x:.4f}&amp;quot;)
print(f&amp;quot; Between std: {between_std_x:.4f}&amp;quot;)
print(f&amp;quot; Within std: {within_std_x:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Observations per firm: min=10, max=10, mean=10.0
Panel is balanced
Variation in y:
Overall std: 2.9686
Between std: 2.4645
Within std: 1.6715
Variation in x:
Overall std: 1.9783
Between std: 1.3751
Within std: 1.4282
&lt;/code>&lt;/pre>
&lt;p>The decomposition reveals an important pattern. For firm performance (&lt;code>y&lt;/code>), the between-firm standard deviation (2.46) is substantially larger than the within-firm standard deviation (1.67). This means that &lt;em>persistent differences across firms&lt;/em> account for more of the total variation than year-to-year fluctuations within individual firms. The same pattern holds for R&amp;amp;D intensity (&lt;code>x&lt;/code>): between-firm variation (1.38) is comparable to within-firm variation (1.43). Since firm fixed effects absorb all between-firm variation, this tells us that fixed effects will have a large impact on the regression &amp;mdash; they are removing a dominant source of variation that is confounded with the treatment.&lt;/p>
&lt;h3 id="42-within-firm-correlations">4.2 Within-firm correlations&lt;/h3>
&lt;pre>&lt;code class="language-python">within_corr = (
df.groupby(&amp;quot;firm&amp;quot;)
.apply(lambda g: g[&amp;quot;y&amp;quot;].corr(g[&amp;quot;x&amp;quot;]), include_groups=False)
)
print(f&amp;quot;Within-firm correlation (y, x):&amp;quot;)
print(f&amp;quot; Mean: {within_corr.mean():.4f}&amp;quot;)
print(f&amp;quot; Median: {within_corr.median():.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Within-firm correlation (y, x):
Mean: 0.4100
Median: 0.4624
&lt;/code>&lt;/pre>
&lt;p>The average within-firm correlation between R&amp;amp;D and performance is 0.41, with a median of 0.46. This moderate positive correlation is what we expect given the true effect ($\beta = 0.5$): years in which a firm invests more in R&amp;amp;D tend to be years in which that firm performs better. The correlation is less than 0.5 because the AR(1) errors add noise.&lt;/p>
&lt;pre>&lt;code class="language-python"># Figure: Panel structure and within-firm correlations
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
fig.patch.set_linewidth(0)
# Left: x vs y colored by firm (sample 10 firms)
rng_plot = np.random.default_rng(99)
sample_firms = sorted(rng_plot.choice(df[&amp;quot;firm&amp;quot;].unique(), 10, replace=False))
colors_sample = [STEEL_BLUE, WARM_ORANGE, TEAL, &amp;quot;#e8956a&amp;quot;, &amp;quot;#c4623d&amp;quot;,
&amp;quot;#8fbfcc&amp;quot;, &amp;quot;#e0a57a&amp;quot;, &amp;quot;#5cc8c0&amp;quot;, &amp;quot;#b0c4de&amp;quot;, &amp;quot;#f0c8a0&amp;quot;]
for i, fid in enumerate(sample_firms):
sub = df[df[&amp;quot;firm&amp;quot;] == fid]
axes[0].scatter(sub[&amp;quot;x&amp;quot;], sub[&amp;quot;y&amp;quot;], color=colors_sample[i % len(colors_sample)],
alpha=0.7, s=30, edgecolors=DARK_NAVY, linewidths=0.5)
axes[0].set_xlabel(&amp;quot;R&amp;amp;D intensity (x)&amp;quot;)
axes[0].set_ylabel(&amp;quot;Firm performance (y)&amp;quot;)
axes[0].set_title(&amp;quot;10 sampled firms: x vs y&amp;quot;, fontweight=&amp;quot;bold&amp;quot;)
# Right: within-firm correlation distribution
axes[1].hist(within_corr, bins=20, color=STEEL_BLUE, edgecolor=DARK_NAVY, alpha=0.85)
axes[1].axvline(within_corr.mean(), color=WARM_ORANGE, linewidth=2,
linestyle=&amp;quot;--&amp;quot;, label=f&amp;quot;Mean = {within_corr.mean():.2f}&amp;quot;)
axes[1].set_xlabel(&amp;quot;Within-firm correlation (y, x)&amp;quot;)
axes[1].set_ylabel(&amp;quot;Number of firms&amp;quot;)
axes[1].set_title(&amp;quot;Distribution of within-firm correlations&amp;quot;, fontweight=&amp;quot;bold&amp;quot;)
axes[1].legend()
plt.tight_layout()
plt.savefig(&amp;quot;panel_ses_eda.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="panel_ses_eda.png" alt="Panel structure scatter plots showing 10 sampled firms and distribution of within-firm correlations.">&lt;/p>
&lt;p>The left panel shows how the 10 sampled firms form distinct &lt;em>clusters&lt;/em> in the scatter plot &amp;mdash; each firm occupies a different region of the x-y space. This visual clustering is the between-firm variation that fixed effects remove. The right panel shows that most firms have a positive within-firm correlation between R&amp;amp;D and performance, with the distribution centered around 0.41. A few firms have near-zero or negative correlations, reflecting the random noise in the simulation. These within-firm relationships are what fixed effects regressions actually estimate.&lt;/p>
&lt;p>Now that we understand the panel structure, we are ready to set up the MultiIndex that &lt;code>linearmodels&lt;/code> requires and begin estimating models.&lt;/p>
&lt;h2 id="5-setting-up-the-multiindex">5. Setting up the MultiIndex&lt;/h2>
&lt;p>The &lt;code>linearmodels&lt;/code> package requires panel data to be stored in a pandas DataFrame with a &lt;a href="https://pandas.pydata.org/docs/user_guide/advanced.html" target="_blank" rel="noopener">MultiIndex&lt;/a>: the entity (firm) as the first level and the time period (year) as the second. This structure tells the package which observations belong to the same firm and how they are ordered in time &amp;mdash; information it needs to compute clustered standard errors and absorb fixed effects.&lt;/p>
&lt;pre>&lt;code class="language-python">df_panel = df.set_index([&amp;quot;firm&amp;quot;, &amp;quot;year&amp;quot;])
print(f&amp;quot;MultiIndex levels: {df_panel.index.names}&amp;quot;)
print(df_panel.head(3))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">MultiIndex levels: ['firm', 'year']
y x
firm year
1 2010 6.721042 4.139183
2011 5.889161 3.844151
2012 2.355109 2.596322
&lt;/code>&lt;/pre>
&lt;p>The MultiIndex now encodes the panel structure directly in the DataFrame. Firm 1&amp;rsquo;s three displayed observations span 2010&amp;ndash;2012, and &lt;code>linearmodels&lt;/code> uses this ordering to know which observations to group when computing entity-clustered standard errors. With the data properly indexed, we can now estimate our first model.&lt;/p>
&lt;h2 id="6-pooled-ols-----the-naive-baseline">6. Pooled OLS &amp;mdash; the naive baseline&lt;/h2>
&lt;h3 id="61-conventional-standard-errors">6.1 Conventional standard errors&lt;/h3>
&lt;p>We begin with the simplest possible approach: pooled OLS with conventional standard errors. This estimator ignores the panel structure entirely &amp;mdash; it treats all 1,000 observations as if they were independent draws, like 1,000 different firms each observed once. We use &lt;a href="https://bashtage.github.io/linearmodels/panel/panel/linearmodels.panel.model.PanelOLS.from_formula.html" target="_blank" rel="noopener">PanelOLS.from_formula()&lt;/a> with &lt;code>cov_type=&amp;quot;unadjusted&amp;quot;&lt;/code> to request conventional (homoskedastic) standard errors. The formula &lt;code>&amp;quot;y ~ 1 + x&amp;quot;&lt;/code> specifies a regression of firm performance on R&amp;amp;D intensity with an intercept.&lt;/p>
&lt;pre>&lt;code class="language-python">mod_pooled = PanelOLS.from_formula(&amp;quot;y ~ 1 + x&amp;quot;, data=df_panel)
res_pooled = mod_pooled.fit(cov_type=&amp;quot;unadjusted&amp;quot;)
beta_pooled = res_pooled.params[&amp;quot;x&amp;quot;]
se_pooled = res_pooled.std_errors[&amp;quot;x&amp;quot;]
t_pooled = res_pooled.tstats[&amp;quot;x&amp;quot;]
print(f&amp;quot;Coefficient on x: {beta_pooled:.4f}&amp;quot;)
print(f&amp;quot;Conventional SE: {se_pooled:.4f}&amp;quot;)
print(f&amp;quot;t-statistic: {t_pooled:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Coefficient on x: 1.0318
Conventional SE: 0.0345
t-statistic: 29.9151
&lt;/code>&lt;/pre>
&lt;p>The pooled OLS coefficient is 1.03 &amp;mdash; &lt;em>more than double&lt;/em> the true value of 0.5. This is omitted variable bias in action. Because high-ability firms both invest more in R&amp;amp;D and perform better, the regression attributes to R&amp;amp;D what is actually driven by unobserved ability. The conventional standard error of 0.0345 looks impressively small, yielding a t-statistic of 29.9. But this precision is doubly misleading: the point estimate itself is biased, and the standard error is too small because it ignores within-firm error correlation.&lt;/p>
&lt;p>This is the first major lesson: &lt;strong>a biased estimator with small standard errors is worse than a noisy but unbiased one&lt;/strong>. The conventional SE tells us we can be very confident that the effect is around 1.03 &amp;mdash; but 1.03 is the &lt;em>wrong answer&lt;/em>. No standard error correction can fix this; we need a different estimator (fixed effects) to address the bias. We will get there in Section 9. But first, let us see what happens when we try progressively better standard errors on the same biased pooled model.&lt;/p>
&lt;h3 id="62-white-heteroskedasticity-robust-standard-errors">6.2 White (heteroskedasticity-robust) standard errors&lt;/h3>
&lt;p>The next step up from conventional SEs is the &lt;em>White estimator&lt;/em>, also called &lt;em>heteroskedasticity-consistent&lt;/em> (HC) standard errors. While conventional SEs assume all errors have the same variance, the White estimator allows the error variance to differ across observations. Think of it as replacing a one-size-fits-all uncertainty measure with one tailored to each data point. In &lt;code>linearmodels&lt;/code>, we request it with &lt;code>cov_type=&amp;quot;robust&amp;quot;&lt;/code>.&lt;/p>
&lt;p>The White covariance estimator is:&lt;/p>
&lt;p>$$\hat{\Sigma}_{\text{White}} = (X&amp;rsquo;X)^{-1} \left( \sum_{i=1}^{N} X_i' \hat{e}_i^2 X_i \right) (X&amp;rsquo;X)^{-1}$$&lt;/p>
&lt;p>In words, this replaces the constant variance assumption with the squared residuals $\hat{e}_i^2$ from each observation, producing standard errors that are robust to heteroskedasticity &amp;mdash; situations where the spread of errors varies with the level of $X$.&lt;/p>
&lt;pre>&lt;code class="language-python">res_white = mod_pooled.fit(cov_type=&amp;quot;robust&amp;quot;)
se_white = res_white.std_errors[&amp;quot;x&amp;quot;]
t_white = res_white.tstats[&amp;quot;x&amp;quot;]
print(f&amp;quot;White SE: {se_white:.4f}&amp;quot;)
print(f&amp;quot;t-statistic: {t_white:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">White SE: 0.0361
t-statistic: 28.5897
&lt;/code>&lt;/pre>
&lt;p>The White SE (0.0361) is only slightly larger than the conventional SE (0.0345), and the t-statistic barely budges from 29.9 to 28.6. This is because heteroskedasticity is not the main problem here &amp;mdash; &lt;em>within-cluster correlation&lt;/em> is. The White estimator treats each observation as independent, just with potentially different variances. It does not account for the fact that firm 1&amp;rsquo;s error in 2015 is correlated with firm 1&amp;rsquo;s error in 2016. For panel data with serial correlation, we need standard errors that account for this clustering.&lt;/p>
&lt;h2 id="7-clustered-standard-errors">7. Clustered standard errors&lt;/h2>
&lt;h3 id="71-the-intuition-behind-clustering">7.1 The intuition behind clustering&lt;/h3>
&lt;p>Clustering is the workhorse correction for panel data standard errors. The idea is simple: if errors within a firm are correlated, then 10 observations from the same firm do not contain as much &lt;em>independent&lt;/em> information as 10 observations from 10 different firms. Clustering acknowledges this by allowing arbitrary correlation among all observations within the same cluster.&lt;/p>
&lt;p>Think of surveying students in classrooms. If you survey 100 students from 10 classrooms (10 per classroom), you do not have 100 independent data points &amp;mdash; students in the same classroom share the same teacher, curriculum, and classroom environment. The effective sample size is closer to 10 (the number of classrooms) than 100 (the number of students). Clustering adjusts the standard errors to reflect this reduced effective sample size.&lt;/p>
&lt;h3 id="72-entity-clustered-ses">7.2 Entity-clustered SEs&lt;/h3>
&lt;p>Entity clustering allows arbitrary correlation among all observations within the same firm. We request it by setting &lt;code>cluster_entity=True&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-python"># Entity-clustered
res_cl_entity = mod_pooled.fit(cov_type=&amp;quot;clustered&amp;quot;, cluster_entity=True)
se_cl_entity = res_cl_entity.std_errors[&amp;quot;x&amp;quot;]
t_cl_entity = res_cl_entity.tstats[&amp;quot;x&amp;quot;]
print(f&amp;quot;Entity-clustered SE: {se_cl_entity:.4f}&amp;quot;)
print(f&amp;quot;t-statistic: {t_cl_entity:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Entity-clustered SE: 0.0621
t-statistic: 16.6233
&lt;/code>&lt;/pre>
&lt;p>Entity-clustered SEs (0.0621) are 80% larger than conventional SEs (0.0345) and nearly double the White SEs (0.0361). The t-statistic drops from 29.9 to 16.6 &amp;mdash; still highly significant in this case, but the inflation in standard errors demonstrates how much conventional SEs understate uncertainty when within-firm correlation is present. In a setting with a weaker true effect, this correction could flip a &amp;ldquo;significant&amp;rdquo; result to &amp;ldquo;insignificant.&amp;rdquo;&lt;/p>
&lt;h3 id="73-time-clustered-ses">7.3 Time-clustered SEs&lt;/h3>
&lt;p>Time clustering allows correlation among all firms &lt;em>within the same year&lt;/em>. This matters when firms face common shocks &amp;mdash; a recession, a regulatory change, or a market-wide technology shift that affects all firms simultaneously.&lt;/p>
&lt;pre>&lt;code class="language-python"># Time-clustered
res_cl_time = mod_pooled.fit(cov_type=&amp;quot;clustered&amp;quot;, cluster_time=True)
se_cl_time = res_cl_time.std_errors[&amp;quot;x&amp;quot;]
t_cl_time = res_cl_time.tstats[&amp;quot;x&amp;quot;]
print(f&amp;quot;Time-clustered SE: {se_cl_time:.4f}&amp;quot;)
print(f&amp;quot;t-statistic: {t_cl_time:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Time-clustered SE: 0.0168
t-statistic: 61.2757
&lt;/code>&lt;/pre>
&lt;p>Time-clustered SEs (0.0168) are actually &lt;em>smaller&lt;/em> than conventional SEs, and the t-statistic jumps to 61.3. This happens because our DGP has only weak time effects ($\lambda_t \sim N(0, 0.5)$) but strong firm effects. With only 10 time clusters (years), the clustering correction has very few groups to work with, and the asymptotic theory &amp;mdash; the mathematical guarantees that hold when the number of clusters is large &amp;mdash; that justifies clustered SEs relies on having many clusters. As a rule of thumb, cluster on the dimension that has at least 40&amp;ndash;50 groups. Here, entity clustering (100 firms) is far more appropriate than time clustering (10 years).&lt;/p>
&lt;h3 id="74-two-way-clustered-ses">7.4 Two-way clustered SEs&lt;/h3>
&lt;p>Two-way clustering allows correlation along &lt;em>both&lt;/em> dimensions simultaneously &amp;mdash; within firms over time and across firms within the same year. This is the most conservative approach, proposed by &lt;a href="https://doi.org/10.1198/jbes.2010.07136" target="_blank" rel="noopener">Cameron, Gelbach, and Miller (2011)&lt;/a>. In &lt;code>linearmodels&lt;/code>, set both &lt;code>cluster_entity=True&lt;/code> and &lt;code>cluster_time=True&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-python"># Two-way clustered
res_cl_both = mod_pooled.fit(cov_type=&amp;quot;clustered&amp;quot;,
cluster_entity=True, cluster_time=True)
se_cl_both = res_cl_both.std_errors[&amp;quot;x&amp;quot;]
t_cl_both = res_cl_both.tstats[&amp;quot;x&amp;quot;]
print(f&amp;quot;Two-way clustered SE: {se_cl_both:.4f}&amp;quot;)
print(f&amp;quot;t-statistic: {t_cl_both:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Two-way clustered SE: 0.0532
t-statistic: 19.3829
&lt;/code>&lt;/pre>
&lt;p>The two-way clustered SE (0.0532) falls between the entity-clustered (0.0621) and time-clustered (0.0168) estimates. This makes sense: the two-way estimator combines information from both clustering dimensions. Since the time dimension contributes little (weak time effects, few clusters), the two-way SE is somewhat smaller than entity-only clustering. In practice, two-way clustering is recommended when both dimensions have enough clusters and both types of correlation are plausible.&lt;/p>
&lt;h2 id="8-a-side-by-side-comparison-so-far">8. A side-by-side comparison so far&lt;/h2>
&lt;p>Before introducing fixed effects, let us pause to see all the pooled OLS standard errors side by side. Remember: the point estimate (1.0318) is the same for all of them &amp;mdash; only the standard errors and hence the confidence intervals differ.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Model / SE Type&lt;/th>
&lt;th>Coefficient&lt;/th>
&lt;th>Std. Error&lt;/th>
&lt;th>t-stat&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Pooled OLS (conventional)&lt;/td>
&lt;td>1.0318&lt;/td>
&lt;td>0.0345&lt;/td>
&lt;td>29.92&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Pooled OLS (White/HC)&lt;/td>
&lt;td>1.0318&lt;/td>
&lt;td>0.0361&lt;/td>
&lt;td>28.59&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Pooled OLS (cluster: entity)&lt;/td>
&lt;td>1.0318&lt;/td>
&lt;td>0.0621&lt;/td>
&lt;td>16.62&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Pooled OLS (cluster: time)&lt;/td>
&lt;td>1.0318&lt;/td>
&lt;td>0.0168&lt;/td>
&lt;td>61.28&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Pooled OLS (cluster: both)&lt;/td>
&lt;td>1.0318&lt;/td>
&lt;td>0.0532&lt;/td>
&lt;td>19.38&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The entity-clustered SE is 1.8 times larger than the conventional SE. But recall that all these models estimate the &lt;em>wrong&lt;/em> coefficient (1.03 vs. the true 0.5). Correcting standard errors on a biased estimator is like putting better tires on a car driving in the wrong direction. Next, we fix the direction with fixed effects.&lt;/p>
&lt;h2 id="9-entity-fixed-effects-with-clustered-ses">9. Entity fixed effects with clustered SEs&lt;/h2>
&lt;h3 id="91-why-fixed-effects-solve-the-bias">9.1 Why fixed effects solve the bias&lt;/h3>
&lt;p>Fixed effects regression removes all time-invariant differences between firms before estimating the coefficient. Mathematically, it subtracts each firm&amp;rsquo;s time-average from its observations &amp;mdash; a process called &lt;em>demeaning&lt;/em>. After demeaning, the unobserved firm ability $\mu_i$ vanishes because it is constant over time, and we estimate $\beta$ using only the within-firm variation in $x$ and $y$. This eliminates the omitted variable bias that inflated the pooled OLS estimate.&lt;/p>
&lt;p>In &lt;code>linearmodels&lt;/code>, adding &lt;code>EntityEffects&lt;/code> to the formula absorbs firm fixed effects:&lt;/p>
&lt;pre>&lt;code class="language-python">mod_fe = PanelOLS.from_formula(&amp;quot;y ~ 1 + x + EntityEffects&amp;quot;, data=df_panel)
res_fe_cl = mod_fe.fit(cov_type=&amp;quot;clustered&amp;quot;, cluster_entity=True)
beta_fe = res_fe_cl.params[&amp;quot;x&amp;quot;]
se_fe_cl = res_fe_cl.std_errors[&amp;quot;x&amp;quot;]
t_fe_cl = res_fe_cl.tstats[&amp;quot;x&amp;quot;]
print(f&amp;quot;FE coefficient on x: {beta_fe:.4f}&amp;quot;)
print(f&amp;quot;Entity-clustered SE: {se_fe_cl:.4f}&amp;quot;)
print(f&amp;quot;t-statistic: {t_fe_cl:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">FE coefficient on x: 0.4829
Entity-clustered SE: 0.0357
t-statistic: 13.5250
&lt;/code>&lt;/pre>
&lt;p>The fixed effects coefficient (0.4829) is dramatically closer to the true value of 0.5 than the pooled estimate (1.0318). The remaining gap of 0.017 is sampling noise, not systematic bias. The entity-clustered SE of 0.0357 is actually &lt;em>smaller&lt;/em> than the pooled entity-clustered SE (0.0621) because fixed effects remove the between-firm variation that was inflating the residuals.&lt;/p>
&lt;h3 id="92-two-way-fixed-effects">9.2 Two-way fixed effects&lt;/h3>
&lt;p>We can also absorb time fixed effects by adding &lt;code>TimeEffects&lt;/code>, which removes year-specific shocks common to all firms. This controls for business cycle effects, regulatory changes, or any other year-level phenomenon.&lt;/p>
&lt;pre>&lt;code class="language-python">mod_twfe = PanelOLS.from_formula(&amp;quot;y ~ 1 + x + EntityEffects + TimeEffects&amp;quot;,
data=df_panel)
res_twfe = mod_twfe.fit(cov_type=&amp;quot;clustered&amp;quot;, cluster_entity=True)
beta_twfe = res_twfe.params[&amp;quot;x&amp;quot;]
se_twfe = res_twfe.std_errors[&amp;quot;x&amp;quot;]
t_twfe = res_twfe.tstats[&amp;quot;x&amp;quot;]
print(f&amp;quot;TWFE coefficient on x: {beta_twfe:.4f}&amp;quot;)
print(f&amp;quot;Entity-clustered SE: {se_twfe:.4f}&amp;quot;)
print(f&amp;quot;t-statistic: {t_twfe:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">TWFE coefficient on x: 0.4796
Entity-clustered SE: 0.0376
t-statistic: 12.7392
&lt;/code>&lt;/pre>
&lt;p>Adding time fixed effects barely changes the estimate (0.4796 vs. 0.4829) and slightly increases the standard error (0.0376 vs. 0.0357). This makes sense: the time effects in our DGP are small ($\lambda_t \sim N(0, 0.5)$), so absorbing them provides only a minor correction while consuming 9 additional degrees of freedom. In real applications where macroeconomic shocks are substantial, two-way FE can make a bigger difference.&lt;/p>
&lt;h2 id="10-driscoll-kraay-standard-errors">10. Driscoll-Kraay standard errors&lt;/h2>
&lt;p>&lt;a href="https://doi.org/10.1162/003465398557549" target="_blank" rel="noopener">Driscoll and Kraay (1998)&lt;/a> proposed a standard error estimator that accounts for both cross-sectional correlation (across firms within a period) and temporal dependence (within firms over time), using a kernel-based approach similar to Newey-West but applied to cross-sectional averages. In &lt;code>linearmodels&lt;/code>, we request it with &lt;code>cov_type=&amp;quot;kernel&amp;quot;&lt;/code> and a Bartlett kernel (equivalent to &lt;a href="https://doi.org/10.2307/1913610" target="_blank" rel="noopener">Newey and West (1987)&lt;/a> weighting). The &lt;code>bandwidth&lt;/code> parameter controls how many time lags of correlation the estimator accounts for &amp;mdash; a bandwidth of 3 means it incorporates correlations up to 3 years apart, with declining weights for longer lags.&lt;/p>
&lt;pre>&lt;code class="language-python">res_dk = mod_pooled.fit(cov_type=&amp;quot;kernel&amp;quot;, kernel=&amp;quot;bartlett&amp;quot;, bandwidth=3)
se_dk = res_dk.std_errors[&amp;quot;x&amp;quot;]
t_dk = res_dk.tstats[&amp;quot;x&amp;quot;]
print(f&amp;quot;Driscoll-Kraay SE (BW=3): {se_dk:.4f}&amp;quot;)
print(f&amp;quot;t-statistic: {t_dk:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Driscoll-Kraay SE (BW=3): 0.0158
t-statistic: 65.4073
&lt;/code>&lt;/pre>
&lt;p>The Driscoll-Kraay SE (0.0158) is the smallest we have seen &amp;mdash; even smaller than conventional SEs. This reflects the estimator&amp;rsquo;s focus on cross-sectional dependence, which is weak in our simulation (firms are independent given their fixed effects). In applications with strong cross-sectional correlation &amp;mdash; for example, banks exposed to the same macroeconomic shock &amp;mdash; Driscoll-Kraay SEs can be substantially larger. The key feature is robustness to &lt;em>cross-sectional dependence&lt;/em> that entity clustering alone cannot handle.&lt;/p>
&lt;h2 id="11-full-comparison">11. Full comparison&lt;/h2>
&lt;h3 id="111-summary-table">11.1 Summary table&lt;/h3>
&lt;p>Now we can see all eight model-SE combinations in a single table. The true coefficient is $\beta = 0.5$. The &amp;ldquo;Reject H0&amp;rdquo; column tests the default null H0: $\beta = 0$ (not H0: $\beta = 0.5$). In Section 12, the Monte Carlo explicitly tests against the true value.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Model / SE Type&lt;/th>
&lt;th>Coefficient&lt;/th>
&lt;th>Std. Error&lt;/th>
&lt;th>t-stat&lt;/th>
&lt;th>Reject H0 (5%)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Pooled OLS (conventional)&lt;/td>
&lt;td>1.0318&lt;/td>
&lt;td>0.0345&lt;/td>
&lt;td>29.92&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Pooled OLS (White/HC)&lt;/td>
&lt;td>1.0318&lt;/td>
&lt;td>0.0361&lt;/td>
&lt;td>28.59&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Pooled OLS (cluster: entity)&lt;/td>
&lt;td>1.0318&lt;/td>
&lt;td>0.0621&lt;/td>
&lt;td>16.62&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Pooled OLS (cluster: time)&lt;/td>
&lt;td>1.0318&lt;/td>
&lt;td>0.0168&lt;/td>
&lt;td>61.28&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Pooled OLS (cluster: both)&lt;/td>
&lt;td>1.0318&lt;/td>
&lt;td>0.0532&lt;/td>
&lt;td>19.38&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Entity FE (cluster: entity)&lt;/td>
&lt;td>0.4829&lt;/td>
&lt;td>0.0357&lt;/td>
&lt;td>13.53&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Two-way FE (cluster: entity)&lt;/td>
&lt;td>0.4796&lt;/td>
&lt;td>0.0376&lt;/td>
&lt;td>12.74&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Pooled OLS (Driscoll-Kraay)&lt;/td>
&lt;td>1.0318&lt;/td>
&lt;td>0.0158&lt;/td>
&lt;td>65.41&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Two patterns stand out. First, all pooled models estimate a coefficient around 1.03 &amp;mdash; more than double the true 0.5 &amp;mdash; while both FE models recover estimates close to 0.5. This is the bias-versus-variance distinction: &lt;strong>standard errors address precision, not accuracy&lt;/strong>. Second, among the FE models (which have the right coefficient), entity-clustered SEs are appropriately sized relative to the true uncertainty.&lt;/p>
&lt;h3 id="112-standard-error-comparison">11.2 Standard error comparison&lt;/h3>
&lt;pre>&lt;code class="language-python"># Figure: SE comparison bar chart (code in script.py)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="panel_ses_comparison.png" alt="Bar chart comparing standard error estimates across all eight model-SE combinations.">&lt;/p>
&lt;p>The bar chart reveals the full spectrum of standard error estimates. Entity-clustered SEs on the pooled model (0.0621) are the largest &amp;mdash; they correctly reflect high within-firm correlation but sit atop a biased estimate. The FE models' entity-clustered SEs (0.036&amp;ndash;0.038) are smaller because fixed effects absorbed the between-firm variation that inflated residuals. At the other extreme, Driscoll-Kraay (0.0158) and time-clustered (0.0168) SEs are the smallest, reflecting the weak cross-sectional and time-level correlation in our data.&lt;/p>
&lt;h3 id="113-confidence-intervals">11.3 Confidence intervals&lt;/h3>
&lt;pre>&lt;code class="language-python"># Figure: Confidence intervals across methods (code in script.py)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="panel_ses_ci.png" alt="Confidence interval plot showing 95% CIs across all eight methods, with a dashed line at the true beta of 0.5.">&lt;/p>
&lt;p>The confidence interval plot delivers the tutorial&amp;rsquo;s core visual message. The teal dashed line at $\beta = 0.5$ is the truth. All five pooled OLS intervals (blue) are far to the right &amp;mdash; none come close to covering the true value, regardless of which SE estimator we use. The two FE intervals (orange) are centered near 0.5 and easily cover it. The lesson is unmistakable: &lt;strong>standard errors cannot rescue a biased point estimate&lt;/strong>, but combined with a consistent estimator, they produce intervals with correct coverage.&lt;/p>
&lt;h2 id="12-monte-carlo-simulation-----which-ses-get-the-right-rejection-rate">12. Monte Carlo simulation &amp;mdash; which SEs get the right rejection rate?&lt;/h2>
&lt;h3 id="121-the-experiment">12.1 The experiment&lt;/h3>
&lt;p>The confidence interval plot above shows one simulation. But how do we know whether those intervals &lt;em>typically&lt;/em> contain the true value? A single simulation could be lucky or unlucky. To rigorously evaluate each SE estimator, we need a &lt;em>Monte Carlo simulation&lt;/em>: generate hundreds of independent datasets from the same DGP, estimate the model on each, and check how often the 95% confidence interval covers the true $\beta = 0.5$.&lt;/p>
&lt;p>If an SE estimator is correctly sized, its 95% CI should cover the truth 95% of the time, meaning it &lt;em>rejects&lt;/em> the true null hypothesis only 5% of the time. An SE that is too small produces intervals that are too narrow, leading to &lt;em>over-rejection&lt;/em> &amp;mdash; false positives in more than 5% of simulations.&lt;/p>
&lt;p>We focus on Entity FE models because they produce unbiased estimates. This isolates the SE question: given that the point estimate is right on average, do the standard errors correctly quantify the remaining uncertainty?&lt;/p>
&lt;h3 id="122-results">12.2 Results&lt;/h3>
&lt;pre>&lt;code class="language-python">N_SIM = 500
# ... (Monte Carlo loop runs Entity FE with 6 different SE types) ...
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Empirical rejection rates at 5% level (H0: beta=0.5 is true):
FE + conventional : 0.060 (30/500) ~correct
FE + White (HC) : 0.064 (32/500) ~correct
FE + cluster: entity : 0.066 (33/500) ~correct
FE + cluster: time : 0.090 (45/500)
FE + cluster: both : 0.078 (39/500) ~correct
TWFE + cluster: entity : 0.032 (16/500) ~correct
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-python"># Figure: Monte Carlo rejection rates (code in script.py)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="panel_ses_montecarlo.png" alt="Monte Carlo rejection rates for six FE model and SE combinations, with a dashed line at the nominal 5% level.">&lt;/p>
&lt;p>The Monte Carlo results across 500 simulations reveal meaningful differences. Entity FE with entity-clustered SEs rejects at 6.6% &amp;mdash; close to the nominal 5% and well within the range expected from simulation noise. Conventional SEs (6.0%) and White SEs (6.4%) also perform well here because, &lt;em>after&lt;/em> absorbing firm fixed effects, the remaining within-firm errors are approximately homoskedastic with moderate serial correlation that 100 clusters can handle.&lt;/p>
&lt;p>The outlier is FE with time-clustered SEs at 9.0% &amp;mdash; nearly double the nominal rate. This over-rejection occurs because time clustering with only 10 year-clusters violates the large-cluster asymptotic assumption. With 10 clusters, the finite-sample correction is insufficient, and the SEs are too small. TWFE with entity-clustered SEs (3.2%) is slightly conservative, meaning its confidence intervals are a bit wider than necessary &amp;mdash; a benign property compared to over-rejection.&lt;/p>
&lt;h3 id="123-standard-error-ratios">12.3 Standard error ratios&lt;/h3>
&lt;pre>&lt;code class="language-python"># Figure: SE ratios relative to entity-clustered (code in script.py)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="panel_ses_ratios.png" alt="SE ratios relative to entity-clustered standard errors as the benchmark.">&lt;/p>
&lt;p>This figure normalizes all standard errors to the entity-clustered SE (the recommended default). Ratios below 1.0 indicate SEs that are &lt;em>smaller&lt;/em> than entity-clustered &amp;mdash; and therefore potentially over-confident. Conventional SEs and White SEs on the pooled model are about 0.55&amp;ndash;0.58 times the entity-clustered SE, confirming they understate uncertainty by roughly 40%. The FE-based entity-clustered SE (0.57x) is smaller because fixed effects reduce residual variance &amp;mdash; this is a genuine precision gain, not an artifact of ignoring correlation.&lt;/p>
&lt;h2 id="13-discussion">13. Discussion&lt;/h2>
&lt;h3 id="131-answering-the-case-study-question">13.1 Answering the case study question&lt;/h3>
&lt;p>We asked: &lt;em>when firms are observed over multiple years, how does our choice of standard error estimator change what we conclude about the effect of R&amp;amp;D spending on firm performance?&lt;/em> The answer has two parts.&lt;/p>
&lt;p>&lt;strong>First, the bias problem.&lt;/strong> Pooled OLS estimates R&amp;amp;D&amp;rsquo;s effect at 1.03 &amp;mdash; more than double the true 0.5. This bias comes from omitted firm ability, not from standard error choice. Entity fixed effects reduce the estimate to 0.48, close to the truth. No standard error correction can fix a biased coefficient.&lt;/p>
&lt;p>&lt;strong>Second, the inference problem.&lt;/strong> Even after fixing bias with FE, standard error choice matters. In our Monte Carlo, time-clustered SEs on FE models rejected the true null at 9.0% instead of 5%. Entity-clustered SEs maintained correct size at 6.6%. For a practitioner, using the wrong SEs could mean reporting a &amp;ldquo;significant&amp;rdquo; finding that is actually a false positive.&lt;/p>
&lt;h3 id="132-practical-guidance">13.2 Practical guidance&lt;/h3>
&lt;p>Following the recommendations of &lt;a href="https://doi.org/10.1093/rfs/hhn053" target="_blank" rel="noopener">Petersen (2009)&lt;/a>, here is a decision framework:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Always start with fixed effects&lt;/strong> if the panel has entity-level unobserved heterogeneity. Without FE, standard error corrections address precision but not bias.&lt;/li>
&lt;li>&lt;strong>Cluster on the dimension with more groups.&lt;/strong> Entity clustering (100 firms) is more reliable than time clustering (10 years) because clustered SEs rely on large-cluster asymptotics.&lt;/li>
&lt;li>&lt;strong>Two-way clustering is the safe default&lt;/strong> when both dimensions have enough clusters (rule of thumb: at least 40&amp;ndash;50 each). It accounts for both types of dependence simultaneously.&lt;/li>
&lt;li>&lt;strong>Driscoll-Kraay is specialized.&lt;/strong> Use it when cross-sectional dependence is strong and the number of time periods is large (e.g., long macroeconomic panels).&lt;/li>
&lt;/ol>
&lt;h2 id="14-summary-and-next-steps">14. Summary and next steps&lt;/h2>
&lt;p>&lt;strong>Key takeaways:&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Standard errors cannot fix bias.&lt;/strong> Pooled OLS overestimated the R&amp;amp;D effect at 1.03 (true: 0.5) regardless of which SE estimator was applied. Entity fixed effects recovered an estimate of 0.48 &amp;mdash; close to the truth. Always address the &lt;em>model&lt;/em> before worrying about the &lt;em>standard errors&lt;/em>.&lt;/li>
&lt;li>&lt;strong>Clustering dimension matters.&lt;/strong> Entity-clustered SEs (0.0621) were 80% larger than conventional SEs (0.0345) on the pooled model, reflecting the within-firm correlation that conventional SEs ignore. Time-clustered SEs (0.0168) were misleadingly small because only 10 year-clusters provided too few groups for reliable asymptotic inference.&lt;/li>
&lt;li>&lt;strong>Monte Carlo validation is essential.&lt;/strong> Entity-clustered SEs on the FE model rejected the true null at 6.6% (close to the nominal 5%), while time-clustered SEs rejected at 9.0% &amp;mdash; nearly double the expected rate. Simulation is the only way to verify that your SE choice controls size in your specific data structure.&lt;/li>
&lt;li>&lt;strong>The FE + entity-clustered combination is the reliable default.&lt;/strong> It addresses both bias (via FE) and inference (via clustering). Two-way clustering adds insurance against cross-sectional correlation when both dimensions have enough groups.&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Limitations:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Our simulation uses balanced panels. With unbalanced panels (firms entering and exiting), some SE estimators require additional adjustments.&lt;/li>
&lt;li>We used 100 firms and 10 years. Results may differ with fewer clusters or different cluster-size ratios.&lt;/li>
&lt;li>The DGP has a simple AR(1) error structure. Real data may have more complex dependence patterns.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Next steps:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Apply these techniques to a real firm-level dataset (e.g., Compustat) and compare SE estimates.&lt;/li>
&lt;li>Explore bootstrap-based approaches for clustered inference with few clusters (wild cluster bootstrap).&lt;/li>
&lt;li>Study the Cameron-Gelbach-Miller multi-way clustering theory for panels with more than two clustering dimensions.&lt;/li>
&lt;/ul>
&lt;h2 id="15-exercises">15. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Modify the DGP.&lt;/strong> Change the AR(1) coefficient from 0.5 to 0.9 (stronger serial correlation) and re-run the Monte Carlo. Which SE estimators are most affected? Does entity-clustering still control size at 5%?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Reduce the number of firms.&lt;/strong> Set &lt;code>n_firms=20&lt;/code> (keeping &lt;code>n_years=10&lt;/code>) and re-run the Monte Carlo. With only 20 entity clusters, do entity-clustered SEs still perform well? At what cluster count do they start to break down?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Add cross-sectional dependence.&lt;/strong> Modify &lt;code>simulate_panel()&lt;/code> so that each year has a common shock ($\delta_t$) that enters &lt;em>all&lt;/em> firms' errors: &lt;code>eps[start + t] += delta_t&lt;/code>. Re-run the analysis and check whether entity-clustered SEs still control size, or whether Driscoll-Kraay / two-way clustering becomes necessary.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="references">References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://vincent.codes.finance/posts/panel-ols-standard-errors/" target="_blank" rel="noopener">Gregoire, V. (2024). Panel OLS Standard Errors. &lt;em>Vincent Codes Finance&lt;/em>.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://bashtage.github.io/linearmodels/panel/index.html" target="_blank" rel="noopener">linearmodels &amp;mdash; Kevin Sheppard. Panel Data Models Documentation.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.2307/1912934" target="_blank" rel="noopener">White, H. (1980). A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. &lt;em>Econometrica&lt;/em>, 48(4), 817&amp;ndash;838.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1198/jbes.2010.07136" target="_blank" rel="noopener">Cameron, A. C., Gelbach, J. B., &amp;amp; Miller, D. L. (2011). Robust Inference with Multiway Clustering. &lt;em>Journal of Business &amp;amp; Economic Statistics&lt;/em>, 29(2), 238&amp;ndash;249.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1162/003465398557549" target="_blank" rel="noopener">Driscoll, J. C. &amp;amp; Kraay, A. C. (1998). Consistent Covariance Matrix Estimation with Spatially Dependent Panel Data. &lt;em>Review of Economics and Statistics&lt;/em>, 80(4), 549&amp;ndash;560.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.2307/1913610" target="_blank" rel="noopener">Newey, W. K. &amp;amp; West, K. D. (1987). A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. &lt;em>Econometrica&lt;/em>, 55(3), 703&amp;ndash;708.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1093/rfs/hhn053" target="_blank" rel="noopener">Petersen, M. A. (2009). Estimating Standard Errors in Finance Panel Data Sets. &lt;em>Review of Financial Studies&lt;/em>, 22(1), 435&amp;ndash;480.&lt;/a>&lt;/li>
&lt;/ol></description></item><item><title>Dynamic Panel BMA: Which Factors Truly Drive Economic Growth?</title><link>https://carlos-mendez.org/post/r_dynamic_bma/</link><pubDate>Sun, 29 Mar 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/r_dynamic_bma/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>Imagine you are advising a government on how to accelerate long-run economic growth. Your team has compiled a panel dataset covering 73 countries across four decades, with nine candidate drivers: investment, education, population growth, trade openness, government spending, life expectancy, democracy, investment prices, and population size. The natural question is: &lt;strong>which of these factors truly drive economic growth &amp;mdash; and can we trust our answers when today&amp;rsquo;s GDP might itself be shaped by those same factors?&lt;/strong>&lt;/p>
&lt;p>What is BMA? Imagine trying to predict salaries using education, experience, age, and industry. You could build one model with all four variables, or drop industry, or use only experience and education. With just 4 candidates, there are $2^4 = 16$ possible models. Which is correct? &lt;strong>Bayesian Model Averaging (BMA)&lt;/strong> does not pick one &amp;mdash; it averages predictions from all 16, giving more weight to models that fit the data well. This avoids betting everything on one specification that might be wrong.&lt;/p>
&lt;p>This last concern is &lt;em>reverse causality&lt;/em> &amp;mdash; the possibility that GDP growth causes higher investment rather than the other way around. Cross-sectional BMA handles model uncertainty this way, but it assumes regressors are strictly exogenous. When that assumption fails, BMA can confidently point to the wrong variables.&lt;/p>
&lt;p>This tutorial introduces the &lt;a href="https://cran.r-project.org/web/packages/bdsm/index.html" target="_blank" rel="noopener">Bayesian Dynamic Systems Modeling&lt;/a> R package &amp;mdash; which extends BMA to dynamic panel data with weakly exogenous regressors. Built on the methodology of Moral-Benito (2012, 2013, 2016), it simultaneously addresses model uncertainty and reverse causality by incorporating a lagged dependent variable, entity fixed effects, and time fixed effects into the BMA framework.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Companion tutorial.&lt;/strong> For a cross-sectional perspective using BMA, LASSO, and WALS on synthetic data, see the &lt;a href="https://carlos-mendez.org/post/r_bma_lasso_wals/">R tutorial on variable selection&lt;/a>. The current tutorial builds on those foundations by moving from cross-sectional to panel data and from strict to weak exogeneity.&lt;/p>
&lt;/blockquote>
&lt;p>&lt;strong>Learning objectives:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Understand why cross-sectional BMA can be misleading when regressors are endogenous, and how dynamic panel BMA addresses this&lt;/li>
&lt;li>Prepare panel data for the Bayesian DSM package using &lt;code>join_lagged_col()&lt;/code> and &lt;code>feature_standardization()&lt;/code>&lt;/li>
&lt;li>Run Bayesian Model Averaging with &lt;code>bma()&lt;/code> and interpret Posterior Inclusion Probabilities (PIPs &amp;mdash; how often a variable appears in the best-fitting models), posterior means, and model probabilities&lt;/li>
&lt;li>Assess the sensitivity of results to prior specification by varying the expected model size (how many variables the prior expects to matter) and applying dilution priors (which adjust for correlated variables)&lt;/li>
&lt;li>Analyze jointness (which variables tend to appear in models together) to discover which growth determinants are complements versus substitutes&lt;/li>
&lt;/ul>
&lt;p>The package also includes a smaller 3-regressor example (&lt;code>small_model_space&lt;/code>) for practice &amp;mdash; see the companion R script for details.&lt;/p>
&lt;p>&lt;strong>Data Prep&lt;/strong> (lag DV, demean, standardize) &lt;strong>→ Model Space&lt;/strong> (estimate all 2&lt;sup>K&lt;/sup> models)
&lt;strong>→ BMA&lt;/strong> (PIPs, posterior means) &lt;strong>→ Sensitivity&lt;/strong> (vary priors, EMS, dilution) &lt;strong>→ Jointness&lt;/strong> (complements vs. substitutes) &lt;strong>→ Findings&lt;/strong> (robust growth determinants)&lt;/p>
&lt;h2 id="2-setup">2. Setup&lt;/h2>
&lt;p>We need the Bayesian Dynamic Systems Modeling package for dynamic panel BMA and &lt;code>tidyverse&lt;/code> for data manipulation. The &lt;code>parallel&lt;/code> package (included with base R) enables parallel computing for the model space estimation step.&lt;/p>
&lt;pre>&lt;code class="language-r"># Install bdsm if needed
if (!requireNamespace(&amp;quot;bdsm&amp;quot;, quietly = TRUE)) {
install.packages(&amp;quot;bdsm&amp;quot;)
}
# Load packages
library(bdsm)
library(tidyverse)
library(parallel)
set.seed(42)
&lt;/code>&lt;/pre>
&lt;h2 id="3-why-dynamic-panel-bma">3. Why Dynamic Panel BMA?&lt;/h2>
&lt;h3 id="31-the-endogeneity-problem">3.1 The endogeneity problem&lt;/h3>
&lt;p>Standard BMA assumes that all regressors are &lt;em>strictly exogenous&lt;/em> &amp;mdash; meaning they are determined outside the model and are uncorrelated with the error term at any point in time. In growth economics, this assumption almost never holds.&lt;/p>
&lt;p>Think of it this way: imagine judging a runner&amp;rsquo;s training program by their final race time, but faster runners also &lt;em>chose&lt;/em> better programs. You cannot tell whether the program caused the speed or the speed attracted the program. This is &lt;strong>reverse causality&lt;/strong>, and it contaminates cross-sectional regressions. Countries that grow faster invest more, trade more, urbanize faster, and attract more education spending &amp;mdash; not just the other way around.&lt;/p>
&lt;p>When BMA is applied to cross-sectional data with endogenous regressors, it can confidently assign high inclusion probabilities to variables that appear important only because they are &lt;em>consequences&lt;/em> of growth rather than &lt;em>causes&lt;/em> of it. The model averaging machinery works perfectly &amp;mdash; but the individual models it averages over are biased.&lt;/p>
&lt;p>The solution is to include &lt;em>last period&amp;rsquo;s GDP&lt;/em> as a regressor. By controlling for where a country &lt;em>was&lt;/em>, we isolate which new factors push it forward &amp;mdash; breaking the feedback loop. The next section shows why this dynamic structure arises naturally from economic growth theory.&lt;/p>
&lt;h3 id="32-from-the-solow-model-to-a-dynamic-equation">3.2 From the Solow model to a dynamic equation&lt;/h3>
&lt;p>Why does a dynamic equation &amp;mdash; one with lagged GDP on the right-hand side &amp;mdash; arise naturally in growth economics? The answer comes from the &lt;strong>Solow growth model&lt;/strong> and its convergence prediction. The Solow model predicts that poorer countries should grow faster than richer ones, conditional on their structural characteristics (&lt;strong>beta convergence&lt;/strong>). Through a series of algebraic steps &amp;mdash; defining a persistence parameter, substituting observable country characteristics for the unobserved steady state, and adding fixed effects &amp;mdash; the convergence equation yields the following dynamic panel model:&lt;/p>
&lt;p>$$\ln y_{it} = \alpha \ln y_{i,t-1} + \beta' x_{it} + \eta_i + \zeta_t + v_{it}$$&lt;/p>
&lt;p>This is the &lt;strong>dynamic panel model&lt;/strong> that the Bayesian DSM package estimates. The coefficient $\alpha$ has a direct economic interpretation: it measures the &lt;strong>persistence of GDP&lt;/strong> across periods. A value of $\alpha$ close to 1 means slow convergence &amp;mdash; countries stay near their current income level for a long time. A value close to 0 means fast convergence &amp;mdash; countries quickly reach their steady state. Our BMA results will reveal $\alpha \approx 0.92$, indicating very slow convergence: after a decade, countries have closed only about 8% of the gap between their current GDP and their steady state.&lt;/p>
&lt;p>The key insight is that the lagged dependent variable is not an ad hoc addition &amp;mdash; it arises directly from the Solow model&amp;rsquo;s convergence prediction. Any study of growth determinants that omits lagged GDP is implicitly assuming $\alpha = 0$, which means assuming &lt;em>instantaneous convergence&lt;/em> &amp;mdash; a prediction strongly rejected by the data. For the full step-by-step derivation from the Solow convergence equation, see Appendix B.&lt;/p>
&lt;h3 id="33-weak-exogeneity-and-the-role-of-each-component">3.3 Weak exogeneity and the role of each component&lt;/h3>
&lt;p>Each component of the dynamic panel equation plays a distinct role:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Lagged dependent variable&lt;/strong> ($y_{it-1}$): Think of this as a student&amp;rsquo;s previous exam score &amp;mdash; it captures all the accumulated history that got a country to its current level. After controlling for where a country &lt;em>was&lt;/em>, we can ask: among countries at the same starting point, which factors predict who grows faster?&lt;/li>
&lt;li>&lt;strong>Entity fixed effects&lt;/strong> ($\eta_i$): Like grading on a curve within each classroom &amp;mdash; these absorb time-invariant country traits such as geography, colonial history, and institutional heritage. We compare each country to its own average, not to other countries.&lt;/li>
&lt;li>&lt;strong>Time fixed effects&lt;/strong> ($\zeta_t$): These remove global shocks that affect all countries simultaneously, such as oil crises or the Asian financial crisis.&lt;/li>
&lt;/ul>
&lt;p>To understand this assumption, consider a concrete example. Suppose an oil price shock in 1985 affects both GDP and trade openness simultaneously. Weak exogeneity allows this kind of contemporaneous correlation between regressors and the fixed effects. What it rules out is that the &lt;em>unexplained&lt;/em> part of today&amp;rsquo;s GDP shock &amp;mdash; the idiosyncratic error $v_{it}$ &amp;mdash; directly causes today&amp;rsquo;s investment to change within the same period.&lt;/p>
&lt;p>The key assumption is &lt;strong>weak exogeneity&lt;/strong>: current regressors can be correlated with &lt;em>past&lt;/em> shocks but not with the &lt;em>current&lt;/em> shock $v_{it}$. This is much weaker than strict exogeneity &amp;mdash; it allows past GDP growth to influence current investment (feedback effects) while requiring only that the current unexpected shock to GDP does not simultaneously cause changes in investment. In practical terms, weak exogeneity permits the realistic feedback loops that plague growth regressions while still allowing consistent estimation.&lt;/p>
&lt;h3 id="34-from-cross-sectional-to-dynamic-panel-bma">3.4 From cross-sectional to dynamic panel BMA&lt;/h3>
&lt;p>&lt;strong>Cross-sectional BMA&lt;/strong> uses a single time snapshot, assumes strict exogeneity, includes no lagged dependent variable, and has no fixed effects. &lt;strong>Dynamic panel BMA&lt;/strong> uses multiple time periods, requires only weak exogeneity, includes a lagged dependent variable, and controls for entity and time fixed effects. Both approaches address model uncertainty by averaging across all possible model specifications.&lt;/p>
&lt;p>In the &lt;a href="https://carlos-mendez.org/post/r_bma_lasso_wals/">companion cross-sectional tutorial&lt;/a>, we averaged across 4,096 models of CO&lt;sub>2&lt;/sub> emissions using synthetic data. Here we apply the same BMA principle &amp;mdash; weighting models by how well they fit the data &amp;mdash; but to a panel of 73 countries over four decades, using the methodology that handles the endogeneity that cross-sectional BMA cannot.&lt;/p>
&lt;h2 id="4-the-dataset">4. The Dataset&lt;/h2>
&lt;h3 id="41-loading-the-data">4.1 Loading the data&lt;/h3>
&lt;p>The package includes two versions of the Moral-Benito (2016) economic growth dataset. The &lt;code>economic_growth&lt;/code> version has the lagged dependent variable already merged into the panel structure (with NAs in the initial period), while &lt;code>original_economic_growth&lt;/code> keeps it as a separate column.&lt;/p>
&lt;pre>&lt;code class="language-r">data(&amp;quot;economic_growth&amp;quot;)
data(&amp;quot;original_economic_growth&amp;quot;)
cat(&amp;quot;economic_growth:&amp;quot;, dim(economic_growth), &amp;quot;\n&amp;quot;)
cat(&amp;quot;Countries:&amp;quot;, length(unique(economic_growth$country)), &amp;quot;\n&amp;quot;)
cat(&amp;quot;Years:&amp;quot;, sort(unique(economic_growth$year)), &amp;quot;\n&amp;quot;)
head(economic_growth, 5)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">economic_growth: 365 12
Countries: 73
Years: 1960 1970 1980 1990 2000
# A tibble: 5 x 12
year country gdp ish sed pgrw pop ipr opem gsh lnlex polity
&amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
1 1960 1 8.25 NA NA NA NA NA NA NA NA NA
2 1970 1 8.37 0.122 0.139 0.0235 10.9 61.1 1.08 0.191 3.88 0.15
3 1980 1 8.54 0.207 0.141 0.0300 13.9 92.3 1.06 0.203 4.00 0.15
4 1990 1 8.63 0.203 0.28 0.0303 18.9 100. 0.898 0.232 4.10 0.15
5 2000 1 8.66 0.115 0.774 0.0215 25.3 81.2 0.636 0.219 4.21 0.575
&lt;/code>&lt;/pre>
&lt;p>The panel covers 73 countries observed at 10-year intervals from 1960 to 2000, yielding 5 periods per country (365 total rows, including the initial 1960 observation). The 1960 row for each country contains only the initial GDP level &amp;mdash; all regressors are NA because there is no &amp;ldquo;previous decade&amp;rdquo; to compute changes from. The four subsequent decades (1970&amp;ndash;2000) contain the 292 usable observations.&lt;/p>
&lt;h3 id="42-variable-descriptions">4.2 Variable descriptions&lt;/h3>
&lt;p>The dataset contains the dependent variable (log GDP per capita) and 9 candidate growth determinants:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>Description&lt;/th>
&lt;th style="text-align:center">Expected sign&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>gdp&lt;/code>&lt;/td>
&lt;td>Log real GDP per capita (dependent variable)&lt;/td>
&lt;td style="text-align:center">&amp;mdash;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>ish&lt;/code>&lt;/td>
&lt;td>Investment share of GDP&lt;/td>
&lt;td style="text-align:center">+&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>sed&lt;/code>&lt;/td>
&lt;td>Secondary school enrollment rate&lt;/td>
&lt;td style="text-align:center">+&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>pgrw&lt;/code>&lt;/td>
&lt;td>Population growth rate&lt;/td>
&lt;td style="text-align:center">&amp;ndash;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>pop&lt;/code>&lt;/td>
&lt;td>Population (millions)&lt;/td>
&lt;td style="text-align:center">?&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>ipr&lt;/code>&lt;/td>
&lt;td>Investment price (relative to US)&lt;/td>
&lt;td style="text-align:center">&amp;ndash;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>opem&lt;/code>&lt;/td>
&lt;td>Trade openness (imports + exports / GDP)&lt;/td>
&lt;td style="text-align:center">+&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>gsh&lt;/code>&lt;/td>
&lt;td>Government consumption share of GDP&lt;/td>
&lt;td style="text-align:center">&amp;ndash;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>lnlex&lt;/code>&lt;/td>
&lt;td>Log life expectancy at birth&lt;/td>
&lt;td style="text-align:center">+&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>polity&lt;/code>&lt;/td>
&lt;td>Democracy index (0 = autocracy, 1 = democracy)&lt;/td>
&lt;td style="text-align:center">?&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>These variables are standard in the empirical growth literature, following Sala-i-Martin, Doppelhofer, and Miller (2004). Investment share and education are expected to have positive effects on growth, while population growth and government consumption are typically associated with slower growth. The signs for population and democracy are theoretically ambiguous.&lt;/p>
&lt;p>The 292 usable observations span 73 countries over four decades. Log GDP per capita ranges from 6.02 to 10.45, reflecting substantial income inequality &amp;mdash; the richest country is roughly 80 times wealthier than the poorest in per capita terms. Investment share averages 16.9% of GDP but ranges from 1.2% to 65.3%, indicating enormous variation in capital accumulation across countries and decades. Population growth averages 1.9% per decade, with one country experiencing slight population decline (&amp;ndash;0.6%).&lt;/p>
&lt;h2 id="5-data-preparation">5. Data Preparation&lt;/h2>
&lt;p>The Bayesian DSM package requires two data preprocessing steps before estimation: standardization (scaling) and demeaning (removing entity and time fixed effects). These steps ensure numerical stability and allow the model to focus on within-country, within-period variation.&lt;/p>
&lt;h3 id="51-understanding-the-data-structure">5.1 Understanding the data structure&lt;/h3>
&lt;p>If your data has the lagged dependent variable as a separate column (like &lt;code>original_economic_growth&lt;/code>), you first need to merge it into the panel structure using &lt;a href="https://cran.r-project.org/web/packages/bdsm/vignettes/bdsm_vignette.Rnw" target="_blank" rel="noopener">&lt;code>join_lagged_col()&lt;/code>&lt;/a>. This function creates the initial period row with NAs:&lt;/p>
&lt;pre>&lt;code class="language-r"># Demonstration: converting original format to package format
eg_joined &amp;lt;- join_lagged_col(
df = original_economic_growth,
col = gdp,
col_lagged = lag_gdp,
timestamp_col = year,
entity_col = country,
timestep = 10 # 10-year intervals
)
cat(&amp;quot;Result:&amp;quot;, dim(eg_joined), &amp;quot;\n&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Result: 365 12
&lt;/code>&lt;/pre>
&lt;p>The &lt;code>economic_growth&lt;/code> dataset already has this structure, so we can use it directly.&lt;/p>
&lt;h3 id="52-standardization-and-demeaning">5.2 Standardization and demeaning&lt;/h3>
&lt;p>Data preparation involves two calls to &lt;a href="https://cran.r-project.org/web/packages/bdsm/vignettes/bdsm_vignette.Rnw" target="_blank" rel="noopener">&lt;code>feature_standardization()&lt;/code>&lt;/a>. The first call &lt;em>standardizes&lt;/em> all regressors to have mean zero and unit variance &amp;mdash; this puts all variables on the same scale so that the BMA coefficients are directly comparable. The second call &lt;em>demeans&lt;/em> by time period to remove time fixed effects.&lt;/p>
&lt;p>Think of demeaning by time as subtracting the global average for each decade. If every country&amp;rsquo;s GDP grew in the 1990s due to the tech boom, demeaning removes that common trend. What remains is each country&amp;rsquo;s deviation from the global pattern &amp;mdash; the variation that country-specific factors must explain.&lt;/p>
&lt;pre>&lt;code class="language-r"># Step 1: Standardize all regressors (mean=0, sd=1)
# Makes variables comparable: GDP and population are on vastly different scales
data_std &amp;lt;- feature_standardization(
df = economic_growth,
excluded_cols = c(country, year, gdp)
)
# Step 2: Demean by time period (remove time fixed effects)
# Subtracts each decade's global average, isolating country-specific variation
data_prepared &amp;lt;- feature_standardization(
df = data_std,
group_by_col = year,
excluded_cols = country,
scale = FALSE
)
head(data_prepared, 5)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"># A tibble: 5 x 12
year country gdp ish sed pgrw pop ipr opem gsh lnlex polity
&amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
1 1960 1 0.292 NA NA NA NA NA NA NA NA NA
2 1970 1 0.121 -0.493 -0.534 0.163 -0.151 -0.271 1.13 0.0496 -0.549 -0.578
3 1980 1 0.0573 0.241 -0.697 0.942 -0.181 0.0635 1.07 -0.0226 0.0167 -0.578
4 1990 1 0.0724 0.456 -0.932 1.09 -0.203 0.208 0.724 0.101 -0.0655 -0.578
5 2000 1 -0.0823 -0.505 -0.778 0.465 -0.218 -0.0620 -0.120 0.120 -0.107 0.112
&lt;/code>&lt;/pre>
&lt;p>After preparation, all regressor values are centered around zero. Country 1&amp;rsquo;s investment share (&lt;code>ish&lt;/code>) was 0.49 standard deviations below the global average in 1970 but 0.46 standard deviations above average in 1990, showing meaningful within-country variation over time. The GDP column retains its original scale because it is the dependent variable.&lt;/p>
&lt;h2 id="6-estimating-the-full-model-space">6. Estimating the Full Model Space&lt;/h2>
&lt;p>With 9 candidate regressors, there are $2^9 = 512$ possible regression models. The package estimates every single one via numerical optimization of the &lt;em>marginal likelihood&lt;/em> &amp;mdash; the probability of observing the data given a particular model, after integrating out all parameter uncertainty. Think of this as a cooking competition with 512 recipes &amp;mdash; each uses a different combination of 9 ingredients, and the marginal likelihood scores each recipe by balancing flavor (fit) against unnecessary complexity (overfitting).&lt;/p>
&lt;p>To be concrete: model 1 might include only investment and education. Model 2 adds trade openness. Model 3 uses education and democracy but drops investment. Each of the 512 combinations gets its own likelihood estimated separately, and BMA weights them by how well they fit the data.&lt;/p>
&lt;p>The &lt;a href="https://cran.r-project.org/web/packages/bdsm/vignettes/bdsm_vignette.Rnw" target="_blank" rel="noopener">&lt;code>optim_model_space()&lt;/code>&lt;/a> function handles this computation. For the full 9-regressor case, this is the most computationally intensive step &amp;mdash; it can take several minutes depending on the machine. The package helpfully includes a precomputed &lt;code>full_model_space&lt;/code> object so we can skip the wait:&lt;/p>
&lt;pre>&lt;code class="language-r"># Load precomputed model space (or compute from scratch)
data(&amp;quot;full_model_space&amp;quot;)
# To compute from scratch (takes several minutes):
# full_model_space &amp;lt;- optim_model_space(
# df = data_prepared,
# dep_var_col = gdp,
# timestamp_col = year,
# entity_col = country,
# init_value = 0.5
# )
cat(&amp;quot;Parameters matrix:&amp;quot;, dim(full_model_space$params), &amp;quot;\n&amp;quot;)
cat(&amp;quot;Statistics matrix:&amp;quot;, dim(full_model_space$stats), &amp;quot;\n&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Parameters matrix: 106 512
Statistics matrix: 22 512
&lt;/code>&lt;/pre>
&lt;p>The result is a list with two elements. The &lt;code>$params&lt;/code> matrix contains 106 estimated parameters for each of the 512 models &amp;mdash; these include the structural parameters ($\alpha$, $\beta$), reduced-form parameters, and variance components. The &lt;code>$stats&lt;/code> matrix stores 22 statistics per model, including the log-likelihood, BIC, regular standard errors, and robust (heteroskedasticity-consistent) standard errors.&lt;/p>
&lt;p>Why use marginal likelihood instead of R-squared? Unlike R-squared, which always improves when you add variables, the marginal likelihood penalizes complexity. It accounts for the fact that more parameters make it easier to fit noise. A model with 9 regressors that barely improves fit over a 5-regressor model will receive a &lt;em>lower&lt;/em> marginal likelihood score &amp;mdash; the extra parameters were not worth the complexity cost.&lt;/p>
&lt;p>Before jumping into BMA, let us first establish a benchmark using a standard regression approach &amp;mdash; this will help us appreciate what BMA adds.&lt;/p>
&lt;h2 id="7-benchmark-kitchen-sink-fixed-effects">7. Benchmark: Kitchen-Sink Fixed Effects&lt;/h2>
&lt;p>Before running BMA, it is useful to establish a benchmark. What happens if we simply throw all 9 regressors into a single fixed effects regression? This &amp;ldquo;kitchen-sink&amp;rdquo; approach is the default in applied work &amp;mdash; but it commits to one model specification and ignores the uncertainty about which variables belong.&lt;/p>
&lt;pre>&lt;code class="language-r"># Kitchen-sink FE regression with all 9 regressors
fe_full &amp;lt;- lm(gdp ~ lag_gdp + ish + sed + pgrw + pop + ipr +
opem + gsh + lnlex + polity +
factor(country) + factor(year),
data = original_economic_growth)
summary(fe_full)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">FE regression coefficients:
Estimate Std. Error t value Pr(&amp;gt;|t|)
lag_gdp 0.6188 0.0501 12.3521 0.0000
ish 0.4646 0.2331 1.9934 0.0475
sed 0.0162 0.0337 0.4798 0.6319
pgrw -2.3352 2.1409 -1.0907 0.2767
pop 0.0016 0.0004 4.5092 0.0000
ipr -0.0003 0.0003 -1.0817 0.2806
opem 0.1199 0.0379 3.1652 0.0018
gsh -0.7448 0.2700 -2.7585 0.0063
lnlex 0.1153 0.2440 0.4727 0.6369
polity -0.1656 0.0570 -2.9065 0.0041
Significant at 5%: lag_gdp, ish, pop, opem, gsh, polity
R-squared: 0.988
N observations: 292
&lt;/code>&lt;/pre>
&lt;p>The kitchen-sink model finds 6 of 10 variables significant at the 5% level: lagged GDP, investment share, population, trade openness, government share, and democracy. Education, population growth, investment price, and life expectancy are insignificant. But this result depends entirely on this particular specification &amp;mdash; drop one variable or add another, and the significance pattern may change. This is the model uncertainty problem that BMA is designed to solve.&lt;/p>
&lt;p>The lagged GDP coefficient of 0.619 is notably lower than the BMA posterior mean (0.919), suggesting that the kitchen-sink model&amp;rsquo;s coefficient estimates are pulled by multicollinearity among the 9 regressors. BMA handles this by averaging over specifications that include different subsets.&lt;/p>
&lt;p>Notice how the FE model forces a binary judgment: education is &amp;lsquo;insignificant&amp;rsquo; (p = 0.63) and trade is &amp;lsquo;significant&amp;rsquo; (p = 0.002). BMA replaces this all-or-nothing verdict with a nuanced probability scale: education has PIP = 0.72 (moderate evidence) and trade has PIP = 0.77 (positive evidence). The difference between &amp;lsquo;insignificant&amp;rsquo; and &amp;lsquo;moderate evidence&amp;rsquo; matters for policy &amp;mdash; a policymaker who ignores education entirely because of a p-value threshold may be discarding useful information.&lt;/p>
&lt;p>The kitchen-sink model commits to one specification and produces one set of p-values. But we saw that which variables look &amp;lsquo;significant&amp;rsquo; depends entirely on which others are in the model. Drop one variable, and the significance pattern reshuffles. BMA solves this by never committing to a single specification &amp;mdash; it averages over all 512, letting the data decide which matter most.&lt;/p>
&lt;h2 id="8-bayesian-model-averaging">8. Bayesian Model Averaging&lt;/h2>
&lt;h3 id="81-running-bma">8.1 Running BMA&lt;/h3>
&lt;p>Now we can perform Bayesian Model Averaging across all 512 models. The &lt;a href="https://cran.r-project.org/web/packages/bdsm/vignettes/bdsm_vignette.Rnw" target="_blank" rel="noopener">&lt;code>bma()&lt;/code>&lt;/a> function takes the precomputed model space and the prepared data, weights each model by its posterior probability, and computes weighted averages of the coefficients:&lt;/p>
&lt;p>&lt;em>Focus on two columns: &lt;strong>PIP&lt;/strong> (how important is this variable?) and &lt;strong>%(+)&lt;/strong> (is its effect consistently positive or negative?).&lt;/em>&lt;/p>
&lt;pre>&lt;code class="language-r">bma_results &amp;lt;- bma(full_model_space, df = data_prepared, round = 3)
# Binomial prior results
print(bma_results[[1]])
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> PIP PM PSD PSDR PMcon PSDcon PSDRcon %(+)
gdp_lag NA 0.919 0.077 0.109 0.919 0.077 0.109 100.000
ish 0.773 0.063 0.045 0.062 0.082 0.034 0.059 100.000
sed 0.717 0.030 0.057 0.074 0.042 0.064 0.084 69.922
pgrw 0.714 0.018 0.030 0.052 0.025 0.033 0.060 99.609
pop 0.990 0.119 0.065 0.082 0.121 0.064 0.081 100.000
ipr 0.656 -0.034 0.033 0.044 -0.051 0.027 0.046 0.000
opem 0.766 0.034 0.030 0.033 0.044 0.026 0.031 100.000
gsh 0.751 -0.015 0.041 0.091 -0.020 0.046 0.104 30.859
lnlex 0.864 0.088 0.075 0.098 0.102 0.071 0.099 100.000
polity 0.678 -0.057 0.046 0.053 -0.084 0.030 0.044 0.000
&lt;/code>&lt;/pre>
&lt;p>The binomial prior results reveal a clear hierarchy among the 9 candidate regressors. Population size (&lt;code>pop&lt;/code>) dominates with PIP = 0.990 &amp;mdash; appearing in virtually every high-quality model &amp;mdash; followed by life expectancy (&lt;code>lnlex&lt;/code>) at 0.864 and investment share (&lt;code>ish&lt;/code>) at 0.773. At the other end, investment price (&lt;code>ipr&lt;/code>) at 0.656 and democracy (&lt;code>polity&lt;/code>) at 0.678 show the weakest evidence, though even these exceed 0.5. The lagged GDP coefficient of 0.919 confirms strong persistence: a country&amp;rsquo;s current GDP is heavily determined by its past GDP.&lt;/p>
&lt;h3 id="82-understanding-the-bma-statistics">8.2 Understanding the BMA statistics&lt;/h3>
&lt;p>Each column in the BMA output captures a different aspect of the evidence:&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Beginner tip:&lt;/strong> For a first reading, focus on three columns: &lt;strong>PIP&lt;/strong> (does this variable matter?), &lt;strong>PM&lt;/strong> (what is its average effect?), and &lt;strong>%(+)&lt;/strong> (is the effect consistently positive or negative?). The remaining columns (PSDR, PMcon, PSDcon, PSDRcon) are useful for advanced robustness checks but can be skipped on a first pass.&lt;/p>
&lt;/blockquote>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Statistic&lt;/th>
&lt;th>Full name&lt;/th>
&lt;th>Interpretation&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>PIP&lt;/strong>&lt;/td>
&lt;td>Posterior Inclusion Probability&lt;/td>
&lt;td>Fraction of posterior probability mass in models that include this variable. Think of it as a &lt;strong>batting average&lt;/strong>: PIP = 0.99 means the variable appeared in 99% of high-scoring models&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>PM&lt;/strong>&lt;/td>
&lt;td>Posterior Mean&lt;/td>
&lt;td>Weighted average of the coefficient across all models (including zeros from models that exclude the variable)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>PSD&lt;/strong>&lt;/td>
&lt;td>Posterior Standard Deviation&lt;/td>
&lt;td>Uncertainty around PM, incorporating both within-model and across-model variation&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>PSDR&lt;/strong>&lt;/td>
&lt;td>Posterior SD Ratio&lt;/td>
&lt;td>PSD divided by the conditional PM &amp;mdash; a robustness measure&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>PMcon&lt;/strong>&lt;/td>
&lt;td>Conditional Posterior Mean&lt;/td>
&lt;td>Average coefficient only across models that &lt;em>include&lt;/em> the variable&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>PSDcon&lt;/strong>&lt;/td>
&lt;td>Conditional PSD&lt;/td>
&lt;td>Uncertainty conditional on inclusion&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>%(+)&lt;/strong>&lt;/td>
&lt;td>Positive sign share&lt;/td>
&lt;td>Percentage of models where the coefficient is positive. Values near 0% or 100% indicate stable sign&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The central quantity driving all these statistics is the &lt;strong>posterior model probability&lt;/strong> (PMP). Each model $M_j$ receives a weight proportional to its marginal likelihood times its prior probability:&lt;/p>
&lt;p>$$\mathbb{P}(M_j | \text{data}) = \frac{\exp(-\frac{1}{2} BIC_j) \cdot \mathbb{P}(M_j)}{\sum_{i=1}^{2^K} \exp(-\frac{1}{2} BIC_i) \cdot \mathbb{P}(M_i)}$$&lt;/p>
&lt;p>In words, this equation says that each model&amp;rsquo;s posterior probability is its prior probability times a data-fit term (approximated by the BIC), divided by the sum across all $2^K$ models to ensure the probabilities add to 1. Models that fit the data well without too many parameters receive higher posterior probability. The PIP for a variable is then the sum of PMPs across all models that include it.&lt;/p>
&lt;p>To make this concrete: if model A has BIC = &amp;ndash;800 and model B has BIC = &amp;ndash;790, model A fits the data better. After exponentiating and normalizing, model A might receive 73% of the posterior probability while model B gets 27%. The PIP of a variable included only in model A would then be at least 0.73.&lt;/p>
&lt;p>The &lt;strong>PSDR&lt;/strong> (or equivalently |PM/PSD|) is a key robustness criterion. Raftery (1995) considers a variable &lt;em>robust&lt;/em> when |PM/PSD| &amp;gt; 1. More stringent thresholds include |PM/PSD| &amp;gt; 1.3 (Masanjala and Papageorgiou, 2008) and |PM/PSD| &amp;gt; 2 (Sala-i-Martin et al., 2004).&lt;/p>
&lt;h3 id="83-interpreting-pips-with-rafterys-classification">8.3 Interpreting PIPs with Raftery&amp;rsquo;s classification&lt;/h3>
&lt;p>Raftery (1995) provides a standard classification for the strength of evidence based on PIP values:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>PIP range&lt;/th>
&lt;th>Evidence&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&amp;gt; 0.99&lt;/td>
&lt;td>Very strong&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>0.95 &amp;ndash; 0.99&lt;/td>
&lt;td>Strong&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>0.75 &amp;ndash; 0.95&lt;/td>
&lt;td>Positive&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>0.50 &amp;ndash; 0.75&lt;/td>
&lt;td>Weak&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Under the binomial prior, &lt;code>pop&lt;/code> (PIP = 0.990) reaches &lt;em>strong&lt;/em> evidence &amp;mdash; just short of the &amp;ldquo;very strong&amp;rdquo; threshold at 0.99. Life expectancy (&lt;code>lnlex&lt;/code> at 0.864), investment share (&lt;code>ish&lt;/code> at 0.773), trade openness (&lt;code>opem&lt;/code> at 0.766), and government share (&lt;code>gsh&lt;/code> at 0.751) fall in the &lt;em>positive&lt;/em> evidence range. The remaining four variables &amp;mdash; education, population growth, investment price, and democracy &amp;mdash; show &lt;em>weak&lt;/em> evidence (0.65&amp;ndash;0.72). No variable has PIP below 0.5, suggesting the data supports relatively large models.&lt;/p>
&lt;p>The &lt;strong>sign stability&lt;/strong> column (%(+)) provides an additional robustness check. Seven of the nine regressors have perfectly stable signs: investment share, population growth, population, trade openness, and life expectancy are always positive (100%), while investment price and democracy are always negative (0%). Government share has %(+) = 30.9%, meaning its sign is negative in about 70% of models &amp;mdash; moderately unstable. Education has %(+) = 69.9%, with a positive coefficient in about 70% of models but negative in 30%.&lt;/p>
&lt;p>The following chart visualizes the PIPs with color-coded evidence tiers. We first define a dark-theme palette and extract the BMA statistics into a data frame, then build the plot:&lt;/p>
&lt;pre>&lt;code class="language-r"># Dark theme palette (matching site navbar/footer)
DARK_BG &amp;lt;- &amp;quot;#0f1729&amp;quot;
LIGHT_TEXT &amp;lt;- &amp;quot;#c8d0e0&amp;quot;
LIGHTER_TEXT &amp;lt;- &amp;quot;#e8ecf2&amp;quot;
# Extract BMA statistics into a data frame
bma_tab &amp;lt;- bma_results[[1]]
pip_df &amp;lt;- data.frame(
variable = rownames(bma_tab)[-1],
pip = bma_tab[-1, &amp;quot;PIP&amp;quot;],
pm = bma_tab[-1, &amp;quot;PM&amp;quot;],
psd = bma_tab[-1, &amp;quot;PSD&amp;quot;],
sign_pos = bma_tab[-1, &amp;quot;%(+)&amp;quot;]
)
# Readable labels and robustness classification
var_labels &amp;lt;- c(ish = &amp;quot;Investment share&amp;quot;, sed = &amp;quot;Education&amp;quot;,
pgrw = &amp;quot;Population growth&amp;quot;, pop = &amp;quot;Population&amp;quot;,
ipr = &amp;quot;Investment price&amp;quot;, opem = &amp;quot;Trade openness&amp;quot;,
gsh = &amp;quot;Government share&amp;quot;, lnlex = &amp;quot;Life expectancy&amp;quot;,
polity = &amp;quot;Democracy&amp;quot;)
pip_df$label &amp;lt;- var_labels[pip_df$variable]
pip_df$robustness &amp;lt;- cut(pip_df$pip,
breaks = c(0, 0.50, 0.75, 1),
labels = c(&amp;quot;Weak (PIP &amp;lt; 0.50)&amp;quot;, &amp;quot;Moderate (0.50-0.75)&amp;quot;,
&amp;quot;Positive (PIP &amp;gt;= 0.75)&amp;quot;),
include.lowest = TRUE)
# PIP bar chart
ggplot(pip_df, aes(x = reorder(label, pip), y = pip,
fill = robustness)) +
geom_col(width = 0.65) +
geom_hline(yintercept = 0.75, linetype = &amp;quot;dashed&amp;quot;,
color = LIGHT_TEXT) +
geom_hline(yintercept = 0.50, linetype = &amp;quot;dotted&amp;quot;,
color = LIGHT_TEXT, alpha = 0.6) +
coord_flip() +
scale_fill_manual(values = c(
&amp;quot;Positive (PIP &amp;gt;= 0.75)&amp;quot; = &amp;quot;#6a9bcc&amp;quot;,
&amp;quot;Moderate (0.50-0.75)&amp;quot; = &amp;quot;#00d4c8&amp;quot;,
&amp;quot;Weak (PIP &amp;lt; 0.50)&amp;quot; = &amp;quot;#d97757&amp;quot;)) +
labs(x = NULL, y = &amp;quot;Posterior Inclusion Probability (PIP)&amp;quot;,
fill = &amp;quot;Evidence strength&amp;quot;,
title = &amp;quot;BMA: Posterior Inclusion Probabilities&amp;quot;,
subtitle = &amp;quot;Binomial prior (EMS = 4.5), 512 models averaged&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_dynamic_bma_pip.png" alt="Posterior Inclusion Probabilities for all 9 regressors, sorted by PIP with threshold lines.">&lt;/p>
&lt;p>Population dominates the chart at PIP = 0.990, followed by life expectancy at 0.864. Five variables clear the 0.75 &amp;ldquo;positive evidence&amp;rdquo; threshold, while the remaining four &amp;mdash; democracy, education, population growth, and investment price &amp;mdash; fall in the &amp;ldquo;moderate&amp;rdquo; zone between 0.50 and 0.75. Compared to the kitchen-sink benchmark where 6 of 10 variables were significant at 5%, BMA paints a more nuanced picture: it grades each variable on a continuous scale of importance rather than imposing a binary significant/insignificant cutoff.&lt;/p>
&lt;h2 id="9-visualizing-model-probabilities">9. Visualizing Model Probabilities&lt;/h2>
&lt;h3 id="91-prior-versus-posterior-model-probabilities">9.1 Prior versus posterior model probabilities&lt;/h3>
&lt;p>The &lt;a href="https://cran.r-project.org/web/packages/bdsm/vignettes/bdsm_vignette.Rnw" target="_blank" rel="noopener">&lt;code>model_pmp()&lt;/code>&lt;/a> function visualizes how the data transforms our prior beliefs about which models are best. The prior assigns probability to each of the 512 models, and the data concentrates posterior mass on the models that fit best:&lt;/p>
&lt;pre>&lt;code class="language-r">pmp_plots &amp;lt;- model_pmp(bma_results)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_bdsm_03_model_pmp_combined.png" alt="Prior and posterior model probabilities across all 512 models.">&lt;/p>
&lt;p>The prior (dashed line) is relatively flat, reflecting the uniform prior assumption. The posterior (solid line) concentrates dramatically: a handful of models capture the bulk of the posterior mass, while most models receive negligible probability. This concentration is the signature of informative data &amp;mdash; the 73-country, 4-decade panel provides enough information to strongly favor certain model specifications.&lt;/p>
&lt;h3 id="92-model-sizes">9.2 Model sizes&lt;/h3>
&lt;p>The &lt;a href="https://cran.r-project.org/web/packages/bdsm/vignettes/bdsm_vignette.Rnw" target="_blank" rel="noopener">&lt;code>model_sizes()&lt;/code>&lt;/a> function shows the distribution of prior and posterior probabilities across model sizes (number of included regressors, excluding the lagged dependent variable):&lt;/p>
&lt;pre>&lt;code class="language-r">size_plots &amp;lt;- model_sizes(bma_results)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_bdsm_05_model_sizes.png" alt="Prior and posterior distribution over model sizes.">&lt;/p>
&lt;p>The expected model sizes confirm this visually:&lt;/p>
&lt;pre>&lt;code class="language-r">print(bma_results[[16]])
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Prior models size Posterior model size
Binomial 4.5 6.908
Binomial-beta 4.5 8.556
&lt;/code>&lt;/pre>
&lt;p>The posterior strongly favors larger models. While the binomial prior centers mass on models with 4&amp;ndash;5 regressors (EMS = 4.5), the posterior shifts toward 7 regressors (6.908). Under the binomial-beta prior, the shift is even more dramatic: the posterior expected model size reaches 8.556, meaning the data wants to include nearly all 9 candidate regressors. This is consistent with the finding that all variables have PIP above 0.65 &amp;mdash; the data sees signal in most candidates.&lt;/p>
&lt;h2 id="10-examining-top-models">10. Examining Top Models&lt;/h2>
&lt;p>The &lt;a href="https://cran.r-project.org/web/packages/bdsm/vignettes/bdsm_vignette.Rnw" target="_blank" rel="noopener">&lt;code>best_models()&lt;/code>&lt;/a> function lets us inspect the specific variable combinations and coefficient estimates in the top-ranked models:&lt;/p>
&lt;pre>&lt;code class="language-r">best8 &amp;lt;- best_models(bma_results, criterion = 1, best = 8)
print(best8[[1]]) # Inclusion matrix
&lt;/code>&lt;/pre>
&lt;p>&lt;em>Reading the inclusion matrix: each column is a model (ranked by fit), each row is a variable. A value of 1 means the variable is included in that model. Look for variables that appear in every top model &amp;mdash; those are the most robust.&lt;/em>&lt;/p>
&lt;pre>&lt;code class="language-text"> 'No. 1' 'No. 2' 'No. 3' 'No. 4' 'No. 5' 'No. 6' 'No. 7' 'No. 8'
gdp_lag 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
ish 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000
sed 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000
pgrw 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000
pop 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
ipr 1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000
opem 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000
gsh 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000
lnlex 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
polity 1.000 1.000 0.000 1.000 1.000 1.000 1.000 1.000
PMP 0.089 0.044 0.042 0.036 0.035 0.029 0.026 0.025
&lt;/code>&lt;/pre>
&lt;p>A striking pattern emerges: the top model includes &lt;em>all 9 regressors&lt;/em> (PMP = 8.9%), and the next 7 best models are each formed by dropping exactly one variable from the full set. This &amp;ldquo;kitchen sink minus one&amp;rdquo; pattern confirms that the data supports large models.&lt;/p>
&lt;p>Two variables are never dropped across the top 8 models: &lt;code>pop&lt;/code> and &lt;code>lnlex&lt;/code> &amp;mdash; they appear in all 8, consistent with their high PIPs of 0.990 and 0.864. The variables dropped in models 2&amp;ndash;8 are &lt;code>ipr&lt;/code>, &lt;code>polity&lt;/code>, &lt;code>sed&lt;/code>, &lt;code>pgrw&lt;/code>, &lt;code>gsh&lt;/code>, &lt;code>opem&lt;/code>, and &lt;code>ish&lt;/code> &amp;mdash; precisely the variables with the lowest PIPs.&lt;/p>
&lt;p>We can also examine the coefficient estimates in the best model using the knitr-formatted output:&lt;/p>
&lt;pre>&lt;code class="language-r"># Estimation results for the best model (knitr format)
print(best8[[5]])
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Best model (No. 1) estimates:
gdp_lag 0.954 (0.076)*** pop 0.065 (0.056)
ish 0.079 (0.032)** ipr -0.056 (0.027)**
sed 0.034 (0.065) opem 0.043 (0.025)*
pgrw 0.025 (0.033) gsh -0.043 (0.050)
lnlex 0.151 (0.060)** polity -0.092 (0.032)***
&lt;/code>&lt;/pre>
&lt;p>In the best model (No. 1), the lagged GDP coefficient is 0.954 (SE = 0.076, significant at 1%), confirming the very slow convergence we derived from the Solow model. Investment share has a positive and significant coefficient of 0.079, while democracy has a negative and highly significant coefficient of &amp;ndash;0.092. Life expectancy is positive and significant at 0.151. Education, despite being included in 7 of the top 8 models, has a large standard error (0.034, SE = 0.065) &amp;mdash; explaining its moderate PIP despite frequent inclusion.&lt;/p>
&lt;p>This combination &amp;mdash; high inclusion rate but imprecise coefficient &amp;mdash; happens when most models agree that education &lt;em>belongs&lt;/em> in the model but disagree about its magnitude. Some estimate a positive effect of +0.08, others a negative &amp;ndash;0.02. The variable is probably relevant, but the data does not pin down its direction.&lt;/p>
&lt;p>Beyond these top models, how do the coefficients distribute across all 512 specifications? The next section examines the full posterior distributions.&lt;/p>
&lt;h2 id="11-coefficient-distributions">11. Coefficient Distributions&lt;/h2>
&lt;p>Before examining individual coefficient distributions, it is helpful to see all posterior means and their uncertainty at a glance. We compute approximate 95% credible intervals as the posterior mean plus or minus two posterior standard deviations:&lt;/p>
&lt;pre>&lt;code class="language-r"># Approximate 95% credible intervals
pip_df$ci_low &amp;lt;- pip_df$pm - 2 * pip_df$psd
pip_df$ci_high &amp;lt;- pip_df$pm + 2 * pip_df$psd
# Coefficient point-range plot
ggplot(pip_df, aes(x = reorder(label, pip), y = pm,
color = robustness)) +
geom_hline(yintercept = 0, linetype = &amp;quot;solid&amp;quot;,
color = LIGHT_TEXT, alpha = 0.4) +
geom_pointrange(aes(ymin = ci_low, ymax = ci_high),
size = 0.6, linewidth = 0.8) +
coord_flip() +
scale_color_manual(values = c(
&amp;quot;Positive (PIP &amp;gt;= 0.75)&amp;quot; = &amp;quot;#6a9bcc&amp;quot;,
&amp;quot;Moderate (0.50-0.75)&amp;quot; = &amp;quot;#00d4c8&amp;quot;,
&amp;quot;Weak (PIP &amp;lt; 0.50)&amp;quot; = &amp;quot;#d97757&amp;quot;)) +
labs(x = NULL, y = &amp;quot;Posterior Mean Coefficient&amp;quot;,
color = &amp;quot;Evidence strength&amp;quot;,
title = &amp;quot;BMA: Posterior Coefficient Estimates&amp;quot;,
subtitle = &amp;quot;Points = posterior mean, bars = PM +/- 2*PSD&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_dynamic_bma_coef.png" alt="Posterior coefficient estimates with approximate 95% credible intervals for all 9 regressors.">&lt;/p>
&lt;p>Population and life expectancy have the largest positive posterior means, with credible intervals that do not cross zero &amp;mdash; consistent with their high PIPs. Democracy (polity) has a clearly negative effect, also with an interval that excludes zero. Investment price is negative but with a wider interval. Education and government share have credible intervals that straddle zero, reflecting sign instability. Compared to the kitchen-sink FE model, BMA produces posterior means that account for model uncertainty: the intervals are wider than standard confidence intervals because they incorporate variation &lt;em>across&lt;/em> model specifications, not just &lt;em>within&lt;/em> a single specification.&lt;/p>
&lt;p>The &lt;a href="https://cran.r-project.org/web/packages/bdsm/vignettes/bdsm_vignette.Rnw" target="_blank" rel="noopener">&lt;code>coef_hist()&lt;/code>&lt;/a> function provides more detailed views of the full posterior distribution of each coefficient across all 512 models, weighted by posterior model probability:&lt;/p>
&lt;pre>&lt;code class="language-r">coef_plots &amp;lt;- coef_hist(bma_results)
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Population&lt;/strong> &amp;mdash; the most robust determinant:&lt;/p>
&lt;pre>&lt;code class="language-r">print(coef_plots[[5]])
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_bdsm_09_coef_hist_pop.png" alt="Posterior coefficient distribution for population.">&lt;/p>
&lt;p>Population has a tight, entirely positive distribution centered around 0.12, confirming strong and stable evidence for a positive effect on growth.&lt;/p>
&lt;p>These results hold under the default binomial prior. But how sensitive are they to our choice of prior? The next section stress-tests the findings.&lt;/p>
&lt;h2 id="12-sensitivity-to-prior-specification">12. Sensitivity to Prior Specification&lt;/h2>
&lt;p>A critical step in any BMA analysis is checking whether the results change when we alter our prior beliefs. If a variable&amp;rsquo;s PIP is high under one prior but low under another, we should be cautious about declaring it a robust determinant. The following chart compares PIPs across three prior specifications at a glance:&lt;/p>
&lt;pre>&lt;code class="language-r"># Extract PIPs from three prior specifications
bma_tab_bb &amp;lt;- bma_results[[2]] # Binomial-beta
bma_tab_ems2 &amp;lt;- bma_ems2[[1]] # Skeptical (EMS = 2)
sens_df &amp;lt;- data.frame(
label = pip_df$label,
Binomial = pip_df$pip,
BinBeta = bma_tab_bb[-1, &amp;quot;PIP&amp;quot;],
EMS2 = bma_tab_ems2[-1, &amp;quot;PIP&amp;quot;])
# Pivot to long format for ggplot
sens_long &amp;lt;- sens_df %&amp;gt;%
pivot_longer(cols = c(Binomial, BinBeta, EMS2),
names_to = &amp;quot;prior&amp;quot;, values_to = &amp;quot;pip&amp;quot;) %&amp;gt;%
mutate(prior = factor(prior,
levels = c(&amp;quot;EMS2&amp;quot;, &amp;quot;Binomial&amp;quot;, &amp;quot;BinBeta&amp;quot;),
labels = c(&amp;quot;Skeptical (EMS=2)&amp;quot;, &amp;quot;Binomial (EMS=4.5)&amp;quot;,
&amp;quot;Binomial-Beta&amp;quot;)))
# Connecting segments showing the range across priors
seg_df &amp;lt;- sens_df %&amp;gt;%
mutate(pip_min = pmin(Binomial, BinBeta, EMS2),
pip_max = pmax(Binomial, BinBeta, EMS2))
# Dumbbell chart
ggplot() +
geom_vline(xintercept = 0.75, linetype = &amp;quot;dashed&amp;quot;,
color = LIGHT_TEXT) +
geom_vline(xintercept = 0.50, linetype = &amp;quot;dotted&amp;quot;,
color = LIGHT_TEXT, alpha = 0.6) +
geom_segment(data = seg_df,
aes(x = pip_min, xend = pip_max,
y = reorder(label, Binomial),
yend = reorder(label, Binomial)),
color = LIGHT_TEXT, alpha = 0.3, linewidth = 1.5) +
geom_point(data = sens_long,
aes(x = pip, y = reorder(label, pip), color = prior),
size = 3.5) +
scale_color_manual(values = c(
&amp;quot;Skeptical (EMS=2)&amp;quot; = &amp;quot;#d97757&amp;quot;,
&amp;quot;Binomial (EMS=4.5)&amp;quot; = &amp;quot;#6a9bcc&amp;quot;,
&amp;quot;Binomial-Beta&amp;quot; = &amp;quot;#00d4c8&amp;quot;)) +
labs(x = &amp;quot;Posterior Inclusion Probability (PIP)&amp;quot;, y = NULL,
color = &amp;quot;Model prior&amp;quot;,
title = &amp;quot;Prior Sensitivity: How Robust Are the PIPs?&amp;quot;,
subtitle = &amp;quot;Same data, three different prior specifications&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_dynamic_bma_sensitivity.png" alt="Prior sensitivity: PIPs under three different prior specifications.">&lt;/p>
&lt;p>The width of each horizontal segment shows how much a variable&amp;rsquo;s PIP changes across priors. Population is rock-solid: its PIP barely moves (0.964&amp;ndash;0.998) regardless of the prior. Life expectancy shows moderate sensitivity (0.637&amp;ndash;0.974). The bottom four variables &amp;mdash; democracy, education, population growth, and investment price &amp;mdash; are the most sensitive, with PIPs ranging from 0.34&amp;ndash;0.94 depending on the prior. This visual makes the key message immediately clear: &lt;strong>only population and life expectancy are robust across all prior specifications&lt;/strong>.&lt;/p>
&lt;h3 id="121-binomial-versus-binomial-beta-prior">12.1 Binomial versus binomial-beta prior&lt;/h3>
&lt;p>The default analysis already computes both priors. The &lt;strong>binomial prior&lt;/strong> assigns each variable an independent probability of inclusion equal to EMS/K (where EMS is the expected model size and K is the number of regressors). The &lt;strong>binomial-beta prior&lt;/strong> is more flexible &amp;mdash; it places a prior on the inclusion probability itself, allowing the data to determine how many variables should be included.&lt;/p>
&lt;p>Under the binomial-beta prior, all PIPs increase substantially. Population reaches 0.998, life expectancy reaches 0.974, and even the lowest-ranked variable (investment price) reaches 0.924. The posterior expected model size jumps to 8.556 &amp;mdash; the binomial-beta prior allows the data to express its preference for large models even more strongly than the binomial prior.&lt;/p>
&lt;p>Comparing PIPs across the two priors:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th style="text-align:center">PIP (Binomial)&lt;/th>
&lt;th style="text-align:center">PIP (Binomial-Beta)&lt;/th>
&lt;th style="text-align:center">Sign&lt;/th>
&lt;th style="text-align:center">Evidence strength&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>pop&lt;/td>
&lt;td style="text-align:center">0.990&lt;/td>
&lt;td style="text-align:center">0.998&lt;/td>
&lt;td style="text-align:center">+&lt;/td>
&lt;td style="text-align:center">Very strong&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>lnlex&lt;/td>
&lt;td style="text-align:center">0.864&lt;/td>
&lt;td style="text-align:center">0.974&lt;/td>
&lt;td style="text-align:center">+&lt;/td>
&lt;td style="text-align:center">Strong&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>ish&lt;/td>
&lt;td style="text-align:center">0.773&lt;/td>
&lt;td style="text-align:center">0.954&lt;/td>
&lt;td style="text-align:center">+&lt;/td>
&lt;td style="text-align:center">Positive → Strong&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>opem&lt;/td>
&lt;td style="text-align:center">0.766&lt;/td>
&lt;td style="text-align:center">0.952&lt;/td>
&lt;td style="text-align:center">+&lt;/td>
&lt;td style="text-align:center">Positive → Strong&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>gsh&lt;/td>
&lt;td style="text-align:center">0.751&lt;/td>
&lt;td style="text-align:center">0.948&lt;/td>
&lt;td style="text-align:center">&amp;ndash;/+&lt;/td>
&lt;td style="text-align:center">Positive → Strong&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>sed&lt;/td>
&lt;td style="text-align:center">0.717&lt;/td>
&lt;td style="text-align:center">0.938&lt;/td>
&lt;td style="text-align:center">+/&amp;ndash;&lt;/td>
&lt;td style="text-align:center">Weak → Strong&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>pgrw&lt;/td>
&lt;td style="text-align:center">0.714&lt;/td>
&lt;td style="text-align:center">0.938&lt;/td>
&lt;td style="text-align:center">+&lt;/td>
&lt;td style="text-align:center">Weak → Strong&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>polity&lt;/td>
&lt;td style="text-align:center">0.678&lt;/td>
&lt;td style="text-align:center">0.929&lt;/td>
&lt;td style="text-align:center">&amp;ndash;&lt;/td>
&lt;td style="text-align:center">Weak → Strong&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>ipr&lt;/td>
&lt;td style="text-align:center">0.656&lt;/td>
&lt;td style="text-align:center">0.924&lt;/td>
&lt;td style="text-align:center">&amp;ndash;&lt;/td>
&lt;td style="text-align:center">Weak → Strong&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The ranking is stable across priors &amp;mdash; &lt;code>pop&lt;/code> and &lt;code>lnlex&lt;/code> remain the top two, and &lt;code>ipr&lt;/code> and &lt;code>polity&lt;/code> remain the bottom two. However, the absolute PIP values depend heavily on the prior, with the binomial-beta prior being far more inclusive. This is expected: the binomial-beta prior concentrates mass on larger models when the data supports them.&lt;/p>
&lt;h3 id="122-varying-expected-model-size">12.2 Varying expected model size&lt;/h3>
&lt;p>The expected model size (EMS) controls how many regressors the prior expects to be relevant. The default EMS = K/2 = 4.5. Let us see what happens with a skeptical prior (EMS = 2, expecting only 2 of 9 regressors to matter) and a generous prior (EMS = 8):&lt;/p>
&lt;p>With the skeptical EMS = 2 prior, only &lt;code>pop&lt;/code> (PIP = 0.964) and &lt;code>lnlex&lt;/code> (PIP = 0.637) remain above 0.5 under the binomial prior. Investment share drops to 0.483 and democracy falls to 0.372. This tells us that population and life expectancy are the most robust determinants &amp;mdash; they survive even when the prior is heavily biased toward sparse models.&lt;/p>
&lt;p>With EMS = 8, all PIPs exceed 0.94 &amp;mdash; nearly identical to the binomial-beta results, confirming that the data&amp;rsquo;s preference for large models is consistent across prior specifications.&lt;/p>
&lt;p>Full output tables for each prior specification are in Appendix C.&lt;/p>
&lt;h3 id="123-dilution-prior">12.3 Dilution prior&lt;/h3>
&lt;p>Imagine two variables that measure almost the same thing &amp;mdash; say, &amp;lsquo;years of schooling&amp;rsquo; and &amp;lsquo;literacy rate.&amp;rsquo; Including both in a model is redundant, and any model that includes both gets an inflated likelihood simply because it has two ways to capture the same variation.&lt;/p>
&lt;p>When regressors are correlated with each other, standard priors can overcount evidence by giving high probability to models that include near-duplicate variables. The &lt;strong>dilution prior&lt;/strong> (George, 2010) penalizes models whose regressors are highly correlated, adjusting the model prior by the determinant of the correlation matrix:&lt;/p>
&lt;p>$$\mathbb{P}_D(M_j) \propto \mathbb{P}(M_j) \cdot |COR_j|^{\omega}$$&lt;/p>
&lt;p>In words, this formula says that the diluted prior for model $j$ equals the standard prior multiplied by a penalty term. The penalty is the determinant of the correlation matrix among model $j$&amp;rsquo;s regressors, raised to the power $\omega$. When regressors are highly correlated, this determinant is close to zero, pushing the diluted prior toward zero. The parameter $\omega$ controls the strength of the penalty (default = 0.5).&lt;/p>
&lt;pre>&lt;code class="language-r"># Dilution prior with default omega = 0.5
bma_dil &amp;lt;- bma(full_model_space, df = data_prepared,
round = 3, dilution = 1)
print(bma_dil[[1]])
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> PIP PM PSD PSDR PMcon PSDcon PSDRcon %(+)
gdp_lag NA 0.919 0.077 0.107 0.919 0.077 0.107 100.000
ish 0.718 0.058 0.046 0.062 0.081 0.034 0.059 100.000
sed 0.640 0.026 0.055 0.070 0.041 0.064 0.084 69.922
pgrw 0.653 0.017 0.030 0.050 0.026 0.034 0.060 99.609
pop 0.989 0.125 0.065 0.082 0.126 0.064 0.081 100.000
ipr 0.638 -0.033 0.033 0.044 -0.052 0.027 0.045 0.000
opem 0.743 0.034 0.030 0.033 0.046 0.026 0.031 100.000
gsh 0.740 -0.013 0.040 0.090 -0.017 0.046 0.104 30.859
lnlex 0.808 0.081 0.075 0.098 0.100 0.071 0.099 100.000
polity 0.598 -0.049 0.047 0.053 -0.083 0.030 0.044 0.000
&lt;/code>&lt;/pre>
&lt;p>The dilution prior modestly reduces PIPs compared to the standard binomial prior &amp;mdash; for example, &lt;code>ish&lt;/code> drops from 0.773 to 0.718, and &lt;code>polity&lt;/code> drops from 0.678 to 0.598. The posterior expected model size decreases from 6.91 to 6.53. Importantly, the ranking remains unchanged: &lt;code>pop&lt;/code> and &lt;code>lnlex&lt;/code> stay at the top, and the sign stability is unaffected. The dilution prior provides a useful robustness check against multicollinearity inflation.&lt;/p>
&lt;pre>&lt;code class="language-r">sizes_dil &amp;lt;- model_sizes(bma_dil)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_bdsm_16_sizes_dilution.png" alt="Model sizes under the dilution prior.">&lt;/p>
&lt;p>Having examined the evidence from every angle &amp;mdash; PIPs, coefficients, and sensitivity &amp;mdash; let us now synthesize the findings.&lt;/p>
&lt;h2 id="13-summary-of-findings">13. Summary of Findings&lt;/h2>
&lt;h3 id="131-the-robust-determinants">13.1 The robust determinants&lt;/h3>
&lt;p>Combining evidence across all prior specifications, we can classify each regressor by its robustness:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th style="text-align:center">PIP (Bin.)&lt;/th>
&lt;th style="text-align:center">PIP (Bin-Beta)&lt;/th>
&lt;th style="text-align:center">PIP (EMS=2)&lt;/th>
&lt;th style="text-align:center">Sign&lt;/th>
&lt;th style="text-align:center">Verdict&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>pop&lt;/td>
&lt;td style="text-align:center">0.990&lt;/td>
&lt;td style="text-align:center">0.998&lt;/td>
&lt;td style="text-align:center">0.964&lt;/td>
&lt;td style="text-align:center">+&lt;/td>
&lt;td style="text-align:center">&lt;strong>Robust&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>lnlex&lt;/td>
&lt;td style="text-align:center">0.864&lt;/td>
&lt;td style="text-align:center">0.974&lt;/td>
&lt;td style="text-align:center">0.637&lt;/td>
&lt;td style="text-align:center">+&lt;/td>
&lt;td style="text-align:center">&lt;strong>Robust&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>ish&lt;/td>
&lt;td style="text-align:center">0.773&lt;/td>
&lt;td style="text-align:center">0.954&lt;/td>
&lt;td style="text-align:center">0.483&lt;/td>
&lt;td style="text-align:center">+&lt;/td>
&lt;td style="text-align:center">Positive&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>opem&lt;/td>
&lt;td style="text-align:center">0.766&lt;/td>
&lt;td style="text-align:center">0.952&lt;/td>
&lt;td style="text-align:center">0.468&lt;/td>
&lt;td style="text-align:center">+&lt;/td>
&lt;td style="text-align:center">Positive&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>gsh&lt;/td>
&lt;td style="text-align:center">0.751&lt;/td>
&lt;td style="text-align:center">0.948&lt;/td>
&lt;td style="text-align:center">0.459&lt;/td>
&lt;td style="text-align:center">&amp;ndash;&lt;/td>
&lt;td style="text-align:center">Positive&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>sed&lt;/td>
&lt;td style="text-align:center">0.717&lt;/td>
&lt;td style="text-align:center">0.938&lt;/td>
&lt;td style="text-align:center">0.420&lt;/td>
&lt;td style="text-align:center">+/&amp;ndash;&lt;/td>
&lt;td style="text-align:center">Sensitive&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>pgrw&lt;/td>
&lt;td style="text-align:center">0.714&lt;/td>
&lt;td style="text-align:center">0.938&lt;/td>
&lt;td style="text-align:center">0.414&lt;/td>
&lt;td style="text-align:center">+&lt;/td>
&lt;td style="text-align:center">Sensitive&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>polity&lt;/td>
&lt;td style="text-align:center">0.678&lt;/td>
&lt;td style="text-align:center">0.929&lt;/td>
&lt;td style="text-align:center">0.372&lt;/td>
&lt;td style="text-align:center">&amp;ndash;&lt;/td>
&lt;td style="text-align:center">Sensitive&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>ipr&lt;/td>
&lt;td style="text-align:center">0.656&lt;/td>
&lt;td style="text-align:center">0.924&lt;/td>
&lt;td style="text-align:center">0.344&lt;/td>
&lt;td style="text-align:center">&amp;ndash;&lt;/td>
&lt;td style="text-align:center">Sensitive&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;blockquote>
&lt;p>&lt;strong>Bottom line:&lt;/strong> If you are advising a government on growth policy, population dynamics and public health (life expectancy) are the two levers with the strongest evidence across all modeling assumptions. Investment and trade openness show promise under the default prior but become ambiguous under skeptical specifications. Education and democracy &amp;mdash; despite their intuitive appeal &amp;mdash; are fragile in this framework.&lt;/p>
&lt;/blockquote>
&lt;p>Only two variables &amp;mdash; &lt;strong>population&lt;/strong> and &lt;strong>life expectancy&lt;/strong> &amp;mdash; survive as robust determinants across all prior specifications, maintaining PIP above 0.5 even under the most skeptical prior (EMS = 2). Both have stable positive signs and their coefficients are precisely estimated. Investment share and trade openness show positive evidence under the default prior but become ambiguous under the skeptical prior.&lt;/p>
&lt;h3 id="132-connecting-to-cross-sectional-results">13.2 Connecting to cross-sectional results&lt;/h3>
&lt;p>In the &lt;a href="https://carlos-mendez.org/post/r_bma_lasso_wals/">companion cross-sectional tutorial&lt;/a>, we found that BMA, LASSO, and WALS converged on the same set of robust variables for CO&lt;sub>2&lt;/sub> emissions in synthetic data. The dynamic panel BMA analysis here reveals an important nuance: &lt;strong>controlling for reverse causality through the lagged dependent variable and fixed effects changes the landscape of robust determinants&lt;/strong>. The strong persistence of GDP (lagged coefficient = 0.92) absorbs much of the cross-sectional variation, leaving fewer variables with strong independent explanatory power. This is exactly the kind of insight that cross-sectional BMA misses.&lt;/p>
&lt;h2 id="14-conclusion">14. Conclusion&lt;/h2>
&lt;h3 id="141-key-takeaways">14.1 Key takeaways&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Method insight:&lt;/strong> Dynamic panel BMA handles endogeneity that cross-sectional BMA cannot. By including a lagged dependent variable ($\alpha$ = 0.92) and entity/time fixed effects, the Bayesian DSM package allows BMA to work with weakly exogenous regressors, avoiding the bias that plagues standard growth regressions.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Data insight:&lt;/strong> Of 9 candidate growth determinants, only population size (PIP = 0.990) and life expectancy (PIP = 0.864) are robust across all prior specifications. This confirms the &amp;ldquo;fragility&amp;rdquo; of growth determinants documented by Sala-i-Martin et al. (2004) &amp;mdash; most variables that appear important in one specification become ambiguous under different priors.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Sensitivity insight:&lt;/strong> Results are moderately sensitive to prior choice. Under the skeptical EMS = 2 prior, only &lt;code>pop&lt;/code> (PIP = 0.964) remains very strong, while even &lt;code>lnlex&lt;/code> drops to 0.637. The binomial-beta prior pushes all variables above PIP = 0.92, reflecting the data&amp;rsquo;s preference for large models (posterior EMS = 8.6).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Jointness insight:&lt;/strong> All regressor pairs are complements (HCGHM &amp;gt; 0), with the strongest complementarity between population and life expectancy (0.71). No substitution effects were detected, suggesting these growth determinants capture distinct dimensions of the development process. See Appendix A for the full jointness analysis.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h3 id="142-limitations-and-next-steps">14.2 Limitations and next steps&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Computation cost:&lt;/strong> The &lt;code>optim_model_space()&lt;/code> step estimates all $2^K$ models via numerical optimization. With 9 regressors (512 models), this is feasible. With 15+ regressors ($2^{15}$ = 32,768 models), computation time grows exponentially. For larger variable sets, Markov Chain Monte Carlo (MCMC) sampling over the model space may be necessary.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Weak exogeneity assumption:&lt;/strong> While weaker than strict exogeneity, the weak exogeneity assumption still requires that current regressors are uncorrelated with current shocks. If contemporaneous feedback is strong (e.g., a GDP shock immediately changes investment in the same period), the estimates may still be biased.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Extensions:&lt;/strong> The package offers additional features not covered here, including parallel computing for faster model space estimation (&lt;code>cl&lt;/code> parameter in &lt;code>optim_model_space()&lt;/code>), robust standard errors for heteroskedasticity, and the full suite of reduced-form parameters for understanding the dynamic feedback structure.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="143-exercises">14.3 Exercises&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Vary the dilution parameter.&lt;/strong> Run &lt;code>bma()&lt;/code> with &lt;code>dilution = 1&lt;/code> and &lt;code>dil.Par = 2&lt;/code> (stronger dilution). How do the PIPs change compared to &lt;code>dil.Par = 0.5&lt;/code>? Which variables are most affected by multicollinearity adjustment?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Examine the small model space.&lt;/strong> Use &lt;code>small_model_space&lt;/code> with only &lt;code>ish&lt;/code>, &lt;code>sed&lt;/code>, and &lt;code>pgrw&lt;/code>. Run the full BMA workflow (including &lt;code>model_pmp()&lt;/code>, &lt;code>model_sizes()&lt;/code>, &lt;code>best_models()&lt;/code>, and &lt;code>jointness()&lt;/code>). Do the PIP rankings change when the competition among regressors is limited to 3?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Compare standard and robust standard errors.&lt;/strong> Run &lt;code>best_models()&lt;/code> with &lt;code>robust = TRUE&lt;/code> and compare the coefficient significance to the default (regular SE). Are there variables that lose or gain significance under robust inference?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="appendix-a-jointness-analysis">Appendix A: Jointness Analysis&lt;/h2>
&lt;h3 id="what-is-jointness">What is jointness?&lt;/h3>
&lt;p>So far we have examined each regressor individually. But growth determinants do not work in isolation &amp;mdash; they interact. &lt;strong>Jointness&lt;/strong> measures whether two regressors tend to appear in models &lt;em>together&lt;/em> (complements) or &lt;em>separately&lt;/em> (substitutes).&lt;/p>
&lt;p>Think of peanut butter and jelly: each is fine alone, but they show up together so often that their inclusion is correlated. In growth regressions, investment and trade openness might be complements &amp;mdash; countries that invest heavily also trade more, and models that capture one effect benefit from including the other. Conversely, two measures of education (enrollment and literacy) might be substitutes &amp;mdash; including one makes the other redundant.&lt;/p>
&lt;h3 id="three-jointness-measures">Three jointness measures&lt;/h3>
&lt;p>The package implements three jointness measures. The &lt;a href="https://cran.r-project.org/web/packages/bdsm/vignettes/bdsm_vignette.Rnw" target="_blank" rel="noopener">&lt;code>jointness()&lt;/code>&lt;/a> function computes pairwise relationships between all regressors:&lt;/p>
&lt;p>&lt;strong>Hofmarcher et al. (HCGHM)&lt;/strong> ranges from &amp;ndash;1 (perfect substitutes) to +1 (perfect complements), with 0 indicating independence. This is the recommended default measure.&lt;/p>
&lt;p>&lt;strong>Ley-Strazicich (LS)&lt;/strong> ranges from 0 to infinity, where higher values indicate stronger complementarity.&lt;/p>
&lt;p>&lt;strong>Doppelhofer-Weeks (DW)&lt;/strong> classifies relationships as: below &amp;ndash;2 (strong substitutes), &amp;ndash;2 to &amp;ndash;1 (significant substitutes), &amp;ndash;1 to 1 (unrelated), 1 to 2 (significant complements), above 2 (strong complements).&lt;/p>
&lt;h3 id="jointness-matrices">Jointness matrices&lt;/h3>
&lt;p>The HCGHM jointness matrix (above diagonal = binomial prior, below diagonal = binomial-beta prior):&lt;/p>
&lt;pre>&lt;code class="language-r">jointness(bma_results, measure = &amp;quot;HCGHM&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> ish sed pgrw pop ipr opem gsh lnlex polity
ish NA 0.216 0.207 0.530 0.150 0.262 0.243 0.366 0.181
sed 0.805 NA 0.154 0.421 0.115 0.199 0.189 0.288 0.125
pgrw 0.805 0.778 NA 0.416 0.124 0.198 0.186 0.283 0.131
pop 0.905 0.874 0.874 NA 0.304 0.517 0.489 0.711 0.346
ipr 0.781 0.756 0.758 0.845 NA 0.153 0.138 0.209 0.102
opem 0.829 0.801 0.802 0.902 0.780 NA 0.241 0.372 0.169
gsh 0.821 0.794 0.794 0.893 0.772 0.819 NA 0.340 0.154
lnlex 0.864 0.835 0.835 0.944 0.810 0.863 0.853 NA 0.227
polity 0.790 0.763 0.764 0.855 0.744 0.787 0.779 0.817 NA
&lt;/code>&lt;/pre>
&lt;p>All HCGHM values are positive, meaning every pair of regressors acts as complements rather than substitutes. The strongest complementarity under the binomial prior (above diagonal) is between &lt;code>pop&lt;/code> and &lt;code>lnlex&lt;/code> at 0.711 &amp;mdash; population size and life expectancy tend to appear in the best models together. The &lt;code>pop&lt;/code>-&lt;code>ish&lt;/code> pair (0.530) and &lt;code>pop&lt;/code>-&lt;code>opem&lt;/code> pair (0.517) are also moderately complementary. Investment price (&lt;code>ipr&lt;/code>) shows the weakest complementarity with other variables, consistent with its lowest PIP.&lt;/p>
&lt;p>Under the binomial-beta prior (below diagonal), all jointness values increase substantially &amp;mdash; reaching 0.944 for the &lt;code>pop&lt;/code>-&lt;code>lnlex&lt;/code> pair. This is because the binomial-beta prior favors larger models, making it more likely that any two variables appear together.&lt;/p>
&lt;p>The Doppelhofer-Weeks measure confirms these patterns: all pairwise DW values fall between &amp;ndash;1 and +1, with the strongest relationship again between population and life expectancy (DW = 0.153).&lt;/p>
&lt;h2 id="appendix-b-solow-convergence-derivation">Appendix B: Solow Convergence Derivation&lt;/h2>
&lt;p>The Solow model predicts that poorer countries should grow faster than richer ones, conditional on their structural characteristics. This is called &lt;strong>beta convergence&lt;/strong>. Mathematically, the model implies that around the steady state, log GDP per capita evolves according to (Barro and Sala-i-Martin, 2004):&lt;/p>
&lt;p>$$\ln y_{it} = (1 - e^{-\lambda \tau}) \ln y^*_i + e^{-\lambda \tau} \ln y_{i,t-1}$$&lt;/p>
&lt;p>In words, a country&amp;rsquo;s current GDP ($\ln y_{it}$) is a weighted average of two forces: its long-run steady-state level ($\ln y^*_i$), determined by fundamentals like savings and technology, and its GDP in the previous period ($\ln y_{i,t-1}$), which captures where the country currently stands. The parameter $\lambda$ is the &lt;strong>speed of convergence&lt;/strong> &amp;mdash; how fast countries close the gap to their steady state &amp;mdash; and $\tau$ is the time between observations (10 years in our data).&lt;/p>
&lt;p>Now define $\alpha = e^{-\lambda \tau}$. The convergence equation becomes:&lt;/p>
&lt;p>$$\ln y_{it} = \alpha \ln y_{i,t-1} + (1 - \alpha) \ln y^*_i$$&lt;/p>
&lt;p>This is already a dynamic equation &amp;mdash; current GDP depends on lagged GDP. The next step is to recognize that the steady state $\ln y^*_i$ is not observed directly. Instead, it depends on country characteristics such as investment rates, education, trade openness, and institutional quality. Writing these as $\beta' x_{it}$, and adding country fixed effects ($\eta_i$) for unobserved fundamentals, time effects ($\zeta_t$) for global shocks, and an error term ($v_{it}$), we arrive at the dynamic panel equation presented in Section 3.2.&lt;/p>
&lt;h2 id="appendix-c-full-sensitivity-output">Appendix C: Full Sensitivity Output&lt;/h2>
&lt;h3 id="binomial-beta-prior">Binomial-beta prior&lt;/h3>
&lt;pre>&lt;code class="language-r"># Binomial-beta results (already computed)
print(bma_results[[2]])
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> PIP PM PSD PSDR PMcon PSDcon PSDRcon %(+)
gdp_lag NA 0.943 0.078 0.130 0.943 0.078 0.130 100.000
ish 0.954 0.076 0.036 0.066 0.079 0.032 0.065 100.000
sed 0.938 0.035 0.063 0.094 0.037 0.064 0.097 69.922
pgrw 0.938 0.024 0.033 0.059 0.026 0.033 0.061 99.609
pop 0.998 0.080 0.062 0.083 0.080 0.062 0.083 100.000
ipr 0.924 -0.050 0.030 0.052 -0.054 0.027 0.052 0.000
opem 0.952 0.041 0.026 0.034 0.043 0.025 0.034 100.000
gsh 0.948 -0.034 0.049 0.120 -0.036 0.049 0.123 30.859
lnlex 0.974 0.134 0.069 0.105 0.138 0.066 0.104 100.000
polity 0.929 -0.084 0.038 0.053 -0.090 0.031 0.049 0.000
&lt;/code>&lt;/pre>
&lt;h3 id="skeptical-prior-ems--2">Skeptical prior (EMS = 2)&lt;/h3>
&lt;pre>&lt;code class="language-r"># Skeptical prior: EMS = 2
bma_ems2 &amp;lt;- bma(full_model_space, df = data_prepared, round = 3, EMS = 2)
print(bma_ems2[[1]])
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> PIP PM PSD PSDR PMcon PSDcon PSDRcon %(+)
gdp_lag NA 0.922 0.081 0.102 0.922 0.081 0.102 100.000
ish 0.483 0.042 0.050 0.059 0.088 0.034 0.057 100.000
sed 0.420 0.015 0.046 0.057 0.036 0.065 0.084 69.922
pgrw 0.414 0.009 0.025 0.040 0.023 0.034 0.061 99.609
pop 0.964 0.144 0.066 0.082 0.149 0.061 0.079 100.000
ipr 0.344 -0.019 0.031 0.037 -0.055 0.028 0.045 0.000
opem 0.468 0.024 0.032 0.033 0.052 0.026 0.030 100.000
gsh 0.459 -0.003 0.032 0.071 -0.007 0.047 0.105 30.859
lnlex 0.637 0.051 0.068 0.087 0.081 0.069 0.097 100.000
polity 0.372 -0.029 0.042 0.046 -0.079 0.031 0.043 0.000
&lt;/code>&lt;/pre>
&lt;hr>
&lt;h2 id="references">References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://doi.org/10.1162/REST_a_00154" target="_blank" rel="noopener">Moral-Benito, E. (2012). Determinants of Economic Growth: A Bayesian Panel Data Approach. &lt;em>Review of Economics and Statistics&lt;/em>, 94(2), 566&amp;ndash;579.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1080/07350015.2013.818003" target="_blank" rel="noopener">Moral-Benito, E. (2013). Likelihood-Based Estimation of Dynamic Panels with Predetermined Regressors. &lt;em>Journal of Business and Economic Statistics&lt;/em>, 31(4), 451&amp;ndash;472.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1002/jae.2429" target="_blank" rel="noopener">Moral-Benito, E. (2016). Growth Empirics in Panel Data Under Model Uncertainty and Weak Exogeneity. &lt;em>Journal of Applied Econometrics&lt;/em>, 31(3), 582&amp;ndash;602.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://cran.r-project.org/web/packages/bdsm/index.html" target="_blank" rel="noopener">Wyszynski, M., Beck, K., and Dubel, M. (2025). Bayesian Dynamic Systems Modeling. R package version 0.3.0. CRAN.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1257/0002828042002570" target="_blank" rel="noopener">Sala-i-Martin, X., Doppelhofer, G., and Miller, R.I. (2004). Determinants of Long-Term Growth: A Bayesian Averaging of Classical Estimates (BACE) Approach. &lt;em>American Economic Review&lt;/em>, 94(4), 813&amp;ndash;835.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1002/jae.623" target="_blank" rel="noopener">Fernandez, C., Ley, E., and Steel, M.F.J. (2001). Model Uncertainty in Cross-Country Growth Regressions. &lt;em>Journal of Applied Econometrics&lt;/em>, 16(5), 563&amp;ndash;576.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1002/jae.1046" target="_blank" rel="noopener">Doppelhofer, G. and Weeks, M. (2009). Jointness of Growth Determinants. &lt;em>Journal of Applied Econometrics&lt;/em>, 24(2), 209&amp;ndash;244.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1002/jae.1057" target="_blank" rel="noopener">Ley, E. and Steel, M.F.J. (2009). On the Effect of Prior Assumptions in Bayesian Model Averaging with Applications to Growth Regression. &lt;em>Journal of Applied Econometrics&lt;/em>, 24(4), 651&amp;ndash;674.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.2307/271063" target="_blank" rel="noopener">Raftery, A.E. (1995). Bayesian Model Selection in Social Research. &lt;em>Sociological Methodology&lt;/em>, 25, 111&amp;ndash;163.&lt;/a>&lt;/li>
&lt;/ol></description></item><item><title>Spatial Dynamic Panel Data Modeling in R: Cigarette Demand Across US States</title><link>https://carlos-mendez.org/post/r_sdpdmod/</link><pubDate>Fri, 27 Mar 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/r_sdpdmod/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>When a state raises its cigarette tax, smokers near the border may simply drive to a neighboring state with lower prices. This cross-border shopping effect means that cigarette consumption in one state depends not only on its own prices and income but also on the prices and consumption patterns of its neighbors. Ignoring these &lt;strong>spatial spillovers&lt;/strong> leads to biased estimates of how prices and income affect cigarette demand &amp;mdash; a problem that standard panel data methods cannot address.&lt;/p>
&lt;p>The &lt;a href="https://cran.r-project.org/package=SDPDmod" target="_blank" rel="noopener">SDPDmod&lt;/a> R package (Simonovska, 2025) provides an integrated workflow for spatial panel data modeling. It offers three core capabilities: (1) &lt;strong>Bayesian model comparison&lt;/strong> across six spatial specifications using log-marginal posterior probabilities, (2) &lt;strong>maximum likelihood estimation&lt;/strong> of spatial autoregressive (SAR) and spatial Durbin (SDM) models with optional Lee-Yu bias correction for fixed effects, and (3) &lt;strong>impact decomposition&lt;/strong> into direct, indirect (spillover), and total effects &amp;mdash; including short-run and long-run effects for dynamic models. This tutorial applies all three capabilities to the classic Cigar dataset: cigarette consumption across 46 US states from 1963 to 1992.&lt;/p>
&lt;p>The tutorial follows a progressive approach. We start with the simplest spatial model (SAR) and build toward the most general specification (dynamic SDM with Lee-Yu correction). At each step, we interpret the results in terms of the cigarette market and compare them to simpler models. By the end, you will see how spatial spillovers and habit persistence jointly shape cigarette demand &amp;mdash; and why models that ignore either one can produce misleading policy conclusions.&lt;/p>
&lt;p>&lt;strong>Learning objectives:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Load and row-normalize the &lt;code>usa46&lt;/code> binary contiguity matrix from SDPDmod&lt;/li>
&lt;li>Prepare the Cigar panel dataset with log-transformed real prices and income&lt;/li>
&lt;li>Use &lt;code>blmpSDPD()&lt;/code> for Bayesian model comparison across OLS, SAR, SDM, SEM, SDEM, and SLX specifications&lt;/li>
&lt;li>Estimate static SAR and SDM models using &lt;code>SDPDm()&lt;/code> with individual and two-way fixed effects&lt;/li>
&lt;li>Apply the Lee-Yu transformation to correct incidental parameter bias in spatial panels&lt;/li>
&lt;li>Estimate dynamic spatial models with temporal and spatiotemporal lags&lt;/li>
&lt;li>Decompose effects into direct, indirect, and total using &lt;code>impactsSDPDm()&lt;/code>, distinguishing short-run from long-run effects&lt;/li>
&lt;/ul>
&lt;h2 id="2-the-modeling-pipeline">2. The Modeling Pipeline&lt;/h2>
&lt;p>The tutorial follows a six-stage pipeline, moving from data preparation through increasingly rich spatial panel models:&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
A[&amp;quot;Data &amp;amp; W&amp;lt;br/&amp;gt;(Section 3-4)&amp;quot;] --&amp;gt; B[&amp;quot;Bayesian&amp;lt;br/&amp;gt;Comparison&amp;lt;br/&amp;gt;(Section 5)&amp;quot;]
B --&amp;gt; B2[&amp;quot;Non-Spatial&amp;lt;br/&amp;gt;Baseline&amp;lt;br/&amp;gt;(Section 6)&amp;quot;]
B2 --&amp;gt; C[&amp;quot;Static SAR&amp;lt;br/&amp;gt;(Section 7)&amp;quot;]
C --&amp;gt; D[&amp;quot;Static SDM&amp;lt;br/&amp;gt;(Section 8)&amp;quot;]
D --&amp;gt; E[&amp;quot;Dynamic SDM&amp;lt;br/&amp;gt;(Section 9)&amp;quot;]
E --&amp;gt; F[&amp;quot;Impact&amp;lt;br/&amp;gt;Decomposition&amp;lt;br/&amp;gt;(Section 10)&amp;quot;]
style A fill:#6a9bcc,stroke:#141413,color:#fff
style B fill:#d97757,stroke:#141413,color:#fff
style B2 fill:#141413,stroke:#141413,color:#fff
style C fill:#6a9bcc,stroke:#141413,color:#fff
style D fill:#6a9bcc,stroke:#141413,color:#fff
style E fill:#d97757,stroke:#141413,color:#fff
style F fill:#00d4c8,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>Each stage builds on the previous one. The Bayesian comparison tells us &lt;em>which&lt;/em> model family fits the data best. The static models establish baseline spatial effects. The dynamic models add habit persistence and separate short-run from long-run responses. The impact decomposition translates all of this into policy-relevant direct and spillover effects.&lt;/p>
&lt;h2 id="3-setup-and-data-preparation">3. Setup and Data Preparation&lt;/h2>
&lt;h3 id="31-install-and-load-packages">3.1 Install and load packages&lt;/h3>
&lt;p>The analysis requires five packages: &lt;code>SDPDmod&lt;/code> for spatial panel modeling, &lt;code>plm&lt;/code> for the Cigar dataset, &lt;code>ggplot2&lt;/code> and &lt;code>reshape2&lt;/code> for visualization, and &lt;code>dplyr&lt;/code> for data manipulation.&lt;/p>
&lt;pre>&lt;code class="language-r"># Install packages if needed
cran_packages &amp;lt;- c(&amp;quot;SDPDmod&amp;quot;, &amp;quot;plm&amp;quot;, &amp;quot;ggplot2&amp;quot;, &amp;quot;reshape2&amp;quot;, &amp;quot;dplyr&amp;quot;)
missing &amp;lt;- cran_packages[!sapply(cran_packages, requireNamespace, quietly = TRUE)]
if (length(missing) &amp;gt; 0) install.packages(missing)
library(SDPDmod)
library(plm)
library(ggplot2)
library(reshape2)
library(dplyr)
&lt;/code>&lt;/pre>
&lt;h3 id="32-load-and-prepare-the-cigar-dataset">3.2 Load and prepare the Cigar dataset&lt;/h3>
&lt;p>The &lt;a href="https://cran.r-project.org/web/packages/plm/vignettes/A_plmPackage.html" target="_blank" rel="noopener">Cigar dataset&lt;/a> (Baltagi, 1992) contains panel data on cigarette consumption in 46 US states from 1963 to 1992. The key variables are &lt;code>sales&lt;/code> (packs per capita), &lt;code>price&lt;/code> (average price per pack in cents), &lt;code>ndi&lt;/code> (per capita disposable income), &lt;code>pimin&lt;/code> (minimum price in adjoining states), and &lt;code>cpi&lt;/code> (consumer price index). We create log-transformed real values to work with &lt;strong>elasticities&lt;/strong> &amp;mdash; in a log-log model, each coefficient represents the percentage change in consumption for a one-percent change in the corresponding variable.&lt;/p>
&lt;pre>&lt;code class="language-r"># Load Cigar dataset
data(&amp;quot;Cigar&amp;quot;, package = &amp;quot;plm&amp;quot;)
data1 &amp;lt;- Cigar
# Create log-transformed variables
data1$logc &amp;lt;- log(data1$sales) # log cigarette packs per capita
data1$logp &amp;lt;- log(data1$price / data1$cpi) # log real price
data1$logy &amp;lt;- log(data1$ndi / data1$cpi) # log real per capita income
# Inspect panel structure
cat(&amp;quot;States:&amp;quot;, length(unique(data1$state)), &amp;quot;\n&amp;quot;)
cat(&amp;quot;Years:&amp;quot;, length(unique(data1$year)), &amp;quot;\n&amp;quot;)
cat(&amp;quot;Observations:&amp;quot;, nrow(data1), &amp;quot;\n&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">States: 46
Years: 30
Observations: 1380
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-r">head(data1[, c(&amp;quot;state&amp;quot;, &amp;quot;year&amp;quot;, &amp;quot;sales&amp;quot;, &amp;quot;price&amp;quot;, &amp;quot;ndi&amp;quot;, &amp;quot;logc&amp;quot;, &amp;quot;logp&amp;quot;, &amp;quot;logy&amp;quot;)])
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> state year sales price ndi logc logp logy
1 1 63 93.9 28.6 1558.305 4.542230 -0.06759329 3.930354
2 1 64 95.4 29.8 1684.073 4.558079 -0.03947881 3.994983
3 1 65 98.5 29.8 1809.842 4.590057 -0.05547915 4.051007
4 1 66 96.4 31.5 1915.160 4.568506 -0.02817088 4.079398
5 1 67 95.5 31.6 2023.546 4.559126 -0.05539878 4.104051
6 1 68 88.4 35.6 2202.486 4.481872 0.02272825 4.147724
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-r">summary(data1[, c(&amp;quot;logc&amp;quot;, &amp;quot;logp&amp;quot;, &amp;quot;logy&amp;quot;)])
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> logc logp logy
Min. :3.978 Min. :-0.60981 Min. :3.766
1st Qu.:4.681 1st Qu.:-0.20492 1st Qu.:4.423
Median :4.797 Median :-0.10079 Median :4.557
Mean :4.793 Mean :-0.10642 Mean :4.545
3rd Qu.:4.892 3rd Qu.:-0.01225 3rd Qu.:4.686
Max. :5.697 Max. : 0.36399 Max. :5.117
&lt;/code>&lt;/pre>
&lt;p>The panel is balanced with 46 states observed over 30 years (1,380 total observations). Log cigarette consumption (&lt;code>logc&lt;/code>) has a mean of 4.793, corresponding to about 121 packs per capita per year. Real prices (&lt;code>logp&lt;/code>) average -0.106 in log terms, and real per capita income (&lt;code>logy&lt;/code>) averages 4.545. The variation across states and over time in both prices and income is what allows us to identify price and income elasticities &amp;mdash; and the spatial structure across neighboring states is what motivates the spatial models.&lt;/p>
&lt;p>The dataset also includes &lt;code>pimin&lt;/code>, the minimum cigarette price in adjoining states. This variable is inherently spatial &amp;mdash; it measures price competition from neighbors. We do not include &lt;code>pimin&lt;/code> directly in our models because the SDM&amp;rsquo;s spatially lagged price term &lt;code>W*logp&lt;/code> captures the same channel more flexibly. To see why, note that &lt;code>log(pimin/cpi)&lt;/code> and the spatial lag of &lt;code>logp&lt;/code> have a correlation of 0.92 &amp;mdash; they measure essentially the same thing, but the spatial lag uses the full contiguity structure rather than just the cheapest neighbor.&lt;/p>
&lt;h3 id="33-exploratory-visualization">3.3 Exploratory visualization&lt;/h3>
&lt;p>Before building models, the spaghetti plot below shows cigarette sales per capita for all 46 states over time, with five states highlighted for comparison.&lt;/p>
&lt;pre>&lt;code class="language-r"># Highlight selected states
highlight_states &amp;lt;- c(&amp;quot;CA&amp;quot;, &amp;quot;NY&amp;quot;, &amp;quot;NC&amp;quot;, &amp;quot;KY&amp;quot;, &amp;quot;UT&amp;quot;)
ggplot(data1, aes(x = year + 1900, y = sales, group = state_abbr)) +
geom_line(data = subset(data1, !(state_abbr %in% highlight_states)),
color = &amp;quot;gray80&amp;quot;, linewidth = 0.3) +
geom_line(data = subset(data1, state_abbr %in% highlight_states),
aes(color = state_abbr), linewidth = 1) +
labs(title = &amp;quot;Cigarette Sales per Capita Across 46 US States (1963-1992)&amp;quot;,
x = &amp;quot;Year&amp;quot;, y = &amp;quot;Packs per Capita&amp;quot;, color = &amp;quot;State&amp;quot;) +
theme_minimal()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_SDPDmod_fig4_eda_spaghetti.png" alt="Cigarette sales per capita across 46 US states from 1963 to 1992, with five states highlighted">&lt;/p>
&lt;p>Two patterns jump out. First, &lt;strong>temporal persistence is striking&lt;/strong>: states that consumed heavily in the 1960s (like Kentucky, a major tobacco-producing state with over 150 packs per capita) remained high consumers throughout the period, while low-consumption states like Utah stayed low. This visual persistence foreshadows the dominant role of the lagged dependent variable ($\tau \approx 0.86$) in the dynamic models. Second, there is a &lt;strong>general downward trend&lt;/strong> after the late 1970s, visible across nearly all states, reflecting the cumulative effect of anti-smoking campaigns, health awareness, and rising taxes. Time fixed effects in our panel models will absorb this common trend, isolating the within-state, within-year variation that identifies price and income elasticities.&lt;/p>
&lt;h3 id="34-load-and-row-normalize-the-spatial-weight-matrix">3.4 Load and row-normalize the spatial weight matrix&lt;/h3>
&lt;p>A spatial weight matrix $W$ encodes which states are neighbors. The &lt;code>usa46&lt;/code> matrix included in SDPDmod is a binary contiguity matrix: $w_{ij} = 1$ if states $i$ and $j$ share a border, and $w_{ij} = 0$ otherwise. Row-normalization converts these binary entries into weights that sum to one for each row, so the spatial lag $Wy$ equals the &lt;em>weighted average&lt;/em> of neighboring states' values.&lt;/p>
&lt;pre>&lt;code class="language-r"># Load binary contiguity matrix of 46 US states
data(&amp;quot;usa46&amp;quot;, package = &amp;quot;SDPDmod&amp;quot;)
cat(&amp;quot;Dimensions:&amp;quot;, dim(usa46), &amp;quot;\n&amp;quot;)
cat(&amp;quot;Non-zero entries:&amp;quot;, sum(usa46 != 0), &amp;quot;\n&amp;quot;)
cat(&amp;quot;Average neighbors per state:&amp;quot;, round(mean(rowSums(usa46)), 2), &amp;quot;\n&amp;quot;)
# Row-normalize
W &amp;lt;- rownor(usa46)
cat(&amp;quot;Row-normalized:&amp;quot;, isrownor(W), &amp;quot;\n&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Dimensions: 46 46
Non-zero entries: 188
Average neighbors per state: 4.09
Row-normalized: TRUE
&lt;/code>&lt;/pre>
&lt;p>The matrix has 188 non-zero entries out of 2,116 possible pairs (8.9% density), meaning the average state shares a border with about 4 neighbors. After row-normalization, the spatial lag of any variable equals the simple average of that variable across a state&amp;rsquo;s contiguous neighbors. For example, the spatial lag of cigarette consumption for a state with 4 neighbors equals the average consumption in those 4 neighboring states.&lt;/p>
&lt;h2 id="4-visualizing-the-spatial-weight-matrix">4. Visualizing the Spatial Weight Matrix&lt;/h2>
&lt;p>Before estimating spatial models, it helps to visualize the neighborhood structure. The heatmap below shows the binary contiguity matrix, with each colored cell indicating a pair of neighboring states.&lt;/p>
&lt;pre>&lt;code class="language-r"># Use state abbreviations for the axes
rownames(usa46) &amp;lt;- state_abbr
colnames(usa46) &amp;lt;- state_abbr
usa46_df &amp;lt;- melt(usa46)
colnames(usa46_df) &amp;lt;- c(&amp;quot;State_i&amp;quot;, &amp;quot;State_j&amp;quot;, &amp;quot;Connection&amp;quot;)
usa46_df$Connection &amp;lt;- factor(usa46_df$Connection, levels = c(0, 1),
labels = c(&amp;quot;Not neighbors&amp;quot;, &amp;quot;Neighbors&amp;quot;))
ggplot(usa46_df, aes(x = State_j, y = State_i, fill = Connection)) +
geom_tile(color = &amp;quot;white&amp;quot;, linewidth = 0.1) +
scale_fill_manual(values = c(&amp;quot;Not neighbors&amp;quot; = &amp;quot;gray95&amp;quot;,
&amp;quot;Neighbors&amp;quot; = &amp;quot;#6a9bcc&amp;quot;)) +
labs(title = &amp;quot;Binary Contiguity Matrix of 46 US States&amp;quot;,
x = &amp;quot;State j&amp;quot;, y = &amp;quot;State i&amp;quot;) +
theme_minimal()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_SDPDmod_fig1_weight_matrix.png" alt="Binary contiguity matrix heatmap showing neighborhood structure of 46 US states">&lt;/p>
&lt;p>The sparse pattern confirms that most state pairs are &lt;em>not&lt;/em> neighbors &amp;mdash; only 8.9% of cells are colored. With state abbreviations on the axes, you can verify specific neighborhood relationships: California (CA) neighbors Arizona (AZ), Nevada (NV), and Oregon (OR); Missouri (MO) has the most neighbors at 8. The sparsity is typical of contiguity-based weight matrices and means that spatial effects operate through a relatively small number of direct neighbor relationships. The row-normalized version ensures that each state&amp;rsquo;s spatial lag is an equally weighted average of its neighbors, regardless of whether a state has 2 neighbors or 8.&lt;/p>
&lt;h3 id="42-alternative-weight-matrices">4.2 Alternative weight matrices&lt;/h3>
&lt;p>The SDPDmod package provides several functions for constructing weight matrices from scratch: &lt;code>mOrdNbr()&lt;/code> for higher-order contiguity from shapefiles, &lt;code>mNearestN()&lt;/code> for k-nearest neighbors, &lt;code>InvDistMat()&lt;/code> for inverse distance, and &lt;code>DistWMat()&lt;/code> as a unified wrapper. Since our results may depend on the choice of $W$, we construct a &lt;strong>2nd-order contiguity matrix&lt;/strong> as a robustness check. This matrix treats states as neighbors if they share a border &lt;em>or&lt;/em> share a common neighbor (friends-of-friends).&lt;/p>
&lt;pre>&lt;code class="language-r"># 2nd-order contiguity: states reachable in 2 steps
W2_raw &amp;lt;- (usa46 %*% usa46) &amp;gt; 0 # indicator for 2-step reachability
W2_combined &amp;lt;- W2_raw * 1
diag(W2_combined) &amp;lt;- 0 # remove self-connections
W2 &amp;lt;- rownor(W2_combined)
cat(&amp;quot;Original W non-zero entries:&amp;quot;, sum(usa46 != 0), &amp;quot;\n&amp;quot;)
cat(&amp;quot;2nd-order W non-zero entries:&amp;quot;, sum(W2_combined != 0), &amp;quot;\n&amp;quot;)
cat(&amp;quot;Avg neighbors (original):&amp;quot;, round(mean(rowSums(usa46)), 2), &amp;quot;\n&amp;quot;)
cat(&amp;quot;Avg neighbors (2nd-order):&amp;quot;, round(mean(rowSums(W2_combined)), 2), &amp;quot;\n&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Original W non-zero entries: 188
2nd-order W non-zero entries: 486
Avg neighbors (original): 4.09
Avg neighbors (2nd-order): 10.57
&lt;/code>&lt;/pre>
&lt;p>The 2nd-order matrix is much denser: 486 non-zero entries versus 188, with an average of 10.6 neighbors per state instead of 4.1. This broader definition of &amp;ldquo;neighbor&amp;rdquo; captures indirect spatial relationships &amp;mdash; for example, Illinois and Kentucky are not direct contiguous neighbors, but they share Indiana as a common neighbor. We will use this alternative $W$ for a robustness check in Section 11.&lt;/p>
&lt;h2 id="5-bayesian-model-comparison-with-blmpsdpd">5. Bayesian Model Comparison with &lt;code>blmpSDPD()&lt;/code>&lt;/h2>
&lt;h3 id="51-the-spatial-model-family">5.1 The spatial model family&lt;/h3>
&lt;p>Before estimating any single model, we use Bayesian model comparison to let the data tell us which spatial specification fits best. The SDPDmod package supports six models that differ in &lt;em>where&lt;/em> spatial dependence enters the equation. The general spatial panel model takes the form:&lt;/p>
&lt;p>$$y_t = \rho W y_t + X_t \beta + W X_t \theta + u_t, \quad u_t = \lambda W u_t + \epsilon_t$$&lt;/p>
&lt;p>In words, the outcome $y_t$ can depend on neighbors' outcomes (through $\rho$), on spatially lagged covariates (through $\theta$), and spatial correlation can appear in the error term (through $\lambda$). Different restrictions on these parameters yield different models:&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
GNS[&amp;quot;General Nesting&amp;lt;br/&amp;gt;ρ, θ, λ&amp;quot;] --&amp;gt;|&amp;quot;λ = 0&amp;quot;| SDM[&amp;quot;SDM&amp;lt;br/&amp;gt;ρ, θ&amp;quot;]
GNS --&amp;gt;|&amp;quot;θ = 0&amp;quot;| SAC[&amp;quot;SAC&amp;lt;br/&amp;gt;ρ, λ&amp;quot;]
GNS --&amp;gt;|&amp;quot;ρ = 0&amp;quot;| SDEM[&amp;quot;SDEM&amp;lt;br/&amp;gt;θ, λ&amp;quot;]
SDM --&amp;gt;|&amp;quot;θ = 0&amp;quot;| SAR[&amp;quot;SAR&amp;lt;br/&amp;gt;ρ&amp;quot;]
SDM --&amp;gt;|&amp;quot;ρ = 0&amp;quot;| SLX[&amp;quot;SLX&amp;lt;br/&amp;gt;θ&amp;quot;]
SAC --&amp;gt;|&amp;quot;λ = 0&amp;quot;| SAR
SDEM --&amp;gt;|&amp;quot;ρ = 0&amp;quot;| SEM[&amp;quot;SEM&amp;lt;br/&amp;gt;λ&amp;quot;]
SDEM --&amp;gt;|&amp;quot;λ = 0&amp;quot;| SLX
SAR --&amp;gt;|&amp;quot;ρ = 0&amp;quot;| OLS[&amp;quot;OLS&amp;lt;br/&amp;gt;No spatial&amp;quot;]
SEM --&amp;gt;|&amp;quot;λ = 0&amp;quot;| OLS
SLX --&amp;gt;|&amp;quot;θ = 0&amp;quot;| OLS
style SDM fill:#d97757,stroke:#141413,color:#fff
style SAR fill:#6a9bcc,stroke:#141413,color:#fff
style SEM fill:#6a9bcc,stroke:#141413,color:#fff
style SDEM fill:#6a9bcc,stroke:#141413,color:#fff
style SLX fill:#6a9bcc,stroke:#141413,color:#fff
style OLS fill:#141413,stroke:#141413,color:#fff
style GNS fill:#00d4c8,stroke:#141413,color:#fff
style SAC fill:#00d4c8,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Model&lt;/th>
&lt;th>Equation&lt;/th>
&lt;th>Key Parameters&lt;/th>
&lt;th>Interpretation&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>OLS&lt;/td>
&lt;td>$y_t = X_t \beta + \epsilon_t$&lt;/td>
&lt;td>None spatial&lt;/td>
&lt;td>No spatial dependence&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>SAR&lt;/td>
&lt;td>$y_t = \rho W y_t + X_t \beta + \epsilon_t$&lt;/td>
&lt;td>$\rho$&lt;/td>
&lt;td>Neighbors' outcomes affect own outcome&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>SEM&lt;/td>
&lt;td>$y_t = X_t \beta + u_t$, $u_t = \lambda W u_t + \epsilon_t$&lt;/td>
&lt;td>$\lambda$&lt;/td>
&lt;td>Spatial correlation in unobservables&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>SLX&lt;/td>
&lt;td>$y_t = X_t \beta + W X_t \theta + \epsilon_t$&lt;/td>
&lt;td>$\theta$&lt;/td>
&lt;td>Neighbors' covariates affect own outcome&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>SDM&lt;/td>
&lt;td>$y_t = \rho W y_t + X_t \beta + W X_t \theta + \epsilon_t$&lt;/td>
&lt;td>$\rho, \theta$&lt;/td>
&lt;td>Both neighbors' outcomes and covariates matter&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>SDEM&lt;/td>
&lt;td>$y_t = X_t \beta + W X_t \theta + u_t$, $u_t = \lambda W u_t + \epsilon_t$&lt;/td>
&lt;td>$\theta, \lambda$&lt;/td>
&lt;td>Spatially lagged X plus spatial errors&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The &lt;a href="https://rdrr.io/cran/SDPDmod/man/blmpSDPD.html" target="_blank" rel="noopener">&lt;code>blmpSDPD()&lt;/code>&lt;/a> function computes Bayesian log-marginal posterior probabilities for each model. Unlike classical hypothesis tests that compare models pairwise, this approach assigns a probability to every candidate model simultaneously, making it straightforward to assess which specification the data favors.&lt;/p>
&lt;h3 id="52-static-comparison-with-individual-fixed-effects">5.2 Static comparison with individual fixed effects&lt;/h3>
&lt;p>We begin by comparing all six models under a static specification with individual (state) fixed effects only. This controls for time-invariant differences across states &amp;mdash; such as tobacco culture or geographic remoteness &amp;mdash; but does not control for common time trends like federal tax changes.&lt;/p>
&lt;pre>&lt;code class="language-r">res_ind &amp;lt;- blmpSDPD(formula = logc ~ logp + logy, data = data1, W = W,
index = c(&amp;quot;state&amp;quot;, &amp;quot;year&amp;quot;),
model = list(&amp;quot;ols&amp;quot;, &amp;quot;sar&amp;quot;, &amp;quot;sdm&amp;quot;, &amp;quot;sem&amp;quot;, &amp;quot;sdem&amp;quot;, &amp;quot;slx&amp;quot;),
effect = &amp;quot;individual&amp;quot;)
res_ind
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Log-marginal posteriors:
ols sar sdm sem sdem slx
1 884.7551 938.6934 1046.487 993.192 1039.671 930.0585
Model probabilities:
ols sar sdm sem sdem slx
1 0 0 0.9989 0 0.0011 0
&lt;/code>&lt;/pre>
&lt;p>With individual fixed effects, the SDM receives a posterior probability of 99.89%, dominating all other specifications. The SDEM gets only 0.11%, and the remaining models receive essentially zero probability. This overwhelming support for the SDM indicates that both the spatial lag of the dependent variable ($\rho W y$) and the spatial lags of covariates ($W X \theta$) are important for explaining cigarette consumption &amp;mdash; neighbors' prices and income matter above and beyond neighbors' consumption levels.&lt;/p>
&lt;h3 id="53-static-comparison-with-two-way-fixed-effects">5.3 Static comparison with two-way fixed effects&lt;/h3>
&lt;p>Adding time fixed effects controls for common shocks that affect all states simultaneously, such as national anti-smoking campaigns or federal excise tax changes. This typically absorbs much of the cross-sectional variation, so we might expect the model rankings to shift.&lt;/p>
&lt;pre>&lt;code class="language-r">res_tw &amp;lt;- blmpSDPD(formula = logc ~ logp + logy, data = data1, W = W,
index = c(&amp;quot;state&amp;quot;, &amp;quot;year&amp;quot;),
model = list(&amp;quot;ols&amp;quot;, &amp;quot;sar&amp;quot;, &amp;quot;sdm&amp;quot;, &amp;quot;sem&amp;quot;, &amp;quot;sdem&amp;quot;, &amp;quot;slx&amp;quot;),
effect = &amp;quot;twoways&amp;quot;,
prior = &amp;quot;beta&amp;quot;) # beta prior concentrates probability near moderate rho values
res_tw
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Log-marginal posteriors:
ols sar sdm sem sdem slx
1 1076.602 1095.993 1100.727 1099.415 1100.621 1080.323
Model probabilities:
ols sar sdm sem sdem slx
1 0 0.004 0.4592 0.1237 0.4131 0
&lt;/code>&lt;/pre>
&lt;p>With two-way fixed effects and a beta prior, the race tightens considerably. The SDM still leads with 45.92% probability, but the SDEM is close behind at 41.31%. The SEM receives 12.37%, while the SAR drops to just 0.4%. This tells us that spatial effects in the covariates ($\theta$) remain important, but there is genuine uncertainty about whether the spatial lag of the dependent variable ($\rho$) or the spatial error term ($\lambda$) best captures the remaining spatial dependence.&lt;/p>
&lt;h3 id="54-dynamic-comparison-with-two-way-fixed-effects">5.4 Dynamic comparison with two-way fixed effects&lt;/h3>
&lt;p>Cigarette consumption is highly persistent over time &amp;mdash; smokers who consumed heavily last year tend to do so again this year. Dynamic models add the lagged dependent variable $y_{t-1}$ and potentially its spatial lag $W y_{t-1}$ to capture this habit persistence.&lt;/p>
&lt;pre>&lt;code class="language-r">res_dyn &amp;lt;- blmpSDPD(formula = logc ~ logp + logy, data = data1, W = W,
index = c(&amp;quot;state&amp;quot;, &amp;quot;year&amp;quot;),
model = list(&amp;quot;sar&amp;quot;, &amp;quot;sdm&amp;quot;, &amp;quot;sem&amp;quot;, &amp;quot;sdem&amp;quot;, &amp;quot;slx&amp;quot;),
effect = &amp;quot;twoways&amp;quot;,
ldet = &amp;quot;mc&amp;quot;, # Monte Carlo approximation for the log-determinant (faster for dynamic models)
dynamic = TRUE,
prior = &amp;quot;uniform&amp;quot;) # uniform prior assigns equal weight to all valid rho values
res_dyn
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Log-marginal posteriors:
sar sdm sem sdem slx
1 1987.651 1986.906 1987.799 1986.924 1987.388
Model probabilities:
sar sdm sem sdem slx
1 0.2573 0.1221 0.2984 0.1243 0.1979
&lt;/code>&lt;/pre>
&lt;p>The dynamic comparison produces a dramatically different picture: all five models receive similar probabilities, with the SEM slightly ahead at 29.84%, followed by SAR at 25.73% and SLX at 19.79%. The log-marginal posteriors are nearly identical (within 1 unit), reflecting the fact that once temporal dynamics are included, the remaining spatial signal is much weaker. The lagged dependent variable absorbs much of the persistence that spatial models previously captured.&lt;/p>
&lt;h3 id="55-summary-of-model-comparison">5.5 Summary of model comparison&lt;/h3>
&lt;p>The figure below summarizes the posterior probabilities across all three specification comparisons (see &lt;code>analysis.R&lt;/code> for the full figure code).&lt;/p>
&lt;p>&lt;img src="r_SDPDmod_fig2_model_comparison.png" alt="Bayesian model probabilities across three specifications: static individual FE, static two-way FE, and dynamic two-way FE">&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Specification&lt;/th>
&lt;th>Top Model&lt;/th>
&lt;th>Probability&lt;/th>
&lt;th>Runner-up&lt;/th>
&lt;th>Probability&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Static, Individual FE&lt;/td>
&lt;td>SDM&lt;/td>
&lt;td>99.89%&lt;/td>
&lt;td>SDEM&lt;/td>
&lt;td>0.11%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Static, Two-way FE&lt;/td>
&lt;td>SDM&lt;/td>
&lt;td>45.92%&lt;/td>
&lt;td>SDEM&lt;/td>
&lt;td>41.31%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Dynamic, Two-way FE&lt;/td>
&lt;td>SEM&lt;/td>
&lt;td>29.84%&lt;/td>
&lt;td>SAR&lt;/td>
&lt;td>25.73%&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The Bayesian comparison reveals three key insights. First, spatial dependence is unambiguously present &amp;mdash; OLS and SLX never win. Second, the SDM is the preferred static model, which means both the spatial lag of $y$ and the spatial lags of $X$ contribute to explaining cigarette consumption. Third, adding dynamics substantially weakens the ability to discriminate among spatial specifications, because the lagged dependent variable captures much of the temporal persistence that spatial lags previously absorbed. Given that the SDM leads in two of three comparisons and nests the SAR as a special case, we will estimate both the SAR and SDM in the sections that follow, with and without dynamics.&lt;/p>
&lt;h2 id="6-non-spatial-baseline">6. Non-Spatial Baseline&lt;/h2>
&lt;p>Before introducing spatial models, we establish a benchmark using a standard &lt;strong>two-way fixed effects&lt;/strong> panel regression with no spatial terms. This is the model that most applied researchers would start with &amp;mdash; it controls for state-specific and year-specific unobserved heterogeneity but assumes that each state&amp;rsquo;s consumption depends only on its own prices and income, with no spillovers from neighbors.&lt;/p>
&lt;pre>&lt;code class="language-r">pdata &amp;lt;- pdata.frame(data1, index = c(&amp;quot;state&amp;quot;, &amp;quot;year&amp;quot;))
mod_fe &amp;lt;- plm(logc ~ logp + logy, data = pdata, model = &amp;quot;within&amp;quot;,
effect = &amp;quot;twoways&amp;quot;)
summary(mod_fe)$coefficients
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp -1.0348844 0.04151906 -24.92553 1.881060e-112
logy 0.5285428 0.04658276 11.34632 1.603837e-28
&lt;/code>&lt;/pre>
&lt;p>The non-spatial two-way FE model estimates a price elasticity of -1.035 and an income elasticity of 0.529, both highly significant. The within R-squared is 0.394, meaning that price and income explain about 39% of the within-state, within-year variation in cigarette consumption after removing fixed effects. These estimates serve as the benchmark against which we measure the value added by spatial models. As we will see, the SAR and SDM models produce similar &lt;em>direct&lt;/em> price effects (around -1.00) but reveal substantial &lt;em>indirect&lt;/em> (spillover) effects that the non-spatial model entirely misses &amp;mdash; the total price elasticity in the SDM is -1.23, about 19% larger than the non-spatial estimate.&lt;/p>
&lt;h2 id="7-static-sar-model-estimation">7. Static SAR Model Estimation&lt;/h2>
&lt;h3 id="71-sar-with-individual-fixed-effects">7.1 SAR with individual fixed effects&lt;/h3>
&lt;p>The Spatial Autoregressive (SAR) model adds a single spatial parameter $\rho$ that captures how much a state&amp;rsquo;s cigarette consumption depends on the weighted average of its neighbors' consumption. The model is:&lt;/p>
&lt;p>$$y_t = \rho W y_t + X_t \beta + \mu_i + \epsilon_t$$&lt;/p>
&lt;p>In words, cigarette consumption in state $i$ depends on (1) the average consumption of neighboring states (weighted by $W$, with strength $\rho$), (2) the state&amp;rsquo;s own price and income ($X_t \beta$), and (3) a state-specific intercept ($\mu_i$). The &lt;a href="https://rdrr.io/cran/SDPDmod/man/SDPDm.html" target="_blank" rel="noopener">&lt;code>SDPDm()&lt;/code>&lt;/a> function estimates this model by maximum likelihood. The &lt;code>index&lt;/code> argument specifies the panel identifiers, &lt;code>model = &amp;quot;sar&amp;quot;&lt;/code> selects the spatial lag specification, and &lt;code>effect = &amp;quot;individual&amp;quot;&lt;/code> includes state fixed effects.&lt;/p>
&lt;pre>&lt;code class="language-r">mod_sar_ind &amp;lt;- SDPDm(formula = logc ~ logp + logy, data = data1, W = W,
index = c(&amp;quot;state&amp;quot;, &amp;quot;year&amp;quot;),
model = &amp;quot;sar&amp;quot;,
effect = &amp;quot;individual&amp;quot;)
summary(mod_sar_ind)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">sar panel model with individual fixed effects
Spatial autoregressive coefficient:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
rho 0.297576 0.028444 10.462 &amp;lt; 2.2e-16 ***
Coefficients:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp -0.5320053 0.0254445 -20.9085 &amp;lt;2e-16 ***
logy -0.0007088 0.0152139 -0.0466 0.9628
&lt;/code>&lt;/pre>
&lt;p>The spatial autoregressive coefficient $\rho = 0.298$ is highly significant ($t = 10.46$), confirming strong spatial dependence in cigarette consumption. A state&amp;rsquo;s consumption is positively influenced by its neighbors' consumption levels. The price elasticity is -0.532 ($t = -20.91$), meaning a 1% increase in real price reduces consumption by about 0.53%. However, the income coefficient is essentially zero (-0.001, $p = 0.96$), suggesting that with only state fixed effects, income variation does not significantly predict consumption &amp;mdash; likely because state fixed effects absorb cross-sectional income differences, while the within-state time variation in income is confounded with common time trends.&lt;/p>
&lt;h3 id="72-sar-with-two-way-fixed-effects">7.2 SAR with two-way fixed effects&lt;/h3>
&lt;p>Adding time fixed effects controls for year-specific shocks common to all states and typically changes the coefficient estimates substantially.&lt;/p>
&lt;pre>&lt;code class="language-r">mod_sar_tw &amp;lt;- SDPDm(formula = logc ~ logp + logy, data = data1, W = W,
index = c(&amp;quot;state&amp;quot;, &amp;quot;year&amp;quot;),
model = &amp;quot;sar&amp;quot;,
effect = &amp;quot;twoways&amp;quot;)
summary(mod_sar_tw)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">sar panel model with twoways fixed effects
Spatial autoregressive coefficient:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
rho 0.18659 0.02863 6.5173 7.159e-11 ***
Coefficients:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp -0.994860 0.039906 -24.930 &amp;lt; 2.2e-16 ***
logy 0.463555 0.046019 10.073 &amp;lt; 2.2e-16 ***
&lt;/code>&lt;/pre>
&lt;p>With two-way fixed effects, three things change. First, the spatial coefficient drops from 0.298 to 0.187 &amp;mdash; still highly significant but weaker, because time fixed effects absorb some of the common spatial trends. Second, the price elasticity nearly doubles from -0.53 to -0.99, suggesting that the individual-FE-only model was biased by confounding time trends with prices. Third, income becomes strongly significant (0.464, $t = 10.07$): once common time trends are removed, higher real income is associated with &lt;em>more&lt;/em> cigarette consumption, consistent with cigarettes being a normal good at the state level.&lt;/p>
&lt;h3 id="73-impact-decomposition-for-static-sar">7.3 Impact decomposition for static SAR&lt;/h3>
&lt;p>In spatial models, the raw coefficients $\beta$ do not directly tell us how a change in one state&amp;rsquo;s price affects its own consumption. Because of the spatial feedback loop &amp;mdash; my consumption affects my neighbor&amp;rsquo;s, which in turn affects mine &amp;mdash; the actual effect is larger than $\beta$ alone. The &lt;a href="https://rdrr.io/cran/SDPDmod/man/impactsSDPDm.html" target="_blank" rel="noopener">&lt;code>impactsSDPDm()&lt;/code>&lt;/a> function decomposes the total effect into a &lt;strong>direct effect&lt;/strong> (impact on own state) and an &lt;strong>indirect effect&lt;/strong> (spillover to and from neighbors).&lt;/p>
&lt;pre>&lt;code class="language-r">imp_sar_tw &amp;lt;- impactsSDPDm(mod_sar_tw)
summary(imp_sar_tw)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Impact estimates for spatial (static) model
Direct:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp -1.001155 0.038855 -25.767 &amp;lt; 2.2e-16 ***
logy 0.465947 0.044678 10.429 &amp;lt; 2.2e-16 ***
Indirect:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp -0.223484 0.040877 -5.4672 4.571e-08 ***
logy 0.103540 0.018939 5.4670 4.578e-08 ***
Total:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp -1.224639 0.060815 -20.137 &amp;lt; 2.2e-16 ***
logy 0.569487 0.052965 10.752 &amp;lt; 2.2e-16 ***
&lt;/code>&lt;/pre>
&lt;p>The impact decomposition reveals that a 1% increase in a state&amp;rsquo;s own real price reduces its consumption by 1.00% directly, plus an additional 0.22% through spatial feedback &amp;mdash; for a total price elasticity of -1.22. Think of it this way: when one state raises prices, its consumption drops, which in turn reduces the &amp;ldquo;pull&amp;rdquo; on neighboring states' consumption through the spatial lag, creating a ripple effect that feeds back to the original state. Similarly, a 1% income increase raises own-state consumption by 0.47% directly and by 0.10% through neighbors, for a total income elasticity of 0.57. The indirect effects are about 18% of the total effect, indicating economically meaningful spatial spillovers.&lt;/p>
&lt;h2 id="8-static-sdm-with-lee-yu-correction">8. Static SDM with Lee-Yu Correction&lt;/h2>
&lt;h3 id="81-sdm-with-two-way-fixed-effects">8.1 SDM with two-way fixed effects&lt;/h3>
&lt;p>The Spatial Durbin Model (SDM) extends the SAR by adding spatially lagged covariates $W X$, allowing neighbors' prices and income to directly affect a state&amp;rsquo;s consumption (beyond the indirect channel through $\rho W y$):&lt;/p>
&lt;p>$$y_t = \rho W y_t + X_t \beta + W X_t \theta + \mu_i + \gamma_t + \epsilon_t$$&lt;/p>
&lt;p>In words, this says that cigarette consumption depends on neighbors' consumption ($\rho$), own prices and income ($\beta$), &lt;em>and&lt;/em> neighbors' prices and income ($\theta$). Here $\mu_i$ captures state fixed effects and $\gamma_t$ captures time fixed effects. The SDM is the natural model when we believe that cross-border shopping responds directly to neighboring states' prices &amp;mdash; not just indirectly through neighbors' consumption levels.&lt;/p>
&lt;pre>&lt;code class="language-r">mod_sdm_tw &amp;lt;- SDPDm(formula = logc ~ logp + logy, data = data1, W = W,
index = c(&amp;quot;state&amp;quot;, &amp;quot;year&amp;quot;),
model = &amp;quot;sdm&amp;quot;,
effect = &amp;quot;twoways&amp;quot;)
summary(mod_sdm_tw)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">sdm panel model with twoways fixed effects
Spatial autoregressive coefficient:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
rho 0.222591 0.032825 6.7812 1.192e-11 ***
Coefficients:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp -1.002878 0.040094 -25.0134 &amp;lt; 2.2e-16 ***
logy 0.600876 0.057207 10.5036 &amp;lt; 2.2e-16 ***
W*logp 0.048490 0.080807 0.6001 0.5484546
W*logy -0.292794 0.078158 -3.7462 0.0001795 ***
&lt;/code>&lt;/pre>
&lt;p>The SDM reveals an interesting asymmetry. The spatial lag of price (&lt;code>W*logp = 0.049&lt;/code>) is not significant ($p = 0.55$), meaning that neighboring states' prices do not directly affect own consumption once the spatial lag of consumption ($\rho = 0.223$) is accounted for. However, the spatial lag of income (&lt;code>W*logy = -0.293&lt;/code>) is highly significant ($t = -3.75$): when neighboring states become richer, own-state consumption &lt;em>decreases&lt;/em>. This negative spillover in income may reflect a substitution effect &amp;mdash; as neighbors' incomes rise, their consumers may shift toward premium or out-of-state purchasing channels, reducing the spatial demand that pulls up consumption in the focal state.&lt;/p>
&lt;h3 id="82-sdm-with-lee-yu-bias-correction">8.2 SDM with Lee-Yu bias correction&lt;/h3>
&lt;p>Fixed effects in spatial panels create an &lt;strong>incidental parameter problem&lt;/strong>: the large number of fixed effects (46 states + 30 years = 76 parameters) introduces a small-sample bias in the maximum likelihood estimator, particularly for the spatial autoregressive coefficient $\rho$ and the variance $\sigma^2$. The Lee-Yu transformation (Lee and Yu, 2010) corrects this bias by orthogonally transforming the data to concentrate out the fixed effects before estimation.&lt;/p>
&lt;pre>&lt;code class="language-r">mod_sdm_ly &amp;lt;- SDPDm(formula = logc ~ logp + logy, data = data1, W = W,
index = c(&amp;quot;state&amp;quot;, &amp;quot;year&amp;quot;),
model = &amp;quot;sdm&amp;quot;,
effect = &amp;quot;twoways&amp;quot;,
LYtrans = TRUE)
summary(mod_sdm_ly)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">sdm panel model with twoways fixed effects
Spatial autoregressive coefficient:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
rho 0.262211 0.032081 8.1735 2.996e-16 ***
Coefficients:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp -1.001334 0.041121 -24.3509 &amp;lt; 2.2e-16 ***
logy 0.602729 0.058673 10.2726 &amp;lt; 2.2e-16 ***
W*logp 0.090779 0.082185 1.1046 0.2693
W*logy -0.313251 0.079982 -3.9165 8.983e-05 ***
&lt;/code>&lt;/pre>
&lt;p>The Lee-Yu correction increases $\rho$ from 0.223 to 0.262 &amp;mdash; a 17% upward correction, indicating that the uncorrected estimator underestimated spatial dependence. The slope coefficients change only marginally (the price coefficient moves from -1.003 to -1.001), which is expected with $T = 30$ years. For short panels ($T &amp;lt; 10$), the Lee-Yu correction would matter much more. We will use the Lee-Yu corrected version as our preferred static SDM.&lt;/p>
&lt;h3 id="83-comparison-sar-vs-sdm">8.3 Comparison: SAR vs. SDM&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Parameter&lt;/th>
&lt;th>FE (no spatial)&lt;/th>
&lt;th>SAR (Ind FE)&lt;/th>
&lt;th>SAR (TW FE)&lt;/th>
&lt;th>SDM (TW FE)&lt;/th>
&lt;th>SDM (TW FE, LY)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>$\rho$&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>0.298&lt;/td>
&lt;td>0.187&lt;/td>
&lt;td>0.223&lt;/td>
&lt;td>0.262&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>logp&lt;/td>
&lt;td>-1.035&lt;/td>
&lt;td>-0.532&lt;/td>
&lt;td>-0.995&lt;/td>
&lt;td>-1.003&lt;/td>
&lt;td>-1.001&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>logy&lt;/td>
&lt;td>0.529&lt;/td>
&lt;td>-0.001&lt;/td>
&lt;td>0.464&lt;/td>
&lt;td>0.601&lt;/td>
&lt;td>0.603&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>W*logp&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>0.049&lt;/td>
&lt;td>0.091&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>W*logy&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>-0.293&lt;/td>
&lt;td>-0.313&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\hat{\sigma}^2$&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>0.0067&lt;/td>
&lt;td>0.0051&lt;/td>
&lt;td>0.0050&lt;/td>
&lt;td>0.0052&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Two patterns stand out. First, the price coefficient is remarkably stable across the SDM specifications (around -1.00), while it was biased in the SAR with individual FE only (-0.53). Second, adding the SDM terms increases the income coefficient from 0.46 (SAR) to 0.60 (SDM), because the negative spatial lag of income (&lt;code>W*logy&lt;/code> $\approx$ -0.31) absorbs part of the spatial income effect that the SAR was attributing to the spatial lag $\rho$.&lt;/p>
&lt;h3 id="84-impact-decomposition-for-static-sdm">8.4 Impact decomposition for static SDM&lt;/h3>
&lt;p>The impact decomposition for the SDM differs fundamentally from the SAR because the $W X$ terms create additional channels for indirect effects.&lt;/p>
&lt;pre>&lt;code class="language-r">imp_sdm_ly &amp;lt;- impactsSDPDm(mod_sdm_ly)
summary(imp_sdm_ly)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Impact estimates for spatial (static) model
Direct:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp -1.010329 0.040149 -25.164 &amp;lt; 2.2e-16 ***
logy 0.588471 0.054940 10.711 &amp;lt; 2.2e-16 ***
Indirect:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp -0.21925 0.09439 -2.3228 0.02019 *
logy -0.19721 0.09108 -2.1652 0.03037 *
Total:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp -1.229575 0.105631 -11.6403 &amp;lt; 2.2e-16 ***
logy 0.391262 0.086184 4.5398 5.63e-06 ***
&lt;/code>&lt;/pre>
&lt;p>The SDM impact decomposition tells a richer story than the SAR. For price, the results are similar: a direct effect of -1.01 and an indirect (spillover) effect of -0.22, summing to a total price elasticity of -1.23. However, for income, the SDM flips the sign of the indirect effect: it is now &lt;em>negative&lt;/em> (-0.20) instead of positive (0.10 in the SAR). This means that when neighboring states' incomes rise, the focal state&amp;rsquo;s consumption actually &lt;em>decreases&lt;/em> &amp;mdash; consistent with the significant negative &lt;code>W*logy&lt;/code> coefficient we saw earlier. The total income elasticity in the SDM (0.39) is therefore lower than in the SAR (0.57), because the positive direct effect (0.59) is partially offset by the negative spillover (-0.20). This sign reversal of the income spillover is an important finding that the SAR cannot detect.&lt;/p>
&lt;h2 id="9-dynamic-spatial-panel-models">9. Dynamic Spatial Panel Models&lt;/h2>
&lt;h3 id="91-why-dynamics-habit-persistence-in-cigarette-consumption">9.1 Why dynamics? Habit persistence in cigarette consumption&lt;/h3>
&lt;p>Cigarette consumption is strongly habit-forming. Nicotine addiction creates a direct link between past and present consumption: last year&amp;rsquo;s smokers are very likely to be this year&amp;rsquo;s smokers. Ignoring this temporal persistence in a static model means that the spatial coefficient $\rho$ must absorb &lt;em>both&lt;/em> spatial spillovers and the serial correlation in consumption patterns, leading to biased estimates of the true spatial effect. Dynamic models explicitly include the lagged dependent variable $y_{t-1}$ (with coefficient $\tau$, capturing &lt;strong>habit persistence&lt;/strong>) and optionally its spatial lag $W y_{t-1}$ (with coefficient $\eta$, capturing &lt;strong>spatiotemporal diffusion&lt;/strong>):&lt;/p>
&lt;p>$$y_t = \rho W y_t + \tau y_{t-1} + \eta W y_{t-1} + X_t \beta + W X_t \theta + \mu_i + \gamma_t + \epsilon_t$$&lt;/p>
&lt;p>In words, this equation says that today&amp;rsquo;s cigarette consumption depends on: neighbors' current consumption ($\rho$), own past consumption ($\tau$, habit persistence), neighbors' past consumption ($\eta$, spatiotemporal diffusion), own prices and income ($\beta$), and neighbors' prices and income ($\theta$). Here $y_{t-1}$ corresponds to &lt;code>logc(t-1)&lt;/code> in the output, and $Wy_{t-1}$ corresponds to &lt;code>W*logc(t-1)&lt;/code>.&lt;/p>
&lt;h3 id="92-dynamic-sar-with-temporal-lag-only">9.2 Dynamic SAR with temporal lag only&lt;/h3>
&lt;p>We start by adding only the temporal lag $y_{t-1}$ without the spatiotemporal lag $W y_{t-1}$, to isolate the effect of habit persistence on the spatial coefficient.&lt;/p>
&lt;pre>&lt;code class="language-r">mod_dsar_tl &amp;lt;- SDPDm(formula = logc ~ logp + logy, data = data1, W = W,
index = c(&amp;quot;state&amp;quot;, &amp;quot;year&amp;quot;),
model = &amp;quot;sar&amp;quot;,
effect = &amp;quot;twoways&amp;quot;,
LYtrans = TRUE,
dynamic = TRUE,
tlaginfo = list(ind = NULL, tl = TRUE, stl = FALSE))
summary(mod_dsar_tl)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">sar dynamic panel model with twoways fixed effects
Spatial autoregressive coefficient:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
rho 0.0095932 0.0169929 0.5645 0.5724
Coefficients:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logc(t-1) 0.866212 0.012785 67.7523 &amp;lt; 2.2e-16 ***
logp -0.254617 0.023047 -11.0478 &amp;lt; 2.2e-16 ***
logy 0.084437 0.023719 3.5598 0.0003711 ***
&lt;/code>&lt;/pre>
&lt;p>This result is striking. The temporal lag coefficient $\tau = 0.866$ is enormous ($t = 67.75$), confirming that cigarette consumption is extremely persistent &amp;mdash; about 87% of last year&amp;rsquo;s consumption carries over to this year. More remarkably, the spatial autoregressive coefficient $\rho$ collapses from 0.262 (static SDM) to just 0.010 and becomes &lt;em>non-significant&lt;/em> ($p = 0.57$). This suggests that what appeared to be contemporaneous spatial dependence in the static model was largely a proxy for temporal persistence: states that consumed heavily in the past continue to do so, and neighboring states happen to share similar histories. The short-run price elasticity also drops sharply from -1.00 to -0.25, because the lagged dependent variable now captures the cumulative effect of past prices.&lt;/p>
&lt;h3 id="93-dynamic-sar-with-temporal-and-spatiotemporal-lags">9.3 Dynamic SAR with temporal and spatiotemporal lags&lt;/h3>
&lt;p>Adding the spatiotemporal lag $W y_{t-1}$ allows us to test whether neighboring states' &lt;em>past&lt;/em> consumption patterns affect current consumption.&lt;/p>
&lt;pre>&lt;code class="language-r">mod_dsar_full &amp;lt;- SDPDm(formula = logc ~ logp + logy, data = data1, W = W,
index = c(&amp;quot;state&amp;quot;, &amp;quot;year&amp;quot;),
model = &amp;quot;sar&amp;quot;,
effect = &amp;quot;twoways&amp;quot;,
LYtrans = TRUE,
dynamic = TRUE,
tlaginfo = list(ind = NULL, tl = TRUE, stl = TRUE))
summary(mod_dsar_full)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">sar dynamic panel model with twoways fixed effects
Spatial autoregressive coefficient:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
rho 0.703004 0.021363 32.907 &amp;lt; 2.2e-16 ***
Coefficients:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logc(t-1) 0.882056 0.013012 67.789 &amp;lt; 2e-16 ***
W*logc(t-1) -0.727317 0.026033 -27.938 &amp;lt; 2e-16 ***
logp -0.243591 0.023337 -10.438 &amp;lt; 2e-16 ***
logy 0.055595 0.023933 2.323 0.02018 *
&lt;/code>&lt;/pre>
&lt;p>Adding the spatiotemporal lag dramatically changes the picture. The spatial coefficient $\rho$ jumps to 0.703, and the spatiotemporal lag $\eta = -0.727$ is strongly negative ($t = -27.94$). The temporal lag $\tau = 0.882$ remains dominant. The large $\rho$ combined with the nearly equal-and-opposite $\eta$ suggests a complex dynamic pattern: states with high &lt;em>current&lt;/em> neighbor consumption tend to have higher own consumption ($\rho &amp;gt; 0$), but states whose neighbors consumed heavily &lt;em>last year&lt;/em> tend to have &lt;em>lower&lt;/em> current consumption ($\eta &amp;lt; 0$). However, the near-cancellation of $\rho$ and $\eta$ may also indicate multicollinearity between $Wy_t$ and $Wy_{t-1}$, making the individual coefficients hard to interpret reliably. The dynamic SDM in Section 9.4, which adds covariates' spatial lags, provides a more stable decomposition.&lt;/p>
&lt;h3 id="94-dynamic-sdm-with-both-lags-and-lee-yu-correction">9.4 Dynamic SDM with both lags and Lee-Yu correction&lt;/h3>
&lt;p>The most general model combines all elements: spatial lag of $y$, temporal lag, spatiotemporal lag, and spatial lags of $X$, all with Lee-Yu bias correction.&lt;/p>
&lt;pre>&lt;code class="language-r">mod_dsdm &amp;lt;- SDPDm(formula = logc ~ logp + logy, data = data1, W = W,
index = c(&amp;quot;state&amp;quot;, &amp;quot;year&amp;quot;),
model = &amp;quot;sdm&amp;quot;,
effect = &amp;quot;twoways&amp;quot;,
LYtrans = TRUE,
dynamic = TRUE,
tlaginfo = list(ind = NULL, tl = TRUE, stl = TRUE))
summary(mod_dsdm)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">sdm dynamic panel model with twoways fixed effects
Spatial autoregressive coefficient:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
rho 0.162189 0.036753 4.4129 1.02e-05 ***
Coefficients:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logc(t-1) 0.864412 0.012879 67.1163 &amp;lt; 2.2e-16 ***
W*logc(t-1) -0.096270 0.038810 -2.4805 0.0131186 *
logp -0.270872 0.023145 -11.7031 &amp;lt; 2.2e-16 ***
logy 0.104262 0.029783 3.5007 0.0004641 ***
W*logp 0.195595 0.043870 4.4585 8.254e-06 ***
W*logy -0.032464 0.039520 -0.8215 0.4113891
&lt;/code>&lt;/pre>
&lt;p>The dynamic SDM produces the most nuanced picture. Habit persistence remains dominant ($\tau = 0.864$, $t = 67.12$). The spatial coefficient $\rho = 0.162$ is significant but much smaller than in the static model ($\rho = 0.262$), confirming that static models overstate contemporaneous spatial dependence by conflating it with temporal persistence. The spatiotemporal lag is weakly significant ($\eta = -0.096$, $p = 0.013$). Notably, the spatial lag of price (&lt;code>W*logp = 0.196&lt;/code>) is now &lt;em>positive&lt;/em> and significant ($t = 4.46$), a reversal from the static SDM where it was not significant. This positive coefficient means that when neighboring states' prices rise, own-state consumption &lt;em>increases&lt;/em> &amp;mdash; precisely the cross-border shopping effect we hypothesized. Smokers respond to neighbors' price increases by purchasing more in their own (now relatively cheaper) state. The spatial lag of income (&lt;code>W*logy = -0.032&lt;/code>) is no longer significant once dynamics are included.&lt;/p>
&lt;h3 id="95-impact-decomposition-short-run-and-long-run-effects">9.5 Impact decomposition: short-run and long-run effects&lt;/h3>
&lt;p>For dynamic models, &lt;code>impactsSDPDm()&lt;/code> separates effects into &lt;strong>short-run&lt;/strong> (immediate, one-period) and &lt;strong>long-run&lt;/strong> (cumulative, steady-state) impacts. The long-run effects account for the feedback loop through the lagged dependent variable: a price change today affects consumption today, which affects consumption next year (through $\tau$), which feeds back again, and so on until a new equilibrium is reached.&lt;/p>
&lt;pre>&lt;code class="language-r">imp_dsdm &amp;lt;- impactsSDPDm(mod_dsdm)
summary(imp_dsdm)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Impact estimates for spatial dynamic model
========================================================
Short-term
Direct:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp -0.261569 0.022830 -11.457 &amp;lt; 2.2e-16 ***
logy 0.101759 0.029667 3.430 0.0006035 ***
Indirect:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp 0.178932 0.046861 3.8183 0.0001344 ***
logy -0.015109 0.042210 -0.3579 0.7203812
Total:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp -0.082637 0.052143 -1.5848 0.1130
logy 0.086650 0.037890 2.2868 0.0222 *
========================================================
Long-term
Direct:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp -1.92836 0.20580 -9.3702 &amp;lt; 2.2e-16 ***
logy 0.80149 0.22655 3.5378 0.0004034 ***
Indirect:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp 0.91054 0.58271 1.5626 0.1181
logy 0.48361 1.54612 0.3128 0.7544
Total:
Estimate Std. Error t-value Pr(&amp;gt;|t|)
logp -1.01783 0.66733 -1.5252 0.1272
logy 1.28510 1.59825 0.8041 0.4214
&lt;/code>&lt;/pre>
&lt;p>The gap between short-run and long-run effects is dramatic. The &lt;strong>short-run direct price elasticity&lt;/strong> is only -0.26, meaning that a 1% price increase immediately reduces consumption by just 0.26%. But the &lt;strong>long-run direct price elasticity&lt;/strong> is -1.93 &amp;mdash; more than seven times larger &amp;mdash; because the habit persistence mechanism ($\tau = 0.864$) amplifies the initial shock over time. Think of it as a snowball effect: a small reduction today accumulates year after year because lower consumption this year leads to lower consumption next year, and so on.&lt;/p>
&lt;p>The short-run indirect (spillover) effect of price is &lt;em>positive&lt;/em> (0.179): when a state raises its prices, neighboring states' consumption increases in the short run, consistent with cross-border shopping. This positive spillover partly offsets the direct negative effect, making the short-run &lt;em>total&lt;/em> price elasticity (-0.083) small and statistically non-significant. In the long run, the indirect price effect remains positive (0.911) but becomes imprecisely estimated and non-significant, while the direct effect (-1.928) dominates. The long-run total effects for both price and income are estimated with large standard errors, reflecting the uncertainty inherent in extrapolating dynamic effects to the steady state. The non-significance of these long-run totals means that, despite large point estimates, we cannot reliably predict the net cumulative impact of price or income changes across the full spatial system. Note that the long-run effects assume the system reaches a stable equilibrium, which requires the stationarity condition $|\tau + \rho \eta| &amp;lt; 1$ to hold.&lt;/p>
&lt;h3 id="96-comparison-of-dynamic-specifications">9.6 Comparison of dynamic specifications&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Parameter&lt;/th>
&lt;th>Static SDM (LY)&lt;/th>
&lt;th>Dyn SAR (tl)&lt;/th>
&lt;th>Dyn SAR (tl+stl)&lt;/th>
&lt;th>Dyn SDM (LY)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>$\rho$&lt;/td>
&lt;td>0.262&lt;/td>
&lt;td>0.010&lt;/td>
&lt;td>0.703&lt;/td>
&lt;td>0.162&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\tau$ (logc_{t-1})&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>0.866&lt;/td>
&lt;td>0.882&lt;/td>
&lt;td>0.864&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\eta$ (W*logc_{t-1})&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>-0.727&lt;/td>
&lt;td>-0.096&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>logp&lt;/td>
&lt;td>-1.001&lt;/td>
&lt;td>-0.255&lt;/td>
&lt;td>-0.244&lt;/td>
&lt;td>-0.271&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>logy&lt;/td>
&lt;td>0.603&lt;/td>
&lt;td>0.084&lt;/td>
&lt;td>0.056&lt;/td>
&lt;td>0.104&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>W*logp&lt;/td>
&lt;td>0.091&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>0.196&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>W*logy&lt;/td>
&lt;td>-0.313&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>-0.032&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\hat{\sigma}^2$&lt;/td>
&lt;td>0.0052&lt;/td>
&lt;td>0.0012&lt;/td>
&lt;td>0.0012&lt;/td>
&lt;td>0.0012&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The table reveals that temporal dynamics fundamentally reshape the spatial story. The temporal lag coefficient ($\tau \approx 0.86$) is remarkably stable across all dynamic specifications, confirming that habit persistence is the dominant force. The spatial coefficient $\rho$ varies widely depending on whether the spatiotemporal lag is included, highlighting the sensitivity of spatial inference to the dynamic specification. The short-run price and income elasticities in the dynamic models are roughly one-quarter the size of the static estimates, because the lagged dependent variable now carries the cumulative effect.&lt;/p>
&lt;h2 id="10-effect-decomposition-summary">10. Effect Decomposition Summary&lt;/h2>
&lt;p>The figure below compares the direct, indirect, and total effects of price and income across three model-horizon combinations: the static SDM, and the short-run and long-run effects from the dynamic SDM.&lt;/p>
&lt;pre>&lt;code class="language-r"># See analysis.R for the full figure code
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_SDPDmod_fig3_impact_decomposition.png" alt="Effect decomposition comparing direct, indirect, and total effects for price and income across the static SDM and the dynamic SDM (short-run and long-run)">&lt;/p>
&lt;p>Four patterns stand out from this comparison. First, the &lt;strong>static SDM overstates the short-run response&lt;/strong> to price changes: its direct price effect (-1.01) is nearly four times larger than the dynamic short-run direct effect (-0.26). A policymaker using the static estimate to predict the immediate revenue impact of a cigarette tax increase would be far too optimistic about consumption reductions.&lt;/p>
&lt;p>Second, &lt;strong>spatial spillovers change sign between static and dynamic models&lt;/strong>. In the static SDM, the indirect price effect is negative (-0.22), meaning price increases reduce neighbors' consumption. In the dynamic SDM&amp;rsquo;s short run, it is &lt;em>positive&lt;/em> (0.18), consistent with cross-border shopping: when one state raises prices, its neighbors' sales increase as smokers cross the border. This sign reversal underscores the importance of properly specifying temporal dynamics.&lt;/p>
&lt;p>Third, &lt;strong>long-run effects are much larger but imprecisely estimated&lt;/strong>. The long-run direct price elasticity (-1.93) is the largest estimate in the analysis, reflecting decades of accumulated habit adjustments. However, the wide confidence intervals on long-run total effects mean that precise long-run predictions require caution.&lt;/p>
&lt;p>Fourth, &lt;strong>income effects are more robust&lt;/strong>. The direct income elasticity is positive and significant in all specifications (ranging from 0.10 in the short run to 0.80 in the long run), confirming that cigarettes behave as a normal good. The indirect income effects are less stable and generally not significant in the dynamic specification.&lt;/p>
&lt;h2 id="11-discussion">11. Discussion&lt;/h2>
&lt;p>This tutorial demonstrates three key findings about spatial dynamics in cigarette demand. First, &lt;strong>spatial dependence is real and economically meaningful&lt;/strong>, but its magnitude depends critically on the model specification. The Bayesian comparison (Section 5) unanimously rejects non-spatial models, and the total price elasticity in the static SDM (-1.23) is 22% larger than the direct effect alone (-1.01). A state that ignores spatial spillovers when evaluating a cigarette tax increase will underestimate both the consumption reduction in its own state and the cross-border effects on neighbors.&lt;/p>
&lt;p>Second, &lt;strong>habit persistence dominates the dynamic structure&lt;/strong>. The temporal lag coefficient ($\tau \approx 0.86$) is by far the largest and most precisely estimated parameter in every dynamic model. Once dynamics are included, the contemporaneous spatial coefficient weakens dramatically, and what appeared to be spatial dependence in the static model is revealed to be largely temporal persistence. This does not mean spatial effects are absent &amp;mdash; they remain significant at $\rho = 0.16$ in the dynamic SDM &amp;mdash; but they are much smaller than the static model suggests.&lt;/p>
&lt;p>Third, &lt;strong>the dynamic SDM uncovers a cross-border shopping effect&lt;/strong> that the static model misses. The positive and significant &lt;code>W*logp&lt;/code> coefficient (0.196) in the dynamic SDM means that when neighboring states raise prices, own-state consumption &lt;em>increases&lt;/em> in the short run. This is the signature of cross-border purchasing. The effect is masked in the static model because the spatial lag $\rho Wy$ absorbs it, and it only emerges when the temporal dynamics are properly specified.&lt;/p>
&lt;p>A fourth finding relates to &lt;strong>robustness to the weight matrix&lt;/strong>. Re-estimating the static SDM with a 2nd-order contiguity matrix (which expands the average number of neighbors from 4.1 to 10.6) yields a stronger spatial coefficient ($\rho = 0.449$ vs. 0.262) and a significant &lt;code>W*logp&lt;/code> coefficient (0.337, $p = 0.009$) that was not significant with the 1st-order matrix. This suggests that cross-border shopping effects may extend beyond immediately adjacent states, and that the choice of spatial weight matrix matters substantively for policy conclusions.&lt;/p>
&lt;p>From a software perspective, the SDPDmod package provides a streamlined R workflow that covers the complete spatial panel modeling pipeline &amp;mdash; from Bayesian model selection through estimation to impact decomposition &amp;mdash; in a coherent framework. The &lt;code>blmpSDPD()&lt;/code> function is particularly valuable for applied researchers, as it replaces the ad hoc sequence of Wald tests with a principled, simultaneous comparison of all candidate models.&lt;/p>
&lt;h2 id="12-summary-and-next-steps">12. Summary and Next Steps&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Spatial models matter for tobacco policy:&lt;/strong> the total price elasticity (-1.23 in the static SDM) is 22% larger than the direct effect alone, meaning unilateral state tax increases generate spillovers to neighboring states that standard panel models miss.&lt;/li>
&lt;li>&lt;strong>Bayesian model comparison provides principled model selection:&lt;/strong> the SDM is overwhelmingly preferred in static specifications (99.89% probability with individual FE), but adding dynamics reduces the ability to discriminate among spatial models, with all specifications receiving similar posterior probabilities.&lt;/li>
&lt;li>&lt;strong>Habit persistence is the dominant dynamic force:&lt;/strong> the temporal lag coefficient $\tau \approx 0.86$ dwarfs the contemporaneous spatial effect ($\rho = 0.16$), and static models conflate short-run and long-run responses. The short-run price elasticity (-0.26) is one-quarter of the static estimate (-1.01).&lt;/li>
&lt;li>&lt;strong>Cross-border shopping emerges in the dynamic SDM:&lt;/strong> the positive spatial lag of price (&lt;code>W*logp = 0.20&lt;/code>) means that neighboring states' price increases boost own consumption in the short run &amp;mdash; the clearest evidence of border-crossing behavior.&lt;/li>
&lt;/ul>
&lt;p>For further study, see the companion &lt;a href="https://carlos-mendez.org/post/stata_sp_regression_panel/">Stata spatial panel tutorial&lt;/a> that applies &lt;code>xsmle&lt;/code> to the same dataset, and the &lt;a href="https://carlos-mendez.org/post/stata_sp_regression_cross_section/">Stata cross-sectional spatial tutorial&lt;/a> for a simpler introduction to spatial models without the temporal dimension. The SDPDmod package is documented in Simonovska (2025) and available on &lt;a href="https://cran.r-project.org/package=SDPDmod" target="_blank" rel="noopener">CRAN&lt;/a>.&lt;/p>
&lt;h2 id="13-exercises">13. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Build your own W.&lt;/strong> In Section 4.2 we constructed a 2nd-order contiguity matrix. Re-run &lt;code>blmpSDPD()&lt;/code> with this alternative &lt;code>W2&lt;/code> instead of the original &lt;code>W&lt;/code>. Does the Bayesian model comparison still favor the SDM? How do the model probabilities change when the definition of &amp;ldquo;neighbor&amp;rdquo; is broader?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Include pimin directly.&lt;/strong> Add &lt;code>lpm = log(pimin/cpi)&lt;/code> as an additional covariate in the SAR model: &lt;code>logc ~ logp + logy + lpm&lt;/code>. Compare the results to the SDM&amp;rsquo;s &lt;code>W*logp&lt;/code> coefficient. Does &lt;code>lpm&lt;/code> remain significant alongside the spatial lag of the dependent variable? Why or why not?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>SAR vs. SDM indirect effects.&lt;/strong> Compare the impact decomposition from the static SAR (Section 7.3) and static SDM (Section 8.4). The indirect income effect &lt;em>reverses sign&lt;/em> (positive in SAR, negative in SDM). Write a paragraph explaining this reversal in terms of the cross-border shopping mechanism.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Subsample analysis.&lt;/strong> Split the data into two periods (1963&amp;ndash;1977 and 1978&amp;ndash;1992). Re-estimate the dynamic SDM for each period. Does the habit persistence coefficient ($\tau$) change over time? Has the spatial coefficient ($\rho$) strengthened or weakened as anti-smoking policies intensified?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="14-references">14. References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://doi.org/10.1007/s10614-025-11056-2" target="_blank" rel="noopener">Simonovska, R. (2025). SDPDmod: An R Package for Spatial Dynamic Panel Data Modeling. &lt;em>Computational Economics&lt;/em>.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1016/0954-349X%2892%2990010-4" target="_blank" rel="noopener">Baltagi, B. H. &amp;amp; Levin, D. (1992). Cigarette Taxation: Raising Revenues and Reducing Consumption. &lt;em>Structural Change and Economic Dynamics&lt;/em>, 3(2), 321&amp;ndash;335.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1016/j.jeconom.2009.08.001" target="_blank" rel="noopener">Lee, L.-F. &amp;amp; Yu, J. (2010). Estimation of Spatial Autoregressive Panel Data Models with Fixed Effects. &lt;em>Journal of Econometrics&lt;/em>, 154(2), 165&amp;ndash;185.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1016/j.spasta.2014.02.002" target="_blank" rel="noopener">LeSage, J. P. (2014). Spatial Econometric Panel Data Model Specification: A Bayesian Approach. &lt;em>Spatial Statistics&lt;/em>, 9, 122&amp;ndash;145.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1007/978-3-642-40340-8" target="_blank" rel="noopener">Elhorst, J. P. (2014). &lt;em>Spatial Econometrics: From Cross-Sectional Data to Spatial Panels.&lt;/em> Springer.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1201/9781420064254" target="_blank" rel="noopener">LeSage, J. P. &amp;amp; Pace, R. K. (2009). &lt;em>Introduction to Spatial Econometrics.&lt;/em> Chapman &amp;amp; Hall/CRC.&lt;/a>&lt;/li>
&lt;/ol></description></item><item><title>Spatial Dynamic Panels with Common Factors in Stata: Credit Risk in US Banking</title><link>https://carlos-mendez.org/post/stata_spxtivdfreg/</link><pubDate>Fri, 27 Mar 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/stata_spxtivdfreg/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>The 2007&amp;ndash;2009 Global Financial Crisis revealed that credit risk does not stay contained within individual banks. Non-performing loans surged across the US banking system through two distinct channels &amp;mdash; &lt;strong>spatial spillovers&lt;/strong> from balance-sheet interdependencies among interconnected banks, and &lt;strong>common factors&lt;/strong> from macroeconomic shocks (interest rate changes, housing market collapses, unemployment spikes) that hit all banks simultaneously. Ignoring either channel leads to biased estimates of credit risk determinants and misleading policy prescriptions. Standard spatial panel packages in Stata &amp;mdash; such as &lt;code>xsmle&lt;/code> and &lt;code>spxtregress&lt;/code> &amp;mdash; can model spatial spillovers but cannot account for unobserved common factors, leaving a critical gap in the econometrician&amp;rsquo;s toolkit.&lt;/p>
&lt;p>The &lt;code>spxtivdfreg&lt;/code> package (Kripfganz &amp;amp; Sarafidis, 2025) fills this gap by implementing a &lt;strong>defactored instrumental variables&lt;/strong> estimator that simultaneously handles four sources of endogeneity: spatial lags of the dependent variable, temporal lags (dynamic persistence), endogenous regressors, and unobserved common factors. The estimator first removes common factors from the data using a principal-components-based defactoring procedure, then applies IV/GMM estimation to the defactored model. This approach avoids the incidental parameters bias that plagues maximum likelihood methods and does not require bias corrections like the Lee-Yu adjustment used in &lt;code>xsmle&lt;/code>.&lt;/p>
&lt;p>This tutorial replicates the empirical application from Kripfganz and Sarafidis (2025), which models non-performing loan ratios across 350 US commercial banks over the period 2006:Q1 to 2014:Q4 &amp;mdash; a sample that spans the entire GFC episode. We estimate the full spatial dynamic panel model with common factors, demonstrate what happens when common factors or the spatial lag are omitted, compute short-run and long-run spillover effects, and compare homogeneous and heterogeneous slope specifications.&lt;/p>
&lt;h3 id="learning-objectives">Learning objectives&lt;/h3>
&lt;ul>
&lt;li>Understand the four sources of endogeneity in spatial dynamic panel models: spatial lag, temporal lag, endogenous regressors, and common factors&lt;/li>
&lt;li>Estimate the full spatial dynamic panel model with common factors using &lt;code>spxtivdfreg&lt;/code>&lt;/li>
&lt;li>Compare estimation results with and without common factors to assess the consequences of ignoring latent macroeconomic shocks&lt;/li>
&lt;li>Compare estimation results with and without the spatial lag to evaluate the importance of bank interconnectedness&lt;/li>
&lt;li>Compute and interpret short-run and long-run direct, indirect, and total effects using &lt;code>estat impact&lt;/code>&lt;/li>
&lt;li>Estimate heterogeneous slope models with the mean-group (MG) estimator to assess cross-bank parameter heterogeneity&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="2-the-modeling-framework">2. The modeling framework&lt;/h2>
&lt;p>Credit risk in a banking system is shaped by forces operating at three different levels: the individual bank (its own financial ratios and management quality), the network of interconnected banks (spatial spillovers through lending relationships, common borrowers, and contagion), and the macroeconomy (interest rates, GDP growth, and other aggregate shocks that affect all banks). The spatial dynamic panel model with common factors captures all three levels in a single equation.&lt;/p>
&lt;p>The diagram below illustrates the four sources of endogeneity that the &lt;code>spxtivdfreg&lt;/code> estimator must address simultaneously.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
Y[&amp;quot;&amp;lt;b&amp;gt;NPL&amp;lt;sub&amp;gt;it&amp;lt;/sub&amp;gt;&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Non-performing&amp;lt;br/&amp;gt;loan ratio&amp;quot;]
WY[&amp;quot;&amp;lt;b&amp;gt;W · NPL&amp;lt;sub&amp;gt;t&amp;lt;/sub&amp;gt;&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Spatial lag&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Bank interdependence&amp;lt;/i&amp;gt;&amp;quot;]
LY[&amp;quot;&amp;lt;b&amp;gt;NPL&amp;lt;sub&amp;gt;i,t-1&amp;lt;/sub&amp;gt;&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Temporal lag&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Risk persistence&amp;lt;/i&amp;gt;&amp;quot;]
X[&amp;quot;&amp;lt;b&amp;gt;INEFF&amp;lt;sub&amp;gt;it&amp;lt;/sub&amp;gt;&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Endogenous&amp;lt;br/&amp;gt;regressor&amp;quot;]
F[&amp;quot;&amp;lt;b&amp;gt;f&amp;lt;sub&amp;gt;t&amp;lt;/sub&amp;gt;&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Common factors&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Macro shocks&amp;lt;/i&amp;gt;&amp;quot;]
Z[&amp;quot;&amp;lt;b&amp;gt;Z&amp;lt;sub&amp;gt;it&amp;lt;/sub&amp;gt;&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Instruments&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;INTEREST, lags&amp;lt;/i&amp;gt;&amp;quot;]
WY --&amp;gt;|&amp;quot;ψ&amp;quot;| Y
LY --&amp;gt;|&amp;quot;ρ&amp;quot;| Y
X --&amp;gt;|&amp;quot;β&amp;quot;| Y
F -.-&amp;gt;|&amp;quot;λ&amp;lt;sub&amp;gt;i&amp;lt;/sub&amp;gt;&amp;quot;| Y
Z -.-&amp;gt;|&amp;quot;IV&amp;quot;| X
style Y fill:#d97757,stroke:#141413,color:#fff
style WY fill:#6a9bcc,stroke:#141413,color:#fff
style LY fill:#6a9bcc,stroke:#141413,color:#fff
style X fill:#00d4c8,stroke:#141413,color:#141413
style F fill:#141413,stroke:#d97757,color:#fff
style Z fill:#6a9bcc,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>The spatial lag ($W \cdot NPL$) creates endogeneity because bank $i$&amp;rsquo;s credit risk depends on bank $j$&amp;rsquo;s credit risk, and vice versa &amp;mdash; a simultaneity problem. The temporal lag ($NPL_{i,t-1}$) is endogenous because it correlates with the bank-specific fixed effect. The endogenous regressor (operational inefficiency, $INEFF$) is correlated with the error term. And the common factors ($f_t$) enter both the regressors and the error, inducing cross-sectional dependence and omitted variable bias.&lt;/p>
&lt;p>The model is specified as:&lt;/p>
&lt;p>$$NPL_{it} = \psi \sum_{j=1}^{N} w_{ij} \, NPL_{jt} + \rho \, NPL_{i,t-1} + x_{it} \beta + \alpha_i + \lambda_i' f_t + \varepsilon_{it}$$&lt;/p>
&lt;p>In words, this equation says that the non-performing loan ratio of bank $i$ at time $t$ depends on: the &lt;strong>spatial lag&lt;/strong> $\psi W \cdot NPL$ (the weighted average NPL of interconnected banks), the &lt;strong>temporal lag&lt;/strong> $\rho \, NPL_{i,t-1}$ (the bank&amp;rsquo;s own past credit risk, capturing persistence), the &lt;strong>bank-specific covariates&lt;/strong> $x_{it} \beta$ (financial ratios like capital adequacy, profitability, and liquidity), the &lt;strong>individual fixed effect&lt;/strong> $\alpha_i$ (time-invariant bank characteristics), and the &lt;strong>interactive fixed effect&lt;/strong> $\lambda_i' f_t$ (unobserved common factors with heterogeneous loadings).&lt;/p>
&lt;h3 id="variable-mapping">Variable mapping&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Symbol&lt;/th>
&lt;th>Meaning&lt;/th>
&lt;th>Stata variable&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>$NPL_{it}$&lt;/td>
&lt;td>Non-performing loans / total loans (%)&lt;/td>
&lt;td>&lt;code>NPL&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\psi$&lt;/td>
&lt;td>Spatial autoregressive parameter&lt;/td>
&lt;td>&lt;code>[W]NPL&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\rho$&lt;/td>
&lt;td>Temporal autoregressive parameter&lt;/td>
&lt;td>&lt;code>L1.NPL&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$x_{it}$&lt;/td>
&lt;td>Bank-specific covariates&lt;/td>
&lt;td>&lt;code>INEFF&lt;/code>, &lt;code>CAR&lt;/code>, &lt;code>SIZE&lt;/code>, &amp;hellip;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\alpha_i$&lt;/td>
&lt;td>Bank fixed effect (absorbed)&lt;/td>
&lt;td>&lt;code>absorb(ID)&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\lambda_i' f_t$&lt;/td>
&lt;td>Interactive fixed effect (defactored)&lt;/td>
&lt;td>estimated by &lt;code>spxtivdfreg&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$w_{ij}$&lt;/td>
&lt;td>Spatial weight (interconnection)&lt;/td>
&lt;td>&lt;code>W.csv&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="comparison-with-existing-stata-packages">Comparison with existing Stata packages&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Feature&lt;/th>
&lt;th>&lt;code>spxtivdfreg&lt;/code>&lt;/th>
&lt;th>&lt;code>xsmle&lt;/code>&lt;/th>
&lt;th>&lt;code>spxtregress&lt;/code>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Estimation method&lt;/td>
&lt;td>IV/GMM (defactored)&lt;/td>
&lt;td>Maximum likelihood&lt;/td>
&lt;td>Quasi-ML&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Common factors&lt;/td>
&lt;td>Yes (estimated)&lt;/td>
&lt;td>No&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Endogenous regressors&lt;/td>
&lt;td>Yes (IV)&lt;/td>
&lt;td>No&lt;/td>
&lt;td>Limited&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Dynamic (temporal lag)&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>Yes (&lt;code>dlag&lt;/code>)&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Bias correction needed&lt;/td>
&lt;td>No&lt;/td>
&lt;td>Yes (Lee-Yu)&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Heterogeneous slopes (MG)&lt;/td>
&lt;td>Yes (&lt;code>mg&lt;/code> option)&lt;/td>
&lt;td>No&lt;/td>
&lt;td>No&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The key advantage of &lt;code>spxtivdfreg&lt;/code> is its ability to handle unobserved common factors &amp;mdash; latent macroeconomic shocks that affect all banks but with heterogeneous intensity. Maximum likelihood methods in &lt;code>xsmle&lt;/code> assume cross-sectional independence conditional on the spatial weight matrix, which is violated when common factors are present. The defactored IV approach removes these factors before estimation, producing consistent estimates even in the presence of strong cross-sectional dependence.&lt;/p>
&lt;hr>
&lt;h2 id="3-setup-and-data-loading">3. Setup and data loading&lt;/h2>
&lt;p>Before running any spatial dynamic panel models, we need three Stata packages: &lt;code>xtivdfreg&lt;/code> (the core estimation engine), &lt;code>reghdfe&lt;/code> (for absorbing fixed effects), and &lt;code>ftools&lt;/code> (a dependency of &lt;code>reghdfe&lt;/code>). The &lt;code>spxtivdfreg&lt;/code> command is the spatial panel wrapper around &lt;code>xtivdfreg&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-stata">* Install packages (if not already installed)
capture which xtivdfreg
if _rc {
ssc install xtivdfreg
}
capture which reghdfe
if _rc {
ssc install reghdfe
}
capture which ftools
if _rc {
ssc install ftools
}
&lt;/code>&lt;/pre>
&lt;h3 id="31-data-loading-and-panel-setup">3.1 Data loading and panel setup&lt;/h3>
&lt;p>The dataset contains quarterly financial ratios for 350 US commercial banks from 2006:Q1 to 2014:Q4, yielding 36 quarters and 12,600 total observations. After absorbing fixed effects and creating lags, the effective estimation sample is 12,250 observations (350 banks times 35 periods).&lt;/p>
&lt;pre>&lt;code class="language-stata">clear all
use &amp;quot;https://github.com/cmg777/starter-academic-v501/raw/master/content/post/stata_spxtivdfreg/references/v113i06.dta&amp;quot;, clear
xtset ID TIME
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Panel variable: ID (strongly balanced)
Time variable: TIME, 1 to 36
Delta: 1 unit
&lt;/code>&lt;/pre>
&lt;p>The panel is strongly balanced &amp;mdash; all 350 banks are observed in all 36 quarters. The &lt;code>xtset&lt;/code> command declares &lt;code>ID&lt;/code> as the bank identifier and &lt;code>TIME&lt;/code> as the quarterly time index.&lt;/p>
&lt;p>The sample period is rich with major macro-financial events that all banks experienced &amp;mdash; precisely the kind of aggregate shocks that common factors are designed to capture:&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
A[&amp;quot;&amp;lt;b&amp;gt;2006--2007&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Pre-crisis&amp;lt;br/&amp;gt;Housing bubble&amp;lt;br/&amp;gt;Low NPL ratios&amp;quot;]
B[&amp;quot;&amp;lt;b&amp;gt;2007--2009&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Global Financial&amp;lt;br/&amp;gt;Crisis&amp;lt;br/&amp;gt;NPL surge&amp;quot;]
C[&amp;quot;&amp;lt;b&amp;gt;2010--2011&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Dodd-Frank Act&amp;lt;br/&amp;gt;Stress tests&amp;lt;br/&amp;gt;Capital rebuilding&amp;quot;]
D[&amp;quot;&amp;lt;b&amp;gt;2012--2014&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Recovery&amp;lt;br/&amp;gt;Basel III phase-in&amp;lt;br/&amp;gt;NPL normalization&amp;quot;]
A --&amp;gt; B
B --&amp;gt; C
C --&amp;gt; D
style A fill:#6a9bcc,stroke:#141413,color:#fff
style B fill:#d97757,stroke:#141413,color:#fff
style C fill:#141413,stroke:#d97757,color:#fff
style D fill:#00d4c8,stroke:#141413,color:#141413
&lt;/code>&lt;/pre>
&lt;p>These regime shifts (housing bubble, financial crisis, regulatory tightening, recovery) are exactly the unobserved common factors that the &lt;code>spxtivdfreg&lt;/code> estimator extracts. Standard two-way fixed effects would capture them only if they affected all 350 banks equally &amp;mdash; but the interactive fixed effect structure $\lambda_i' f_t$ allows each bank to respond with different intensity to the same aggregate shock.&lt;/p>
&lt;h3 id="32-summary-statistics">3.2 Summary statistics&lt;/h3>
&lt;pre>&lt;code class="language-stata">summarize NPL INEFF CAR SIZE BUFFER PROFIT QUALITY LIQUIDITY INTEREST
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
NPL | 12,600 1.7283 2.1067 0 23.0378
INEFF | 12,600 .6425 .1726 .2007 2.9037
CAR | 12,600 13.5550 5.6198 1.3800 86.8400
SIZE | 12,600 14.6883 1.4234 11.9466 20.4618
BUFFER | 12,600 5.5550 5.2691 -6.6200 78.8400
PROFIT | 12,600 .8001 5.0380 -132.0700 40.9900
QUALITY | 12,600 .2827 .6245 -4.9482 27.8659
LIQUIDITY | 12,600 .7699 .2224 .0122 2.3217
INTEREST | 12,600 -1.9074 .9328 -5.1644 2.5187
&lt;/code>&lt;/pre>
&lt;p>Mean NPL is 1.73%, reflecting the mixture of pre-crisis, crisis, and post-crisis quarters in the sample. The standard deviation of 2.11 percentage points indicates substantial variation both across banks and over time &amp;mdash; some banks had NPL ratios as high as 23%. Mean LIQUIDITY (loan-to-deposit ratio) is 0.77, meaning the average bank lent out 77 cents for every dollar of deposits. The wide range of CAR (1.38% to 86.84%) reflects the heterogeneity in capital structures across US commercial banks.&lt;/p>
&lt;h3 id="33-variables">3.3 Variables&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>Mean&lt;/th>
&lt;th>Std. Dev.&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>NPL&lt;/code>&lt;/td>
&lt;td>Non-performing loans / total loans (%)&lt;/td>
&lt;td>1.728&lt;/td>
&lt;td>2.107&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>INEFF&lt;/code>&lt;/td>
&lt;td>Operational inefficiency (endogenous)&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>CAR&lt;/code>&lt;/td>
&lt;td>Capital adequacy ratio&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>SIZE&lt;/code>&lt;/td>
&lt;td>ln(total assets)&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>BUFFER&lt;/code>&lt;/td>
&lt;td>Capital buffer (leverage ratio minus 8%)&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>PROFIT&lt;/code>&lt;/td>
&lt;td>Return on equity, annualized&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>QUALITY&lt;/code>&lt;/td>
&lt;td>Loan loss provisions / assets (%)&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>LIQUIDITY&lt;/code>&lt;/td>
&lt;td>Loan-to-deposit ratio&lt;/td>
&lt;td>0.770&lt;/td>
&lt;td>0.222&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>INTEREST&lt;/code>&lt;/td>
&lt;td>Interest expenses / deposits (instrument for INEFF)&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The dependent variable &lt;code>NPL&lt;/code> measures credit risk as the share of non-performing loans in total loans, expressed in percentage points. Its mean of 1.728% reflects the mixture of pre-crisis, crisis, and post-crisis quarters in the sample, with a standard deviation of 2.107 percentage points indicating substantial variation both across banks and over time. The variable &lt;code>INEFF&lt;/code> (operational inefficiency) is treated as &lt;strong>endogenous&lt;/strong> and instrumented using &lt;code>INTEREST&lt;/code> (interest expenses relative to deposits) along with lagged values of the exogenous regressors.&lt;/p>
&lt;h3 id="33-the-spatial-weight-matrix">3.3 The spatial weight matrix&lt;/h3>
&lt;p>The spatial weight matrix $W$ is a 350-by-350 matrix that defines the network structure among banks. Unlike geographic contiguity matrices used in regional analysis, this matrix is constructed from &lt;strong>economic distance&lt;/strong> &amp;mdash; specifically, Spearman&amp;rsquo;s rank correlation of bank debt-to-asset ratios. Two banks are defined as &amp;ldquo;neighbors&amp;rdquo; if their debt ratio correlation exceeds the 95th percentile of the empirical distribution.&lt;/p>
&lt;pre>&lt;code class="language-stata">* Download the W matrix to the current working directory
copy &amp;quot;https://github.com/cmg777/starter-academic-v501/raw/master/content/post/stata_spxtivdfreg/references/W.csv&amp;quot; &amp;quot;W.csv&amp;quot;, replace
* The W matrix (350 x 350, row-standardized, 6,300 nonzero entries) is loaded
* automatically by spxtivdfreg via the spmatrix(&amp;quot;W.csv&amp;quot;, import) option
&lt;/code>&lt;/pre>
&lt;p>The matrix is row-standardized so that each row sums to one, meaning the spatial lag of a variable equals the &lt;strong>weighted average&lt;/strong> among a bank&amp;rsquo;s neighbors. With 6,300 nonzero entries across 350 banks, the average bank has approximately 18 neighbors &amp;mdash; banks whose debt structures are sufficiently correlated to suggest economic interdependence. To illustrate: suppose Bank A and Bank B have a Spearman rank correlation of 0.92 in their quarterly debt ratios, while the 95th percentile threshold is 0.87. Since 0.92 exceeds 0.87, Bank A and Bank B are classified as neighbors ($w_{AB} &amp;gt; 0$). After row-standardization, $w_{AB}$ equals $1/18$ if Bank A has 18 neighbors. This economic-distance approach captures financial contagion channels that geographic proximity alone would miss, since two banks on opposite coasts can be highly interconnected through similar lending portfolios.&lt;/p>
&lt;hr>
&lt;h2 id="4-full-model-with-common-factors">4. Full model with common factors&lt;/h2>
&lt;p>We now estimate the full spatial dynamic panel model with unobserved common factors. The &lt;code>spxtivdfreg&lt;/code> command takes the dependent variable (&lt;code>NPL&lt;/code>) and the regressors, with options specifying the model structure: &lt;code>absorb(ID)&lt;/code> absorbs bank fixed effects, &lt;code>splag&lt;/code> includes the spatial lag of NPL, &lt;code>tlags(1)&lt;/code> adds the first temporal lag, &lt;code>spmatrix(&amp;quot;W.csv&amp;quot;, import)&lt;/code> loads the weight matrix, and &lt;code>iv(...)&lt;/code> specifies the instrumental variables. The &lt;code>std&lt;/code> option standardizes the variables before extracting principal components for the factor estimation, which improves numerical stability when covariates have very different scales.&lt;/p>
&lt;pre>&lt;code class="language-stata">spxtivdfreg NPL INEFF CAR SIZE BUFFER PROFIT QUALITY LIQUIDITY, ///
absorb(ID) splag tlags(1) spmatrix(&amp;quot;W.csv&amp;quot;, import) ///
iv(INTEREST CAR SIZE BUFFER PROFIT QUALITY LIQUIDITY, splags lag(1)) std
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Defactored instrumental variables estimation
Group variable: ID Number of obs = 12,250
Time variable: TIME Number of groups = 350
Number of instruments = 28 Obs per group:
Number of factors in X = 2 min = 35
Number of factors in u = 1 avg = 35.0
max = 35
Second-stage estimator (model with homogeneous slope coefficients)
--------------------------------------------------------------------------
Robust
NPL | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
------+-------------------------------------------------------------------
NPL |
L1. | .2898521 .0543794 5.33 0.000 .1832704 .3964339
|
INEFF | .4473777 .1045636 4.28 0.000 .2424368 .6523186
CAR | .0305078 .0057852 5.27 0.000 .019169 .0418465
SIZE | .2225966 .0941614 2.36 0.018 .0380436 .4071496
BUFFER| -.0545049 .0118678 -4.59 0.000 -.0777653 -.0312445
PROFIT| -.0053351 .0018411 -2.90 0.004 -.0089437 -.0017266
QUALITY| .1830412 .0307657 5.95 0.000 .1227415 .2433408
LIQUIDITY| 2.452391 .2696471 9.09 0.000 1.923892 2.980889
_cons | -4.510715 1.311453 -3.44 0.001 -7.081115 -1.940315
------+-------------------------------------------------------------------
W |
NPL | .3943206 .0848856 4.65 0.000 .2279479 .5606932
------+-------------------------------------------------------------------
sigma_f | .64162366 (std. dev. of factor error component)
sigma_e | .90381799 (std. dev. of idiosyncratic error component)
rho | .33509009 (fraction of variance due to factors)
--------------------------------------------------------------------------
Hansen test: chi2(19) = 18.8250, Prob &amp;gt; chi2 = 0.4681
&lt;/code>&lt;/pre>
&lt;p>The estimator identifies &lt;strong>2 common factors in the regressors&lt;/strong> and &lt;strong>1 common factor in the error term&lt;/strong>, capturing latent macroeconomic forces that drive credit risk across the banking system. These factors represent unobserved aggregate shocks &amp;mdash; such as Federal Reserve interest rate decisions, housing market fluctuations, and changes in regulatory stringency &amp;mdash; that affect all banks simultaneously but with bank-specific intensities (heterogeneous factor loadings $\lambda_i$).&lt;/p>
&lt;p>The &lt;strong>spatial autoregressive parameter&lt;/strong> $\psi = 0.394$ (z = 4.65, p &amp;lt; 0.001) indicates strong positive spatial spillovers: when the average NPL ratio of a bank&amp;rsquo;s neighbors increases by 1 percentage point, the bank&amp;rsquo;s own NPL ratio increases by 0.39 percentage points, holding all else constant. This captures financial contagion through interconnected lending networks &amp;mdash; when one bank&amp;rsquo;s borrowers default, it can trigger a cascade of defaults among economically linked banks.&lt;/p>
&lt;p>The &lt;strong>temporal persistence parameter&lt;/strong> $\rho = 0.290$ (z = 5.33, p &amp;lt; 0.001) shows that credit risk is moderately persistent: about 29% of a bank&amp;rsquo;s current NPL ratio is inherited from the previous quarter. This reflects the gradual resolution of non-performing loans through workout processes, foreclosures, and write-offs.&lt;/p>
&lt;p>Among the covariates, &lt;strong>LIQUIDITY&lt;/strong> has the largest effect at 2.452 (z = 9.09, p &amp;lt; 0.001), meaning that a 1 percentage point increase in the loan-to-deposit ratio is associated with a 2.45 percentage point increase in non-performing loans. Banks that extend more credit relative to their deposit base face higher credit risk. &lt;strong>INEFF&lt;/strong> (operational inefficiency) enters with a coefficient of 0.447 (z = 4.28, p &amp;lt; 0.001), confirming that poorly managed banks experience higher default rates &amp;mdash; a finding consistent with the &amp;ldquo;bad management&amp;rdquo; hypothesis in the banking literature. &lt;strong>BUFFER&lt;/strong> enters negatively at -0.055 (z = -4.59, p &amp;lt; 0.001), indicating that better-capitalized banks (those with larger capital buffers above the 8% regulatory minimum) have lower credit risk.&lt;/p>
&lt;p>The &lt;strong>variance decomposition&lt;/strong> at the bottom of the output reveals that common factors explain a substantial share of the error variance: $\sigma_f = 0.642$ and $\sigma_e = 0.904$, yielding $\rho_{factor} = 0.335$. This means that &lt;strong>33.5% of the residual variance&lt;/strong> is attributable to unobserved common factors &amp;mdash; macroeconomic shocks that a model without factors would absorb into biased coefficient estimates.&lt;/p>
&lt;p>The &lt;strong>Hansen J-test&lt;/strong> for overidentifying restrictions yields chi2(19) = 18.825 with p = 0.468, which &lt;strong>does not reject&lt;/strong> the null hypothesis that the instruments are valid. This provides confidence that the IV strategy &amp;mdash; using &lt;code>INTEREST&lt;/code> and lagged values of exogenous regressors as instruments &amp;mdash; is appropriate.&lt;/p>
&lt;hr>
&lt;h2 id="5-what-happens-without-common-factors">5. What happens without common factors?&lt;/h2>
&lt;p>To assess the consequences of ignoring latent macroeconomic shocks, we re-estimate the model with the &lt;code>factmax(0)&lt;/code> option, which forces the estimator to set the number of common factors to zero. This specification is equivalent to a standard spatial dynamic panel model without interactive fixed effects.&lt;/p>
&lt;pre>&lt;code class="language-stata">spxtivdfreg NPL INEFF CAR SIZE BUFFER PROFIT QUALITY LIQUIDITY, ///
absorb(ID) splag tlags(1) spmatrix(&amp;quot;W.csv&amp;quot;, import) ///
iv(INTEREST CAR SIZE BUFFER PROFIT QUALITY LIQUIDITY, splags lag(1)) std factmax(0)
&lt;/code>&lt;/pre>
&lt;p>The table below compares the coefficient estimates from the full model (with factors) and the restricted model (without factors).&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th style="text-align:center">With factors&lt;/th>
&lt;th style="text-align:center">Without factors&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>$\psi$ (W*NPL)&lt;/td>
&lt;td style="text-align:center">0.394*** (0.085)&lt;/td>
&lt;td style="text-align:center">0.288*** (0.038)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\rho$ (L1.NPL)&lt;/td>
&lt;td style="text-align:center">0.290*** (0.054)&lt;/td>
&lt;td style="text-align:center">0.594*** (0.034)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>INEFF&lt;/td>
&lt;td style="text-align:center">0.447*** (0.105)&lt;/td>
&lt;td style="text-align:center">0.366*** (0.107)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>CAR&lt;/td>
&lt;td style="text-align:center">0.031*** (0.006)&lt;/td>
&lt;td style="text-align:center">0.017*** (0.004)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>SIZE&lt;/td>
&lt;td style="text-align:center">0.223** (0.094)&lt;/td>
&lt;td style="text-align:center">0.089 (0.061)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>BUFFER&lt;/td>
&lt;td style="text-align:center">-0.055*** (0.012)&lt;/td>
&lt;td style="text-align:center">-0.025** (0.010)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>PROFIT&lt;/td>
&lt;td style="text-align:center">-0.005*** (0.002)&lt;/td>
&lt;td style="text-align:center">-0.006*** (0.002)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>QUALITY&lt;/td>
&lt;td style="text-align:center">0.183*** (0.031)&lt;/td>
&lt;td style="text-align:center">0.283*** (0.029)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>LIQUIDITY&lt;/td>
&lt;td style="text-align:center">2.452*** (0.270)&lt;/td>
&lt;td style="text-align:center">0.843*** (0.180)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Factors ($r_x$, $r_u$)&lt;/td>
&lt;td style="text-align:center">2, 1&lt;/td>
&lt;td style="text-align:center">0, 0&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>J-test&lt;/td>
&lt;td style="text-align:center">18.825 [0.468]&lt;/td>
&lt;td style="text-align:center">48.151 [0.000]&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The differences are striking and systematic. Without common factors, the &lt;strong>temporal persistence doubles&lt;/strong> from $\rho = 0.290$ to $\rho = 0.594$. This inflation occurs because unobserved common factors are serially correlated (macroeconomic conditions evolve gradually), and when they are excluded from the model, the temporal lag absorbs their persistence. In other words, the model without factors confuses macroeconomic persistence with bank-level credit risk persistence.&lt;/p>
&lt;p>The &lt;strong>spatial autoregressive parameter drops&lt;/strong> from $\psi = 0.394$ to $\psi = 0.288$ &amp;mdash; a 27% decrease. This is counterintuitive at first glance: one might expect omitting factors to inflate the spatial parameter (since common factors create cross-sectional dependence that could be mistaken for spatial spillovers). However, the inflated temporal lag in the no-factor model absorbs some of the spatial dynamics, compressing $\psi$ downward. The lesson is that omitting common factors distorts &lt;strong>all&lt;/strong> coefficient estimates in complex and non-obvious ways.&lt;/p>
&lt;p>The &lt;strong>LIQUIDITY coefficient collapses&lt;/strong> from 2.452 to 0.843 &amp;mdash; a 66% reduction. This suggests that much of the effect of liquidity on credit risk operates through common factors: during the GFC, aggregate liquidity conditions deteriorated system-wide, and banks with high loan-to-deposit ratios were disproportionately affected. Without factors to absorb these aggregate movements, the LIQUIDITY coefficient is biased downward.&lt;/p>
&lt;p>Most critically, the &lt;strong>Hansen J-test rejects&lt;/strong> in the no-factor model: chi2 = 48.151 with p &amp;lt; 0.001. This rejection means that the instruments are not valid under the no-factor specification &amp;mdash; the model is misspecified. The common factors that enter both the regressors and the error term invalidate the exclusion restriction when they are not accounted for. This provides a formal statistical justification for including common factors: the J-test passes (p = 0.468) with factors and fails (p &amp;lt; 0.001) without them.&lt;/p>
&lt;p>&lt;strong>SIZE&lt;/strong> becomes statistically insignificant without factors (coefficient = 0.089, standard error = 0.061), whereas it is significant at the 5% level in the full model (0.223, standard error = 0.094). This reversal illustrates how omitting common factors can mask genuine relationships: larger banks are more exposed to systematic macro shocks (they have larger factor loadings), and without factors in the model, this exposure is incorrectly attributed to noise rather than to bank size.&lt;/p>
&lt;hr>
&lt;h2 id="6-what-happens-without-the-spatial-lag">6. What happens without the spatial lag?&lt;/h2>
&lt;p>To isolate the contribution of spatial spillovers, we now estimate a model that includes common factors but removes the spatially lagged dependent variable. This is done by dropping the &lt;code>splag&lt;/code> option. Without the spatial lag, the model reduces to a dynamic panel with common factors &amp;mdash; equivalent to the &lt;code>xtivdfreg&lt;/code> command.&lt;/p>
&lt;pre>&lt;code class="language-stata">* Without spatial lag (spxtivdfreg without splag option)
spxtivdfreg NPL INEFF CAR SIZE BUFFER PROFIT QUALITY LIQUIDITY, ///
absorb(ID) tlags(1) spmatrix(&amp;quot;W.csv&amp;quot;, import) ///
iv(INTEREST CAR SIZE BUFFER PROFIT QUALITY LIQUIDITY, lag(1)) std
* Equivalent specification with xtivdfreg
xtivdfreg NPL L.NPL INEFF CAR SIZE BUFFER PROFIT QUALITY LIQUIDITY, ///
absorb(ID) ///
iv(INTEREST CAR SIZE BUFFER PROFIT QUALITY LIQUIDITY, lag(1)) std
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th style="text-align:center">Full model&lt;/th>
&lt;th style="text-align:center">Without spatial lag&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>$\psi$ (W*NPL)&lt;/td>
&lt;td style="text-align:center">0.394*** (0.085)&lt;/td>
&lt;td style="text-align:center">&amp;mdash;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\rho$ (L1.NPL)&lt;/td>
&lt;td style="text-align:center">0.290*** (0.054)&lt;/td>
&lt;td style="text-align:center">0.323*** (0.055)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>INEFF&lt;/td>
&lt;td style="text-align:center">0.447*** (0.105)&lt;/td>
&lt;td style="text-align:center">0.638*** (0.116)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>CAR&lt;/td>
&lt;td style="text-align:center">0.031*** (0.006)&lt;/td>
&lt;td style="text-align:center">0.030*** (0.006)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>SIZE&lt;/td>
&lt;td style="text-align:center">0.223** (0.094)&lt;/td>
&lt;td style="text-align:center">0.346*** (0.096)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>BUFFER&lt;/td>
&lt;td style="text-align:center">-0.055*** (0.012)&lt;/td>
&lt;td style="text-align:center">-0.045*** (0.016)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>PROFIT&lt;/td>
&lt;td style="text-align:center">-0.005*** (0.002)&lt;/td>
&lt;td style="text-align:center">-0.004** (0.002)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>QUALITY&lt;/td>
&lt;td style="text-align:center">0.183*** (0.031)&lt;/td>
&lt;td style="text-align:center">0.183*** (0.036)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>LIQUIDITY&lt;/td>
&lt;td style="text-align:center">2.452*** (0.270)&lt;/td>
&lt;td style="text-align:center">2.534*** (0.311)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Factors ($r_x$, $r_u$)&lt;/td>
&lt;td style="text-align:center">2, 1&lt;/td>
&lt;td style="text-align:center">2, 1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>J-test&lt;/td>
&lt;td style="text-align:center">18.825 [0.468]&lt;/td>
&lt;td style="text-align:center">8.174 [0.226]&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>When the spatial lag is removed, the &lt;strong>temporal persistence increases&lt;/strong> from $\rho = 0.290$ to $\rho = 0.323$ &amp;mdash; the temporal lag partially absorbs the missing spatial dynamics. The &lt;strong>INEFF coefficient inflates&lt;/strong> from 0.447 to 0.638 (a 43% increase), and &lt;strong>SIZE&lt;/strong> rises from 0.223 to 0.346 (a 55% increase). Without the spatial lag to capture bank interdependence, these covariates must do more work to explain the cross-sectional variation in credit risk, leading to upward bias.&lt;/p>
&lt;p>Importantly, both specifications pass the J-test (p = 0.468 and p = 0.226, respectively), meaning that both models have valid instruments. The choice between them must therefore be based on economic reasoning rather than diagnostic tests alone. The full model with the spatial lag is preferred because financial theory predicts bank interdependence, and the spatial autoregressive parameter $\psi = 0.394$ is highly significant (z = 4.65, p &amp;lt; 0.001).&lt;/p>
&lt;hr>
&lt;h2 id="7-short-run-and-long-run-effects">7. Short-run and long-run effects&lt;/h2>
&lt;p>In spatial dynamic panel models, the coefficient on a variable does not directly measure its total effect on the dependent variable. Because of the spatial lag ($\psi W \cdot NPL$) and the temporal lag ($\rho \, NPL_{i,t-1}$), a shock to any covariate propagates through the system both across banks (through the spatial multiplier) and over time (through dynamic accumulation). The &lt;code>estat impact&lt;/code> command decomposes these effects into &lt;strong>direct effects&lt;/strong> (the impact of a bank&amp;rsquo;s own covariate on its own NPL), &lt;strong>indirect effects&lt;/strong> (the impact transmitted through the network of interconnected banks), and &lt;strong>total effects&lt;/strong> (direct plus indirect).&lt;/p>
&lt;p>The long-run effects account for the full dynamic accumulation of a permanent change in a covariate. The long-run multiplier scales the short-run coefficients by $(1 - \rho)^{-1}$ for the direct channel and further by $(1 - \psi)^{-1}$ for the spatial multiplier:&lt;/p>
&lt;p>$$\text{Total LR effect} = \frac{\beta}{(1 - \rho)(1 - \psi)}$$&lt;/p>
&lt;p>In words, this equation says that a permanent 1-unit increase in a covariate has a total long-run effect equal to its short-run coefficient $\beta$ amplified by two multipliers: the temporal multiplier $1/(1-\rho)$, which captures the compounding of the effect over time as it feeds back through lagged NPL, and the spatial multiplier $1/(1-\psi)$, which captures the amplification as the effect spreads through the bank network. The diagram below illustrates this decomposition.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
B[&amp;quot;&amp;lt;b&amp;gt;Short-run&amp;lt;br/&amp;gt;coefficient&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;β = 2.452&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;(LIQUIDITY)&amp;lt;/i&amp;gt;&amp;quot;]
T[&amp;quot;&amp;lt;b&amp;gt;Temporal&amp;lt;br/&amp;gt;multiplier&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;1/(1−ρ)&amp;lt;br/&amp;gt;= 1/(1−0.290)&amp;lt;br/&amp;gt;= 1.408&amp;quot;]
D[&amp;quot;&amp;lt;b&amp;gt;Direct&amp;lt;br/&amp;gt;effect&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;3.547&amp;quot;]
S[&amp;quot;&amp;lt;b&amp;gt;Spatial&amp;lt;br/&amp;gt;multiplier&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;1/(1−ψ)&amp;lt;br/&amp;gt;= 1/(1−0.394)&amp;lt;br/&amp;gt;= 1.650&amp;quot;]
I[&amp;quot;&amp;lt;b&amp;gt;Indirect&amp;lt;br/&amp;gt;effect&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;4.218&amp;quot;]
Tot[&amp;quot;&amp;lt;b&amp;gt;Total&amp;lt;br/&amp;gt;effect&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;7.765&amp;quot;]
B --&amp;gt;|&amp;quot;× temporal&amp;quot;| T
T --&amp;gt;|&amp;quot;= direct&amp;quot;| D
D --&amp;gt;|&amp;quot;× spatial&amp;quot;| S
S --&amp;gt;|&amp;quot;= indirect&amp;quot;| I
D --&amp;gt; Tot
I --&amp;gt; Tot
style B fill:#6a9bcc,stroke:#141413,color:#fff
style T fill:#d97757,stroke:#141413,color:#fff
style D fill:#00d4c8,stroke:#141413,color:#141413
style S fill:#d97757,stroke:#141413,color:#fff
style I fill:#141413,stroke:#d97757,color:#fff
style Tot fill:#6a9bcc,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-stata">* Short-run effects (full model with factors)
estat impact, sr
&lt;/code>&lt;/pre>
&lt;h3 id="71-short-run-effects">7.1 Short-run effects&lt;/h3>
&lt;p>The short-run effects capture the immediate one-period impact of a covariate change, including the contemporaneous spatial spillover but not the dynamic accumulation over time.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th style="text-align:center">SR Direct&lt;/th>
&lt;th style="text-align:center">SR Indirect&lt;/th>
&lt;th style="text-align:center">SR Total&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>INEFF&lt;/td>
&lt;td style="text-align:center">0.457&lt;/td>
&lt;td style="text-align:center">0.289&lt;/td>
&lt;td style="text-align:center">0.746&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>CAR&lt;/td>
&lt;td style="text-align:center">0.031&lt;/td>
&lt;td style="text-align:center">0.020&lt;/td>
&lt;td style="text-align:center">0.051&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>SIZE&lt;/td>
&lt;td style="text-align:center">0.227&lt;/td>
&lt;td style="text-align:center">0.144&lt;/td>
&lt;td style="text-align:center">0.371&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>BUFFER&lt;/td>
&lt;td style="text-align:center">-0.056&lt;/td>
&lt;td style="text-align:center">-0.035&lt;/td>
&lt;td style="text-align:center">-0.091&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>PROFIT&lt;/td>
&lt;td style="text-align:center">-0.005&lt;/td>
&lt;td style="text-align:center">-0.003&lt;/td>
&lt;td style="text-align:center">-0.009&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>QUALITY&lt;/td>
&lt;td style="text-align:center">0.187&lt;/td>
&lt;td style="text-align:center">0.118&lt;/td>
&lt;td style="text-align:center">0.305&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>LIQUIDITY&lt;/td>
&lt;td style="text-align:center">2.505&lt;/td>
&lt;td style="text-align:center">1.585&lt;/td>
&lt;td style="text-align:center">4.090&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>In the short run, indirect effects are roughly 63% of direct effects &amp;mdash; the spatial multiplier $(I - \psi W)^{-1}$ amplifies every shock by about 1.63x. For LIQUIDITY, the short-run total is 4.09 &amp;mdash; already substantially larger than the regression coefficient (2.452) due to spatial amplification alone.&lt;/p>
&lt;pre>&lt;code class="language-stata">* Long-run effects (full model with factors)
estat impact, lr
&lt;/code>&lt;/pre>
&lt;h3 id="72-long-run-effects-with-common-factors">7.2 Long-run effects with common factors&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th style="text-align:center">Direct&lt;/th>
&lt;th style="text-align:center">Indirect&lt;/th>
&lt;th style="text-align:center">Total&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>INEFF&lt;/td>
&lt;td style="text-align:center">0.647*** (0.159)&lt;/td>
&lt;td style="text-align:center">0.769** (0.335)&lt;/td>
&lt;td style="text-align:center">1.417*** (0.427)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>CAR&lt;/td>
&lt;td style="text-align:center">0.044*** (0.009)&lt;/td>
&lt;td style="text-align:center">0.052** (0.024)&lt;/td>
&lt;td style="text-align:center">0.097*** (0.029)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>SIZE&lt;/td>
&lt;td style="text-align:center">0.322** (0.142)&lt;/td>
&lt;td style="text-align:center">0.383* (0.198)&lt;/td>
&lt;td style="text-align:center">0.705** (0.310)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>BUFFER&lt;/td>
&lt;td style="text-align:center">-0.079*** (0.018)&lt;/td>
&lt;td style="text-align:center">-0.094** (0.043)&lt;/td>
&lt;td style="text-align:center">-0.173*** (0.054)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>PROFIT&lt;/td>
&lt;td style="text-align:center">-0.008*** (0.002)&lt;/td>
&lt;td style="text-align:center">-0.009** (0.005)&lt;/td>
&lt;td style="text-align:center">-0.017*** (0.006)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>QUALITY&lt;/td>
&lt;td style="text-align:center">0.265*** (0.047)&lt;/td>
&lt;td style="text-align:center">0.315** (0.141)&lt;/td>
&lt;td style="text-align:center">0.580*** (0.167)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>LIQUIDITY&lt;/td>
&lt;td style="text-align:center">3.547*** (0.445)&lt;/td>
&lt;td style="text-align:center">4.218** (1.742)&lt;/td>
&lt;td style="text-align:center">7.765*** (1.904)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The long-run effects reveal that &lt;strong>indirect (spillover) effects are comparable to or larger than direct effects&lt;/strong> for every variable. For LIQUIDITY, the direct long-run effect is 3.547 and the indirect effect is 4.218, yielding a total of 7.765 &amp;mdash; meaning that a permanent 1 percentage point increase in the loan-to-deposit ratio across all banks would increase the system-wide NPL ratio by nearly 7.8 percentage points in the long run. The indirect effect exceeds the direct effect because the spatial multiplier amplifies shocks across the network of 18 average neighbors per bank.&lt;/p>
&lt;p>For INEFF (operational inefficiency), the total long-run effect is 1.417 &amp;mdash; more than three times the short-run coefficient of 0.447. A permanent deterioration in management quality cascades through the banking network as inefficient banks generate non-performing loans that spread to their interconnected counterparts through shared borrowers and counterparty risk.&lt;/p>
&lt;p>The BUFFER variable has a total long-run effect of -0.173, meaning that a 1 percentage point increase in capital buffers above the 8% regulatory minimum reduces system-wide NPL by 0.173 percentage points in the long run. Both the direct channel (-0.079, well-capitalized banks absorb losses better) and the indirect channel (-0.094, their stability reduces contagion to neighbors) contribute to this protective effect.&lt;/p>
&lt;h3 id="73-long-run-effects-without-common-factors">7.3 Long-run effects without common factors&lt;/h3>
&lt;p>To see how omitting common factors distorts spillover estimates, we compare the long-run effects from the full model (with factors) to those from the &lt;code>factmax(0)&lt;/code> specification.&lt;/p>
&lt;pre>&lt;code class="language-stata">* Long-run effects (model without factors)
spxtivdfreg NPL INEFF CAR SIZE BUFFER PROFIT QUALITY LIQUIDITY, ///
absorb(ID) splag tlags(1) spmatrix(&amp;quot;W.csv&amp;quot;, import) ///
iv(INTEREST CAR SIZE BUFFER PROFIT QUALITY LIQUIDITY, splags lag(1)) std factmax(0)
estat impact, lr
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th style="text-align:center">With factors (Total)&lt;/th>
&lt;th style="text-align:center">Without factors (Total)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>INEFF&lt;/td>
&lt;td style="text-align:center">1.417***&lt;/td>
&lt;td style="text-align:center">3.117**&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>CAR&lt;/td>
&lt;td style="text-align:center">0.097***&lt;/td>
&lt;td style="text-align:center">0.145**&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>SIZE&lt;/td>
&lt;td style="text-align:center">0.705**&lt;/td>
&lt;td style="text-align:center">0.756 (n.s.)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>BUFFER&lt;/td>
&lt;td style="text-align:center">-0.173***&lt;/td>
&lt;td style="text-align:center">-0.212*&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>PROFIT&lt;/td>
&lt;td style="text-align:center">-0.017***&lt;/td>
&lt;td style="text-align:center">-0.053***&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>QUALITY&lt;/td>
&lt;td style="text-align:center">0.580***&lt;/td>
&lt;td style="text-align:center">2.407***&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>LIQUIDITY&lt;/td>
&lt;td style="text-align:center">7.765***&lt;/td>
&lt;td style="text-align:center">7.176**&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The comparison reveals &lt;strong>severe distortion&lt;/strong> in the no-factor model&amp;rsquo;s long-run effects. The total effect of QUALITY more than quadruples from 0.580 to 2.407, and INEFF more than doubles from 1.417 to 3.117. These inflated estimates arise because the no-factor model attributes macroeconomic variation to the covariates: when aggregate loan quality deteriorates during a recession, the no-factor model incorrectly assigns this entire movement to the bank-level QUALITY and INEFF variables rather than recognizing the common factor (the recession itself).&lt;/p>
&lt;p>Conversely, SIZE loses statistical significance in the no-factor model (total effect = 0.756, not significant), even though it is significant in the full model (0.705, p &amp;lt; 0.05). The common factors capture macro-financial conditions that disproportionately affect larger banks, and without these factors, the SIZE effect is masked by omitted variable bias.&lt;/p>
&lt;hr>
&lt;h2 id="8-heterogeneous-slopes-the-mean-group-estimator">8. Heterogeneous slopes: the mean-group estimator&lt;/h2>
&lt;p>The models estimated so far assume that all banks share the same slope coefficients &amp;mdash; that is, the effect of LIQUIDITY on NPL is identical for all 350 banks. This is a strong assumption. Banks differ in their business models, geographic markets, and risk management practices, and these differences may translate into heterogeneous responses to the same financial ratios. The &lt;code>mg&lt;/code> (mean-group) option in &lt;code>spxtivdfreg&lt;/code> relaxes this assumption by estimating bank-specific slopes and reporting their cross-sectional average.&lt;/p>
&lt;pre>&lt;code class="language-stata">spxtivdfreg NPL INEFF CAR SIZE BUFFER PROFIT QUALITY LIQUIDITY, ///
absorb(ID) splag tlags(1) spmatrix(&amp;quot;W.csv&amp;quot;, import) ///
iv(INTEREST CAR SIZE BUFFER PROFIT QUALITY LIQUIDITY, splags lag(1)) std mg
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th style="text-align:center">Homogeneous (pooled)&lt;/th>
&lt;th style="text-align:center">Heterogeneous (MG)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>$\psi$ (W*NPL)&lt;/td>
&lt;td style="text-align:center">0.394*** (0.085)&lt;/td>
&lt;td style="text-align:center">0.032 (0.051)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\rho$ (L1.NPL)&lt;/td>
&lt;td style="text-align:center">0.290*** (0.054)&lt;/td>
&lt;td style="text-align:center">0.301*** (0.015)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>INEFF&lt;/td>
&lt;td style="text-align:center">0.447*** (0.105)&lt;/td>
&lt;td style="text-align:center">0.759*** (0.158)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>CAR&lt;/td>
&lt;td style="text-align:center">0.031*** (0.006)&lt;/td>
&lt;td style="text-align:center">0.218*** (0.026)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>SIZE&lt;/td>
&lt;td style="text-align:center">0.223** (0.094)&lt;/td>
&lt;td style="text-align:center">2.004*** (0.339)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>BUFFER&lt;/td>
&lt;td style="text-align:center">-0.055*** (0.012)&lt;/td>
&lt;td style="text-align:center">-0.376*** (0.042)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>PROFIT&lt;/td>
&lt;td style="text-align:center">-0.005*** (0.002)&lt;/td>
&lt;td style="text-align:center">-0.018*** (0.006)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>QUALITY&lt;/td>
&lt;td style="text-align:center">0.183*** (0.031)&lt;/td>
&lt;td style="text-align:center">0.287** (0.139)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>LIQUIDITY&lt;/td>
&lt;td style="text-align:center">2.452*** (0.270)&lt;/td>
&lt;td style="text-align:center">6.330*** (0.506)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>_cons&lt;/td>
&lt;td style="text-align:center">-4.511*** (1.311)&lt;/td>
&lt;td style="text-align:center">-29.013*** (4.167)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The most striking result is that the &lt;strong>spatial autoregressive parameter becomes insignificant&lt;/strong> under the MG estimator: $\psi = 0.032$ (z = 0.62, p = 0.536). This suggests that the strong spatial spillovers found in the pooled model ($\psi = 0.394$) may partly reflect slope heterogeneity rather than genuine bank-to-bank contagion. When each bank is allowed its own coefficient on LIQUIDITY, SIZE, and other variables, the average spatial lag effect shrinks to near zero. This is a common finding in spatial econometrics: imposing homogeneous slopes in the presence of slope heterogeneity can create spurious spatial dependence.&lt;/p>
&lt;p>The &lt;strong>covariate coefficients increase substantially&lt;/strong> under the MG estimator. SIZE jumps from 0.223 to 2.004 (a nine-fold increase), BUFFER from -0.055 to -0.376 (a seven-fold increase), and CAR from 0.031 to 0.218 (a seven-fold increase). These larger MG coefficients suggest that the pooled model&amp;rsquo;s homogeneity restriction attenuates individual bank-level effects toward zero. The MG standard errors are generally smaller than the pooled standard errors for the temporal lag ($\rho$: 0.015 vs. 0.054) but larger for some covariates, reflecting the averaging of heterogeneous bank-specific estimates.&lt;/p>
&lt;p>The &lt;strong>temporal persistence&lt;/strong> remains stable: $\rho = 0.301$ (MG) versus $\rho = 0.290$ (pooled). This robustness suggests that credit risk persistence is a genuine phenomenon shared across all banks, not an artifact of slope heterogeneity. Whether a bank is large or small, well-managed or poorly managed, about 30% of its current NPL ratio is inherited from the previous quarter.&lt;/p>
&lt;p>The MG estimator is only $\sqrt{N}$-consistent (versus $\sqrt{NT}$-consistent for the pooled estimator), making it inherently less efficient and more susceptible to outliers. With 350 banks and 35 time periods, a handful of banks with extreme coefficient estimates can shift the MG average substantially. To investigate, individual bank-specific estimates can be inspected using the &lt;code>mg(101)&lt;/code> option (which displays estimates for the bank with ID 101) or extracted from the &lt;code>e(b_mg)&lt;/code> and &lt;code>e(se_mg)&lt;/code> matrices for further analysis &amp;mdash; for example, to compute trimmed or median estimates that are robust to outlier influence. However, further exploration of individual heterogeneity is beyond the scope of this tutorial.&lt;/p>
&lt;hr>
&lt;h2 id="9-model-comparison-and-specification-guidance">9. Model comparison and specification guidance&lt;/h2>
&lt;p>The following table summarizes the four model specifications estimated in this tutorial, highlighting the key coefficient estimates and diagnostic tests.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th style="text-align:center">Full model&lt;/th>
&lt;th style="text-align:center">No factors&lt;/th>
&lt;th style="text-align:center">No spatial lag&lt;/th>
&lt;th style="text-align:center">Heterogeneous (MG)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>$\psi$ (spatial)&lt;/td>
&lt;td style="text-align:center">0.394***&lt;/td>
&lt;td style="text-align:center">0.288***&lt;/td>
&lt;td style="text-align:center">&amp;mdash;&lt;/td>
&lt;td style="text-align:center">0.032&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\rho$ (temporal)&lt;/td>
&lt;td style="text-align:center">0.290***&lt;/td>
&lt;td style="text-align:center">0.594***&lt;/td>
&lt;td style="text-align:center">0.323***&lt;/td>
&lt;td style="text-align:center">0.301***&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>LIQUIDITY&lt;/td>
&lt;td style="text-align:center">2.452***&lt;/td>
&lt;td style="text-align:center">0.843***&lt;/td>
&lt;td style="text-align:center">2.534***&lt;/td>
&lt;td style="text-align:center">6.330***&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Factors&lt;/td>
&lt;td style="text-align:center">$r_x$=2, $r_u$=1&lt;/td>
&lt;td style="text-align:center">0, 0&lt;/td>
&lt;td style="text-align:center">$r_x$=2, $r_u$=1&lt;/td>
&lt;td style="text-align:center">$r_x$=2, $r_u$=1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>J-test p-value&lt;/td>
&lt;td style="text-align:center">0.468&lt;/td>
&lt;td style="text-align:center">0.000&lt;/td>
&lt;td style="text-align:center">0.226&lt;/td>
&lt;td style="text-align:center">&amp;mdash;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Slopes&lt;/td>
&lt;td style="text-align:center">Homogeneous&lt;/td>
&lt;td style="text-align:center">Homogeneous&lt;/td>
&lt;td style="text-align:center">Homogeneous&lt;/td>
&lt;td style="text-align:center">Heterogeneous&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The decision diagram below provides a practical guide for choosing among these specifications.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
START[&amp;quot;&amp;lt;b&amp;gt;Start&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Spatial dynamic panel&amp;lt;br/&amp;gt;with suspected factors&amp;quot;]
JTEST[&amp;quot;&amp;lt;b&amp;gt;J-test&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Estimate with factors&amp;lt;br/&amp;gt;and without factors&amp;quot;]
FACTORS[&amp;quot;&amp;lt;b&amp;gt;Include factors&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;J-test fails without&amp;lt;br/&amp;gt;(p &amp;lt; 0.05)&amp;quot;]
NOFACT[&amp;quot;&amp;lt;b&amp;gt;No factors needed&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;J-test passes without&amp;lt;br/&amp;gt;(p ≥ 0.05)&amp;quot;]
SPLAG[&amp;quot;&amp;lt;b&amp;gt;Spatial lag?&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Is ψ significant?&amp;quot;]
FULL[&amp;quot;&amp;lt;b&amp;gt;Full model&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;spxtivdfreg with&amp;lt;br/&amp;gt;splag + factors&amp;quot;]
NOSPL[&amp;quot;&amp;lt;b&amp;gt;xtivdfreg&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Dynamic panel&amp;lt;br/&amp;gt;with factors only&amp;quot;]
MG[&amp;quot;&amp;lt;b&amp;gt;MG estimator&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Test slope&amp;lt;br/&amp;gt;heterogeneity&amp;quot;]
START --&amp;gt; JTEST
JTEST --&amp;gt;|&amp;quot;J rejects without factors&amp;quot;| FACTORS
JTEST --&amp;gt;|&amp;quot;J passes without factors&amp;quot;| NOFACT
FACTORS --&amp;gt; SPLAG
SPLAG --&amp;gt;|&amp;quot;ψ significant&amp;quot;| FULL
SPLAG --&amp;gt;|&amp;quot;ψ not significant&amp;quot;| NOSPL
FULL --&amp;gt; MG
style START fill:#141413,stroke:#d97757,color:#fff
style JTEST fill:#6a9bcc,stroke:#141413,color:#fff
style FACTORS fill:#00d4c8,stroke:#141413,color:#141413
style NOFACT fill:#d97757,stroke:#141413,color:#fff
style SPLAG fill:#6a9bcc,stroke:#141413,color:#fff
style FULL fill:#00d4c8,stroke:#141413,color:#141413
style NOSPL fill:#d97757,stroke:#141413,color:#fff
style MG fill:#6a9bcc,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>The J-test is the first and most important diagnostic: in our application, it unambiguously rejects the no-factor specification (p &amp;lt; 0.001), confirming that common factors must be included. With factors, the spatial lag is highly significant ($\psi = 0.394$, z = 4.65), supporting the full model. The MG estimator provides a robustness check that reveals potential slope heterogeneity, but its insignificant spatial lag should be interpreted cautiously &amp;mdash; it may indicate genuine absence of spillovers, or it may reflect the difficulty of estimating bank-specific spatial parameters with only 35 time periods.&lt;/p>
&lt;hr>
&lt;h2 id="10-discussion">10. Discussion&lt;/h2>
&lt;h3 id="methodological-implications">Methodological implications&lt;/h3>
&lt;p>The &lt;code>spxtivdfreg&lt;/code> package represents a significant advance in the spatial panel toolkit for Stata. By combining defactored IV estimation with spatial lag modeling, it addresses a long-standing limitation of existing packages: the inability to account for unobserved common factors. The results in this tutorial demonstrate that ignoring common factors leads to three specific problems: (1) inflated temporal persistence ($\rho$ doubling from 0.290 to 0.594), (2) distorted covariate effects (LIQUIDITY falling by 66% from 2.452 to 0.843), and (3) invalid instruments (J-test rejecting at p &amp;lt; 0.001). These are not minor specification issues &amp;mdash; they fundamentally change the economic story that emerges from the analysis.&lt;/p>
&lt;p>Readers who have worked through the companion &lt;a href="https://carlos-mendez.org/post/stata_sp_regression_panel/">spatial panel regression tutorial with &lt;code>xsmle&lt;/code>&lt;/a> may wonder: what would happen if we used &lt;code>xsmle&lt;/code> on this banking dataset? Since &lt;code>xsmle&lt;/code> uses maximum likelihood without common factors, its estimates would resemble the &amp;ldquo;Without factors&amp;rdquo; column in Section 5 &amp;mdash; with temporal persistence inflated to $\rho \approx 0.59$, spatial spillovers compressed to $\psi \approx 0.29$, and the LIQUIDITY effect attenuated by two-thirds. The J-test rejection (p &amp;lt; 0.001) confirms that this ML specification is misspecified. The &lt;code>spxtivdfreg&lt;/code> approach avoids these problems by defactoring the data before estimation.&lt;/p>
&lt;h3 id="empirical-implications">Empirical implications&lt;/h3>
&lt;p>The empirical application reveals that credit risk in US banking operates through multiple interacting channels. The short-run coefficient on LIQUIDITY (2.452) implies that a 10 percentage point increase in the loan-to-deposit ratio increases non-performing loans by about 0.25 percentage points in the current quarter. But the long-run total effect (7.765) is more than three times larger, reflecting the amplification through temporal persistence and spatial contagion. This means that the true cost of excessive lending is far larger than what contemporaneous cross-sectional regressions suggest.&lt;/p>
&lt;p>The common factors that the estimator identifies &amp;mdash; 2 in the regressors and 1 in the error &amp;mdash; capture aggregate forces such as Federal Reserve monetary policy, the collapse of the housing market, and the tightening of interbank lending during the crisis. These factors account for 33.5% of the residual variance, underscoring the importance of modeling macro-financial shocks explicitly rather than assuming they are absorbed by time fixed effects. Traditional two-way fixed effects would capture these factors only if they had &lt;strong>homogeneous&lt;/strong> effects across banks, but the interactive fixed effect structure $\lambda_i' f_t$ allows for &lt;strong>heterogeneous&lt;/strong> loadings &amp;mdash; some banks are more sensitive to interest rate shocks, others to housing market conditions.&lt;/p>
&lt;h3 id="policy-implications">Policy implications&lt;/h3>
&lt;p>For banking regulators, the indirect long-run effects are particularly informative. The total long-run effect of BUFFER on NPL is -0.173, meaning that a system-wide 1 percentage point increase in capital buffers above the 8% minimum would reduce non-performing loans by 0.17 percentage points across the network. This effect is roughly split between the direct channel (banks with more capital absorb losses better) and the indirect channel (their stability reduces contagion to connected banks). This decomposition supports macroprudential policies that target &lt;strong>system-wide&lt;/strong> capital requirements rather than bank-specific ones, since the spillover benefits of higher capital buffers are nearly as large as the direct benefits.&lt;/p>
&lt;hr>
&lt;h2 id="11-summary-and-next-steps">11. Summary and next steps&lt;/h2>
&lt;p>This tutorial demonstrated the complete workflow for estimating spatial dynamic panel models with unobserved common factors in Stata using the &lt;code>spxtivdfreg&lt;/code> package. The key takeaways are:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Common factors are essential.&lt;/strong> The J-test rejects the no-factor model (p &amp;lt; 0.001), and omitting factors inflates temporal persistence from $\rho = 0.290$ to $\rho = 0.594$ &amp;mdash; a doubling that confuses macroeconomic persistence with bank-level credit risk dynamics.&lt;/li>
&lt;li>&lt;strong>Spatial spillovers are economically significant.&lt;/strong> The spatial autoregressive parameter $\psi = 0.394$ implies that a 1 percentage point increase in neighbors' NPL raises a bank&amp;rsquo;s own NPL by 0.39 percentage points. Long-run indirect effects exceed direct effects for most variables.&lt;/li>
&lt;li>&lt;strong>Long-run total effects are large.&lt;/strong> For LIQUIDITY, the total long-run effect is 7.765 &amp;mdash; more than three times the short-run coefficient of 2.452 &amp;mdash; reflecting amplification through both temporal persistence and spatial contagion.&lt;/li>
&lt;li>&lt;strong>Slope heterogeneity matters for interpretation.&lt;/strong> The mean-group estimator drives the spatial lag to insignificance ($\psi = 0.032$, p = 0.536), suggesting that the pooled model&amp;rsquo;s strong spatial spillovers may partly reflect cross-bank heterogeneity in covariate effects.&lt;/li>
&lt;/ul>
&lt;p>For further study, the companion tutorial on &lt;a href="https://carlos-mendez.org/post/stata_sp_regression_panel/">spatial panel regression with xsmle&lt;/a> covers maximum likelihood estimation of static and dynamic spatial panels, including the Spatial Durbin Model with Wald specification tests and the Lee-Yu bias correction. For cross-sectional spatial models, see the &lt;a href="https://carlos-mendez.org/post/stata_sp_regression_cross_section/">cross-sectional spatial regression tutorial&lt;/a>. The original paper by Kripfganz and Sarafidis (2025) provides the full theoretical derivation and Monte Carlo simulations that establish the estimator&amp;rsquo;s properties.&lt;/p>
&lt;hr>
&lt;h2 id="12-exercises">12. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Endogeneity of INEFF.&lt;/strong> The full model treats &lt;code>INEFF&lt;/code> (operational inefficiency) as endogenous and uses &lt;code>INTEREST&lt;/code> (interest expenses / deposits) as an excluded instrument. Re-estimate the model treating &lt;code>INEFF&lt;/code> as exogenous by removing &lt;code>INTEREST&lt;/code> from the &lt;code>iv()&lt;/code> option and adding &lt;code>INEFF&lt;/code> to the exogenous instrument list. Does the coefficient on &lt;code>INEFF&lt;/code> change substantially? What does this tell you about the direction of endogeneity bias?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Alternative factor structure.&lt;/strong> The estimator automatically selects 2 factors in the regressors and 1 in the error. Use the &lt;code>factmax()&lt;/code> option to constrain the maximum number of factors to 1 or 3 and re-estimate the model. Compare the spatial parameter $\psi$, the J-test statistic, and the variance decomposition ($\rho_{factor}$). How sensitive are the results to the assumed number of common factors?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Short-run vs. long-run effects.&lt;/strong> Use &lt;code>estat impact, sr&lt;/code> to compute the short-run direct, indirect, and total effects and compare them to the long-run effects in Table 3. For which variable is the ratio of long-run to short-run total effect the largest? What does this ratio tell you about the relative importance of temporal persistence vs. spatial amplification for that variable?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="references">References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://doi.org/10.18637/jss.v113.i06" target="_blank" rel="noopener">Kripfganz, S. &amp;amp; Sarafidis, V. (2025). Estimating spatial dynamic panel data models with unobserved common factors in Stata. &lt;em>Journal of Statistical Software&lt;/em>, 113(6).&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1177/1536867X211045558" target="_blank" rel="noopener">Kripfganz, S. &amp;amp; Sarafidis, V. (2021). Instrumental-variable estimation of large-T panel-data models with common factors. &lt;em>Stata Journal&lt;/em>, 21(3), 659&amp;ndash;686.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1080/07474938.2011.611458" target="_blank" rel="noopener">Sarafidis, V. &amp;amp; Wansbeek, T. (2012). Cross-sectional dependence in panel data analysis. &lt;em>Econometric Reviews&lt;/em>, 31(5), 483&amp;ndash;531.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1111/j.1468-0262.2006.00692.x" target="_blank" rel="noopener">Pesaran, M. H. (2006). Estimation and inference in large heterogeneous panels with a multifactor error structure. &lt;em>Econometrica&lt;/em>, 74(4), 967&amp;ndash;1012.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://link.springer.com/book/10.1007/978-3-642-40340-8" target="_blank" rel="noopener">Elhorst, J. P. (2014). &lt;em>Spatial Econometrics: From Cross-Sectional Data to Spatial Panels&lt;/em>. Springer.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1177/1536867X1701700109" target="_blank" rel="noopener">Belotti, F., Hughes, G., &amp;amp; Mortari, A. P. (2017). Spatial panel-data models using Stata. &lt;em>Stata Journal&lt;/em>, 17(1), 139&amp;ndash;180.&lt;/a>&lt;/li>
&lt;/ol></description></item><item><title>Difference-in-Differences for Policy Evaluation: A Tutorial using R</title><link>https://carlos-mendez.org/post/r_did/</link><pubDate>Thu, 26 Mar 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/r_did/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>Does raising the minimum wage reduce employment among young workers? This question has been at the center of one of the longest-running debates in labor economics, and the &lt;strong>Difference-in-Differences (DID)&lt;/strong> method has been the primary tool for answering it. In this tutorial, we analyze how state-level minimum wage increases between 2001 and 2007 affected teen employment in the United States &amp;mdash; a period when the federal minimum wage was frozen at \$5.15 per hour, while individual states raised their own minimum wages at different times. This variation in treatment timing creates a natural experiment ideally suited for DID.&lt;/p>
&lt;p>For decades, applied researchers implemented DID using a simple &lt;strong>two-way fixed effects (TWFE)&lt;/strong> regression &amp;mdash; a panel regression with unit and time fixed effects. Recent research has revealed that this approach can produce severely biased estimates when there is &lt;strong>staggered treatment adoption&lt;/strong> (units treated at different times) and &lt;strong>treatment effect heterogeneity&lt;/strong> (effects that vary across groups or over time). The TWFE regression implicitly makes &amp;ldquo;forbidden comparisons&amp;rdquo; that use already-treated units as the comparison group, and it assigns negative weights to some group-time treatment effects. These problems are not theoretical curiosities &amp;mdash; they lead to meaningful differences in empirical estimates.&lt;/p>
&lt;p>This tutorial walks through the complete modern DID workflow. We begin with the traditional TWFE regression and demonstrate its limitations. We then introduce the &lt;strong>Callaway and Sant&amp;rsquo;Anna (2021)&lt;/strong> framework for estimating group-time average treatment effects, $ATT(g,t)$, that cleanly separate identification from estimation. We extend the analysis with covariates using doubly robust estimation, assess the sensitivity of results to violations of parallel trends using &lt;strong>HonestDiD&lt;/strong> (Rambachan and Roth, 2023), and explore how to handle heterogeneous treatment doses across states. The tutorial is based on Callaway&amp;rsquo;s (2022) chapter &amp;ldquo;Difference-in-Differences for Policy Evaluation&amp;rdquo; and the accompanying LSU workshop materials.&lt;/p>
&lt;p>&lt;strong>Learning objectives:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Understand the parallel trends assumption and why TWFE regressions break down with staggered treatment adoption and treatment effect heterogeneity&lt;/li>
&lt;li>Estimate group-time average treatment effects using &lt;code>att_gt()&lt;/code> from the &lt;code>did&lt;/code> package and aggregate them into overall ATTs and event studies&lt;/li>
&lt;li>Diagnose TWFE bias through weight decomposition, identifying negative weights and pre-treatment contamination&lt;/li>
&lt;li>Apply doubly robust estimation with conditional parallel trends and assess robustness to base period and comparison group choices&lt;/li>
&lt;li>Conduct HonestDiD sensitivity analysis to evaluate how robust findings are to violations of parallel trends&lt;/li>
&lt;/ul>
&lt;h2 id="2-setup">2. Setup&lt;/h2>
&lt;pre>&lt;code class="language-r"># Install packages if needed
cran_packages &amp;lt;- c(&amp;quot;did&amp;quot;, &amp;quot;fixest&amp;quot;, &amp;quot;HonestDiD&amp;quot;, &amp;quot;DRDID&amp;quot;, &amp;quot;BMisc&amp;quot;,
&amp;quot;modelsummary&amp;quot;, &amp;quot;ggplot2&amp;quot;, &amp;quot;dplyr&amp;quot;, &amp;quot;pte&amp;quot;)
missing &amp;lt;- cran_packages[!sapply(cran_packages, requireNamespace, quietly = TRUE)]
if (length(missing) &amp;gt; 0) install.packages(missing)
# twfeweights is GitHub-only
if (!requireNamespace(&amp;quot;twfeweights&amp;quot;, quietly = TRUE)) {
remotes::install_github(&amp;quot;bcallaway11/twfeweights&amp;quot;)
}
# pte may also require GitHub install if not on CRAN
if (!requireNamespace(&amp;quot;pte&amp;quot;, quietly = TRUE)) {
remotes::install_github(&amp;quot;bcallaway11/pte&amp;quot;)
}
library(did)
library(fixest)
library(twfeweights)
library(HonestDiD)
library(DRDID)
library(BMisc)
library(modelsummary)
library(ggplot2)
library(dplyr)
&lt;/code>&lt;/pre>
&lt;h2 id="3-data-loading-and-exploration">3. Data Loading and Exploration&lt;/h2>
&lt;p>The dataset comes from Callaway and Sant&amp;rsquo;Anna (2021) and contains county-level panel data on teen employment and state minimum wages across the United States from 2001 to 2007. During this period, the federal minimum wage remained constant at \$5.15 per hour, while several states raised their state-level minimum wages above the federal floor at different points in time. States that raised their minimum wages form the &amp;ldquo;treated&amp;rdquo; groups, identified by the year their first increase took effect. States that never raised their minimum wage above the federal level during this period form the &amp;ldquo;never-treated&amp;rdquo; comparison group.&lt;/p>
&lt;pre>&lt;code class="language-r"># Load data from Callaway's GitHub repository
load(url(&amp;quot;https://github.com/bcallaway11/did_chapter/raw/master/mw_data_ch2.RData&amp;quot;))
# Filter: keep groups 0 (never-treated), 2004, 2006; drop Northeast region
mw_data_ch2 &amp;lt;- subset(mw_data_ch2,
(G %in% c(2004, 2006, 2007, 0)) &amp;amp; (region != &amp;quot;1&amp;quot;))
# Main analysis subset: drop G=2007, keep year &amp;gt;= 2003
data2 &amp;lt;- subset(mw_data_ch2, G != 2007 &amp;amp; year &amp;gt;= 2003)
head(data2[, c(&amp;quot;id&amp;quot;, &amp;quot;year&amp;quot;, &amp;quot;G&amp;quot;, &amp;quot;lemp&amp;quot;, &amp;quot;lpop&amp;quot;, &amp;quot;region&amp;quot;)])
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> id year G lemp lpop region
6 1001 2003 0 5.253534 10.07352 3
7 1001 2004 0 5.288267 10.06966 3
8 1001 2005 0 5.267858 10.06235 3
9 1001 2006 0 5.298317 10.05546 3
10 1001 2007 0 5.232025 10.04953 3
31 1003 2003 0 6.822197 11.16740 3
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-r"># Counties by treatment group
data2 %&amp;gt;%
filter(year == 2003) %&amp;gt;%
group_by(G) %&amp;gt;%
summarise(n_counties = n(), .groups = &amp;quot;drop&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> G n_counties
1 0 1417
2 2004 102
3 2006 226
&lt;/code>&lt;/pre>
&lt;p>The dataset contains 8,725 county-year observations spanning 1,745 counties over five years (2003&amp;ndash;2007). There are two treatment groups: 102 counties in states that first raised their minimum wage in 2004 (G=2004) and 226 counties in states that did so in 2006 (G=2006). The remaining 1,417 counties are in states that kept their minimum wage at the federal level throughout the period and serve as the never-treated comparison group. We drop the G=2007 group (states raising their minimum wage right before the federal increase) to maintain a cleaner analysis window, following the workshop approach.&lt;/p>
&lt;pre>&lt;code class="language-r"># Summary statistics
summary(data2[, c(&amp;quot;lemp&amp;quot;, &amp;quot;lpop&amp;quot;, &amp;quot;lavg_pay&amp;quot;)])
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> lemp lpop lavg_pay
Min. : 1.099 Min. : 6.397 Min. : 9.646
1st Qu.: 4.615 1st Qu.: 9.149 1st Qu.:10.117
Median : 5.517 Median : 9.931 Median :10.225
Mean : 5.594 Mean :10.030 Mean :10.245
3rd Qu.: 6.458 3rd Qu.:10.762 3rd Qu.:10.352
Max. :11.173 Max. :15.492 Max. :11.223
&lt;/code>&lt;/pre>
&lt;p>The outcome variable &lt;code>lemp&lt;/code> is log teen employment, with a mean of 5.59 (corresponding to roughly 270 teen workers per county). The covariates &lt;code>lpop&lt;/code> (log county population, mean 10.03) and &lt;code>lavg_pay&lt;/code> (log average county pay, mean 10.25) capture differences in county size and economic conditions that could affect employment trends. These covariates will become important when we condition the parallel trends assumption on observables in Section 7.&lt;/p>
&lt;h2 id="4-the-basic-did-framework">4. The Basic DID Framework&lt;/h2>
&lt;h3 id="41-did-intuition-and-parallel-trends">4.1 DID Intuition and Parallel Trends&lt;/h3>
&lt;p>The core idea behind Difference-in-Differences is simple: compare how outcomes change over time for the treated group relative to a comparison group. If the treated and comparison groups would have followed &lt;strong>parallel trends&lt;/strong> in the absence of treatment, then any divergence after treatment can be attributed to the treatment itself. Formally, the Average Treatment Effect on the Treated (ATT) is identified as:&lt;/p>
&lt;p>$$ATT = E[\Delta Y_{t^{\ast}} \mid D=1] - E[\Delta Y_{t^{\ast}} \mid D=0]$$&lt;/p>
&lt;p>where $\Delta Y_{t^{\ast}}$ is the change in outcomes from the pre-treatment period to the post-treatment period, $D=1$ indicates treated units, and $D=0$ indicates untreated units. The ATT equals the change in outcomes for the treated group, adjusted by the change in outcomes for the comparison group.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
subgraph &amp;quot;Before Treatment&amp;quot;
A[&amp;quot;Treated Group&amp;lt;br/&amp;gt;Pre-treatment Y&amp;quot;]
B[&amp;quot;Control Group&amp;lt;br/&amp;gt;Pre-treatment Y&amp;quot;]
end
subgraph &amp;quot;After Treatment&amp;quot;
C[&amp;quot;Treated Group&amp;lt;br/&amp;gt;Post-treatment Y&amp;quot;]
D[&amp;quot;Control Group&amp;lt;br/&amp;gt;Post-treatment Y&amp;quot;]
end
A --&amp;gt;|&amp;quot;ΔY treated&amp;quot;| C
B --&amp;gt;|&amp;quot;ΔY control&amp;quot;| D
C -.-&amp;gt;|&amp;quot;ATT = ΔY treated − ΔY control&amp;quot;| E[&amp;quot;Causal Effect&amp;quot;]
style A fill:#d97757,stroke:#141413,color:#fff
style C fill:#d97757,stroke:#141413,color:#fff
style B fill:#6a9bcc,stroke:#141413,color:#fff
style D fill:#6a9bcc,stroke:#141413,color:#fff
style E fill:#00d4c8,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>In the textbook case with exactly two periods and two groups, the TWFE regression $Y_{it} = \theta_t + \eta_i + \alpha D_{it} + v_{it}$ delivers an estimate of $\alpha$ that is numerically identical to the simple DID estimator, even in the presence of treatment effect heterogeneity. Here, $\theta_t$ represents time fixed effects (captured by &lt;code>year&lt;/code> in the regression), $\eta_i$ represents unit fixed effects (captured by &lt;code>id&lt;/code>), $D_{it}$ is the treatment indicator (&lt;code>post&lt;/code>), and $v_{it}$ are idiosyncratic unobservables.&lt;/p>
&lt;p>However, this equivalence breaks down when there are &lt;strong>multiple time periods&lt;/strong> and &lt;strong>variation in treatment timing&lt;/strong>. In our application, states raised their minimum wages at different times (2004 and 2006), creating a staggered treatment adoption design.&lt;/p>
&lt;p>The TWFE regression implicitly makes two types of comparisons: (1) &amp;ldquo;good comparisons&amp;rdquo; that compare treated groups to not-yet-treated groups, and (2) &amp;ldquo;bad comparisons&amp;rdquo; (sometimes called &amp;ldquo;forbidden comparisons&amp;rdquo;) that use already-treated groups as the comparison group. To see why this is problematic, imagine grading a student&amp;rsquo;s improvement by comparing them to classmates who already took the test last week &amp;mdash; those &amp;ldquo;comparison&amp;rdquo; students are themselves affected by the test, so they no longer represent a valid counterfactual. Similarly, already-treated units may themselves be experiencing treatment effects, contaminating the estimate.&lt;/p>
&lt;p>Moreover, under treatment effect heterogeneity, the TWFE coefficient $\alpha$ is a weighted average of underlying group-time treatment effects, and some of these weights can be &lt;strong>negative&lt;/strong>. It is as if you tried to compute an average score but accidentally gave some students a negative weight &amp;mdash; their positive performance would drag the average down. This means TWFE could, in principle, produce a negative estimate even when all true treatment effects are positive.&lt;/p>
&lt;h3 id="42-twfe-regression">4.2 TWFE Regression&lt;/h3>
&lt;p>Let us start with the traditional TWFE approach to establish a baseline estimate.&lt;/p>
&lt;pre>&lt;code class="language-r">twfe_res &amp;lt;- fixest::feols(lemp ~ post | id + year,
data = data2,
cluster = &amp;quot;id&amp;quot;)
summary(twfe_res)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">OLS estimation, Dep. Var.: lemp
Observations: 8,725
Fixed-effects: id: 1,745, year: 5
Standard-errors: Clustered (id)
Estimate Std. Error t value Pr(&amp;gt;|t|)
post -0.03812 0.008489 -4.49036 7.5762e-06 ***
---
RMSE: 0.116264 Adj. R2: 0.9926
Within R2: 0.003711
&lt;/code>&lt;/pre>
&lt;p>The TWFE regression estimates that minimum wage increases reduced log teen employment by 0.038 (SE = 0.008), which is statistically significant. Interpreted naively, this suggests that states raising their minimum wage experienced a 3.8% decline in teen employment relative to states that did not. However, this single coefficient attempts to summarize the entire treatment effect across two different treatment groups, multiple post-treatment periods, and varying lengths of exposure &amp;mdash; a task that, as we will show, is not well-served by TWFE under treatment effect heterogeneity.&lt;/p>
&lt;p>&lt;img src="r_did_01_twfe_event_study.png" alt="TWFE Event Study based on the Sun-Abraham interaction-weighted estimator.">&lt;/p>
&lt;p>The TWFE event study above uses &lt;code>fixest::sunab()&lt;/code> to estimate dynamic treatment effects within the TWFE framework. The coefficients suggest a small pre-trend violation at event time $-3$ and increasingly negative post-treatment effects. While the Sun-Abraham correction improves upon the standard TWFE event study by addressing some of the weighting issues, we will see that the Callaway-Sant&amp;rsquo;Anna approach provides a more principled decomposition of the treatment effect.&lt;/p>
&lt;h2 id="5-group-time-att-the-callaway-santanna-approach">5. Group-Time ATT: The Callaway-Sant&amp;rsquo;Anna Approach&lt;/h2>
&lt;h3 id="51-estimating-attgt">5.1 Estimating ATT(g,t)&lt;/h3>
&lt;p>The Callaway and Sant&amp;rsquo;Anna (2021) framework addresses the limitations of TWFE by working with &lt;strong>group-time average treatment effects&lt;/strong>:&lt;/p>
&lt;p>$$ATT(g,t) = E[Y_t(g) - Y_t(0) \mid G = g]$$&lt;/p>
&lt;p>where $Y_t(g)$ is the potential outcome at time $t$ if first treated in period $g$, $Y_t(0)$ is the untreated potential outcome, and $G = g$ identifies units in treatment group $g$. In words, $ATT(g,t)$ is the average treatment effect for units first treated in period $g$, measured at time $t$. These building-block parameters are identified under the parallel trends assumption using clean comparisons: each treated group is compared only to units that are never treated (or not yet treated), avoiding the forbidden comparisons that plague TWFE.&lt;/p>
&lt;pre>&lt;code class="language-r">attgt &amp;lt;- did::att_gt(yname = &amp;quot;lemp&amp;quot;,
idname = &amp;quot;id&amp;quot;,
gname = &amp;quot;G&amp;quot;,
tname = &amp;quot;year&amp;quot;,
data = data2,
control_group = &amp;quot;nevertreated&amp;quot;,
base_period = &amp;quot;universal&amp;quot;)
tidy(attgt)[, 1:5]
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> term group time estimate std.error
ATT(2004,2003) 2004 2003 0.00000000 NA
ATT(2004,2004) 2004 2004 -0.03266653 0.02149279
ATT(2004,2005) 2004 2005 -0.06827991 0.02098524
ATT(2004,2006) 2004 2006 -0.12335404 0.02089502
ATT(2004,2007) 2004 2007 -0.13109136 0.02326712
ATT(2006,2003) 2006 2003 -0.03408910 0.01165128
ATT(2006,2004) 2006 2004 -0.01669977 0.00817406
ATT(2006,2005) 2006 2005 0.00000000 NA
ATT(2006,2006) 2006 2006 -0.01939335 0.00892409
ATT(2006,2007) 2006 2007 -0.06607568 0.00965073
&lt;/code>&lt;/pre>
&lt;p>The &lt;code>att_gt()&lt;/code> function estimates each $ATT(g,t)$ separately. For the G=2004 group, the treatment effect grows over time: $-0.033$ on impact (2004), $-0.068$ one year later (2005), $-0.123$ two years later (2006), and $-0.131$ three years later (2007). This pattern suggests &lt;strong>treatment effect dynamics&lt;/strong> &amp;mdash; the negative employment effect of minimum wage increases deepens with longer exposure. For the G=2006 group, the on-impact effect is smaller ($-0.019$) and grows to $-0.066$ after one year. The pre-treatment estimates for G=2006 show a concerning value of $-0.034$ at event time $-3$ (year 2003), suggesting a possible violation of the parallel trends assumption for this group &amp;mdash; a point we will revisit in the sensitivity analysis.&lt;/p>
&lt;p>&lt;img src="r_did_02_attgt.png" alt="Group-time average treatment effects for each treatment cohort, estimated with the Callaway-Sant&amp;rsquo;Anna method.">&lt;/p>
&lt;h3 id="52-aggregation-overall-att-and-event-study">5.2 Aggregation: Overall ATT and Event Study&lt;/h3>
&lt;p>Group-time ATTs are informative but numerous. The &lt;code>aggte()&lt;/code> function aggregates them into summary parameters. The &lt;strong>overall ATT&lt;/strong> weights each $ATT(g,t)$ by the group size and the number of post-treatment periods:&lt;/p>
&lt;pre>&lt;code class="language-r">attO &amp;lt;- did::aggte(attgt, type = &amp;quot;group&amp;quot;)
summary(attO)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Overall summary of ATT's based on group/cohort aggregation:
ATT Std. Error [ 95% Conf. Int.]
-0.0571 0.008 -0.0727 -0.0415 *
Group Effects:
Group Estimate Std. Error [95% Simult. Conf. Band]
2004 -0.0888 0.0197 -0.1309 -0.0468 *
2006 -0.0427 0.0083 -0.0604 -0.0251 *
&lt;/code>&lt;/pre>
&lt;p>The overall ATT is $-0.057$ (SE = 0.008), substantially larger in magnitude than the TWFE estimate of $-0.038$. The Callaway-Sant&amp;rsquo;Anna framework reveals that TWFE &lt;strong>understated&lt;/strong> the negative employment effect by about one-third. The group-level results show that the G=2004 group experienced a larger average effect ($-0.089$) than the G=2006 group ($-0.043$), which makes sense because the G=2004 group has been treated for more periods and thus accumulates more treatment effect dynamics.&lt;/p>
&lt;p>The &lt;strong>event study&lt;/strong> aggregation is equally informative:&lt;/p>
&lt;pre>&lt;code class="language-r">attes &amp;lt;- did::aggte(attgt, type = &amp;quot;dynamic&amp;quot;)
summary(attes)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Overall summary of ATT's based on event-study/dynamic aggregation:
ATT Std. Error [ 95% Conf. Int.]
-0.0862 0.0124 -0.1106 -0.0618 *
Dynamic Effects:
Event time Estimate Std. Error [95% Simult. Conf. Band]
-3 -0.0341 0.0119 -0.0623 -0.0059 *
-2 -0.0167 0.0076 -0.0348 0.0014
-1 0.0000 NA NA NA
0 -0.0235 0.0081 -0.0426 -0.0044 *
1 -0.0668 0.0086 -0.0870 -0.0465 *
2 -0.1234 0.0203 -0.1714 -0.0753 *
3 -0.1311 0.0230 -0.1855 -0.0767 *
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_did_03_cs_event_study.png" alt="Event study aggregation of group-time ATTs showing the trajectory of treatment effects relative to the treatment year.">&lt;/p>
&lt;p>The event study reveals a clear pattern: the on-impact effect at $e=0$ is $-0.024$, growing to $-0.067$ at $e=1$, $-0.123$ at $e=2$, and $-0.131$ at $e=3$. The post-treatment effects are all statistically significant and increasingly negative, consistent with the minimum wage having a cumulative negative effect on teen employment over time. However, the pre-trend at $e=-3$ is $-0.034$ and marginally significant, which raises a flag about the validity of the parallel trends assumption. The pre-trend at $e=-2$ is smaller ($-0.017$) and not significant. We will formally assess the robustness of these results to parallel trends violations using HonestDiD in Section 8.&lt;/p>
&lt;h3 id="53-twfe-weight-decomposition">5.3 TWFE Weight Decomposition&lt;/h3>
&lt;p>Why does TWFE produce a different estimate than Callaway-Sant&amp;rsquo;Anna? Both the TWFE coefficient and the overall $ATT^O$ can be written as weighted averages of the same underlying $ATT(g,t)$ values:&lt;/p>
&lt;p>$$ATT^O = \sum_{g,t} w^O(g,t) \cdot ATT(g,t)$$&lt;/p>
&lt;p>The difference lies in the weights. The proper $ATT^O$ weights reflect group size and number of post-treatment periods, while the TWFE weights are driven by the estimation method and can assign nonzero weight to pre-treatment periods or even negative weight to some post-treatment cells. The &lt;code>twfeweights&lt;/code> package makes these weights explicit.&lt;/p>
&lt;pre>&lt;code class="language-r">tw_obj &amp;lt;- twfeweights::twfe_weights(attgt)
tw &amp;lt;- tw_obj$weights_df
wO_obj &amp;lt;- twfeweights::attO_weights(attgt)
wO &amp;lt;- wO_obj$weights_df
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">TWFE estimate from weights: -0.0381
ATT^O estimate from weights: -0.0571
TWFE post-treatment component: -0.0503
Pre-treatment contamination: 0.0122
Total TWFE bias: 0.019
Fraction of bias from pre-treatment: 0.6422
Fraction of bias from post-treatment weighting: 0.3578
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_did_04_twfe_weights.png" alt="TWFE weight scatter plot showing how each group-time ATT is weighted. Circles are TWFE weights; teal diamonds are the proper ATT-O weights for post-treatment cells.">&lt;/p>
&lt;p>The weight decomposition is revealing. The TWFE estimate ($-0.038$) differs from the proper overall ATT ($-0.057$) by a total bias of $0.019$ &amp;mdash; meaning TWFE attenuates the negative employment effect toward zero. Of this bias, &lt;strong>64.2%&lt;/strong> comes from pre-treatment contamination: the TWFE regression assigns nonzero weights to pre-treatment $ATT(g,t)$ values, which should receive zero weight in any proper treatment effect parameter. The remaining &lt;strong>35.8%&lt;/strong> of the bias comes from TWFE assigning different post-treatment weights than the proper $ATT^O$ weights. The figure shows this visually: the orange pre-treatment dots receive nonzero TWFE weights (horizontal position), and the post-treatment TWFE weights (blue circles) differ systematically from the proper $ATT^O$ weights (teal diamonds).&lt;/p>
&lt;h2 id="6-relaxing-parallel-trends">6. Relaxing Parallel Trends&lt;/h2>
&lt;h3 id="61-conditional-parallel-trends-with-covariates">6.1 Conditional Parallel Trends with Covariates&lt;/h3>
&lt;p>The unconditional parallel trends assumption may be too strong if treatment and comparison groups differ on observable characteristics that affect outcome trends. For example, states that raised their minimum wages may have larger populations or higher average pay levels, and these characteristics could correlate with employment trends even absent the minimum wage change. &lt;strong>Conditional parallel trends&lt;/strong> weakens the assumption: trends need only be parallel after conditioning on covariates. The &lt;code>did&lt;/code> package offers three estimation methods for this setting. Regression adjustment models the outcome as a function of covariates; inverse probability weighting (IPW) reweights the comparison group to match the treated group&amp;rsquo;s covariate distribution; and the &lt;strong>doubly robust&lt;/strong> (DR) estimator combines both approaches, remaining consistent if either the outcome model or the propensity score model is correctly specified &amp;mdash; like wearing both a belt and suspenders.&lt;/p>
&lt;pre>&lt;code class="language-r"># Regression adjustment
cs_reg &amp;lt;- att_gt(yname = &amp;quot;lemp&amp;quot;, tname = &amp;quot;year&amp;quot;, idname = &amp;quot;id&amp;quot;, gname = &amp;quot;G&amp;quot;,
xformla = ~lpop + lavg_pay,
control_group = &amp;quot;nevertreated&amp;quot;, base_period = &amp;quot;universal&amp;quot;,
est_method = &amp;quot;reg&amp;quot;, data = data2)
attO_reg &amp;lt;- aggte(cs_reg, type = &amp;quot;group&amp;quot;)
# Inverse probability weighting
cs_ipw &amp;lt;- att_gt(yname = &amp;quot;lemp&amp;quot;, tname = &amp;quot;year&amp;quot;, idname = &amp;quot;id&amp;quot;, gname = &amp;quot;G&amp;quot;,
xformla = ~lpop + lavg_pay,
control_group = &amp;quot;nevertreated&amp;quot;, base_period = &amp;quot;universal&amp;quot;,
est_method = &amp;quot;ipw&amp;quot;, data = data2)
attO_ipw &amp;lt;- aggte(cs_ipw, type = &amp;quot;group&amp;quot;)
# Doubly robust
cs_dr &amp;lt;- att_gt(yname = &amp;quot;lemp&amp;quot;, tname = &amp;quot;year&amp;quot;, idname = &amp;quot;id&amp;quot;, gname = &amp;quot;G&amp;quot;,
xformla = ~lpop + lavg_pay,
control_group = &amp;quot;nevertreated&amp;quot;, base_period = &amp;quot;universal&amp;quot;,
est_method = &amp;quot;dr&amp;quot;, data = data2)
attO_dr &amp;lt;- aggte(cs_dr, type = &amp;quot;group&amp;quot;)
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Method&lt;/th>
&lt;th>Overall ATT&lt;/th>
&lt;th>SE&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Unconditional&lt;/td>
&lt;td>$-0.057$&lt;/td>
&lt;td>0.008&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Regression adj.&lt;/td>
&lt;td>$-0.064$&lt;/td>
&lt;td>0.008&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>IPW&lt;/td>
&lt;td>$-0.065$&lt;/td>
&lt;td>0.008&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Doubly robust&lt;/td>
&lt;td>$-0.065$&lt;/td>
&lt;td>0.008&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Controlling for log population and log average pay increases the estimated negative employment effect from $-0.057$ to approximately $-0.065$ across all three conditional methods. The three estimation methods produce nearly identical estimates, which is reassuring. The fact that all three methods agree suggests that covariate adjustment is not introducing model-dependence artifacts.&lt;/p>
&lt;p>&lt;img src="r_did_05_dr_event_study.png" alt="Event study from the doubly robust estimator conditioning on log population and log average pay.">&lt;/p>
&lt;p>The doubly robust event study shows the same qualitative pattern as the unconditional analysis: near-zero pre-trends (the pre-trend at $e=-3$ shrinks from $-0.034$ to $-0.022$ and is no longer significant) and increasingly negative post-treatment effects ($-0.027$ at $e=0$, $-0.077$ at $e=1$, $-0.135$ at $e=2$, $-0.147$ at $e=3$). The improved pre-trend behavior after conditioning on covariates suggests that some of the apparent pre-trend violations in the unconditional analysis were driven by differences in county characteristics between treatment and comparison groups.&lt;/p>
&lt;h3 id="62-robustness-base-period-comparison-group-and-anticipation">6.2 Robustness: Base Period, Comparison Group, and Anticipation&lt;/h3>
&lt;p>The Callaway-Sant&amp;rsquo;Anna framework allows the researcher to make several important choices. We now check that our results are robust to these choices.&lt;/p>
&lt;p>&lt;strong>Varying base period:&lt;/strong> Instead of comparing all pre-treatment and post-treatment periods to a single universal base period ($t = g-1$), we can use a varying base period that compares each period $t$ to period $t-1$.&lt;/p>
&lt;pre>&lt;code class="language-r">cs_varying &amp;lt;- att_gt(yname = &amp;quot;lemp&amp;quot;, tname = &amp;quot;year&amp;quot;, idname = &amp;quot;id&amp;quot;, gname = &amp;quot;G&amp;quot;,
xformla = ~lpop + lavg_pay,
control_group = &amp;quot;nevertreated&amp;quot;, base_period = &amp;quot;varying&amp;quot;,
est_method = &amp;quot;dr&amp;quot;, data = data2)
attO_varying &amp;lt;- aggte(cs_varying, type = &amp;quot;group&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Varying base period ATT^O: -0.0646 (SE: 0.0081)
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Not-yet-treated comparison group:&lt;/strong> Instead of using only the never-treated group as the comparison, we can also include units that are not yet treated at time $t$.&lt;/p>
&lt;pre>&lt;code class="language-r">cs_nyt &amp;lt;- att_gt(yname = &amp;quot;lemp&amp;quot;, tname = &amp;quot;year&amp;quot;, idname = &amp;quot;id&amp;quot;, gname = &amp;quot;G&amp;quot;,
xformla = ~lpop + lavg_pay,
control_group = &amp;quot;notyettreated&amp;quot;, base_period = &amp;quot;universal&amp;quot;,
est_method = &amp;quot;dr&amp;quot;, data = data2)
attO_nyt &amp;lt;- aggte(cs_nyt, type = &amp;quot;group&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Not-yet-treated ATT^O: -0.0649 (SE: 0.008)
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Anticipation:&lt;/strong> If states announced their minimum wage increases before they took effect, workers and firms might adjust their behavior in anticipation. We allow for one period of anticipation by setting &lt;code>anticipation = 1&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-r">cs_antic &amp;lt;- att_gt(yname = &amp;quot;lemp&amp;quot;, tname = &amp;quot;year&amp;quot;, idname = &amp;quot;id&amp;quot;, gname = &amp;quot;G&amp;quot;,
xformla = ~lpop + lavg_pay,
control_group = &amp;quot;nevertreated&amp;quot;, base_period = &amp;quot;universal&amp;quot;,
est_method = &amp;quot;dr&amp;quot;, anticipation = 1, data = data2)
attO_antic &amp;lt;- aggte(cs_antic, type = &amp;quot;group&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">With anticipation (1 period) ATT^O: -0.0396 (SE: 0.0098)
&lt;/code>&lt;/pre>
&lt;p>The results are reassuringly stable across specifications. Switching to a varying base period ($-0.065$) or using the not-yet-treated comparison group ($-0.065$) produces virtually identical estimates to our baseline doubly robust result ($-0.065$). Allowing for one period of anticipation reduces the estimated ATT to $-0.040$ (SE = 0.010), which makes sense &amp;mdash; if some of the treatment effect occurs before the official implementation date, excluding that period from post-treatment narrows the estimated effect. The consistency across the first three specifications gives us confidence that the main findings are not driven by specific methodological choices.&lt;/p>
&lt;h2 id="7-sensitivity-analysis-when-parallel-trends-may-fail">7. Sensitivity Analysis: When Parallel Trends May Fail&lt;/h2>
&lt;p>Even after conditioning on covariates, the parallel trends assumption is not directly testable &amp;mdash; pre-trends being close to zero is necessary but not sufficient for parallel trends to hold in post-treatment periods. The &lt;strong>HonestDiD&lt;/strong> approach of Rambachan and Roth (2023) provides a principled sensitivity analysis: it asks how large violations of parallel trends can be before the post-treatment results break down. The &amp;ldquo;relative magnitude&amp;rdquo; variant compares the size of potential post-treatment violations to the observed size of pre-treatment deviations from parallel trends.&lt;/p>
&lt;p>The &lt;code>HonestDiD&lt;/code> package requires a small helper function to interface with the &lt;code>did&lt;/code> package&amp;rsquo;s event study objects. This helper (available in the companion R script and in &lt;a href="https://github.com/bcallaway11/did_chapter" target="_blank" rel="noopener">Callaway&amp;rsquo;s workshop materials&lt;/a>) extracts the influence function (a statistical tool for computing standard errors in complex estimators) and variance-covariance matrix from the event study, then passes them to &lt;code>HonestDiD&lt;/code>&amp;rsquo;s sensitivity routines. The parameter $\bar{M}$ bounds the ratio of the maximum post-treatment deviation from parallel trends to the maximum pre-treatment deviation &amp;mdash; in other words, it is a stress test asking &amp;ldquo;how much worse can things get after treatment compared to what we already see before treatment?&amp;rdquo;&lt;/p>
&lt;pre>&lt;code class="language-r"># Helper function from Callaway's workshop (references/honest_did.R)
# Bridges the did package's AGGTEobj to HonestDiD's sensitivity functions
source(&amp;quot;references/honest_did.R&amp;quot;)
attgt_hd &amp;lt;- did::att_gt(yname = &amp;quot;lemp&amp;quot;, idname = &amp;quot;id&amp;quot;, gname = &amp;quot;G&amp;quot;,
tname = &amp;quot;year&amp;quot;, data = data2,
control_group = &amp;quot;nevertreated&amp;quot;,
base_period = &amp;quot;universal&amp;quot;)
cs_es_hd &amp;lt;- aggte(attgt_hd, type = &amp;quot;dynamic&amp;quot;)
hd_rm &amp;lt;- honest_did(es = cs_es_hd, e = 0, type = &amp;quot;relative_magnitude&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Original CI: [-0.0404, -0.0066]
Robust CIs:
lb ub Mbar
-0.0401 -0.00871 0.000
-0.0435 -0.00523 0.222
-0.0470 -0.00174 0.444
-0.0505 0.00523 0.667
-0.0575 0.01220 0.889
-0.0644 0.01920 1.111
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_did_06_honestdid.png" alt="HonestDiD sensitivity analysis showing how the confidence interval for the on-impact effect widens as the allowed magnitude of parallel trends violations increases.">&lt;/p>
&lt;p>The sensitivity analysis reveals that the on-impact effect ($e=0$) is robust to moderate violations of parallel trends, but not to large ones. The original 95% confidence interval is $[-0.040, -0.007]$, comfortably below zero. As $\bar{M}$ increases &amp;mdash; meaning we allow post-treatment violations of parallel trends to be larger relative to pre-treatment violations &amp;mdash; the confidence interval widens. The &lt;strong>breakdown point&lt;/strong> is at $\bar{M} \approx 0.67$: if post-treatment violations are no more than about 67% as large as the pre-treatment deviations from parallel trends, the negative employment effect remains statistically significant. Beyond that threshold, the confidence interval includes zero and we can no longer rule out a null effect. Given the moderate pre-trend violations we observed (especially at $e=-3$), this suggests that the results should be interpreted with some caution &amp;mdash; the evidence is suggestive of a negative employment effect, but it is not bulletproof.&lt;/p>
&lt;h2 id="8-more-complicated-treatment-regimes">8. More Complicated Treatment Regimes&lt;/h2>
&lt;h3 id="81-heterogeneous-treatment-doses">8.1 Heterogeneous Treatment Doses&lt;/h3>
&lt;p>So far, we have treated all minimum wage increases as a binary &amp;ldquo;treated or not&amp;rdquo; event. But states raised their minimum wages by very different amounts &amp;mdash; some by as little as \$0.10 above the federal floor, others by over \$1.00. A \$0.25 increase and a \$1.70 increase should not be expected to have the same employment effect. To account for this, we can normalize the treatment effect by the size of the minimum wage increase, computing an &lt;strong>ATT per dollar&lt;/strong>.&lt;/p>
&lt;pre>&lt;code class="language-r"># Use full data including G=2007 for more treated states
data3 &amp;lt;- subset(mw_data_ch2, year &amp;gt;= 2003)
treated_state_list &amp;lt;- unique(subset(data3, G != 0)$state_name)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_did_07_state_mw.png" alt="Minimum wage trajectories showing the heterogeneous timing and magnitude of state minimum wage increases above the federal floor.">&lt;/p>
&lt;p>The figure reveals substantial variation across states. Illinois raised its minimum wage early (2004) and by a relatively large amount, while Florida and Colorado made smaller increases later. This heterogeneity in treatment dose motivates the per-dollar normalization.&lt;/p>
&lt;h3 id="82-att-per-dollar-event-study">8.2 ATT Per Dollar Event Study&lt;/h3>
&lt;p>We compute state-specific ATTs using the doubly robust panel DID estimator from the &lt;code>DRDID&lt;/code> package, then divide each by the size of the minimum wage increase above the federal level.&lt;/p>
&lt;pre>&lt;code class="language-r"># For each treated state and post-treatment period, compute ATT
# using the doubly robust panel estimator, then normalize by dose
for (state in treated_state_list) {
g &amp;lt;- unique(subset(data3, state_name == state)$G)
for (period in 2004:2007) {
Y1 &amp;lt;- c(subset(data3, state_name == state &amp;amp; year == period)$lemp,
subset(data3, G == 0 &amp;amp; year == period)$lemp)
Y0 &amp;lt;- c(subset(data3, state_name == state &amp;amp; year == g - 1)$lemp,
subset(data3, G == 0 &amp;amp; year == g - 1)$lemp)
D &amp;lt;- c(rep(1, sum(data3$state_name == state &amp;amp; data3$year == period)),
rep(0, sum(data3$G == 0 &amp;amp; data3$year == period)))
attst &amp;lt;- DRDID::drdid_panel(Y1, Y0, D, covariates = NULL)
treat_amount &amp;lt;- unique(subset(data3, state_name == state &amp;amp;
year == period)$state_mw) - 5.15
att_per_dollar &amp;lt;- attst$ATT / treat_amount
}
}
# Note: this is a simplified excerpt. See analysis.R for the full
# implementation with result storage, event study aggregation, and plots.
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Overall ATT per dollar: -0.0297 (SE: 0.0155)
Event study ATT per dollar:
event_time att se ci_lower ci_upper
0 -0.028 0.020 -0.066 0.010
1 -0.055 0.012 -0.079 -0.031
2 -0.091 0.015 -0.120 -0.062
3 -0.097 0.017 -0.130 -0.064
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_did_08_att_per_dollar.png" alt="Event study of treatment effects normalized by the dollar amount of the minimum wage increase, showing the employment response per dollar of additional minimum wage.">&lt;/p>
&lt;p>The dose-normalized results tell a consistent story. The on-impact effect per dollar is $-0.028$ (not quite significant at the 5% level), but the effect grows substantially with exposure: $-0.055$ after one year, $-0.091$ after two years, and $-0.097$ after three years. These per-dollar estimates imply that a \$1 increase in the minimum wage is associated with a decline of 0.055 log points in teen employment after one year (approximately 5.3%) and 0.097 log points after three years (approximately 9.2%). The post-treatment estimates from $e=1$ onward are all statistically significant. The overall ATT per dollar of $-0.030$ (SE = 0.016) averages across all post-treatment periods, but the event study makes clear that the cumulative effects are substantially larger.&lt;/p>
&lt;h2 id="9-alternative-identification-strategies">9. Alternative Identification Strategies&lt;/h2>
&lt;p>The DID framework relies on the parallel trends assumption. Alternative identification strategies relax this assumption in different ways. The &lt;code>pte&lt;/code> package implements a &lt;strong>lagged outcomes&lt;/strong> strategy, which conditions on lagged outcome values rather than assuming parallel trends. Instead of assuming that treated and untreated groups would have followed the same trend, this approach assumes that controlling for the previous period&amp;rsquo;s outcome level makes treatment assignment as good as random &amp;mdash; counties with the same employment level last year are equally likely to be in a state that raised its minimum wage, regardless of which state they are in.&lt;/p>
&lt;pre>&lt;code class="language-r">library(pte)
data2_lo &amp;lt;- data2
data2_lo$G2 &amp;lt;- data2_lo$G
lo_res &amp;lt;- pte::pte_default(yname = &amp;quot;lemp&amp;quot;, tname = &amp;quot;year&amp;quot;, idname = &amp;quot;id&amp;quot;,
gname = &amp;quot;G2&amp;quot;, data = data2_lo,
d_outcome = FALSE, lagged_outcome_cov = TRUE)
summary(lo_res)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Overall ATT: -0.061 (SE: 0.008, 95% CI: [-0.077, -0.045])
Dynamic Effects:
Event Time Estimate Std. Error [95% Conf. Band]
-2 0.014 0.008 -0.010 0.038
-1 0.010 0.007 -0.009 0.030
0 -0.024 0.009 -0.049 0.000
1 -0.074 0.008 -0.097 -0.050 *
2 -0.129 0.019 -0.185 -0.073 *
3 -0.140 0.023 -0.206 -0.074 *
&lt;/code>&lt;/pre>
&lt;p>The lagged outcomes strategy produces an overall ATT of $-0.061$ (SE = 0.008), very close to the DID estimates with covariates ($-0.065$). The pre-trends under this alternative identification strategy are close to zero (0.014 at $e=-2$ and 0.010 at $e=-1$, both insignificant), and the post-treatment trajectory ($-0.024$ on impact, $-0.074$ at $e=1$, $-0.129$ at $e=2$, $-0.140$ at $e=3$) closely mirrors the DID event study. The convergence of results across different identification strategies strengthens the case that the estimated negative employment effects are reflecting a genuine causal relationship rather than an artifact of any particular set of assumptions.&lt;/p>
&lt;h2 id="10-discussion-and-takeaways">10. Discussion and Takeaways&lt;/h2>
&lt;p>This tutorial demonstrates why &lt;strong>TWFE regressions are unreliable&lt;/strong> with staggered treatment adoption and treatment effect heterogeneity, and how modern DID methods provide a principled alternative. The TWFE coefficient of $-0.038$ understates the true overall ATT of $-0.057$ by about one-third, with the bias driven primarily by pre-treatment contamination (64% of the total bias) and improper post-treatment weighting (36%). The Callaway-Sant&amp;rsquo;Anna framework cleanly separates identification from estimation by first computing group-time ATTs and then aggregating them into target parameters of interest.&lt;/p>
&lt;p>The substantive findings suggest that state-level minimum wage increases above the federal floor reduced teen employment, with effects that grew over time. The doubly robust estimator with covariates yields an overall ATT of $-0.065$ (SE = 0.008), and the dose-normalized analysis finds effects of approximately $-0.055$ per dollar after one year and $-0.097$ per dollar after three years. These results are robust across estimation methods (regression adjustment, IPW, doubly robust), comparison group definitions (never-treated, not-yet-treated), and base period choices (universal, varying).&lt;/p>
&lt;p>However, the results come with important caveats. The HonestDiD sensitivity analysis shows that the on-impact effect loses statistical significance when post-treatment parallel trends violations exceed about 67% of the pre-treatment deviations. The pre-treatment coefficient at $e=-3$ is moderately significant in the unconditional analysis, though it shrinks after covariate adjustment. These patterns suggest that while the evidence points toward negative employment effects, the magnitude should be interpreted with some caution. As Callaway (2022) notes, this application is primarily intended to illustrate the methodology rather than to settle the minimum wage debate.&lt;/p>
&lt;p>The modern DID toolkit demonstrated here &amp;mdash; &lt;code>did&lt;/code> for group-time ATTs, &lt;code>twfeweights&lt;/code> for diagnosing TWFE problems, &lt;code>HonestDiD&lt;/code> for sensitivity analysis, and &lt;code>DRDID&lt;/code> for doubly robust estimation &amp;mdash; provides applied researchers with a complete workflow for credible causal inference in staggered treatment settings. The key lesson is that DID is not just a regression &amp;mdash; it is an identification strategy that requires careful attention to the structure of the treatment, the comparison group, and the plausibility of the underlying assumptions.&lt;/p>
&lt;p>&lt;strong>Key takeaways:&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>TWFE understates the true ATT by ~33% ($-0.038$ vs $-0.057$), with 64% of the bias from pre-treatment contamination and 36% from improper post-treatment weighting&lt;/li>
&lt;li>The doubly robust ATT of $-0.065$ is stable across estimation methods (regression, IPW, DR), comparison groups (never-treated, not-yet-treated), and base periods (universal, varying)&lt;/li>
&lt;li>Employment effects accumulate over time: $-0.027$ on impact, growing to $-0.147$ after three years under the doubly robust specification&lt;/li>
&lt;li>The on-impact effect is robust to parallel trends violations up to 67% of pre-trend magnitude ($\bar{M} \approx 0.67$), but not beyond&lt;/li>
&lt;li>Per-dollar normalization reveals that a \$1 minimum wage increase reduces teen employment by approximately 5.3% after one year and 9.2% after three years&lt;/li>
&lt;/ol>
&lt;h2 id="11-exercises">11. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Expand the sample:&lt;/strong> Re-run the analysis using &lt;code>data3&lt;/code> (which includes the G=2007 group) and compare the results. Does including the additional treatment group change the overall ATT or the event study pattern?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Alternative covariates:&lt;/strong> Experiment with different covariate specifications in the doubly robust estimator. What happens if you include only &lt;code>lpop&lt;/code>? Only &lt;code>lavg_pay&lt;/code>? Does the choice of covariates meaningfully affect the pre-trends?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Smoothness sensitivity:&lt;/strong> Run the HonestDiD smoothness-based sensitivity analysis (&lt;code>type = &amp;quot;smoothness&amp;quot;&lt;/code>) in addition to the relative magnitude analysis. How do the two approaches compare in terms of the robustness of the results?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="12-references">12. References&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>Callaway, B. (2022). Difference-in-Differences for Policy Evaluation. In &lt;em>Handbook of Labor, Human Resources, and Population Economics&lt;/em>. Springer. &lt;a href="https://link.springer.com/referenceworkentry/10.1007/978-3-319-57365-6_352-1" target="_blank" rel="noopener">Published version&lt;/a> | &lt;a href="https://bcallaway11.github.io/files/Callaway-Chapter-2022/main.pdf" target="_blank" rel="noopener">Working paper&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Callaway, B. and Sant&amp;rsquo;Anna, P.H.C. (2021). Difference-in-Differences with Multiple Time Periods. &lt;em>Journal of Econometrics&lt;/em>, 225(2), 200&amp;ndash;230. &lt;a href="https://doi.org/10.1016/j.jeconom.2020.12.001" target="_blank" rel="noopener">doi:10.1016/j.jeconom.2020.12.001&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. &lt;em>Journal of Econometrics&lt;/em>, 225(2), 254&amp;ndash;277. &lt;a href="https://doi.org/10.1016/j.jeconom.2021.03.014" target="_blank" rel="noopener">doi:10.1016/j.jeconom.2021.03.014&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Rambachan, A. and Roth, J. (2023). A More Credible Approach to Parallel Trends. &lt;em>Review of Economic Studies&lt;/em>, 90(5), 2555&amp;ndash;2591. &lt;a href="https://doi.org/10.1093/restud/rdad018" target="_blank" rel="noopener">doi:10.1093/restud/rdad018&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>de Chaisemartin, C. and D&amp;rsquo;Haultfoeuille, X. (2020). Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects. &lt;em>American Economic Review&lt;/em>, 110(9), 2964&amp;ndash;2996.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Sun, L. and Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. &lt;em>Journal of Econometrics&lt;/em>, 225(2), 175&amp;ndash;199.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>did&lt;/code> package: &lt;a href="https://cran.r-project.org/package=did" target="_blank" rel="noopener">CRAN&lt;/a> | &lt;a href="https://github.com/bcallaway11/did" target="_blank" rel="noopener">GitHub&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>fixest&lt;/code> package: &lt;a href="https://cran.r-project.org/package=fixest" target="_blank" rel="noopener">CRAN&lt;/a> | &lt;a href="https://lrberge.github.io/fixest/" target="_blank" rel="noopener">Documentation&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>twfeweights&lt;/code> package: &lt;a href="https://github.com/bcallaway11/twfeweights" target="_blank" rel="noopener">GitHub&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>HonestDiD&lt;/code> package: &lt;a href="https://cran.r-project.org/package=HonestDiD" target="_blank" rel="noopener">CRAN&lt;/a> | &lt;a href="https://github.com/asheshrambachan/HonestDiD" target="_blank" rel="noopener">GitHub&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol></description></item><item><title>Evaluating a Cash Transfer Program (RCT) with Panel Data in Stata</title><link>https://carlos-mendez.org/post/stata_rct/</link><pubDate>Tue, 24 Mar 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/stata_rct/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>Cash transfer programs are among the most common development interventions worldwide. Governments and international organizations spend billions of dollars each year providing direct cash transfers to low-income households. But how do we rigorously evaluate whether these programs actually work? This tutorial walks through the complete workflow of analyzing a &lt;strong>randomized controlled trial (RCT)&lt;/strong> with &lt;strong>panel data&lt;/strong> in Stata &amp;mdash; from verifying that randomization succeeded, to estimating treatment effects using increasingly sophisticated methods, to comparing results across all approaches.&lt;/p>
&lt;p>We use simulated data from a hypothetical cash transfer program targeting 2,000 households in a developing country. The key advantage of simulated data is that we know the &lt;strong>true treatment effect&lt;/strong> before we begin: the program increases household consumption by &lt;strong>12%&lt;/strong> (0.12 log points). This known ground truth gives us a perfect benchmark to evaluate how well each econometric method recovers the correct answer.&lt;/p>
&lt;p>The tutorial progresses from simple to sophisticated. We start with basic balance checks, then estimate treatment effects three different ways using only endline data &amp;mdash; regression adjustment (RA), inverse probability weighting (IPW), and doubly robust (DR) methods. Next, we unlock the full power of panel data with difference-in-differences (DiD) and its doubly robust extension (DRDID). Finally, we address the real-world complication of imperfect compliance.&lt;/p>
&lt;h3 id="learning-objectives">Learning objectives&lt;/h3>
&lt;ul>
&lt;li>Verify baseline balance using t-tests, standardized mean differences, and balance plots&lt;/li>
&lt;li>Distinguish between ATE and ATT and identify which estimand each method targets&lt;/li>
&lt;li>Understand three estimation strategies &amp;mdash; regression adjustment, inverse probability weighting, and doubly robust &amp;mdash; and when to use each&lt;/li>
&lt;li>Estimate treatment effects using all three approaches and compare their results&lt;/li>
&lt;li>Leverage panel data structure with difference-in-differences and understand why DiD estimates ATT&lt;/li>
&lt;li>Apply doubly robust difference-in-differences (DRDID) for modern panel data analysis&lt;/li>
&lt;li>Separate the effect of treatment offer from treatment receipt under imperfect compliance&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="2-study-design">2. Study design&lt;/h2>
&lt;p>This RCT evaluates a cash transfer program designed to boost household consumption. The study tracks 2,000 households across two survey waves &amp;mdash; a &lt;strong>baseline&lt;/strong> in 2021 (before the program) and an &lt;strong>endline&lt;/strong> in 2024 (after the program was implemented). The diagram below summarizes the experimental design.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
POP[&amp;quot;&amp;lt;b&amp;gt;2,000 Households&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Balanced panel&amp;lt;br/&amp;gt;(observed in 2021 and 2024)&amp;quot;]
STRAT[&amp;quot;&amp;lt;b&amp;gt;Stratified Randomization&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Within poverty strata&amp;quot;]
TRT[&amp;quot;&amp;lt;b&amp;gt;Treatment Group&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(~1,000 households)&amp;lt;br/&amp;gt;Offered cash transfer&amp;quot;]
CTL[&amp;quot;&amp;lt;b&amp;gt;Control Group&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(~1,000 households)&amp;lt;br/&amp;gt;No offer&amp;quot;]
COMP1[&amp;quot;85% receive&amp;lt;br/&amp;gt;the transfer&amp;quot;]
COMP2[&amp;quot;15% do not&amp;lt;br/&amp;gt;receive&amp;quot;]
COMP3[&amp;quot;5% receive&amp;lt;br/&amp;gt;the transfer&amp;quot;]
COMP4[&amp;quot;95% do not&amp;lt;br/&amp;gt;receive&amp;quot;]
BASE[&amp;quot;&amp;lt;b&amp;gt;Baseline 2021&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Pre-treatment survey&amp;quot;]
END[&amp;quot;&amp;lt;b&amp;gt;Endline 2024&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Post-treatment survey&amp;quot;]
POP --&amp;gt; BASE
BASE --&amp;gt; STRAT
STRAT --&amp;gt; TRT
STRAT --&amp;gt; CTL
TRT --&amp;gt; COMP1
TRT --&amp;gt; COMP2
CTL --&amp;gt; COMP3
CTL --&amp;gt; COMP4
COMP1 --&amp;gt; END
COMP2 --&amp;gt; END
COMP3 --&amp;gt; END
COMP4 --&amp;gt; END
style POP fill:#6a9bcc,stroke:#141413,color:#fff
style STRAT fill:#d97757,stroke:#141413,color:#fff
style TRT fill:#00d4c8,stroke:#141413,color:#141413
style CTL fill:#6a9bcc,stroke:#141413,color:#fff
style BASE fill:#6a9bcc,stroke:#141413,color:#fff
style END fill:#d97757,stroke:#141413,color:#fff
style COMP1 fill:#00d4c8,stroke:#141413,color:#141413
style COMP2 fill:#141413,stroke:#d97757,color:#fff
style COMP3 fill:#d97757,stroke:#141413,color:#fff
style COMP4 fill:#141413,stroke:#6a9bcc,color:#fff
&lt;/code>&lt;/pre>
&lt;p>The randomization was &lt;strong>stratified by poverty status&lt;/strong> (block randomization), ensuring that treatment and control groups started with similar proportions of poor and non-poor households. A critical real-world feature of this study is &lt;strong>imperfect compliance&lt;/strong> &amp;mdash; only 85% of households offered the treatment actually received the cash transfer, while 5% of control households received it through other channels.&lt;/p>
&lt;h3 id="variables">Variables&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>Type&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>id&lt;/code>&lt;/td>
&lt;td>Household identifier&lt;/td>
&lt;td>Panel ID&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>year&lt;/code>&lt;/td>
&lt;td>Survey year (2021 or 2024)&lt;/td>
&lt;td>Time variable&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>post&lt;/code>&lt;/td>
&lt;td>Endline indicator (1 = 2024)&lt;/td>
&lt;td>Binary&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>treat&lt;/code>&lt;/td>
&lt;td>Random assignment to offer (intent-to-treat)&lt;/td>
&lt;td>Binary&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>D&lt;/code>&lt;/td>
&lt;td>Actual receipt of cash transfer&lt;/td>
&lt;td>Binary (endogenous)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>y&lt;/code>&lt;/td>
&lt;td>Log monthly consumption&lt;/td>
&lt;td>Continuous (outcome)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>age&lt;/code>&lt;/td>
&lt;td>Age of household head&lt;/td>
&lt;td>Continuous&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>female&lt;/code>&lt;/td>
&lt;td>Female-headed household&lt;/td>
&lt;td>Binary&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>poverty&lt;/code>&lt;/td>
&lt;td>Poverty status at baseline&lt;/td>
&lt;td>Binary&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>edu&lt;/code>&lt;/td>
&lt;td>Years of education&lt;/td>
&lt;td>Continuous&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>y0&lt;/code>&lt;/td>
&lt;td>Log monthly consumption at baseline (pre-treatment)&lt;/td>
&lt;td>Continuous&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;blockquote>
&lt;p>&lt;strong>Offer vs. receipt&lt;/strong> &amp;mdash; The variable &lt;code>treat&lt;/code> captures random assignment to the program offer. It is exogenous (determined by randomization) and unrelated to household characteristics. The variable &lt;code>D&lt;/code> captures actual receipt of the cash transfer. It is &lt;strong>endogenous&lt;/strong> &amp;mdash; households that chose to take up the program may differ systematically from those that did not. Most methods in this tutorial estimate the effect of the &lt;strong>offer&lt;/strong> (intent-to-treat). Section 10 addresses the effect of &lt;strong>receipt&lt;/strong>.&lt;/p>
&lt;/blockquote>
&lt;hr>
&lt;h2 id="3-analytical-roadmap">3. Analytical roadmap&lt;/h2>
&lt;p>The diagram below shows the progression of methods we will use. Each stage builds on the previous one, adding complexity and robustness.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
A[&amp;quot;&amp;lt;b&amp;gt;Balance&amp;lt;br/&amp;gt;Checks&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Section 5&amp;lt;/i&amp;gt;&amp;quot;]
B[&amp;quot;&amp;lt;b&amp;gt;Cross-sectional&amp;lt;br/&amp;gt;RA / IPW / DR&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Sections 7--8&amp;lt;/i&amp;gt;&amp;quot;]
C[&amp;quot;&amp;lt;b&amp;gt;Panel Data&amp;lt;br/&amp;gt;DiD / DR-DiD&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Section 9&amp;lt;/i&amp;gt;&amp;quot;]
D[&amp;quot;&amp;lt;b&amp;gt;Endogenous&amp;lt;br/&amp;gt;Treatment&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Section 10&amp;lt;/i&amp;gt;&amp;quot;]
A --&amp;gt; B
B --&amp;gt; C
C --&amp;gt; D
style A fill:#6a9bcc,stroke:#141413,color:#fff
style B fill:#d97757,stroke:#141413,color:#fff
style C fill:#00d4c8,stroke:#141413,color:#141413
style D fill:#141413,stroke:#d97757,color:#fff
&lt;/code>&lt;/pre>
&lt;p>We first establish that randomization worked (balance checks). Then we estimate treatment effects three ways using only endline data &amp;mdash; regression adjustment, inverse probability weighting, and doubly robust methods. Next, we leverage the full panel structure with difference-in-differences. Finally, we address imperfect compliance by separating the effect of the offer from the effect of receipt.&lt;/p>
&lt;hr>
&lt;h2 id="4-data-loading-and-exploration">4. Data loading and exploration&lt;/h2>
&lt;p>We begin by loading the simulated dataset from a public GitHub repository and examining its structure.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/ametrics/dataSIM4RCT.dta&amp;quot;, clear
des y age edu female poverty treat D
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Contains data
Observations: 4,000
Variables: 10
Variable Storage Display Value
name type format label Variable label
─────────────────────────────────────────────────────────────
y float %9.0g Log monthly consumption
age float %9.0g
edu float %9.0g
female float %9.0g
poverty float %9.0g
treat float %9.0g Assignment to offer (Z)
D float %9.0g Receipt of cash transfer
&lt;/code>&lt;/pre>
&lt;p>The dataset contains 4,000 observations &amp;mdash; 2,000 households observed at two time points (baseline 2021 and endline 2024). The outcome variable &lt;code>y&lt;/code> is log monthly consumption, &lt;code>treat&lt;/code> is the random assignment indicator, and &lt;code>D&lt;/code> is the actual receipt indicator.&lt;/p>
&lt;p>Now let us examine summary statistics at baseline and endline separately.&lt;/p>
&lt;pre>&lt;code class="language-stata">sum y age edu female poverty treat D if post==0
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Variable | Obs Mean Std. dev. Min Max
─────────────+─────────────────────────────────────────────────────────
y | 2,000 10.0154 .4348886 8.454445 11.48253
age | 2,000 35.126 9.650839 18 68
edu | 2,000 12.0275 1.9889 6 18
female | 2,000 .5085 .5000528 0 1
poverty | 2,000 .3125 .4636283 0 1
treat | 2,000 .518 .4998009 0 1
D | 2,000 0 0 0 0
&lt;/code>&lt;/pre>
&lt;p>At baseline, mean log consumption is approximately 10.02, the average household head is 35 years old with 12 years of education, about 51% of households are female-headed, and 31% are in poverty. Treatment assignment (&lt;code>treat&lt;/code>) is approximately 50%, as expected from the randomization. Crucially, the receipt variable &lt;code>D&lt;/code> is zero for all households at baseline &amp;mdash; the program had not yet been implemented.&lt;/p>
&lt;pre>&lt;code class="language-stata">sum y age edu female poverty treat D if post==1
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Variable | Obs Mean Std. dev. Min Max
─────────────+─────────────────────────────────────────────────────────
y | 2,000 10.1137 .4382183 8.638689 11.55002
age | 2,000 35.126 9.650839 18 68
edu | 2,000 12.0275 1.9889 6 18
female | 2,000 .5085 .5000528 0 1
poverty | 2,000 .3125 .4636283 0 1
treat | 2,000 .518 .4998009 0 1
D | 2,000 .4615 .4986402 0 1
&lt;/code>&lt;/pre>
&lt;p>At endline, mean consumption has risen to approximately 10.11, reflecting both the natural time trend and the treatment effect. The receipt variable &lt;code>D&lt;/code> is now non-zero &amp;mdash; about 46% of all households received the cash transfer (combining treated households who took up the program and control households who received it through other channels).&lt;/p>
&lt;p>Finally, we declare the panel structure so Stata knows we have repeated observations.&lt;/p>
&lt;pre>&lt;code class="language-stata">xtset id year
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Panel variable: id (strongly balanced)
Time variable: year, 2021 to 2024, but with gaps
Delta: 1 unit
&lt;/code>&lt;/pre>
&lt;p>The panel is &lt;strong>strongly balanced&lt;/strong> &amp;mdash; all 2,000 households appear in both survey waves, with no attrition. This is an ideal scenario that simplifies our analysis.&lt;/p>
&lt;hr>
&lt;h2 id="5-baseline-balance-checks">5. Baseline balance checks&lt;/h2>
&lt;p>Before estimating any treatment effects, we must verify that randomization produced comparable treatment and control groups at baseline. This is the most fundamental quality check in any RCT.&lt;/p>
&lt;h3 id="51-t-tests-and-proportion-tests">5.1 T-tests and proportion tests&lt;/h3>
&lt;p>We compare the treatment and control groups on all baseline characteristics using two-sample t-tests for continuous variables and proportion tests for binary variables.&lt;/p>
&lt;pre>&lt;code class="language-stata">ttest y if post==0, by(treat)
ttest age if post==0, by(treat)
ttest edu if post==0, by(treat)
prtest female if post==0, by(treat)
prtest poverty if post==0, by(treat)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Variable | Control Mean Treat Mean Diff p-value
────────────+──────────────────────────────────────────────
y | 10.025 10.006 0.019 0.330
age | 35.335 34.931 0.404 0.350
edu | 11.974 12.077 -0.103 0.247
female | 0.484 0.531 -0.046 0.038 **
poverty | 0.307 0.318 -0.011 0.612
&lt;/code>&lt;/pre>
&lt;p>Most variables show no statistically significant differences between the treatment and control groups. However, the variable &lt;code>female&lt;/code> has a p-value of 0.038 &amp;mdash; a statistically significant imbalance. The treatment group has about 4.6 percentage points more female-headed households than the control group. This imbalance occurred purely by chance but must be addressed in our estimation.&lt;/p>
&lt;h3 id="52-balance-table-with-standardized-mean-differences">5.2 Balance table with standardized mean differences&lt;/h3>
&lt;p>P-values are sensitive to sample size &amp;mdash; a large sample can make tiny differences &amp;ldquo;significant.&amp;rdquo; Standardized mean differences (SMDs) provide a scale-free measure of imbalance that is more informative. The SMD is computed as the difference in group means divided by the pooled standard deviation &amp;mdash; this puts all variables on the same scale regardless of their units. The common rule of thumb is that SMDs below 10% indicate adequate balance.&lt;/p>
&lt;pre>&lt;code class="language-stata">capture ssc install ietoolkit, replace
iebaltab y age edu female poverty if post==0, grpvar(treat)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> (1) (2) (2)-(1)
Control Treatment Difference
y 10.025 10.006 0.019
(0.014) (0.014) (0.019)
age 35.335 34.931 0.404
(0.316) (0.295) (0.432)
edu 11.974 12.077 -0.103
(0.063) (0.063) (0.089)
female 0.484 0.531 -0.046**
(0.016) (0.016) (0.022)
poverty 0.307 0.318 -0.011
(0.015) (0.014) (0.021)
N 964 1,036
&lt;/code>&lt;/pre>
&lt;p>The balance table confirms our t-test findings. With 964 control and 1,036 treatment households, all variables are well balanced except &lt;code>female&lt;/code>, which shows a statistically significant difference (marked with **). The outcome variable &lt;code>y&lt;/code> has a negligible difference of 0.019 at baseline &amp;mdash; the groups started with essentially identical consumption levels.&lt;/p>
&lt;h3 id="53-visual-balance-plot">5.3 Visual balance plot&lt;/h3>
&lt;p>A balance plot provides a visual overview of all SMDs at once, making it easy to spot problematic variables.&lt;/p>
&lt;pre>&lt;code class="language-stata">net install balanceplot, from(&amp;quot;https://tdmize.github.io/data&amp;quot;) replace
balanceplot y age edu i.female i.poverty, group(treat) table nodropdv
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_rct_balance_plot.png" alt="Balance plot showing standardized mean differences for all covariates. All variables fall within the 10% threshold, with female closest at approximately 9.3%.">&lt;/p>
&lt;p>The balance plot shows that all SMDs fall below the 10% threshold (indicated by the dashed vertical lines). The variable &lt;code>female&lt;/code> has the largest SMD at approximately 9.3% &amp;mdash; close to but still below the conventional threshold. The remaining variables &amp;mdash; consumption, age, education, and poverty &amp;mdash; all have SMDs well below 5%. Overall, randomization was successful, but we should control for &lt;code>female&lt;/code> (and other covariates) in our estimation to improve precision.&lt;/p>
&lt;h3 id="54-aipw-as-a-formal-balance-test">5.4 AIPW as a formal balance test&lt;/h3>
&lt;p>As a final and more formal balance check, we can use the Augmented Inverse Probability Weighting (AIPW) estimator on &lt;strong>baseline data only&lt;/strong>. If randomization was successful, the estimated &amp;ldquo;treatment effect&amp;rdquo; at baseline should be zero &amp;mdash; since the program had not yet been implemented, there should be no difference between groups.&lt;/p>
&lt;pre>&lt;code class="language-stata">preserve
keep if post==0
teffects aipw (y age edu i.female i.poverty) (treat age edu i.female i.poverty)
&lt;/code>&lt;/pre>
&lt;blockquote>
&lt;p>&lt;strong>Tip:&lt;/strong> The &lt;code>preserve&lt;/code> command saves a snapshot of the current data. After the balance analysis, use &lt;code>restore&lt;/code> to return to the full dataset. The companion do-file handles this automatically.&lt;/p>
&lt;/blockquote>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : augmented IPW
Outcome model : linear
Treatment model: logit
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATE |
treat |
(1 vs 0) | -.0244086 .018861 -1.29 0.196 -.0613754 .0125582
─────────────+────────────────────────────────────────────────────────────────
POmean |
treat |
0 | 10.02792 .0138363 724.75 0.000 10.0008 10.05504
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The AIPW-estimated &amp;ldquo;ATE&amp;rdquo; at baseline is -0.024 with a p-value of 0.196 &amp;mdash; not statistically significant. This confirms that there is no detectable pre-treatment difference between the groups after adjusting for covariates. The treatment and control groups were statistically comparable before the program began.&lt;/p>
&lt;p>Now we run the diagnostic checks for the AIPW model.&lt;/p>
&lt;pre>&lt;code class="language-stata">tebalance overid
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Overidentification test for covariate balance
H0: Covariates are balanced
chi2(5) = 3.216
Prob &amp;gt; chi2 = 0.6670
&lt;/code>&lt;/pre>
&lt;p>The overidentification test fails to reject the null hypothesis of covariate balance (p = 0.667). There is no statistical evidence of residual imbalance after weighting.&lt;/p>
&lt;pre>&lt;code class="language-stata">tebalance summarize
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> |Standardized differences Variance ratio
| Raw Weighted Raw Weighted
----------------+------------------------------------------------
age | -.0417918 .0002505 .9318894 .9446877
edu | .0519015 -6.96e-06 1.071677 1.078214
female |
1 | .0929611 6.51e-06 .9970775 .9999996
poverty |
1 | .0226764 .0002864 1.018475 1.000233
&lt;/code>&lt;/pre>
&lt;p>The balance summary reveals that the raw standardized differences (before weighting) show the &lt;code>female&lt;/code> imbalance at 0.093, consistent with our earlier findings. After weighting, all standardized differences shrink to near zero (all below 0.001) &amp;mdash; excellent balance. The variance ratios are all close to 1.0, indicating similar spread across groups.&lt;/p>
&lt;pre>&lt;code class="language-stata">tebalance density y
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_rct_density_y.png" alt="Density plot showing the distribution of log consumption for treatment and control groups, before and after AIPW weighting. The weighted distributions overlap almost perfectly.">&lt;/p>
&lt;p>The density plot confirms that after AIPW weighting, the distributions of log consumption in the treatment and control groups overlap almost perfectly. Any small pre-existing differences in the outcome variable have been eliminated by the weighting scheme.&lt;/p>
&lt;pre>&lt;code class="language-stata">teffects overlap
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_rct_overlap_baseline.png" alt="Overlap plot showing kernel densities of estimated propensity scores for treatment and control groups. Both distributions span approximately 0.43 to 0.55 with substantial overlap.">&lt;/p>
&lt;p>The overlap plot shows that propensity scores for both groups are concentrated between approximately 0.43 and 0.55 &amp;mdash; well within the range where matching and weighting are feasible. There are no extreme propensity scores near 0 or 1, confirming that the common support condition is satisfied. This is expected in a well-designed RCT where treatment probability is approximately 0.50 for all households.&lt;/p>
&lt;pre>&lt;code class="language-stata">restore
&lt;/code>&lt;/pre>
&lt;p>This AIPW-based balance analysis also serves a pedagogical purpose: it introduces the concept of &lt;strong>doubly robust&lt;/strong> estimation before we use it for treatment effect estimation in Section 8.&lt;/p>
&lt;hr>
&lt;h2 id="6-what-are-we-estimating-ate-vs-att">6. What are we estimating? ATE vs. ATT&lt;/h2>
&lt;p>Before diving into estimation, we need to be precise about &lt;strong>what&lt;/strong> we are trying to estimate. There are two fundamental causal quantities in program evaluation.&lt;/p>
&lt;p>The &lt;strong>Average Treatment Effect (ATE)&lt;/strong> answers the policymaker&amp;rsquo;s question: &lt;em>&amp;ldquo;What would happen if we scaled this program to the entire population?&amp;quot;&lt;/em>&lt;/p>
&lt;p>$$ATE = E[Y(1) - Y(0)]$$&lt;/p>
&lt;p>where $Y(1)$ is the potential outcome under treatment and $Y(0)$ is the potential outcome under control, averaged over the &lt;strong>entire population&lt;/strong> (both treated and untreated).&lt;/p>
&lt;p>The &lt;strong>Average Treatment Effect on the Treated (ATT)&lt;/strong> answers the evaluator&amp;rsquo;s question: &lt;em>&amp;ldquo;Did the program benefit those who were assigned to it?&amp;quot;&lt;/em>&lt;/p>
&lt;p>$$ATT = E[Y(1) - Y(0) \mid T = 1]$$&lt;/p>
&lt;p>This averages the treatment effect only over the &lt;strong>treated group&lt;/strong> &amp;mdash; the households that were assigned to receive the cash transfer.&lt;/p>
&lt;p>In a well-designed RCT with &lt;strong>homogeneous treatment effects&lt;/strong> (the program affects everyone equally), ATE and ATT are the same. But when treatment effects are &lt;strong>heterogeneous&lt;/strong> (the program benefits some households more than others), they can differ. For example, if poorer households benefit more from cash transfers and the treatment group has a higher share of poor households, the ATT could be larger than the ATE.&lt;/p>
&lt;p>Understanding this distinction is critical because different methods target different estimands. Cross-sectional methods (RA, IPW, DR) can estimate &lt;strong>either&lt;/strong> ATE or ATT. Difference-in-differences inherently estimates the &lt;strong>ATT only&lt;/strong>. We will return to this point in Section 9.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Note on RCTs&lt;/strong> &amp;mdash; In a randomized experiment, treatment assignment is independent of potential outcomes. This means that simple comparisons between treatment and control groups are already unbiased estimates of the ATE. When we add covariates (regression adjustment, IPW, doubly robust), we are not removing bias &amp;mdash; we are &lt;strong>improving precision&lt;/strong> by accounting for residual variation. This is different from observational studies, where covariate adjustment is needed to address confounding.&lt;/p>
&lt;/blockquote>
&lt;hr>
&lt;h2 id="7-three-strategies-for-causal-estimation">7. Three strategies for causal estimation&lt;/h2>
&lt;p>We now understand &lt;em>what&lt;/em> we want to estimate (ATE and ATT from Section 6). The question becomes &lt;em>how&lt;/em> to estimate it. Three families of methods exist, each taking a fundamentally different approach to solving the missing-data problem at the heart of causal inference. Each method models a different part of the data-generating process, and understanding these differences is essential for interpreting results and choosing the right tool.&lt;/p>
&lt;h3 id="71-regression-adjustment-ra-----modeling-the-outcome">7.1 Regression Adjustment (RA) &amp;mdash; modeling the outcome&lt;/h3>
&lt;p>Regression adjustment solves the missing-data problem by &lt;strong>predicting the unobserved potential outcomes&lt;/strong>. It fits separate regression models for treated and untreated groups. For each household, it uses these models to predict two potential outcomes: what consumption would be if treated, $\hat{\mu}_1(X_i)$, and what consumption would be if untreated, $\hat{\mu}_0(X_i)$. Since we only observe one of these for each household, the model fills in the missing counterfactual. The treatment effect for each household is the difference between the two predictions, and the ATE is the average across all households.&lt;/p>
&lt;p>The Stata documentation describes this succinctly: &lt;em>&amp;ldquo;RA estimators use means of predicted outcomes for each treatment level to estimate each POM. ATEs and ATETs are differences in estimated POMs.&amp;quot;&lt;/em>&lt;/p>
&lt;p>&lt;strong>Analogy &amp;mdash; predicting exam scores.&lt;/strong> Imagine two study methods (A and B) being tested on students. You observe each student using only one method. RA fits a model predicting test scores based on student characteristics (prior GPA, hours studied) separately for method-A and method-B users. Then, for &lt;em>every&lt;/em> student, it predicts what their score would have been under &lt;em>both&lt;/em> methods &amp;mdash; even the one they did not use. The average difference in predicted scores is the treatment effect.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
DATA[&amp;quot;&amp;lt;b&amp;gt;Observed Data&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Each household observed&amp;lt;br/&amp;gt;under ONE treatment only&amp;quot;]
M0[&amp;quot;&amp;lt;b&amp;gt;Fit outcome model&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;using control group&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Y = f(age, edu, female, poverty)&amp;lt;/i&amp;gt;&amp;quot;]
M1[&amp;quot;&amp;lt;b&amp;gt;Fit outcome model&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;using treated group&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Y = f(age, edu, female, poverty)&amp;lt;/i&amp;gt;&amp;quot;]
P0[&amp;quot;Predict &amp;lt;b&amp;gt;Ŷ₀&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;for ALL households&amp;quot;]
P1[&amp;quot;Predict &amp;lt;b&amp;gt;Ŷ₁&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;for ALL households&amp;quot;]
ATE[&amp;quot;&amp;lt;b&amp;gt;ATE&amp;lt;/b&amp;gt; = Average of&amp;lt;br/&amp;gt;(Ŷ₁ − Ŷ₀)&amp;quot;]
DATA --&amp;gt; M0
DATA --&amp;gt; M1
M0 --&amp;gt; P0
M1 --&amp;gt; P1
P0 --&amp;gt; ATE
P1 --&amp;gt; ATE
style DATA fill:#141413,stroke:#6a9bcc,color:#fff
style M0 fill:#6a9bcc,stroke:#141413,color:#fff
style M1 fill:#6a9bcc,stroke:#141413,color:#fff
style P0 fill:#6a9bcc,stroke:#141413,color:#fff
style P1 fill:#6a9bcc,stroke:#141413,color:#fff
style ATE fill:#6a9bcc,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>The RA estimator.&lt;/strong> Formally, the ATE under regression adjustment is:&lt;/p>
&lt;p>$$\hat{\tau}_{RA}^{ATE} = \frac{1}{N} \sum_{i=1}^{N} \left[ \hat{\mu}_1(X_i) - \hat{\mu}_0(X_i) \right]$$&lt;/p>
&lt;p>where $\hat{\mu}_1(X)$ is the predicted outcome under treatment (fitted from treated observations) and $\hat{\mu}_0(X)$ is the predicted outcome under control (fitted from untreated observations), both evaluated at each household&amp;rsquo;s covariates $X_i$. In plain language: for each household, the model predicts what their consumption would be if they received the cash transfer and what it would be if they did not. The difference is the household&amp;rsquo;s estimated treatment effect. Averaging these across all $N$ households gives the ATE.&lt;/p>
&lt;p>For the ATT, we restrict the average to treated units only:&lt;/p>
&lt;p>$$\hat{\tau}_{RA}^{ATT} = \frac{1}{N_1} \sum_{i: T_i = 1} \left[ \hat{\mu}_1(X_i) - \hat{\mu}_0(X_i) \right]$$&lt;/p>
&lt;p>where $N_1$ is the number of treated households.&lt;/p>
&lt;p>&lt;strong>Mini example from our data.&lt;/strong> Consider Household A: a 40-year-old female in poverty with 10 years of education. The treated outcome model predicts her consumption at 10.17 log points. The untreated outcome model predicts 10.05. Her estimated individual treatment effect is $10.17 - 10.05 = 0.12$. Averaging such predictions over all 2,000 endline households gives the ATE.&lt;/p>
&lt;p>&lt;strong>Stata implementation.&lt;/strong> The &lt;code>teffects ra&lt;/code> command fits linear outcome models by default. The first parenthesis specifies the outcome model (outcome variable + covariates), and the second specifies the treatment variable: &lt;code>teffects ra (y c.age c.edu i.female i.poverty) (treat), ate&lt;/code>.&lt;/p>
&lt;p>&lt;strong>What can go wrong &amp;mdash; model misspecification.&lt;/strong> RA&amp;rsquo;s Achilles heel is that it relies entirely on the outcome model being correctly specified. If consumption depends on age nonlinearly (for example, a U-shaped relationship), but we assume a linear model, the predictions $\hat{\mu}_1$ and $\hat{\mu}_0$ will be systematically wrong, biasing the ATE. As the Stata manual notes, RA works well when the outcome model is correct, but &amp;ldquo;relying on a correctly specified outcome model with little data is extremely risky.&amp;rdquo; RA gives the right answer &lt;strong>only if the outcome model is correct&lt;/strong>. If it is wrong, the ATE estimate can be biased even with infinite data.&lt;/p>
&lt;p>What if we are unsure about the functional form of the outcome model? Is there an approach that avoids modeling the outcome entirely?&lt;/p>
&lt;h3 id="72-inverse-probability-weighting-ipw-----modeling-the-treatment-assignment">7.2 Inverse Probability Weighting (IPW) &amp;mdash; modeling the treatment assignment&lt;/h3>
&lt;p>IPW takes the opposite approach. Instead of modeling consumption, it models the probability of being assigned to treatment &amp;mdash; the &lt;strong>propensity score&lt;/strong>, defined as $p(X) = \Pr(T = 1 \mid X)$. It then reweights observations so that the treatment and control groups become comparable. The Stata documentation explains: &lt;em>&amp;ldquo;IPW estimators use weighted averages of the observed outcome variable to estimate means of the potential outcomes. The weights account for the missing data inherent in the potential-outcome framework.&amp;quot;&lt;/em>&lt;/p>
&lt;p>The logic is elegant: in a perfectly randomized experiment, every household has the same 50% chance of treatment, and a simple comparison of means is unbiased. When chance imbalances arise (like our 9.3% gender SMD), the estimated propensity scores deviate slightly from 0.50. IPW corrects for these imbalances by making the reweighted sample look as if randomization had been perfect &amp;mdash; without ever modeling the outcome.&lt;/p>
&lt;p>&lt;strong>Analogy &amp;mdash; opinion polling.&lt;/strong> Election pollsters know their survey overrepresents some demographics. If 60% of respondents are college graduates but only 35% of voters are, pollsters give lower weight to each college graduate&amp;rsquo;s response and higher weight to non-graduates. IPW does the same thing for treatment groups &amp;mdash; it reweights households so the treated and control groups have the same covariate distribution.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
DATA[&amp;quot;&amp;lt;b&amp;gt;Observed Data&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Treatment and control groups&amp;lt;br/&amp;gt;may have imbalances&amp;quot;]
PS[&amp;quot;&amp;lt;b&amp;gt;Estimate propensity score&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;p(X) = Pr(T=1 | X)&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;via logistic regression&amp;lt;/i&amp;gt;&amp;quot;]
WT[&amp;quot;&amp;lt;b&amp;gt;Compute weights&amp;lt;/b&amp;gt;&amp;quot;]
WTR[&amp;quot;Treated: weight = 1/p(X)&amp;quot;]
WCT[&amp;quot;Control: weight = 1/(1−p(X))&amp;quot;]
ATE[&amp;quot;&amp;lt;b&amp;gt;ATE&amp;lt;/b&amp;gt; = Weighted mean(treated)&amp;lt;br/&amp;gt;− Weighted mean(control)&amp;quot;]
DATA --&amp;gt; PS
PS --&amp;gt; WT
WT --&amp;gt; WTR
WT --&amp;gt; WCT
WTR --&amp;gt; ATE
WCT --&amp;gt; ATE
style DATA fill:#141413,stroke:#d97757,color:#fff
style PS fill:#d97757,stroke:#141413,color:#fff
style WT fill:#d97757,stroke:#141413,color:#fff
style WTR fill:#d97757,stroke:#141413,color:#fff
style WCT fill:#d97757,stroke:#141413,color:#fff
style ATE fill:#d97757,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>The propensity score.&lt;/strong> The propensity score is estimated via logistic regression:&lt;/p>
&lt;p>$$\hat{p}(X_i) = \Pr(T_i = 1 \mid X_i) = \text{logit}^{-1}(\hat{\alpha} + \hat{\beta}' X_i)$$&lt;/p>
&lt;p>In plain language: we fit a logistic model predicting whether each household was assigned to treatment, based on their covariates (age, education, gender, poverty status). The predicted probability is their propensity score.&lt;/p>
&lt;p>&lt;strong>The IPW estimator.&lt;/strong> The ATE under IPW is:&lt;/p>
&lt;p>$$\hat{\tau}_{IPW}^{ATE} = \frac{1}{N} \sum_{i=1}^{N} \left[ \frac{T_i \cdot Y_i}{\hat{p}(X_i)} - \frac{(1 - T_i) \cdot Y_i}{1 - \hat{p}(X_i)} \right]$$&lt;/p>
&lt;p>Each treated household&amp;rsquo;s outcome is divided by its probability of being treated &amp;mdash; this upweights treated households that &amp;ldquo;look like&amp;rdquo; control households (the Stata manual calls this placing &amp;ldquo;a larger weight on those observations for which $y_{1i}$ is observed even though its observation was not likely&amp;rdquo;). Each control household&amp;rsquo;s outcome is divided by its probability of being in the control group. The reweighting creates a pseudo-population where treatment assignment is independent of covariates.&lt;/p>
&lt;p>For the ATT, only the control group needs reweighting (because the treated group is already the reference population):&lt;/p>
&lt;p>$$\hat{\tau}_{IPW}^{ATT} = \frac{1}{N_1} \sum_{i=1}^{N} \left[ T_i \cdot Y_i - \frac{(1 - T_i) \cdot \hat{p}(X_i) \cdot Y_i}{1 - \hat{p}(X_i)} \right]$$&lt;/p>
&lt;p>&lt;strong>Mini example from our data.&lt;/strong> In our RCT, a female household in poverty might have $\hat{p}(X) = 0.52$ (slightly more likely to be treated due to the gender imbalance). If treated, her weight is $1/0.52 = 1.92$. If in the control group, her weight is $1/(1 - 0.52) = 2.08$. A male non-poor household might have $\hat{p}(X) = 0.49$, giving weights close to 2.0 in either group. These mild adjustments rebalance the groups to remove the chance gender imbalance.&lt;/p>
&lt;p>&lt;strong>Why IPW matters even in RCTs.&lt;/strong> In a perfect RCT, the true propensity score is exactly 0.50 for everyone, and IPW does nothing. But finite samples produce chance imbalances. IPW uses the estimated propensity scores (which deviate slightly from 0.50) to correct for these imbalances without making any assumptions about how covariates affect the outcome.&lt;/p>
&lt;p>&lt;strong>Stata implementation.&lt;/strong> The &lt;code>teffects ipw&lt;/code> command fits a logistic treatment model by default. Note that the first parenthesis specifies only the outcome variable (no covariates &amp;mdash; IPW does not model the outcome), and the second specifies the treatment model: &lt;code>teffects ipw (y) (treat c.age c.edu i.female i.poverty), ate&lt;/code>.&lt;/p>
&lt;p>&lt;strong>What can go wrong &amp;mdash; extreme weights.&lt;/strong> IPW&amp;rsquo;s vulnerability is extreme propensity scores. If $\hat{p}(X) = 0.01$ for some household, the weight becomes $1/0.01 = 100$ &amp;mdash; that single household dominates the ATE estimate, causing high variance and instability. The Stata manual warns: &lt;em>&amp;ldquo;When propensity scores are extreme (near 0 or 1), the inverse weights become very large, producing unstable estimates.&amp;quot;&lt;/em> This happens when the treatment and control groups have poor &lt;strong>overlap&lt;/strong> &amp;mdash; some covariate combinations appear only in one group. In our well-designed RCT, all propensity scores are between 0.43 and 0.55 (we verified this in Section 5.4), so extreme weights are not a concern.&lt;/p>
&lt;p>RA works well if the outcome model is correct but can be biased if it is wrong. IPW works well if the propensity score model is correct but can be unstable if it is wrong. Is there a method that protects us against both types of misspecification?&lt;/p>
&lt;h3 id="73-doubly-robust-dr-----modeling-both">7.3 Doubly Robust (DR) &amp;mdash; modeling both&lt;/h3>
&lt;p>Doubly robust methods combine RA and IPW into a single estimator. They fit an outcome model &lt;strong>and&lt;/strong> estimate a propensity score. The key property &amp;mdash; the reason they are called &amp;ldquo;doubly robust&amp;rdquo; &amp;mdash; is that the estimator is consistent (converges to the true treatment effect with enough data) if &lt;strong>either&lt;/strong> the outcome model &lt;strong>or&lt;/strong> the propensity score model is correctly specified. You do not need both to be right &amp;mdash; just one.&lt;/p>
&lt;p>The Stata manual describes this property: &lt;em>&amp;ldquo;AIPW estimators model both the outcome and the treatment probability. A surprising fact is that only one of the two models must be correctly specified to consistently estimate the treatment effects.&amp;quot;&lt;/em>&lt;/p>
&lt;p>&lt;strong>Analogy &amp;mdash; backup power.&lt;/strong> Think of a house with two independent power sources: the electrical grid (the outcome model) and a solar panel system (the propensity score model). If the grid goes down (outcome model is misspecified), solar power keeps the lights on. If clouds block the solar panels (propensity score model is wrong), the grid still works. As long as at least one power source is functioning, the house stays lit. That is doubly robust estimation &amp;mdash; as long as at least one model is correct, the estimator gives the right answer.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
DATA[&amp;quot;&amp;lt;b&amp;gt;Observed Data&amp;lt;/b&amp;gt;&amp;quot;]
RA_C[&amp;quot;&amp;lt;b&amp;gt;RA component&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Predict Ŷ₁ and Ŷ₀&amp;lt;br/&amp;gt;for each household&amp;quot;]
IPW_C[&amp;quot;&amp;lt;b&amp;gt;IPW component&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Estimate propensity&amp;lt;br/&amp;gt;score p(X)&amp;quot;]
RESID[&amp;quot;&amp;lt;b&amp;gt;Prediction errors&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Y − Ŷ for each&amp;lt;br/&amp;gt;household&amp;quot;]
CORRECT[&amp;quot;&amp;lt;b&amp;gt;Bias-correction term&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;IPW-weighted residuals&amp;quot;]
DR[&amp;quot;&amp;lt;b&amp;gt;DR estimate&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;= RA prediction&amp;lt;br/&amp;gt;+ Bias correction&amp;quot;]
DATA --&amp;gt; RA_C
DATA --&amp;gt; IPW_C
RA_C --&amp;gt; RESID
IPW_C --&amp;gt; CORRECT
RESID --&amp;gt; CORRECT
RA_C --&amp;gt; DR
CORRECT --&amp;gt; DR
style DATA fill:#141413,stroke:#00d4c8,color:#fff
style RA_C fill:#6a9bcc,stroke:#141413,color:#fff
style IPW_C fill:#d97757,stroke:#141413,color:#fff
style RESID fill:#6a9bcc,stroke:#141413,color:#fff
style CORRECT fill:#d97757,stroke:#141413,color:#fff
style DR fill:#00d4c8,stroke:#141413,color:#141413
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>The AIPW estimator.&lt;/strong> The most common doubly robust form is Augmented Inverse Probability Weighting (AIPW):&lt;/p>
&lt;p>$$\hat{\tau}_{DR}^{ATE} = \frac{1}{N} \sum_{i=1}^{N} \left[ \hat{\mu}_1(X_i) - \hat{\mu}_0(X_i) + \frac{T_i (Y_i - \hat{\mu}_1(X_i))}{\hat{p}(X_i)} - \frac{(1 - T_i)(Y_i - \hat{\mu}_0(X_i))}{1 - \hat{p}(X_i)} \right]$$&lt;/p>
&lt;p>This equation has two clearly interpretable components:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>RA component&lt;/strong> (first two terms): $\hat{\mu}_1(X_i) - \hat{\mu}_0(X_i)$ &amp;mdash; the regression adjustment prediction, exactly as in Section 7.1&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Bias-correction component&lt;/strong> (last two terms): IPW-weighted residuals $(Y_i - \hat{\mu})$ &amp;mdash; the difference between actual and predicted outcomes, weighted by inverse propensity scores&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>In plain language: start with the RA prediction of each household&amp;rsquo;s treatment effect. Then ask: how far off was that prediction from reality? Weight those prediction errors by the propensity score. If RA was already right, the errors average to zero and you just get RA. If RA was wrong but IPW is right, the weighted errors exactly cancel the RA bias.&lt;/p>
&lt;p>&lt;strong>Why the magic works &amp;mdash; four scenarios.&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Outcome model correct, propensity model wrong:&lt;/strong> The residuals $(Y_i - \hat{\mu})$ are zero on average, so the correction terms vanish. DR reduces to RA. Correct answer.&lt;/li>
&lt;li>&lt;strong>Propensity model correct, outcome model wrong:&lt;/strong> The IPW reweighting is valid, so the correction terms fix the RA bias. Correct answer.&lt;/li>
&lt;li>&lt;strong>Both models correct:&lt;/strong> Both components work together, producing the most efficient estimate.&lt;/li>
&lt;li>&lt;strong>Both models wrong:&lt;/strong> Neither safety net catches the error. The estimate can be biased. DR provides insurance, not invincibility.&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>AIPW vs. IPWRA in Stata.&lt;/strong> Stata offers two doubly robust commands. &lt;code>teffects aipw&lt;/code> augments the IPW estimator with an outcome-model correction (the equation above). &lt;code>teffects ipwra&lt;/code> applies propensity score weights to the regression adjustment &amp;mdash; arriving at the same property from the other direction. Both are doubly robust and produce nearly identical results in practice.&lt;/p>
&lt;p>&lt;strong>Stata implementation.&lt;/strong> Both commands require specifying the outcome model in the first parenthesis and the treatment model in the second: &lt;code>teffects ipwra (y c.age c.edu i.female i.poverty) (treat c.age c.edu i.female i.poverty), vce(robust)&lt;/code>.&lt;/p>
&lt;p>&lt;strong>What can go wrong.&lt;/strong> DR fails only when &lt;strong>both&lt;/strong> models are wrong. This is much less likely than either single model being wrong &amp;mdash; getting at least one model approximately right is much easier than getting both perfectly right. However, the Stata manual notes: &lt;em>&amp;ldquo;When both the outcome and the treatment model are misspecified, which estimator is more robust is a matter of debate.&amp;quot;&lt;/em> Using flexible specifications (polynomials, interactions) reduces the risk of both models failing simultaneously.&lt;/p>
&lt;h3 id="comparison-of-the-three-approaches">Comparison of the three approaches&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Feature&lt;/th>
&lt;th>RA&lt;/th>
&lt;th>IPW&lt;/th>
&lt;th>DR (AIPW/IPWRA)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Models the outcome?&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>No&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Models the treatment?&lt;/td>
&lt;td>No&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Key equation&lt;/td>
&lt;td>$\hat{\mu}_1(X) - \hat{\mu}_0(X)$&lt;/td>
&lt;td>$T \cdot Y / \hat{p}(X)$&lt;/td>
&lt;td>RA + IPW-weighted residuals&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Consistent if outcome model correct?&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Consistent if treatment model correct?&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Main vulnerability&lt;/td>
&lt;td>Outcome misspecification&lt;/td>
&lt;td>Extreme weights&lt;/td>
&lt;td>Both models wrong&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Stata command&lt;/td>
&lt;td>&lt;code>teffects ra&lt;/code>&lt;/td>
&lt;td>&lt;code>teffects ipw&lt;/code>&lt;/td>
&lt;td>&lt;code>teffects ipwra&lt;/code> / &lt;code>teffects aipw&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;pre>&lt;code class="language-mermaid">graph LR
RA[&amp;quot;&amp;lt;b&amp;gt;Regression Adjustment&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Models the outcome&amp;quot;]
IPW[&amp;quot;&amp;lt;b&amp;gt;Inverse Probability&amp;lt;br/&amp;gt;Weighting&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Models the treatment&amp;quot;]
DR[&amp;quot;&amp;lt;b&amp;gt;Doubly Robust&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Models both&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Consistent if either&amp;lt;br/&amp;gt;model is correct&amp;lt;/i&amp;gt;&amp;quot;]
RA --&amp;gt; DR
IPW --&amp;gt; DR
style RA fill:#6a9bcc,stroke:#141413,color:#fff
style IPW fill:#d97757,stroke:#141413,color:#fff
style DR fill:#00d4c8,stroke:#141413,color:#141413
&lt;/code>&lt;/pre>
&lt;p>The doubly robust estimator combines the strengths of both RA and IPW. It is the &lt;strong>standard recommendation in modern causal inference&lt;/strong> because it provides an extra layer of protection against model misspecification. Now that we understand what each method does, what it assumes, and what can go wrong, let us apply all three to our cash transfer data and compare their results.&lt;/p>
&lt;hr>
&lt;h2 id="8-cross-sectional-estimation-at-endline-----ra-ipw-and-dr">8. Cross-sectional estimation at endline &amp;mdash; RA, IPW, and DR&lt;/h2>
&lt;p>We now estimate treatment effects using only endline data. For each method, we compute both the &lt;strong>ATE&lt;/strong> (the policymaker&amp;rsquo;s quantity) and the &lt;strong>ATT&lt;/strong> (the evaluator&amp;rsquo;s quantity).&lt;/p>
&lt;h3 id="81-simple-difference-in-means">8.1 Simple difference in means&lt;/h3>
&lt;p>The simplest approach is to compare mean outcomes between treated and control groups at endline.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/ametrics/dataSIM4RCT.dta&amp;quot;, clear
keep if post==1
reg y treat, robust
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Linear regression Number of obs = 2,000
F(1, 1998) = 35.43
Prob &amp;gt; F = 0.0000
R-squared = 0.0174
Root MSE = .43449
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. t P&amp;gt;|t| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
treat | .1157465 .0194443 5.95 0.000 .0776132 .1538798
_cons | 10.05374 .014001 718.07 0.000 10.02628 10.0812
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The simple difference in means yields an estimate of 0.116 (SE = 0.019, p &amp;lt; 0.001, 95% CI [0.078, 0.154]). Because the outcome is in logs, this means being offered the cash transfer increased household consumption by approximately 11.6%. This estimate is close to the true effect of 12% and is our benchmark for comparison. However, it does not adjust for the gender imbalance we discovered at baseline.&lt;/p>
&lt;h3 id="82-regression-adjustment-----ate-and-att">8.2 Regression Adjustment &amp;mdash; ATE and ATT&lt;/h3>
&lt;p>Regression adjustment models the outcome as a function of treatment and covariates, then computes predicted outcomes under treatment and control for each observation.&lt;/p>
&lt;pre>&lt;code class="language-stata">* RA: Average Treatment Effect (ATE)
teffects ra (y c.age c.edu i.female i.poverty) (treat), ate
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : regression adjustment
Outcome model : linear
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATE |
treat |
(1 vs 0) | .1125431 .0190927 5.89 0.000 .0751221 .1499641
─────────────+────────────────────────────────────────────────────────────────
POmean |
treat |
0 | 10.05503 .0138703 724.93 0.000 10.02785 10.08222
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-stata">* RA: Average Treatment Effect on the Treated (ATT)
teffects ra (y c.age c.edu i.female i.poverty) (treat), atet
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : regression adjustment
Outcome model : linear
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATET |
treat |
(1 vs 0) | .1132537 .0191498 5.91 0.000 .0757208 .1507865
─────────────+────────────────────────────────────────────────────────────────
POmean |
treat |
0 | 10.05623 .0140082 717.88 0.000 10.02878 10.08369
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The RA estimates are ATE = 0.113 (SE = 0.019, 95% CI [0.075, 0.150]) and ATT = 0.113 (SE = 0.019, 95% CI [0.076, 0.151]). The ATE and ATT are nearly identical, which confirms that treatment effects are approximately &lt;strong>homogeneous&lt;/strong> across households. The RA approach models the outcome with covariates (age, education, gender, poverty), which adjusts for the baseline gender imbalance and can improve precision.&lt;/p>
&lt;h3 id="83-inverse-probability-weighting-----ate-and-att">8.3 Inverse Probability Weighting &amp;mdash; ATE and ATT&lt;/h3>
&lt;p>IPW reweights observations based on their estimated probability of treatment, without modeling the outcome.&lt;/p>
&lt;pre>&lt;code class="language-stata">* IPW: Average Treatment Effect (ATE)
teffects ipw (y) (treat c.age c.edu i.female i.poverty), ate
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : inverse-probability weights
Outcome model : weighted mean
Treatment model: logit
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATE |
treat |
(1 vs 0) | .1126713 .0190886 5.90 0.000 .0752583 .1500844
─────────────+────────────────────────────────────────────────────────────────
POmean |
treat |
0 | 10.05495 .0138651 725.20 0.000 10.02778 10.08213
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-stata">* IPW: Average Treatment Effect on the Treated (ATT)
teffects ipw (y) (treat c.age c.edu i.female i.poverty), atet
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : inverse-probability weights
Outcome model : weighted mean
Treatment model: logit
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATET |
treat |
(1 vs 0) | .1134031 .0191397 5.93 0.000 .0758899 .1509162
─────────────+────────────────────────────────────────────────────────────────
POmean |
treat |
0 | 10.05608 .0140004 718.27 0.000 10.02864 10.08352
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The IPW estimates are ATE = 0.113 (SE = 0.019, 95% CI [0.075, 0.150]) and ATT = 0.113 (SE = 0.019, 95% CI [0.076, 0.151]). These are very close to the RA results, which is expected in a well-designed RCT where propensity scores are near 0.50 for all households. Notice that IPW does &lt;strong>not&lt;/strong> model the outcome &amp;mdash; it only models the treatment assignment process using the propensity score. The close agreement between RA and IPW gives us confidence that both the outcome model and the treatment model are approximately correct.&lt;/p>
&lt;h3 id="84-doubly-robust-----ate-and-att-ipwra">8.4 Doubly Robust &amp;mdash; ATE and ATT (IPWRA)&lt;/h3>
&lt;p>The doubly robust IPWRA estimator combines outcome modeling and propensity score weighting.&lt;/p>
&lt;pre>&lt;code class="language-stata">* IPWRA: Average Treatment Effect (ATE)
teffects ipwra (y c.age c.edu i.female i.poverty) ///
(treat c.age c.edu i.female i.poverty), vce(robust)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : IPW regression adjustment
Outcome model : linear
Treatment model: logit
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATE |
treat |
(1 vs 0) | .112639 .0190901 5.90 0.000 .0752231 .1500549
─────────────+────────────────────────────────────────────────────────────────
POmean |
treat |
0 | 10.055 .0138677 725.07 0.000 10.02782 10.08218
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-stata">* IPWRA: Average Treatment Effect on the Treated (ATT)
teffects ipwra (y c.age c.edu i.female i.poverty) ///
(treat c.age c.edu i.female i.poverty), atet vce(robust)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : IPW regression adjustment
Outcome model : linear
Treatment model: logit
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATET |
treat |
(1 vs 0) | .1133162 .0191469 5.92 0.000 .0757889 .1508435
─────────────+────────────────────────────────────────────────────────────────
POmean |
treat |
0 | 10.05617 .0140019 718.20 0.000 10.02873 10.08361
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The doubly robust IPWRA estimates are ATE = 0.113 (SE = 0.019, 95% CI [0.075, 0.150]) and ATT = 0.113 (SE = 0.019, 95% CI [0.076, 0.151]). These are very close to the RA and IPW estimates, confirming that all three approaches converge in this well-designed RCT. The DR method provides the most reliable cross-sectional estimate because it is protected against misspecification of either the outcome or treatment model.&lt;/p>
&lt;h3 id="85-doubly-robust-----aipw-alternative">8.5 Doubly Robust &amp;mdash; AIPW alternative&lt;/h3>
&lt;p>As a robustness check, we can also compute the doubly robust estimate using the AIPW formulation instead of IPWRA.&lt;/p>
&lt;pre>&lt;code class="language-stata">* AIPW: Average Treatment Effect (ATE)
teffects aipw (y c.age c.edu i.female i.poverty) ///
(treat c.age c.edu i.female i.poverty)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : augmented IPW
Outcome model : linear by ML
Treatment model: logit
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATE |
treat |
(1 vs 0) | .1126412 .0190903 5.90 0.000 .075225 .1500574
─────────────+────────────────────────────────────────────────────────────────
POmean |
treat |
0 | 10.055 .013868 725.05 0.000 10.02782 10.08218
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The AIPW estimate of ATE = 0.113 (SE = 0.019, 95% CI [0.075, 0.150]) is virtually identical to the IPWRA result (0.113). Both are doubly robust &amp;mdash; the difference lies in the computational approach (AIPW augments the IPW estimator with a bias-correction term, while IPWRA applies IPW weights to the regression adjustment), but the theoretical properties and estimates are the same.&lt;/p>
&lt;h3 id="86-cross-sectional-comparison">8.6 Cross-sectional comparison&lt;/h3>
&lt;p>The table below summarizes all cross-sectional estimates.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Method&lt;/th>
&lt;th>Approach&lt;/th>
&lt;th>Estimand&lt;/th>
&lt;th style="text-align:center">Estimate&lt;/th>
&lt;th style="text-align:center">SE&lt;/th>
&lt;th style="text-align:center">95% CI&lt;/th>
&lt;th style="text-align:center">Contains 0.12?&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Simple regression&lt;/td>
&lt;td>None&lt;/td>
&lt;td>ATE&lt;/td>
&lt;td style="text-align:center">0.116&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.078, 0.154]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Regression Adjustment&lt;/td>
&lt;td>Outcome model&lt;/td>
&lt;td>ATE&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Regression Adjustment&lt;/td>
&lt;td>Outcome model&lt;/td>
&lt;td>ATT&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.076, 0.151]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Inverse Prob. Weighting&lt;/td>
&lt;td>Treatment model&lt;/td>
&lt;td>ATE&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Inverse Prob. Weighting&lt;/td>
&lt;td>Treatment model&lt;/td>
&lt;td>ATT&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.076, 0.151]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>IPWRA (Doubly Robust)&lt;/td>
&lt;td>Both models&lt;/td>
&lt;td>ATE&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>IPWRA (Doubly Robust)&lt;/td>
&lt;td>Both models&lt;/td>
&lt;td>ATT&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.076, 0.151]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>True effect&lt;/strong>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td style="text-align:center">&lt;strong>0.12&lt;/strong>&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Several patterns emerge from this comparison. First, &lt;strong>ATE and ATT are nearly identical&lt;/strong> for every method, confirming that treatment effects are homogeneous across households. Second, &lt;strong>RA, IPW, and DR all give remarkably similar results&lt;/strong> (all approximately 0.113) because, in this well-designed RCT, randomization ensures that both the outcome model and the propensity score model are approximately correct. Third, the simple difference in means (0.116) is slightly higher than the covariate-adjusted estimates (0.113), reflecting the precision improvement from controlling for covariates including the gender imbalance. Finally, all confidence intervals contain the true effect of 0.12 &amp;mdash; every method successfully recovers the correct answer.&lt;/p>
&lt;p>The real value of doubly robust methods becomes apparent in less ideal settings. When one model might be misspecified &amp;mdash; a common situation in practice &amp;mdash; DR methods provide insurance that RA or IPW alone cannot offer.&lt;/p>
&lt;hr>
&lt;h2 id="9-leveraging-panel-data-----difference-in-differences">9. Leveraging panel data &amp;mdash; Difference-in-Differences&lt;/h2>
&lt;p>All estimates in Section 8 used only endline data. But we have panel data &amp;mdash; the same 2,000 households observed before and after the intervention. Can we do better?&lt;/p>
&lt;h3 id="91-why-use-panel-data">9.1 Why use panel data?&lt;/h3>
&lt;p>Cross-sectional methods (RA, IPW, DR) compare treated and control groups at a single point in time &amp;mdash; the endline. They control for &lt;strong>observable&lt;/strong> covariates like age, education, and gender. But there may be &lt;strong>unobservable&lt;/strong> characteristics &amp;mdash; household motivation, geographic advantages, cultural factors &amp;mdash; that differ between groups and affect consumption. No amount of cross-sectional covariate adjustment can control for these, because we simply do not observe them.&lt;/p>
&lt;p>&lt;strong>Analogy &amp;mdash; comparing students across schools.&lt;/strong> Imagine comparing test scores between students at a charter school (treatment) and a traditional school (control). You can adjust for observable differences like family income and prior grades. But what about unmeasured factors &amp;mdash; parental involvement, neighborhood quality, student ambition? A cross-sectional comparison cannot disentangle the school effect from these hidden differences. Now suppose you observe the &lt;em>same students&lt;/em> before and after they switch schools. By comparing each student&amp;rsquo;s score change, you automatically cancel out all fixed student characteristics &amp;mdash; because they are the same at both time points. That is the power of panel data.&lt;/p>
&lt;p>Panel data methods like difference-in-differences (DiD) solve this problem by comparing each household &lt;strong>to itself&lt;/strong> over time. By looking at how each household&amp;rsquo;s consumption changed from baseline to endline, we effectively control for all &lt;strong>time-invariant unobservable characteristics&lt;/strong> (household fixed effects). This is a powerful advantage that cross-sectional methods cannot replicate.&lt;/p>
&lt;h4 id="the-did-estimator">The DiD estimator&lt;/h4>
&lt;p>The DiD estimator computes a simple but powerful quantity &amp;mdash; a &amp;ldquo;difference of differences&amp;rdquo;:&lt;/p>
&lt;p>$$\hat{\tau}_{DiD} = \underbrace{(\bar{Y}_{treat,post} - \bar{Y}_{treat,pre})}_{\text{Change for treated}} - \underbrace{(\bar{Y}_{control,post} - \bar{Y}_{control,pre})}_{\text{Change for control}}$$&lt;/p>
&lt;p>The first difference ($\bar{Y}_{treat,post} - \bar{Y}_{treat,pre}$) captures the treatment group&amp;rsquo;s change over time &amp;mdash; the treatment effect &lt;strong>plus&lt;/strong> any common time trend (e.g., economic growth that affects all households). The second difference ($\bar{Y}_{control,post} - \bar{Y}_{control,pre}$) captures the control group&amp;rsquo;s change &amp;mdash; the common time trend &lt;strong>only&lt;/strong>, since they did not receive treatment. Subtracting the second from the first removes the time trend, isolating the treatment effect.&lt;/p>
&lt;p>&lt;strong>Mini example from our data.&lt;/strong> Suppose the treated group&amp;rsquo;s average log consumption went from 10.01 at baseline to 10.17 at endline (change = +0.16). The control group went from 10.03 to 10.06 (change = +0.03). The DiD estimate is $0.16 - 0.03 = 0.13$ &amp;mdash; close to the true effect of 0.12. The control group&amp;rsquo;s +0.03 change captures the natural time trend that would have affected everyone, and subtracting it isolates the treatment effect.&lt;/p>
&lt;h4 id="the-parallel-trends-assumption">The parallel trends assumption&lt;/h4>
&lt;p>The key identifying assumption of DiD is the &lt;strong>parallel trends assumption (PTA)&lt;/strong>: absent the treatment, the treatment and control groups would have followed the same time trend. Formally:&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Notation note&lt;/strong> &amp;mdash; In the DiD literature and in the Sant&amp;rsquo;Anna and Zhao (2020) paper, $D$ denotes treatment group assignment (equivalent to our &lt;code>treat&lt;/code> variable). This differs from our data dictionary where &lt;code>D&lt;/code> is the receipt indicator. In this section and Section 9.4, we follow the paper&amp;rsquo;s convention: $D = 1$ means assigned to treatment, $D = 0$ means assigned to control.&lt;/p>
&lt;/blockquote>
&lt;p>$$E[Y_1(0) - Y_0(0) \mid D = 1] = E[Y_1(0) - Y_0(0) \mid D = 0]$$&lt;/p>
&lt;p>This says that the average change in &lt;em>untreated&lt;/em> potential outcomes is the same for the treated and control groups. Note that this does &lt;strong>not&lt;/strong> require the two groups to have the same &lt;em>level&lt;/em> of consumption &amp;mdash; only the same &lt;em>trend&lt;/em>. The treated group can start higher or lower, as long as their consumption would have evolved at the same rate as the control group in the absence of the program.&lt;/p>
&lt;p>In an RCT, the parallel trends assumption is very plausible because randomization ensures the groups were similar at baseline. Any pre-existing differences between groups occurred by chance and are unlikely to produce different time trends. This makes DiD a strong estimator in our setting.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
subgraph &amp;quot;Parallel Trends Assumption&amp;quot;
PRE[&amp;quot;&amp;lt;b&amp;gt;Baseline 2021&amp;lt;/b&amp;gt;&amp;quot;]
POST[&amp;quot;&amp;lt;b&amp;gt;Endline 2024&amp;lt;/b&amp;gt;&amp;quot;]
end
PRE --&amp;gt;|&amp;quot;Treated group&amp;lt;br/&amp;gt;change = effect + trend&amp;quot;| POST
PRE --&amp;gt;|&amp;quot;Control group&amp;lt;br/&amp;gt;change = trend only&amp;quot;| POST
style PRE fill:#6a9bcc,stroke:#141413,color:#fff
style POST fill:#d97757,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;h3 id="92-why-does-did-estimate-att-and-not-ate">9.2 Why does DiD estimate ATT and not ATE?&lt;/h3>
&lt;p>This is a point that many beginners miss, so it is worth explaining carefully.&lt;/p>
&lt;p>Recall from Section 6 that the ATT is $E[Y_1(1) - Y_1(0) \mid D = 1]$ &amp;mdash; the effect on those who were treated. Sant&amp;rsquo;Anna and Zhao (2020) make this explicit: the main challenge is computing $E[Y_1(0) \mid D = 1]$ &amp;mdash; what would the treated group&amp;rsquo;s consumption have been at endline &lt;em>without&lt;/em> the program?&lt;/p>
&lt;p>DiD solves this by using the control group&amp;rsquo;s time trend as a stand-in. Specifically, it constructs the counterfactual for the treated group as:&lt;/p>
&lt;p>$$\underbrace{E[Y_1(0) \mid D = 1]}_{\text{Counterfactual}} = \underbrace{E[Y_0 \mid D = 1]}_{\text{Treated at baseline}} + \underbrace{(E[Y_1 \mid D = 0] - E[Y_0 \mid D = 0])}_{\text{Control group&amp;rsquo;s time trend}}$$&lt;/p>
&lt;p>This counterfactual is &lt;strong>specific to the treated group&lt;/strong> &amp;mdash; it starts from their baseline level and adds the control group&amp;rsquo;s trend. DiD therefore estimates what happened to the treated group relative to this counterfactual. This is precisely the ATT.&lt;/p>
&lt;p>&lt;strong>Why not the ATE?&lt;/strong> To estimate the ATE, we would also need the treatment effect for the untreated &amp;mdash; what would happen if we gave the program to those who did not receive it. DiD does not provide this, because the counterfactual it constructs runs in only one direction (control trend applied to treated baseline, not treated trend applied to control baseline).&lt;/p>
&lt;p>&lt;strong>In our RCT context&lt;/strong>, since treatment was randomly assigned, ATE and ATT are likely very similar (as we saw in Section 8). But in observational studies with heterogeneous treatment effects, this distinction matters greatly. A job-training program might have a larger effect on those who voluntarily enrolled (ATT) than it would have on randomly selected workers (ATE).&lt;/p>
&lt;h3 id="93-basic-did-with-panel-fixed-effects">9.3 Basic DiD with panel fixed effects&lt;/h3>
&lt;p>We now implement the basic DiD estimator using Stata&amp;rsquo;s &lt;code>xtdidregress&lt;/code> command, which handles the panel structure and computes clustered standard errors.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/ametrics/dataSIM4RCT.dta&amp;quot;, clear
* Create the treatment-post interaction
gen treat_post = treat * post
label var treat_post &amp;quot;Treated x Post (1 only for treated in 2024)&amp;quot;
* Declare panel structure
xtset id year
* Basic DiD with individual fixed effects
xtdidregress (y) (treat_post), group(id) time(year) vce(cluster id)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Number of obs = 4,000
Number of groups = 2,000
Outcome model : linear
Treatment model: none
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. t P&amp;gt;|t| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATET |
treat_post | .1347161 .0272737 4.94 0.000 .0812282 .188204
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The basic DiD estimate of the ATT is 0.135 (SE = 0.027, p &amp;lt; 0.001, 95% CI [0.081, 0.188]). This is slightly higher than the cross-sectional estimates (0.113&amp;ndash;0.116) but still contains the true effect of 0.12 within its confidence interval. The wider standard error (0.027 vs. 0.019) reflects the additional variability introduced by differencing within households. Standard errors are clustered at the household level to account for serial correlation within panels.&lt;/p>
&lt;p>The key advantage of this DiD estimate is that it controls for all &lt;strong>time-invariant unobservable characteristics&lt;/strong> of each household. In an RCT, randomization already handles confounding, so the cross-sectional and panel estimates are similar. But in observational settings, DiD&amp;rsquo;s ability to absorb household fixed effects can correct biases that cross-sectional methods cannot.&lt;/p>
&lt;h3 id="94-from-cross-sectional-dr-to-panel-dr-----doubly-robust-did-drdid">9.4 From cross-sectional DR to panel DR &amp;mdash; Doubly Robust DiD (DRDID)&lt;/h3>
&lt;p>In Section 7, we saw that doubly robust methods combine outcome modeling and propensity score modeling for cross-sectional data. &lt;strong>DRDID extends this logic to the panel setting.&lt;/strong> It combines the DiD framework (using pre/post variation) with doubly robust covariate adjustment.&lt;/p>
&lt;p>This approach was introduced by Sant&amp;rsquo;Anna and Zhao (2020) in a landmark paper published in the &lt;em>Journal of Econometrics&lt;/em>. They proposed estimators that are &amp;ldquo;consistent if either (but not necessarily both) a propensity score or outcome regression working models are correctly specified&amp;rdquo; &amp;mdash; bringing the doubly robust property from the cross-sectional world into the DiD framework.&lt;/p>
&lt;h4 id="why-do-we-need-drdid">Why do we need DRDID?&lt;/h4>
&lt;p>Recall from Section 9.2 that basic DiD relies on the &lt;strong>parallel trends assumption&lt;/strong> &amp;mdash; absent treatment, the treated and control groups would have followed the same time trend. But what if parallel trends holds only &lt;strong>conditional on covariates&lt;/strong>? For example, what if consumption trends differ between poor and non-poor households, but within each poverty group the trends are parallel?&lt;/p>
&lt;p>In this case, we need a &lt;strong>conditional&lt;/strong> parallel trends assumption:&lt;/p>
&lt;p>$$E[Y_1(0) - Y_0(0) \mid D = 1, X] = E[Y_1(0) - Y_0(0) \mid D = 0, X]$$&lt;/p>
&lt;p>This says that the average change in untreated potential outcomes is the same for treated and control groups &lt;em>who share the same covariates&lt;/em> $X$. Note that this allows for covariate-specific time trends (e.g., different consumption growth rates for poor and non-poor households) while still identifying the ATT.&lt;/p>
&lt;p>Under this conditional parallel trends assumption, there are two ways to estimate the ATT:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Outcome regression (OR) approach&lt;/strong> &amp;mdash; model how the outcome evolves over time for the control group, and use that model to predict the counterfactual evolution for the treated group&lt;/li>
&lt;li>&lt;strong>IPW approach&lt;/strong> &amp;mdash; reweight the control group so its covariate distribution matches the treated group, then compute the standard DiD&lt;/li>
&lt;/ul>
&lt;p>The problem is the same as in the cross-sectional case: OR requires a correctly specified outcome model, and IPW requires a correctly specified propensity score model. Sant&amp;rsquo;Anna and Zhao&amp;rsquo;s insight was that &lt;strong>you can combine both into a single estimator that works if either model is correct&lt;/strong>.&lt;/p>
&lt;h4 id="the-drdid-estimator-for-panel-data">The DRDID estimator for panel data&lt;/h4>
&lt;p>When panel data are available (as in our case &amp;mdash; same households observed at baseline and endline), the DRDID estimator takes a particularly clean form. Let $\Delta Y_i = Y_{i,post} - Y_{i,pre}$ denote each household&amp;rsquo;s change in consumption. The DR DID estimator is:&lt;/p>
&lt;p>$$\hat{\tau}_{DR}^{DiD} = \frac{1}{N_1} \sum_{i=1}^{N} \left[ w_1(D_i) - w_0(D_i, X_i) \right] \left[ \Delta Y_i - \hat{\mu}_{0,\Delta}(X_i) \right]$$&lt;/p>
&lt;p>where:&lt;/p>
&lt;ul>
&lt;li>$w_1(D_i) = D_i / \bar{D}$ assigns equal weight to each treated unit (the fraction treated)&lt;/li>
&lt;li>$w_0(D_i, X_i)$ reweights control units using the propensity score $\hat{p}(X)$, so they resemble the treated group&lt;/li>
&lt;li>$\hat{\mu}_{0,\Delta}(X_i) = \hat{\mu}_{0,post}(X_i) - \hat{\mu}_{0,pre}(X_i)$ is the predicted change in consumption for the control group, fitted from control-group data&lt;/li>
&lt;/ul>
&lt;p>In plain language: for each household, compute the change in consumption over time ($\Delta Y$) and subtract the model-predicted change for the control group ($\hat{\mu}_{0,\Delta}$). This residual captures the treatment effect plus any prediction error. Then reweight these residuals using IPW so that the control group matches the treated group&amp;rsquo;s covariate profile.&lt;/p>
&lt;h4 id="why-is-this-doubly-robust">Why is this doubly robust?&lt;/h4>
&lt;p>The doubly robust property works through the same logic as in the cross-sectional case (Section 7.3), but applied to &lt;strong>changes&lt;/strong> rather than levels:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>If the outcome model is correct&lt;/strong> ($\hat{\mu}_{0,\Delta}(X) = E[\Delta Y \mid D=0, X]$), then the residuals $\Delta Y_i - \hat{\mu}_{0,\Delta}(X_i)$ average to zero for the control group, regardless of the propensity score weights. The estimator reduces to an outcome-regression DiD. Correct answer.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>If the propensity score model is correct&lt;/strong> ($\hat{p}(X) = \Pr(D=1 \mid X)$), the IPW reweighting makes the control group comparable to the treated group, regardless of the outcome model. The correction term fixes any bias from a misspecified outcome model. Correct answer.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>If both are correct&lt;/strong>, the estimator achieves the &lt;strong>semiparametric efficiency bound&lt;/strong> &amp;mdash; it is the most precise estimator possible given the assumptions. Sant&amp;rsquo;Anna and Zhao proved this formally.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>If both are wrong&lt;/strong>, the estimator can be biased &amp;mdash; double robustness provides one layer of insurance, not two.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;pre>&lt;code class="language-mermaid">graph TD
DY[&amp;quot;&amp;lt;b&amp;gt;Panel data&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;ΔY = Y_post − Y_pre&amp;lt;br/&amp;gt;for each household&amp;quot;]
OR[&amp;quot;&amp;lt;b&amp;gt;Outcome model&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Predict control group's&amp;lt;br/&amp;gt;consumption change&amp;lt;br/&amp;gt;μ̂₀,Δ(X)&amp;quot;]
PS[&amp;quot;&amp;lt;b&amp;gt;Propensity score&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Estimate p(X)&amp;lt;br/&amp;gt;= Pr(D=1 | X)&amp;quot;]
RES[&amp;quot;&amp;lt;b&amp;gt;Residuals&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;ΔY − μ̂₀,Δ(X)&amp;quot;]
IPW_W[&amp;quot;&amp;lt;b&amp;gt;IPW reweighting&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Make controls look&amp;lt;br/&amp;gt;like treated group&amp;quot;]
DRDID[&amp;quot;&amp;lt;b&amp;gt;DR-DiD estimate&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;ATT = weighted average&amp;lt;br/&amp;gt;of residuals&amp;quot;]
DY --&amp;gt; RES
OR --&amp;gt; RES
PS --&amp;gt; IPW_W
RES --&amp;gt; DRDID
IPW_W --&amp;gt; DRDID
style DY fill:#141413,stroke:#00d4c8,color:#fff
style OR fill:#6a9bcc,stroke:#141413,color:#fff
style PS fill:#d97757,stroke:#141413,color:#fff
style RES fill:#6a9bcc,stroke:#141413,color:#fff
style IPW_W fill:#d97757,stroke:#141413,color:#fff
style DRDID fill:#00d4c8,stroke:#141413,color:#141413
&lt;/code>&lt;/pre>
&lt;h4 id="what-drdid-adds-over-basic-did-and-twfe">What DRDID adds over basic DiD and TWFE&lt;/h4>
&lt;p>Sant&amp;rsquo;Anna and Zhao (2020) also showed that the standard two-way fixed effects (TWFE) estimator &amp;mdash; the workhorse of applied economics &amp;mdash; can produce misleading results when treatment effects are heterogeneous across covariates. Specifically, the TWFE estimator implicitly assumes (i) that treatment effects are the same for all covariate values, and (ii) that there are no covariate-specific time trends. When these assumptions fail, &amp;ldquo;the estimand is, in general, different from the ATT, and policy evaluation based on it may be misleading.&amp;rdquo; DRDID avoids both of these pitfalls by allowing for flexible outcome models and covariate-specific trends.&lt;/p>
&lt;h4 id="stata-implementation">Stata implementation&lt;/h4>
&lt;p>The &lt;code>drdid&lt;/code> package (Rios-Avila, Sant&amp;rsquo;Anna, and Callaway) implements the estimators from the paper.&lt;/p>
&lt;pre>&lt;code class="language-stata">* Install the drdid package (only needed once)
ssc install drdid, replace
* Doubly Robust DiD with DRIPW estimator
drdid y c.age c.edu i.female i.poverty, ivar(id) time(year) treatment(treat) dripw
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Doubly robust difference-in-differences estimator
Outcome model : least squares
Treatment model: inverse probability
──────────────────────────────────────────────────────────────────────────────
| Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATET | .1374784 .027387 5.02 0.000 .0838008 .191156
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The DRDID estimate of the ATT is 0.137 (SE = 0.027, p &amp;lt; 0.001, 95% CI [0.084, 0.191]). The &lt;code>dripw&lt;/code> option specifies the Doubly Robust Inverse Probability Weighting estimator, which uses a linear least squares model for the outcome evolution of the control group and a logistic model for the propensity score. The result is slightly higher than basic DiD (0.135) and close to the true effect of 0.12.&lt;/p>
&lt;p>&lt;strong>Alternative: Stata 17+ built-in command.&lt;/strong> Stata 17 and later versions include a built-in doubly robust DiD estimator that does not require installing external packages.&lt;/p>
&lt;pre>&lt;code class="language-stata">xthdidregress aipw (y c.age c.edu i.female i.poverty) ///
(treat_post c.age c.edu i.female i.poverty), group(id)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Heterogeneous-treatment-effects regression Number of obs = 4,000
Number of panels = 2,000
Estimator: Augmented IPW
Panel variable: id
Treatment level: id
Control group: Never treated
(Std. err. adjusted for 2,000 clusters in id)
──────────────────────────────────────────────────────────────────────────────
| Robust
Cohort | ATET std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
year |
2024 | .1374784 .027387 5.02 0.000 .0838008 .191156
──────────────────────────────────────────────────────────────────────────────
Note: ATET computed using covariates.
&lt;/code>&lt;/pre>
&lt;p>The &lt;code>xthdidregress aipw&lt;/code> command produces the same ATT estimate of 0.137 (SE = 0.027, 95% CI [0.084, 0.191]) as the &lt;code>drdid&lt;/code> package &amp;mdash; confirming that both implement the same doubly robust DiD methodology. The output labels the result as &amp;ldquo;Cohort year 2024&amp;rdquo; because &lt;code>xthdidregress&lt;/code> is designed for settings with staggered treatment adoption across multiple cohorts; in our two-period design, there is only one treatment cohort (households treated in 2024). As the Stata manual explains, &amp;ldquo;AIPW models both treatment and outcome. If at least one of the models is correctly specified, it provides consistent estimates, a property called double robustness.&amp;rdquo;&lt;/p>
&lt;p>The agreement between &lt;code>drdid&lt;/code> (community package) and &lt;code>xthdidregress aipw&lt;/code> (built-in) provides a useful robustness check &amp;mdash; researchers can verify their results using both implementations.&lt;/p>
&lt;h4 id="panel-data-vs-repeated-cross-sections">Panel data vs. repeated cross-sections&lt;/h4>
&lt;p>An important result from Sant&amp;rsquo;Anna and Zhao (2020) is that panel data are &lt;strong>strictly more efficient&lt;/strong> than repeated cross-sections for estimating the ATT under the DiD framework. The intuition is straightforward: with panel data, we observe each household&amp;rsquo;s individual change over time ($\Delta Y_i$), which eliminates household-level variation. With repeated cross-sections, we can only compare group averages at different time points, which introduces additional noise. The efficiency gain is larger when the sample sizes in the pre and post periods are more imbalanced.&lt;/p>
&lt;p>In our study, we have a balanced panel (same 2,000 households at baseline and endline), so we benefit from this efficiency advantage.&lt;/p>
&lt;h3 id="95-cross-sectional-vs-panel-comparison">9.5 Cross-sectional vs. panel comparison&lt;/h3>
&lt;p>The table below compares our best cross-sectional estimates with the panel-based DiD estimates.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Method&lt;/th>
&lt;th>Approach&lt;/th>
&lt;th>Estimand&lt;/th>
&lt;th>Data Used&lt;/th>
&lt;th style="text-align:center">Estimate&lt;/th>
&lt;th style="text-align:center">SE&lt;/th>
&lt;th style="text-align:center">95% CI&lt;/th>
&lt;th style="text-align:center">Contains 0.12?&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Simple regression&lt;/td>
&lt;td>None&lt;/td>
&lt;td>ATE&lt;/td>
&lt;td>Endline only&lt;/td>
&lt;td style="text-align:center">0.116&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.078, 0.154]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>RA&lt;/td>
&lt;td>Outcome model&lt;/td>
&lt;td>ATE&lt;/td>
&lt;td>Endline only&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>IPW&lt;/td>
&lt;td>Treatment model&lt;/td>
&lt;td>ATE&lt;/td>
&lt;td>Endline only&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>DR (IPWRA)&lt;/td>
&lt;td>Both models&lt;/td>
&lt;td>ATE&lt;/td>
&lt;td>Endline only&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Basic DiD&lt;/td>
&lt;td>Panel FE&lt;/td>
&lt;td>&lt;strong>ATT&lt;/strong>&lt;/td>
&lt;td>&lt;strong>Both waves&lt;/strong>&lt;/td>
&lt;td style="text-align:center">0.135&lt;/td>
&lt;td style="text-align:center">0.027&lt;/td>
&lt;td style="text-align:center">[0.081, 0.188]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>DR-DiD (&lt;code>drdid&lt;/code>)&lt;/td>
&lt;td>Both + Panel&lt;/td>
&lt;td>&lt;strong>ATT&lt;/strong>&lt;/td>
&lt;td>&lt;strong>Both waves&lt;/strong>&lt;/td>
&lt;td style="text-align:center">0.137&lt;/td>
&lt;td style="text-align:center">0.027&lt;/td>
&lt;td style="text-align:center">[0.084, 0.191]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>DR-DiD (&lt;code>xthdidregress&lt;/code>)&lt;/td>
&lt;td>Both + Panel&lt;/td>
&lt;td>&lt;strong>ATT&lt;/strong>&lt;/td>
&lt;td>&lt;strong>Both waves&lt;/strong>&lt;/td>
&lt;td style="text-align:center">0.137&lt;/td>
&lt;td style="text-align:center">0.027&lt;/td>
&lt;td style="text-align:center">[0.084, 0.191]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>True effect&lt;/strong>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td style="text-align:center">&lt;strong>0.12&lt;/strong>&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Several important patterns emerge from this comparison. Cross-sectional methods estimate &lt;strong>ATE&lt;/strong> using only endline data, while DiD methods estimate &lt;strong>ATT&lt;/strong> using both survey waves. The two DR-DiD implementations (&lt;code>drdid&lt;/code> and &lt;code>xthdidregress aipw&lt;/code>) produce identical results, confirming methodological consistency. The DiD estimates (0.135&amp;ndash;0.137) are slightly higher than the cross-sectional estimates (0.113), but &lt;strong>all confidence intervals contain the true effect of 0.12&lt;/strong>. DiD&amp;rsquo;s wider standard errors (0.027 vs. 0.019) reflect the additional variability from differencing within households.&lt;/p>
&lt;p>The key value of DiD is &lt;strong>not&lt;/strong> tighter standard errors &amp;mdash; it is &lt;strong>robustness to time-invariant unobservables.&lt;/strong> In observational settings where randomization does not hold, DiD can correct biases that cross-sectional methods cannot address. In this RCT, randomization already handles confounding, so the estimates are similar. DRDID adds doubly robust protection on top of DiD, making it the most robust panel method available.&lt;/p>
&lt;hr>
&lt;h2 id="10-offer-vs-receipt-----endogenous-treatment-advanced">10. Offer vs. receipt &amp;mdash; endogenous treatment (advanced)&lt;/h2>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> This section addresses the advanced topic of imperfect compliance and endogenous treatment. Readers new to causal inference may wish to skip this section on a first reading and return to it later.&lt;/p>
&lt;/blockquote>
&lt;h3 id="101-the-compliance-problem">10.1 The compliance problem&lt;/h3>
&lt;p>All estimates in Sections 8 and 9 measure the effect of &lt;strong>being offered&lt;/strong> the cash transfer (&lt;code>treat&lt;/code>), not the effect of &lt;strong>actually receiving&lt;/strong> it (&lt;code>D&lt;/code>). This is the intent-to-treat (ITT) approach &amp;mdash; it captures the policy-relevant effect of the offer, regardless of whether households complied.&lt;/p>
&lt;p>But what about the effect of actual receipt? This is more complex because compliance is &lt;strong>not random&lt;/strong>. Only 85% of treated households received the transfer, and 5% of control households received it through other channels. The households that chose to take up the program may differ systematically from those that did not &amp;mdash; they may be more motivated, more financially constrained, or better connected. Naively comparing receivers to non-receivers would introduce &lt;strong>selection bias&lt;/strong>.&lt;/p>
&lt;p>The solution is to use the random assignment (&lt;code>treat&lt;/code>) as an &lt;strong>instrumental variable&lt;/strong> for actual receipt (&lt;code>D&lt;/code>). Because &lt;code>treat&lt;/code> was randomly assigned, it is independent of household characteristics and satisfies the requirements for a valid instrument. This allows us to isolate the causal effect of receipt, at least for the subset of households whose receipt was determined by the offer (the &amp;ldquo;compliers&amp;rdquo;).&lt;/p>
&lt;p>&lt;strong>Analogy &amp;mdash; prescriptions and pills.&lt;/strong> Imagine a doctor randomly prescribes a medication to some patients, but not all patients fill their prescription. We cannot simply compare those who took the pill to those who did not, because pill-takers may be more health-conscious. Instead, we use the random prescription (the &amp;ldquo;offer&amp;rdquo;) as a nudge &amp;mdash; it strongly predicts whether you take the pill but does not directly affect your health except through the pill. That is the instrumental variable approach: using the random offer to estimate the causal effect of actual receipt.&lt;/p>
&lt;h3 id="102-endogenous-treatment-regression">10.2 Endogenous treatment regression&lt;/h3>
&lt;p>Stata&amp;rsquo;s &lt;code>etregress&lt;/code> command estimates the effect of an endogenous treatment variable, using the random assignment as an excluded instrument.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/ametrics/dataSIM4RCT.dta&amp;quot;, clear
keep if post==1
* Endogenous treatment regression
etregress y c.age i.female i.poverty c.edu, ///
treat(D = treat c.age i.female i.poverty c.edu) vce(robust)
* Mark estimation sample
gen byte esample = e(sample)
* ATE of receipt
margins r.D if esample==1
* ATT of receipt
margins, predict(cte) subpop(if D==1 &amp;amp; esample==1)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Linear regression with endogenous treatment Number of obs = 2,000
Estimator: Maximum likelihood Wald chi2(5) = 92.23
Log pseudolikelihood = -1797.6297 Prob &amp;gt; chi2 = 0.0000
──────────────────────────────────────────────────────────────────────────────
| Robust
| Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
y |
age | .003187 .0010016 3.18 0.001 .001224 .0051501
1.female | .0801465 .0189552 4.23 0.000 .042995 .117298
1.poverty | -.1030302 .0205984 -5.00 0.000 -.1434023 -.062658
edu | .0182634 .0045243 4.04 0.000 .0093959 .0271308
1.D | .1471 .0246775 5.96 0.000 .0987329 .1954671
_cons | 9.705642 .0694641 139.72 0.000 9.569495 9.841789
─────────────+────────────────────────────────────────────────────────────────
D |
treat | 2.55806 .0802103 31.89 0.000 2.40085 2.715269
_cons | -1.844408 .2847883 -6.48 0.000 -2.402582 -1.286233
─────────────+────────────────────────────────────────────────────────────────
/athrho | -.0060068 .0481062 -0.12 0.901 -.1002933 .0882796
sigma | .4245195 .0066426 .411698 .4377404
──────────────────────────────────────────────────────────────────────────────
Wald test of indep. eqns. (rho = 0): chi2(1) = 0.02 Prob &amp;gt; chi2 = 0.9006
ATE of receipt (margins r.D):
──────────────────────────────────────────────────────────────────────────────
D | Contrast std. err. [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
(1 vs 0) | .1471 .0246775 .0987329 .1954671
──────────────────────────────────────────────────────────────────────────────
ATT of receipt (margins, predict(cte)):
──────────────────────────────────────────────────────────────────────────────
_cons | Margin std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
| .1471 .0246775 5.96 0.000 .0987329 .1954671
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The &lt;code>etregress&lt;/code> output reveals several important findings. The coefficient on &lt;code>D&lt;/code> (receipt) is 0.147 (SE = 0.025, p &amp;lt; 0.001, 95% CI [0.099, 0.195]), which is the estimated effect of actually receiving the cash transfer. This is larger than the offer-based estimates (0.113&amp;ndash;0.116) because not everyone who was offered the program received it &amp;mdash; the per-recipient effect is naturally larger than the per-offer effect. The Wald test of independent equations (rho = 0) has p = 0.901, indicating no evidence of endogeneity &amp;mdash; consistent with a well-designed RCT where unobservable factors do not drive both treatment receipt and consumption. The &lt;code>margins&lt;/code> commands confirm that both the ATE and ATT of receipt are 0.147 (identical in this case because the model assumes a constant treatment effect).&lt;/p>
&lt;h3 id="103-doubly-robust-estimation-of-receipt-effect">10.3 Doubly robust estimation of receipt effect&lt;/h3>
&lt;p>We can also estimate the receipt effect using a doubly robust approach, incorporating the baseline outcome &lt;code>y0&lt;/code> as an additional control variable (an ANCOVA-style adjustment) and including &lt;code>treat&lt;/code> (the random assignment) as a covariate in the treatment model for &lt;code>D&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/ametrics/dataSIM4RCT.dta&amp;quot;, clear
keep if post==1
* Doubly robust ATE of receipt, controlling for baseline outcome
teffects ipwra (y y0 c.age i.female i.poverty c.edu) ///
(D c.age i.female i.poverty c.edu treat), vce(robust)
* Diagnostic checks
tebalance summarize age edu i.female i.poverty
tebalance summarize, baseline
tebalance density y0
tebalance density age
teffects overlap
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : IPW regression adjustment
Outcome model : linear
Treatment model: logit
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATE |
D |
(1 vs 0) | .1172686 .0322495 3.64 0.000 .0540608 .1804764
─────────────+────────────────────────────────────────────────────────────────
POmean |
D |
0 | 10.03361 .0171459 585.19 0.000 10 10.06722
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The doubly robust estimate of the ATE of receipt is 0.117 (SE = 0.032, 95% CI [0.054, 0.180]). This is slightly lower than the &lt;code>etregress&lt;/code> estimate (0.147) and closer to the true effect of 0.12. The wider standard error (0.032 vs. 0.025) reflects the additional flexibility of the doubly robust approach. This specification includes &lt;code>y0&lt;/code> (the baseline outcome) in the outcome model, which controls for pre-treatment differences in consumption levels. The variable &lt;code>treat&lt;/code> appears in the treatment model for &lt;code>D&lt;/code> because random assignment is the strongest predictor of receipt.&lt;/p>
&lt;p>The diagnostic graphs below verify adequate covariate balance and propensity score overlap for the receipt model.&lt;/p>
&lt;p>&lt;img src="stata_rct_density_y0_receipt.png" alt="Density plot of baseline consumption (y0) for receivers and non-receivers, before and after IPWRA weighting.">&lt;/p>
&lt;p>&lt;img src="stata_rct_overlap_receipt.png" alt="Overlap plot showing propensity score distributions for receivers and non-receivers of the cash transfer.">&lt;/p>
&lt;p>The density and overlap plots confirm that the IPWRA weighting achieves good balance between receivers and non-receivers. After weighting, the effective sample sizes are approximately 999 treated and 1,001 control (rebalanced from the raw 923 receivers and 1,077 non-receivers). The weighted covariate means are closely aligned &amp;mdash; for example, the weighted mean age is 35.0 for receivers versus 35.2 for non-receivers, and the weighted poverty rate is 31.1% versus 31.4%. The propensity scores show sufficient overlap for reliable estimation.&lt;/p>
&lt;hr>
&lt;h2 id="11-comparing-all-estimates-----the-big-picture">11. Comparing all estimates &amp;mdash; the big picture&lt;/h2>
&lt;p>The table below brings together all estimates from the tutorial, providing a comprehensive overview of how different methods, estimands, and data structures relate to each other.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>#&lt;/th>
&lt;th>Method&lt;/th>
&lt;th>Approach&lt;/th>
&lt;th>Estimand&lt;/th>
&lt;th>Data&lt;/th>
&lt;th style="text-align:center">Estimate&lt;/th>
&lt;th style="text-align:center">SE&lt;/th>
&lt;th style="text-align:center">95% CI&lt;/th>
&lt;th style="text-align:center">Contains 0.12?&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>1&lt;/td>
&lt;td>Simple regression&lt;/td>
&lt;td>None&lt;/td>
&lt;td>ATE (offer)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.116&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.078, 0.154]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>2&lt;/td>
&lt;td>Regression Adjustment&lt;/td>
&lt;td>Outcome model&lt;/td>
&lt;td>ATE (offer)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>3&lt;/td>
&lt;td>Regression Adjustment&lt;/td>
&lt;td>Outcome model&lt;/td>
&lt;td>ATT (offer)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.076, 0.151]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>4&lt;/td>
&lt;td>Inverse Prob. Weighting&lt;/td>
&lt;td>Treatment model&lt;/td>
&lt;td>ATE (offer)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>5&lt;/td>
&lt;td>Inverse Prob. Weighting&lt;/td>
&lt;td>Treatment model&lt;/td>
&lt;td>ATT (offer)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.076, 0.151]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>6&lt;/td>
&lt;td>IPWRA (Doubly Robust)&lt;/td>
&lt;td>Both models&lt;/td>
&lt;td>ATE (offer)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>7&lt;/td>
&lt;td>IPWRA (Doubly Robust)&lt;/td>
&lt;td>Both models&lt;/td>
&lt;td>ATT (offer)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.076, 0.151]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>8&lt;/td>
&lt;td>Basic DiD&lt;/td>
&lt;td>Panel FE&lt;/td>
&lt;td>ATT (offer)&lt;/td>
&lt;td>Panel&lt;/td>
&lt;td style="text-align:center">0.135&lt;/td>
&lt;td style="text-align:center">0.027&lt;/td>
&lt;td style="text-align:center">[0.081, 0.188]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>9&lt;/td>
&lt;td>DR-DiD (&lt;code>drdid&lt;/code>)&lt;/td>
&lt;td>Both + Panel&lt;/td>
&lt;td>ATT (offer)&lt;/td>
&lt;td>Panel&lt;/td>
&lt;td style="text-align:center">0.137&lt;/td>
&lt;td style="text-align:center">0.027&lt;/td>
&lt;td style="text-align:center">[0.084, 0.191]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>10&lt;/td>
&lt;td>DR-DiD (&lt;code>xthdidregress&lt;/code>)&lt;/td>
&lt;td>Both + Panel&lt;/td>
&lt;td>ATT (offer)&lt;/td>
&lt;td>Panel&lt;/td>
&lt;td style="text-align:center">0.137&lt;/td>
&lt;td style="text-align:center">0.027&lt;/td>
&lt;td style="text-align:center">[0.084, 0.191]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>11&lt;/td>
&lt;td>Endogenous treatment (&lt;code>etregress&lt;/code>)&lt;/td>
&lt;td>IV&lt;/td>
&lt;td>ATE (receipt)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.147&lt;/td>
&lt;td style="text-align:center">0.025&lt;/td>
&lt;td style="text-align:center">[0.099, 0.195]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>12&lt;/td>
&lt;td>DR receipt (&lt;code>teffects ipwra&lt;/code>)&lt;/td>
&lt;td>Both models&lt;/td>
&lt;td>ATE (receipt)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.117&lt;/td>
&lt;td style="text-align:center">0.032&lt;/td>
&lt;td style="text-align:center">[0.054, 0.180]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;/td>
&lt;td>&lt;strong>True effect&lt;/strong>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td style="text-align:center">&lt;strong>0.12&lt;/strong>&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="four-key-takeaways">Four key takeaways&lt;/h3>
&lt;p>&lt;strong>1. RA vs. IPW vs. DR.&lt;/strong> In this well-designed RCT, all three cross-sectional approaches give remarkably similar results (0.113&amp;ndash;0.116). This convergence occurs because randomization ensures that both the outcome model and the propensity score model are approximately correct. The differences are small &amp;mdash; but in observational studies, where one model might be misspecified, the choice of method matters much more. Doubly robust methods are the safest bet because they remain consistent if either model is correct.&lt;/p>
&lt;p>&lt;strong>2. ATE vs. ATT.&lt;/strong> For all cross-sectional methods, ATE and ATT are nearly identical (0.113&amp;ndash;0.116). This confirms that treatment effects are roughly homogeneous across households in this simulation. When treatment effects are heterogeneous &amp;mdash; for example, if the program benefits poorer households more &amp;mdash; ATE and ATT can diverge. The researcher must choose the estimand that matches their policy question: ATE for scaling decisions, ATT for program evaluation.&lt;/p>
&lt;p>&lt;strong>3. Cross-sectional vs. DiD.&lt;/strong> DiD estimates (0.135&amp;ndash;0.137) are slightly higher than cross-sectional estimates (0.113&amp;ndash;0.116), but all confidence intervals contain the true effect of 0.12. DiD&amp;rsquo;s main advantage is controlling for &lt;strong>time-invariant unobservable&lt;/strong> household characteristics &amp;mdash; less important in an RCT (where randomization handles confounding) but critical in quasi-experimental settings. DRDID extends the doubly robust logic to the panel setting, providing the most robust estimator in our toolkit. DiD inherently estimates the &lt;strong>ATT&lt;/strong> because its counterfactual is constructed specifically for the treated group.&lt;/p>
&lt;p>&lt;strong>4. Offer vs. receipt.&lt;/strong> The effect of actually receiving the cash transfer (0.117&amp;ndash;0.147) is larger than the effect of being offered it (0.113&amp;ndash;0.116), because imperfect compliance dilutes the offer-based estimates. The doubly robust receipt estimate (0.117) is closest to the true effect of 0.12, while the endogenous treatment model (0.147) is slightly higher. All confidence intervals contain 0.12.&lt;/p>
&lt;hr>
&lt;h2 id="12-summary-and-key-takeaways">12. Summary and key takeaways&lt;/h2>
&lt;p>The cash transfer program increased household consumption by approximately &lt;strong>11&amp;ndash;14%&lt;/strong> across all estimation methods, close to the true effect of &lt;strong>12%&lt;/strong>. Every confidence interval contained the true value, demonstrating that all methods successfully recovered the correct answer.&lt;/p>
&lt;h3 id="seven-methodological-lessons">Seven methodological lessons&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Always verify baseline balance&lt;/strong> before estimating treatment effects. Even with randomization, chance imbalances can occur &amp;mdash; as we saw with the gender variable (SMD = 9.3%).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Be explicit about your estimand.&lt;/strong> ATE answers the policymaker&amp;rsquo;s question (&amp;ldquo;What if we scale this up?&amp;quot;), while ATT answers the evaluator&amp;rsquo;s question (&amp;ldquo;Did it help the participants?&amp;quot;). Different methods target different estimands.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Regression adjustment models the outcome; IPW models treatment assignment; doubly robust does both.&lt;/strong> These three approaches represent fundamentally different strategies for causal estimation. Understanding what each models &amp;mdash; and what can go wrong &amp;mdash; is essential for choosing the right method.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>In a well-designed RCT, all three approaches converge.&lt;/strong> But doubly robust methods provide insurance against model misspecification, making them the standard recommendation in modern causal inference.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Panel data controls for time-invariant unobservables&lt;/strong> that cross-sectional methods cannot address. By comparing each household to itself over time, DiD absorbs household fixed effects &amp;mdash; motivation, geography, family culture &amp;mdash; that are invisible to cross-sectional approaches.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>DiD inherently estimates the ATT&lt;/strong> because its counterfactual is specific to the treated group. The control group&amp;rsquo;s time trend provides a counterfactual for what the treated group would have experienced without the program &amp;mdash; but it does not tell us what would happen if the program were given to the untreated.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Doubly robust DiD (DRDID)&lt;/strong> extends the DR logic to the panel setting. It combines the power of DiD (controlling for household fixed effects) with the robustness of doubly robust estimation (protection against model misspecification), making it the most robust panel estimator available.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h3 id="limitations">Limitations&lt;/h3>
&lt;ul>
&lt;li>This tutorial uses &lt;strong>simulated data&lt;/strong> with known parameters. Real-world data may exhibit more complex compliance patterns, heterogeneous effects, and missing data.&lt;/li>
&lt;li>The panel has only &lt;strong>two periods&lt;/strong> (baseline and endline), limiting our ability to test for pre-treatment trends or estimate dynamic treatment effects.&lt;/li>
&lt;li>Treatment effects are &lt;strong>homogeneous&lt;/strong> by construction. In practice, researchers should explore heterogeneity across subgroups.&lt;/li>
&lt;/ul>
&lt;h3 id="next-steps">Next steps&lt;/h3>
&lt;ul>
&lt;li>Apply these methods to &lt;strong>real-world RCT data&lt;/strong> from actual cash transfer programs&lt;/li>
&lt;li>Explore &lt;strong>heterogeneous treatment effects&lt;/strong> by gender, poverty status, or education level&lt;/li>
&lt;li>Extend to &lt;strong>multi-period panels&lt;/strong> with staggered treatment adoption, using modern DiD methods (Callaway and Sant&amp;rsquo;Anna, 2021)&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="13-exercises">13. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Heterogeneous effects by gender.&lt;/strong> Estimate treatment effects separately for male-headed and female-headed households using IPWRA. Are the effects different? Does ATE still equal ATT when you restrict to subgroups?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Model misspecification.&lt;/strong> Compare the RA, IPW, and DR estimates when you deliberately misspecify the outcome model by omitting &lt;code>edu&lt;/code> and &lt;code>age&lt;/code> from the covariate list. Which method is most robust to this misspecification? What does this tell you about the value of doubly robust estimation?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Basic DiD vs. doubly robust DiD.&lt;/strong> Re-run the DiD analysis using the basic &lt;code>xtdidregress&lt;/code> command (no covariates) and compare it with the &lt;code>drdid&lt;/code> results (with covariates). How much do the estimates differ? What does this tell you about the role of covariate adjustment in DiD?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="references">References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://www.stata.com/manuals/teteffects.pdf" target="_blank" rel="noopener">Stata &lt;code>teffects&lt;/code> documentation &amp;mdash; Treatment-effects estimation&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1016/j.jeconom.2020.06.003" target="_blank" rel="noopener">Sant&amp;rsquo;Anna, P.H.C. &amp;amp; Zhao, J. (2020). Doubly Robust Difference-in-Differences Estimators. &lt;em>Journal of Econometrics&lt;/em>, 219(1), 101&amp;ndash;122&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1017/CBO9781139025751" target="_blank" rel="noopener">Imbens, G. &amp;amp; Rubin, D. (2015). &lt;em>Causal Inference for Statistics, Social, and Biomedical Sciences&lt;/em>. Cambridge University Press&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://friosavila.github.io/stpackages/drdid.html" target="_blank" rel="noopener">Rios-Avila, F., Sant&amp;rsquo;Anna, P.H.C., &amp;amp; Callaway, B. &lt;code>drdid&lt;/code> &amp;mdash; Doubly Robust DID estimators for Stata&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://dimewiki.worldbank.org/iebaltab" target="_blank" rel="noopener">World Bank &lt;code>ietoolkit&lt;/code> / &lt;code>iebaltab&lt;/code> documentation&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://tdmize.github.io/data/" target="_blank" rel="noopener">Mize, T. &lt;code>balanceplot&lt;/code> &amp;mdash; Stata module for covariate balance visualization&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://youtu.be/Gr_fu5deDMk" target="_blank" rel="noopener">RCT Analysis: Cash Transfers, Panel Data, and Doubly Robust Estimation (YouTube)&lt;/a>&lt;/li>
&lt;/ol></description></item><item><title>High-Dimensional Fixed Effects Regression: An Introduction in Python</title><link>https://carlos-mendez.org/post/python_pyfixest/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/python_pyfixest/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>Imagine you want to know whether union membership raises wages. You run a regression and find a strong positive association: union workers earn 18% more. But wait &amp;mdash; what if the workers who join unions are also more motivated, more experienced, or work in industries that pay well regardless? That 18% could be mostly &lt;em>selection&lt;/em>, not a genuine union effect. This is one of the most pervasive problems in empirical research: &lt;strong>omitted variable bias&lt;/strong>. Any time your data is grouped &amp;mdash; by individual, firm, country, or time period &amp;mdash; unobserved characteristics that differ across groups can contaminate your estimates, leading to conclusions that look solid but are fundamentally misleading.&lt;/p>
&lt;p>&lt;strong>Fixed effects regression&lt;/strong> is the workhorse solution. By absorbing all time-invariant group-level heterogeneity &amp;mdash; a worker&amp;rsquo;s innate ability, a firm&amp;rsquo;s management culture, a country&amp;rsquo;s institutional quality &amp;mdash; fixed effects eliminate an entire class of confounders in one step. The result is striking: in the wage panel we analyze below, the apparent union premium drops from 18% to just 7% once we account for individual fixed effects, revealing that more than half the raw association was driven by who selects into unions, not what unions do. This kind of dramatic correction is routine in applied research, which is why fixed effects appear in virtually every empirical paper that uses panel data.&lt;/p>
&lt;p>Modern implementations make this computationally painless. Rather than estimating thousands of dummy variables, they use a &lt;em>demeaning&lt;/em> algorithm that sweeps out group means before estimation. &lt;a href="https://pyfixest.org/" target="_blank" rel="noopener">PyFixest&lt;/a> brings this approach to Python with a concise formula syntax inspired by R&amp;rsquo;s &lt;code>fixest&lt;/code> package &amp;mdash; the most popular fixed effects library in the R ecosystem. In this tutorial we use PyFixest to build from simple OLS through one-way and two-way fixed effects, compare inference methods, perform instrumental variable estimation, analyze a real wage panel, and run event study designs for difference-in-differences &amp;mdash; all with a few lines of code. Along the way, we will see &lt;em>why&lt;/em> fixed effects work (by manually reproducing them via demeaning), discover what they &lt;em>cannot&lt;/em> do (estimate time-invariant effects like education), learn when standard TWFE breaks down in staggered treatment designs, and apply the CRE/Mundlak approach to recover the very coefficients that one-way FE absorb.&lt;/p>
&lt;p>&lt;strong>Learning objectives:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Understand why unobserved group heterogeneity biases OLS and how fixed effects remove that bias&lt;/li>
&lt;li>Implement one-way and two-way fixed effects regressions using PyFixest&amp;rsquo;s formula syntax&lt;/li>
&lt;li>Compare multiple model specifications efficiently using PyFixest&amp;rsquo;s stepwise operators&lt;/li>
&lt;li>Assess robustness by computing standard errors under different clustering assumptions&lt;/li>
&lt;li>Decompose panel variation into between and within components to diagnose what FE can and cannot estimate&lt;/li>
&lt;li>Frame a real wage panel through the Mincer equation and its panel extensions&lt;/li>
&lt;li>Recover time-invariant coefficients (education, race) using the CRE/Mundlak approach&lt;/li>
&lt;li>Apply fixed effects to event study designs with staggered treatment adoption&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Content outline.&lt;/strong> Sections 2&amp;ndash;4 set up the environment and establish an OLS baseline. Sections 5&amp;ndash;6 introduce fixed effects &amp;mdash; first through PyFixest&amp;rsquo;s absorption syntax, then by reproducing the same result manually via demeaning, building intuition for what FE actually does to the data. Section 7 shows how to compare multiple specifications in a single call, and Section 8 explores how standard error choices affect inference. Section 9 extends to two-way FE, and Section 10 combines FE with instrumental variables. Section 11 is the core case study: a real wage panel framed by the Mincer equation, where we decompose within and between variation, see how one-way FE absorb time-invariant variables like education, stress-test the common trends assumption with group-specific time effects, and recover education&amp;rsquo;s coefficient through the CRE/Mundlak approach. Section 12 applies FE to event study designs, with a careful discussion of why period −1 serves as the universal baseline. Throughout, each section builds on the previous &amp;mdash; the manual demeaning in Section 6 explains why education vanishes in Section 11, and the stepwise comparison in Section 7 foreshadows the specification table in Section 11.&lt;/p>
&lt;h2 id="2-setup-and-imports">2. Setup and imports&lt;/h2>
&lt;p>Before running the analysis, install the required packages if needed:&lt;/p>
&lt;pre>&lt;code class="language-python">pip install pyfixest
&lt;/code>&lt;/pre>
&lt;p>The following code imports PyFixest and standard data science libraries. PyFixest provides &lt;a href="https://pyfixest.org/reference/estimation.feols.html" target="_blank" rel="noopener">feols()&lt;/a> as its main estimation function, which accepts R-style formulas with a pipe &lt;code>|&lt;/code> separator for fixed effects.&lt;/p>
&lt;pre>&lt;code class="language-python">import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pyfixest as pf
# Reproducibility
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
# Site color palette
STEEL_BLUE = &amp;quot;#6a9bcc&amp;quot;
WARM_ORANGE = &amp;quot;#d97757&amp;quot;
NEAR_BLACK = &amp;quot;#141413&amp;quot;
TEAL = &amp;quot;#00d4c8&amp;quot;
&lt;/code>&lt;/pre>
&lt;details>
&lt;summary>&lt;strong>Dark theme figure styling&lt;/strong> (click to expand)&lt;/summary>
&lt;pre>&lt;code class="language-python"># Dark theme palette (consistent with site navbar/dark sections)
DARK_NAVY = &amp;quot;#0f1729&amp;quot;
GRID_LINE = &amp;quot;#1f2b5e&amp;quot;
LIGHT_TEXT = &amp;quot;#c8d0e0&amp;quot;
WHITE_TEXT = &amp;quot;#e8ecf2&amp;quot;
# Plot defaults — minimal, spine-free, dark background
plt.rcParams.update({
&amp;quot;figure.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.edgecolor&amp;quot;: DARK_NAVY,
&amp;quot;axes.linewidth&amp;quot;: 0,
&amp;quot;axes.labelcolor&amp;quot;: LIGHT_TEXT,
&amp;quot;axes.titlecolor&amp;quot;: WHITE_TEXT,
&amp;quot;axes.spines.top&amp;quot;: False,
&amp;quot;axes.spines.right&amp;quot;: False,
&amp;quot;axes.spines.left&amp;quot;: False,
&amp;quot;axes.spines.bottom&amp;quot;: False,
&amp;quot;axes.grid&amp;quot;: True,
&amp;quot;grid.color&amp;quot;: GRID_LINE,
&amp;quot;grid.linewidth&amp;quot;: 0.6,
&amp;quot;grid.alpha&amp;quot;: 0.8,
&amp;quot;xtick.color&amp;quot;: LIGHT_TEXT,
&amp;quot;ytick.color&amp;quot;: LIGHT_TEXT,
&amp;quot;xtick.major.size&amp;quot;: 0,
&amp;quot;ytick.major.size&amp;quot;: 0,
&amp;quot;text.color&amp;quot;: WHITE_TEXT,
&amp;quot;font.size&amp;quot;: 12,
&amp;quot;legend.frameon&amp;quot;: False,
&amp;quot;legend.fontsize&amp;quot;: 11,
&amp;quot;legend.labelcolor&amp;quot;: LIGHT_TEXT,
&amp;quot;figure.edgecolor&amp;quot;: DARK_NAVY,
&amp;quot;savefig.facecolor&amp;quot;: DARK_NAVY,
&amp;quot;savefig.edgecolor&amp;quot;: DARK_NAVY,
})
&lt;/code>&lt;/pre>
&lt;/details>
&lt;h2 id="3-data-loading-and-exploration">3. Data loading and exploration&lt;/h2>
&lt;h3 id="31-loading-the-dataset">3.1 Loading the dataset&lt;/h3>
&lt;p>PyFixest includes a built-in synthetic dataset designed for demonstrating fixed effects regression. We load it with &lt;a href="https://pyfixest.org/reference/utils.get_data.html" target="_blank" rel="noopener">pf.get_data()&lt;/a>, which returns a DataFrame with outcome variables (&lt;code>Y&lt;/code>, &lt;code>Y2&lt;/code>), covariates (&lt;code>X1&lt;/code>, &lt;code>X2&lt;/code>), fixed effect identifiers (&lt;code>f1&lt;/code>, &lt;code>f2&lt;/code>, &lt;code>f3&lt;/code>, &lt;code>group_id&lt;/code>), instruments (&lt;code>Z1&lt;/code>, &lt;code>Z2&lt;/code>), and sampling weights.&lt;/p>
&lt;pre>&lt;code class="language-python">data = pf.get_data()
print(f&amp;quot;Dataset shape: {data.shape}&amp;quot;)
print(f&amp;quot;\nColumn names: {list(data.columns)}&amp;quot;)
print(data.head())
print(data.describe().round(3))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Dataset shape: (1000, 11)
Column names: ['Y', 'Y2', 'X1', 'X2', 'f1', 'f2', 'f3', 'group_id', 'Z1', 'Z2', 'weights']
Y Y2 X1 X2 ... group_id Z1 Z2 weights
0 NaN 2.357103 0.0 0.457858 ... 9.0 -0.330607 1.054826 0.661478
1 -1.458643 5.163147 NaN -4.998406 ... 8.0 NaN -4.113690 0.772732
2 0.169132 0.751140 2.0 1.558480 ... 16.0 1.207778 0.465282 0.990929
3 3.319513 -2.656368 1.0 1.560402 ... 3.0 2.869997 0.467570 0.021123
4 0.134420 -1.866416 2.0 -3.472232 ... 14.0 0.835819 -3.115669 0.790815
Y Y2 X1 ... Z1 Z2 weights
count 999.000 1000.000 999.000 ... 999.000 1000.000 1000.000
mean -0.127 -0.309 1.043 ... 1.040 -0.113 0.495
std 2.305 5.584 0.808 ... 1.307 3.172 0.291
min -6.536 -16.974 0.000 ... -2.825 -11.576 0.000
25% -1.732 -4.029 0.000 ... 0.121 -2.252 0.248
50% -0.211 -0.459 1.000 ... 1.040 -0.064 0.469
75% 1.576 3.528 2.000 ... 1.946 2.028 0.746
max 6.907 17.156 2.000 ... 4.601 11.420 1.000
&lt;/code>&lt;/pre>
&lt;p>The dataset has 1,000 observations across 11 columns. The outcome &lt;code>Y&lt;/code> has a mean of -0.127 and standard deviation of 2.305, while &lt;code>X1&lt;/code> takes discrete values 0, 1, and 2. A few observations have missing values (1 missing in &lt;code>Y&lt;/code>, &lt;code>X1&lt;/code>, &lt;code>f1&lt;/code>, and &lt;code>Z1&lt;/code>), which PyFixest handles automatically by dropping incomplete cases. The &lt;code>group_id&lt;/code> variable identifies the group each observation belongs to, and this is the dimension we will control for with fixed effects.&lt;/p>
&lt;h3 id="32-visualizing-group-structure">3.2 Visualizing group structure&lt;/h3>
&lt;p>Before estimating any model, it helps to see how the relationship between &lt;code>X1&lt;/code> and &lt;code>Y&lt;/code> varies across groups. If groups have different average levels of &lt;code>Y&lt;/code>, standard OLS will mix within-group variation (what we care about) with between-group variation (which may reflect confounders).&lt;/p>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(10, 6))
groups = data[&amp;quot;group_id&amp;quot;].unique()
n_groups = len(groups)
cmap = plt.cm.tab20
for i, g in enumerate(sorted(groups)):
subset = data[data[&amp;quot;group_id&amp;quot;] == g]
ax.scatter(subset[&amp;quot;X1&amp;quot;], subset[&amp;quot;Y&amp;quot;], alpha=0.5, s=20,
color=cmap(i / n_groups),
label=f&amp;quot;Group {g}&amp;quot; if i &amp;lt; 5 else None)
ax.set_xlabel(&amp;quot;X1&amp;quot;, fontsize=13)
ax.set_ylabel(&amp;quot;Y&amp;quot;, fontsize=13)
ax.set_title(&amp;quot;Outcome (Y) vs Covariate (X1) by Group&amp;quot;, fontsize=15, fontweight=&amp;quot;bold&amp;quot;)
ax.legend(title=&amp;quot;Group (first 5)&amp;quot;, fontsize=9)
plt.savefig(&amp;quot;pyfixest_scatter_by_group.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_scatter_by_group.png" alt="Scatter plot of Y versus X1 colored by group membership, showing different intercepts across groups.">&lt;/p>
&lt;p>The scatter plot reveals that different groups have distinct average levels of &lt;code>Y&lt;/code> &amp;mdash; some clusters sit higher and others lower on the vertical axis. Within each group, however, &lt;code>Y&lt;/code> tends to decrease as &lt;code>X1&lt;/code> increases. This visual separation between groups is exactly the kind of heterogeneity that fixed effects regression absorbs, allowing us to isolate the within-group relationship between &lt;code>X1&lt;/code> and &lt;code>Y&lt;/code>.&lt;/p>
&lt;h2 id="4-simple-ols-baseline-no-fixed-effects">4. Simple OLS baseline (no fixed effects)&lt;/h2>
&lt;p>To establish a benchmark, we first estimate a standard OLS regression of &lt;code>Y&lt;/code> on &lt;code>X1&lt;/code> without any fixed effects. The model is:&lt;/p>
&lt;p>$$Y_i = \beta_0 + \beta_1 X_{1i} + \epsilon_i$$&lt;/p>
&lt;p>In words, we assume the outcome $Y$ is a linear function of $X_1$ plus random noise $\epsilon$. This gives us the overall association, mixing both within-group and between-group variation. We use heteroskedasticity-robust standard errors (&lt;code>HC1&lt;/code>) to account for non-constant variance.&lt;/p>
&lt;pre>&lt;code class="language-python">fit_ols = pf.feols(&amp;quot;Y ~ X1&amp;quot;, data=data, vcov=&amp;quot;HC1&amp;quot;)
print(fit_ols.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: Y, Fixed effects: 0
Inference: HC1
Observations: 998
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 0.919 | 0.112 | 8.223 | 0.000 | 0.699 | 1.138 |
| X1 | -1.000 | 0.082 | -12.134 | 0.000 | -1.162 | -0.838 |
---
RMSE: 2.158 R2: 0.123
&lt;/code>&lt;/pre>
&lt;p>The pooled OLS estimates a coefficient of -1.000 on &lt;code>X1&lt;/code> (SE = 0.082, p &amp;lt; 0.001), with an R-squared of 0.123. This means that a one-unit increase in &lt;code>X1&lt;/code> is associated with a 1.0-point decrease in &lt;code>Y&lt;/code> on average. However, this estimate ignores group-level differences &amp;mdash; it could be biased if &lt;code>X1&lt;/code> correlates with unobserved group characteristics. The model explains only 12.3% of the total variation in &lt;code>Y&lt;/code>, leaving substantial unexplained heterogeneity. Let us now see how fixed effects change the picture.&lt;/p>
&lt;h2 id="5-one-way-fixed-effects">5. One-way fixed effects&lt;/h2>
&lt;p>The following diagram illustrates the core problem fixed effects solve. When an unobserved group characteristic correlates with both the covariate and the outcome, it creates a &lt;em>backdoor path&lt;/em> that biases OLS. Fixed effects block this path by absorbing all group-level variation.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
A[&amp;quot;&amp;lt;b&amp;gt;Group Characteristics&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(unobserved)&amp;quot;] --&amp;gt;|&amp;quot;correlates&amp;quot;| X[&amp;quot;&amp;lt;b&amp;gt;X1&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(covariate)&amp;quot;]
A --&amp;gt;|&amp;quot;affects&amp;quot;| Y[&amp;quot;&amp;lt;b&amp;gt;Y&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(outcome)&amp;quot;]
X --&amp;gt;|&amp;quot;causal effect β = ?&amp;quot;| Y
FE[&amp;quot;&amp;lt;b&amp;gt;Fixed Effects&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(absorbs A)&amp;quot;] -.-&amp;gt;|&amp;quot;blocks backdoor&amp;quot;| A
style A fill:#d97757,stroke:#141413,color:#fff
style X fill:#6a9bcc,stroke:#141413,color:#fff
style Y fill:#00d4c8,stroke:#141413,color:#fff
style FE fill:#1a3a8a,stroke:#141413,color:#fff,stroke-dasharray: 5 5
&lt;/code>&lt;/pre>
&lt;h3 id="51-absorbing-group-heterogeneity">5.1 Absorbing group heterogeneity&lt;/h3>
&lt;p>Fixed effects regression controls for all time-invariant group characteristics by effectively adding a separate intercept for each group. In PyFixest, we specify fixed effects after a pipe &lt;code>|&lt;/code> in the formula. The syntax &lt;code>Y ~ X1 | group_id&lt;/code> means: regress &lt;code>Y&lt;/code> on &lt;code>X1&lt;/code>, absorbing &lt;code>group_id&lt;/code> fixed effects. Think of this as asking: &amp;ldquo;within each group, what is the relationship between &lt;code>X1&lt;/code> and &lt;code>Y&lt;/code>?&amp;rdquo;&lt;/p>
&lt;pre>&lt;code class="language-python">fit_fe1 = pf.feols(&amp;quot;Y ~ X1 | group_id&amp;quot;, data=data, vcov=&amp;quot;HC1&amp;quot;)
print(fit_fe1.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: Y, Fixed effects: group_id
Inference: HC1
Observations: 998
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1 | -1.019 | 0.083 | -12.234 | 0.000 | -1.182 | -0.856 |
---
RMSE: 2.141 R2: 0.137 R2 Within: 0.126
&lt;/code>&lt;/pre>
&lt;p>With &lt;code>group_id&lt;/code> fixed effects absorbed, the coefficient on &lt;code>X1&lt;/code> shifts slightly to -1.019 (SE = 0.083). The within R-squared of 0.126 tells us how much of the within-group variation in &lt;code>Y&lt;/code> is explained by &lt;code>X1&lt;/code> after removing group means. Compared to the pooled OLS estimate of -1.000, the fixed effects estimate is similar in this synthetic dataset, suggesting that &lt;code>X1&lt;/code> does not strongly correlate with group-level unobservables here. In real data, the shift can be dramatic &amp;mdash; that gap is the omitted variable bias that fixed effects remove.&lt;/p>
&lt;h3 id="52-equivalence-with-dummy-variables">5.2 Equivalence with dummy variables&lt;/h3>
&lt;p>Under the hood, fixed effects absorption produces the same point estimates as including explicit dummy variables for each group. PyFixest&amp;rsquo;s &lt;code>C()&lt;/code> operator creates these dummies. The key advantage of absorption is computational: with thousands of groups, estimating thousands of dummy coefficients is slow and memory-intensive, while demeaning is fast.&lt;/p>
&lt;pre>&lt;code class="language-python">fit_dummy = pf.feols(&amp;quot;Y ~ X1 + C(group_id)&amp;quot;, data=data, vcov=&amp;quot;HC1&amp;quot;)
print(f&amp;quot;X1 coefficient (FE absorption): {fit_fe1.coef()['X1']:.4f}&amp;quot;)
print(f&amp;quot;X1 coefficient (dummy vars): {fit_dummy.coef()['X1']:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">X1 coefficient (FE absorption): -1.0190
X1 coefficient (dummy vars): -1.0190
&lt;/code>&lt;/pre>
&lt;p>Both approaches yield identical coefficients of -1.0190 on &lt;code>X1&lt;/code>, confirming that FE absorption and dummy variable inclusion are algebraically equivalent. The absorption approach simply avoids estimating and storing the hundreds or thousands of group intercepts that are typically not of interest &amp;mdash; what econometricians call &lt;em>nuisance parameters&lt;/em>.&lt;/p>
&lt;h2 id="6-understanding-fixed-effects-via-manual-demeaning">6. Understanding fixed effects via manual demeaning&lt;/h2>
&lt;h3 id="61-the-within-transformation">6.1 The within transformation&lt;/h3>
&lt;p>To build intuition for what fixed effects actually do, we can perform the &lt;em>within transformation&lt;/em> manually. For each observation, we subtract its group mean from both &lt;code>Y&lt;/code> and &lt;code>X1&lt;/code>. This removes all between-group variation, leaving only the deviations from each group&amp;rsquo;s average. Regressing the demeaned &lt;code>Y&lt;/code> on the demeaned &lt;code>X1&lt;/code> recovers the same coefficient as the FE estimator. It is like centering each group at the origin &amp;mdash; the only variation left is how individuals within a group differ from their group&amp;rsquo;s typical level.&lt;/p>
&lt;p>The fixed effects estimator solves:&lt;/p>
&lt;p>$$\hat{\beta}_{FE} = \left(\sum_{i=1}^{N} \ddot{X}_i' \ddot{X}_i\right)^{-1} \sum_{i=1}^{N} \ddot{X}_i' \ddot{Y}_i$$&lt;/p>
&lt;p>where $\ddot{X}_i = X_{it} - \bar{X}_i$ and $\ddot{Y}_i = Y_{it} - \bar{Y}_i$ are the demeaned variables. In words, this says the FE estimator uses only within-group deviations from group means, eliminating any bias from group-level confounders.&lt;/p>
&lt;pre>&lt;code class="language-python"># Manual demeaning (within transformation)
data_dm = data.copy()
for col in [&amp;quot;Y&amp;quot;, &amp;quot;X1&amp;quot;]:
group_means = data_dm.groupby(&amp;quot;group_id&amp;quot;)[col].transform(&amp;quot;mean&amp;quot;)
data_dm[f&amp;quot;{col}_dm&amp;quot;] = data_dm[col] - group_means
fit_demeaned = pf.feols(&amp;quot;Y_dm ~ X1_dm&amp;quot;, data=data_dm, vcov=&amp;quot;HC1&amp;quot;)
print(f&amp;quot;X1 coefficient (FE absorption): {fit_fe1.coef()['X1']:.4f}&amp;quot;)
print(f&amp;quot;X1 coefficient (manual demean): {fit_demeaned.coef()['X1_dm']:.4f}&amp;quot;)
print(f&amp;quot;X1 coefficient (OLS, no FE): {fit_ols.coef()['X1']:.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">X1 coefficient (FE absorption): -1.0190
X1 coefficient (manual demean): -1.0190
X1 coefficient (OLS, no FE): -1.0001
&lt;/code>&lt;/pre>
&lt;p>The manual demeaning produces a coefficient of -1.0190, exactly matching the FE absorption result. The pooled OLS gave -1.0001 by comparison. This confirms that fixed effects regression is mathematically equivalent to subtracting group means from every variable before running OLS. The difference between -1.019 (FE) and -1.000 (OLS) reflects the bias introduced by between-group variation that is removed by demeaning.&lt;/p>
&lt;h3 id="62-visualizing-the-demeaning">6.2 Visualizing the demeaning&lt;/h3>
&lt;pre>&lt;code class="language-python">fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# Left: Raw data
for i, g in enumerate(sorted(groups)[:5]):
subset = data[data[&amp;quot;group_id&amp;quot;] == g]
axes[0].scatter(subset[&amp;quot;X1&amp;quot;], subset[&amp;quot;Y&amp;quot;], alpha=0.4, s=20,
color=cmap(i / n_groups))
axes[0].set_xlabel(&amp;quot;X1 (raw)&amp;quot;, fontsize=13)
axes[0].set_ylabel(&amp;quot;Y (raw)&amp;quot;, fontsize=13)
axes[0].set_title(&amp;quot;Raw Data: Between + Within Variation&amp;quot;, fontsize=13, fontweight=&amp;quot;bold&amp;quot;)
# Right: Demeaned data
axes[1].scatter(data_dm[&amp;quot;X1_dm&amp;quot;], data_dm[&amp;quot;Y_dm&amp;quot;], alpha=0.4, s=20, color=STEEL_BLUE)
x_range = np.linspace(data_dm[&amp;quot;X1_dm&amp;quot;].min(), data_dm[&amp;quot;X1_dm&amp;quot;].max(), 100)
y_pred = fit_demeaned.coef()[&amp;quot;X1_dm&amp;quot;] * x_range
axes[1].plot(x_range, y_pred, color=WARM_ORANGE, linewidth=2.5,
label=f&amp;quot;FE slope = {fit_demeaned.coef()['X1_dm']:.3f}&amp;quot;)
axes[1].set_xlabel(&amp;quot;X1 (demeaned)&amp;quot;, fontsize=13)
axes[1].set_ylabel(&amp;quot;Y (demeaned)&amp;quot;, fontsize=13)
axes[1].set_title(&amp;quot;Demeaned Data: Within-Group Variation Only&amp;quot;, fontsize=13, fontweight=&amp;quot;bold&amp;quot;)
axes[1].legend(fontsize=11)
plt.savefig(&amp;quot;pyfixest_demeaning.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_demeaning.png" alt="Side-by-side comparison of raw data (left) showing scattered clusters at different vertical levels, and demeaned data (right) centered at the origin with a clear negative slope.">&lt;/p>
&lt;p>The left panel shows the raw data with groups scattered at different vertical levels &amp;mdash; this between-group variation is what confounds the OLS estimate. The right panel shows the demeaned data: all groups are now centered at the origin, and the clear negative slope of -1.019 reflects the pure within-group relationship. This visual makes the FE intuition concrete: by removing group averages, we eliminate confounding from any variable that is constant within groups. Now let us explore how to estimate multiple specifications efficiently.&lt;/p>
&lt;h2 id="7-multiple-estimation-with-stepwise-operators">7. Multiple estimation with stepwise operators&lt;/h2>
&lt;h3 id="71-cumulative-stepwise-fixed-effects">7.1 Cumulative stepwise fixed effects&lt;/h3>
&lt;p>One of PyFixest&amp;rsquo;s most powerful features is its formula operators for estimating multiple models in a single call. The &lt;code>csw0()&lt;/code> operator adds fixed effects &lt;em>cumulatively&lt;/em>: &lt;code>csw0(f1, f2)&lt;/code> estimates three models &amp;mdash; no FE, then &lt;code>f1&lt;/code> only, then &lt;code>f1 + f2&lt;/code> &amp;mdash; in one line. This is far more efficient than writing three separate calls and makes it easy to see how results change as we add controls.&lt;/p>
&lt;pre>&lt;code class="language-python">fit_multi = pf.feols(&amp;quot;Y ~ X1 | csw0(f1, f2)&amp;quot;, data=data, vcov=&amp;quot;HC1&amp;quot;)
# Print summary for each model
models = fit_multi.all_fitted_models
for key in models:
m = models[key]
print(f&amp;quot;\nModel: {key}&amp;quot;)
print(m.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Model: Y~X1
Estimation: OLS
Dep. var.: Y, Fixed effects: 0
Inference: HC1
Observations: 998
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) |
|:--------------|-----------:|-------------:|----------:|-----------:|
| Intercept | 0.919 | 0.112 | 8.223 | 0.000 |
| X1 | -1.000 | 0.082 | -12.134 | 0.000 |
---
RMSE: 2.158 R2: 0.123
Model: Y~X1|f1
Estimation: OLS
Dep. var.: Y, Fixed effects: f1
Inference: HC1
Observations: 997
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) |
|:--------------|-----------:|-------------:|----------:|-----------:|
| X1 | -0.949 | 0.067 | -14.094 | 0.000 |
---
RMSE: 1.73 R2: 0.437 R2 Within: 0.161
Model: Y~X1|f1+f2
Estimation: OLS
Dep. var.: Y, Fixed effects: f1+f2
Inference: HC1
Observations: 997
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) |
|:--------------|-----------:|-------------:|----------:|-----------:|
| X1 | -0.919 | 0.060 | -15.440 | 0.000 |
---
RMSE: 1.441 R2: 0.609 R2 Within: 0.200
&lt;/code>&lt;/pre>
&lt;p>The coefficient on &lt;code>X1&lt;/code> shifts from -1.000 (no FE) to -0.949 (with &lt;code>f1&lt;/code>) to -0.919 (with &lt;code>f1 + f2&lt;/code>), while the overall R-squared jumps from 0.123 to 0.437 to 0.609. Adding &lt;code>f1&lt;/code> alone explains an additional 31 percentage points of variation &amp;mdash; a massive improvement that shows how much group-level heterogeneity &lt;code>f1&lt;/code> captures. Adding &lt;code>f2&lt;/code> on top of &lt;code>f1&lt;/code> brings R-squared to 0.609, meaning the two fixed effect dimensions together account for over 60% of the total variation in &lt;code>Y&lt;/code>. The standard error on &lt;code>X1&lt;/code> also shrinks from 0.082 to 0.060, reflecting the precision gain from reducing residual noise.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Specification&lt;/th>
&lt;th>X1 Coef.&lt;/th>
&lt;th>SE&lt;/th>
&lt;th>R-squared&lt;/th>
&lt;th>R-squared Within&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>No FE&lt;/td>
&lt;td>-1.000&lt;/td>
&lt;td>0.082&lt;/td>
&lt;td>0.123&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>FE: f1&lt;/td>
&lt;td>-0.949&lt;/td>
&lt;td>0.067&lt;/td>
&lt;td>0.437&lt;/td>
&lt;td>0.161&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>FE: f1 + f2&lt;/td>
&lt;td>-0.919&lt;/td>
&lt;td>0.060&lt;/td>
&lt;td>0.609&lt;/td>
&lt;td>0.200&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="72-visualizing-coefficient-stability">7.2 Visualizing coefficient stability&lt;/h3>
&lt;p>The table above shows the numbers, but a figure makes the comparison more immediate. Plotting the coefficient with its 95% confidence interval across specifications reveals both the stability of the point estimate and the precision gain from adding fixed effects.&lt;/p>
&lt;pre>&lt;code class="language-python"># Coefficient comparison across specifications
model_names = [&amp;quot;No FE&amp;quot;, &amp;quot;FE: f1&amp;quot;, &amp;quot;FE: f1 + f2&amp;quot;]
coefs = [models[k].coef()[&amp;quot;X1&amp;quot;] for k in models]
ses = [models[k].se()[&amp;quot;X1&amp;quot;] for k in models]
fig, ax = plt.subplots(figsize=(8, 5))
y_pos = np.arange(len(model_names))
ax.barh(y_pos, coefs, xerr=[1.96 * s for s in ses], height=0.5,
color=[STEEL_BLUE, WARM_ORANGE, TEAL], edgecolor=DARK_NAVY, capsize=5)
ax.set_yticks(y_pos)
ax.set_yticklabels(model_names, fontsize=12)
ax.set_xlabel(&amp;quot;Coefficient on X1&amp;quot;, fontsize=13)
ax.set_title(&amp;quot;Effect of X1 Across Fixed Effect Specifications&amp;quot;, fontsize=14, fontweight=&amp;quot;bold&amp;quot;)
ax.axvline(x=0, color=NEAR_BLACK, linewidth=0.8, linestyle=&amp;quot;--&amp;quot;, alpha=0.5)
plt.savefig(&amp;quot;pyfixest_coef_comparison.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_coef_comparison.png" alt="Horizontal bar chart comparing X1 coefficient estimates across no FE, one-way FE, and two-way FE specifications, all showing negative effects near -1.0 with narrowing confidence intervals.">&lt;/p>
&lt;p>The coefficient comparison chart shows that the point estimate on &lt;code>X1&lt;/code> remains stable around -1.0 across all three specifications, with confidence intervals narrowing as we add fixed effects. This stability suggests the estimate is robust to the inclusion of group-level controls. In applied research, large shifts across specifications would signal omitted variable concerns, making this type of comparison essential for assessing credibility.&lt;/p>
&lt;h2 id="8-inference-choosing-the-right-standard-errors">8. Inference: choosing the right standard errors&lt;/h2>
&lt;h3 id="81-comparing-standard-error-estimators">8.1 Comparing standard error estimators&lt;/h3>
&lt;p>The choice of standard errors can dramatically change statistical inference, even when point estimates remain the same. Standard (iid) errors assume all observations are independent and identically distributed. Heteroskedasticity-robust (HC1) errors relax the constant-variance assumption. Cluster-robust (CRV) errors account for arbitrary correlation within groups &amp;mdash; essential when observations within a group are not independent, like repeated measurements of the same individual. Think of it like estimating average height: if you measure the same person ten times, those ten measurements are not ten independent observations, and your standard error should reflect that.&lt;/p>
&lt;pre>&lt;code class="language-python">se_types = {
&amp;quot;iid&amp;quot;: &amp;quot;iid&amp;quot;,
&amp;quot;HC1 (robust)&amp;quot;: &amp;quot;HC1&amp;quot;,
&amp;quot;CRV1 (group_id)&amp;quot;: {&amp;quot;CRV1&amp;quot;: &amp;quot;group_id&amp;quot;},
&amp;quot;CRV1 (group_id + f2)&amp;quot;: {&amp;quot;CRV1&amp;quot;: &amp;quot;group_id + f2&amp;quot;},
&amp;quot;CRV3 (group_id)&amp;quot;: {&amp;quot;CRV3&amp;quot;: &amp;quot;group_id&amp;quot;},
}
print(f&amp;quot;{'SE Type':&amp;lt;22} {'SE(X1)':&amp;lt;10} {'t-stat':&amp;lt;10} {'p-value':&amp;lt;10}&amp;quot;)
print(&amp;quot;-&amp;quot; * 52)
for name, vcov in se_types.items():
fit_tmp = pf.feols(&amp;quot;Y ~ X1 | group_id&amp;quot;, data=data, vcov=vcov)
print(f&amp;quot;{name:&amp;lt;22} {fit_tmp.se()['X1']:&amp;lt;10.4f} &amp;quot;
f&amp;quot;{fit_tmp.tstat()['X1']:&amp;lt;10.3f} {fit_tmp.pvalue()['X1']:&amp;lt;10.4f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">SE Type SE(X1) t-stat p-value
----------------------------------------------------
iid 0.0858 -11.875 0.0000
HC1 (robust) 0.0833 -12.234 0.0000
CRV1 (group_id) 0.1172 -8.696 0.0000
CRV1 (group_id + f2) 0.1207 -8.445 0.0000
CRV3 (group_id) 0.1247 -8.174 0.0000
&lt;/code>&lt;/pre>
&lt;p>The standard error on &lt;code>X1&lt;/code> ranges from 0.0833 (HC1) to 0.1247 (CRV3), a 50% increase depending on the assumption about error correlation. While all p-values remain below 0.001 in this case, the t-statistic drops from 12.2 to 8.2 &amp;mdash; a substantial difference that could determine significance for weaker effects. Cluster-robust SEs (CRV1) inflate to 0.1172 because they account for within-group correlation. The CRV3 estimator, which provides a more conservative finite-sample correction, gives the largest SE of 0.1247. In practice, you should cluster at the level where you believe errors are correlated.&lt;/p>
&lt;h3 id="82-visualizing-the-se-tradeoff">8.2 Visualizing the SE tradeoff&lt;/h3>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(9, 5))
se_names = list(se_types.keys())
se_vals = []
for name, vcov in se_types.items():
fit_tmp = pf.feols(&amp;quot;Y ~ X1 | group_id&amp;quot;, data=data, vcov=vcov)
se_vals.append(fit_tmp.se()[&amp;quot;X1&amp;quot;])
colors = [STEEL_BLUE, WARM_ORANGE, TEAL, &amp;quot;#e8956a&amp;quot;, &amp;quot;#f0a88c&amp;quot;]
bars = ax.bar(range(len(se_names)), se_vals, color=colors, edgecolor=DARK_NAVY, width=0.6)
ax.set_xticks(range(len(se_names)))
ax.set_xticklabels(se_names, rotation=25, ha=&amp;quot;right&amp;quot;, fontsize=10)
ax.set_ylabel(&amp;quot;Standard Error of X1&amp;quot;, fontsize=13)
ax.set_title(&amp;quot;Standard Errors Under Different Assumptions&amp;quot;, fontsize=14, fontweight=&amp;quot;bold&amp;quot;)
for i, v in enumerate(se_vals):
ax.text(i, v + 0.002, f&amp;quot;{v:.4f}&amp;quot;, ha=&amp;quot;center&amp;quot;, fontsize=10, fontweight=&amp;quot;bold&amp;quot;)
plt.savefig(&amp;quot;pyfixest_se_comparison.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_se_comparison.png" alt="Bar chart showing standard errors increasing from iid (0.0858) to CRV3 (0.1247), illustrating how clustering assumptions inflate uncertainty.">&lt;/p>
&lt;p>The bar chart makes the progression vivid: moving from iid to cluster-robust standard errors increases uncertainty by nearly 50%. The iid and HC1 estimates are similar because heteroskedasticity is not a major concern here. The real jump occurs when we account for within-group correlation (CRV1), and the CRV3 bias-corrected estimator is the most conservative. For applied work with grouped data, defaulting to cluster-robust errors is the safest choice &amp;mdash; underestimating standard errors leads to falsely significant results.&lt;/p>
&lt;h2 id="9-two-way-fixed-effects">9. Two-way fixed effects&lt;/h2>
&lt;p>When data has two grouping dimensions &amp;mdash; for example, firms and years, or workers and occupations &amp;mdash; two-way fixed effects absorb unobserved heterogeneity along both dimensions. In PyFixest, we simply list both FE variables after the pipe: &lt;code>Y ~ X1 + X2 | f1 + f2&lt;/code>. This absorbs all factors that are constant within each level of &lt;code>f1&lt;/code> and each level of &lt;code>f2&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-python">fit_twoway = pf.feols(&amp;quot;Y ~ X1 + X2 | f1 + f2&amp;quot;, data=data, vcov=&amp;quot;HC1&amp;quot;)
print(fit_twoway.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: Y, Fixed effects: f1+f2
Inference: HC1
Observations: 997
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1 | -0.924 | 0.056 | -16.375 | 0.000 | -1.035 | -0.813 |
| X2 | -0.174 | 0.015 | -11.246 | 0.000 | -0.204 | -0.144 |
---
RMSE: 1.346 R2: 0.659 R2 Within: 0.303
&lt;/code>&lt;/pre>
&lt;p>Adding both &lt;code>f1&lt;/code> and &lt;code>f2&lt;/code> as fixed effects plus the additional covariate &lt;code>X2&lt;/code> yields an R-squared of 0.659 and a within R-squared of 0.303. The coefficient on &lt;code>X1&lt;/code> is -0.924 (SE = 0.056) and &lt;code>X2&lt;/code> is -0.174 (SE = 0.015), both highly significant. The within R-squared of 0.303 means that &lt;code>X1&lt;/code> and &lt;code>X2&lt;/code> together explain about 30% of the variation in &lt;code>Y&lt;/code> after absorbing both dimensions of fixed effects &amp;mdash; a substantial improvement over the 20% with &lt;code>X1&lt;/code> alone in the previous section.&lt;/p>
&lt;h2 id="10-instrumental-variables-with-fixed-effects">10. Instrumental variables with fixed effects&lt;/h2>
&lt;p>Sometimes the explanatory variable itself is &lt;em>endogenous&lt;/em> &amp;mdash; correlated with the error term due to measurement error, simultaneity, or omitted variables that fixed effects do not capture. Instrumental variables (IV) estimation addresses this by using external variables (instruments) that affect the outcome only through the endogenous variable. Think of instruments as a natural experiment embedded in the data: &lt;code>Z&lt;/code> affects &lt;code>X&lt;/code> but has no direct path to &lt;code>Y&lt;/code>, so any association between &lt;code>Z&lt;/code> and &lt;code>Y&lt;/code> must flow through &lt;code>X&lt;/code>. In PyFixest, the IV syntax uses a second pipe: &lt;code>Y2 ~ 1 | f1 + f2 | X1 ~ Z1 + Z2&lt;/code>. This reads: outcome &lt;code>Y2&lt;/code>, no exogenous controls (just the intercept &lt;code>1&lt;/code>), fixed effects &lt;code>f1 + f2&lt;/code>, and endogenous variable &lt;code>X1&lt;/code> instrumented by &lt;code>Z1&lt;/code> and &lt;code>Z2&lt;/code>.&lt;/p>
&lt;p>The IV estimator recovers the coefficient on &lt;code>X1&lt;/code> by first predicting &lt;code>X1&lt;/code> using the instruments, then using these predictions in the second-stage regression:&lt;/p>
&lt;p>$$\text{First stage: } X_1 = \pi_0 + \pi_1 Z_1 + \pi_2 Z_2 + \alpha_i + \gamma_t + \nu$$&lt;/p>
&lt;p>$$\text{Second stage: } Y_2 = \beta X_1^{predicted} + \alpha_i + \gamma_t + \epsilon$$&lt;/p>
&lt;p>In words, the first stage isolates the variation in &lt;code>X1&lt;/code> that is driven by the instruments &lt;code>Z1&lt;/code> and &lt;code>Z2&lt;/code>, stripping away the endogenous component. The second stage then uses only this &amp;ldquo;clean&amp;rdquo; variation to estimate the effect of &lt;code>X1&lt;/code> on &lt;code>Y2&lt;/code>. Here, $\alpha_i$ corresponds to the &lt;code>f1&lt;/code> fixed effects, $\gamma_t$ corresponds to the &lt;code>f2&lt;/code> fixed effects, and $\beta$ is the causal parameter of interest that we recover from the &lt;code>X1&lt;/code> coefficient in PyFixest&amp;rsquo;s output.&lt;/p>
&lt;pre>&lt;code class="language-python">fit_iv = pf.feols(&amp;quot;Y2 ~ 1 | f1 + f2 | X1 ~ Z1 + Z2&amp;quot;, data=data)
print(fit_iv.summary())
print(f&amp;quot;\nFirst-stage F-statistic: {fit_iv._f_stat_1st_stage:.2f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: IV
Dep. var.: Y2, Fixed effects: f1+f2
Inference: iid
Observations: 998
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| X1 | -1.600 | 0.336 | -4.768 | 0.000 | -2.259 | -0.942 |
---
First-stage F-statistic: 311.54
&lt;/code>&lt;/pre>
&lt;p>The IV estimate of &lt;code>X1&lt;/code> is -1.600 (SE = 0.336), substantially larger in magnitude than the OLS estimate of approximately -1.0. This divergence suggests that the OLS coefficient on &lt;code>X1&lt;/code> is attenuated &amp;mdash; a classic sign of measurement error or endogeneity that biases OLS toward zero. The first-stage F-statistic of 311.54 is well above the conventional threshold of 10, indicating that &lt;code>Z1&lt;/code> and &lt;code>Z2&lt;/code> are strong instruments. Strong instruments mean the IV estimate is reliable; with weak instruments, IV can perform worse than OLS. Note that with heterogeneous treatment effects, IV identifies the &lt;em>Local Average Treatment Effect&lt;/em> (LATE) &amp;mdash; the effect for units whose treatment status is shifted by the instruments &amp;mdash; rather than the Average Treatment Effect (ATE) for the entire population.&lt;/p>
&lt;h2 id="11-panel-data-application-wage-determinants">11. Panel data application: wage determinants&lt;/h2>
&lt;h3 id="111-the-wage-panel-variables-and-structure">11.1 The wage panel: variables and structure&lt;/h3>
&lt;p>To see fixed effects in action with real data, we analyze the Vella and Verbeek (1998) panel of 545 young men observed over 8 years (1980&amp;ndash;1987) from the National Longitudinal Survey of Youth (NLSY). This dataset, used in many econometrics textbooks, is ideal for studying wage determinants because it tracks the same workers as they enter the labor market, gain experience, change jobs, and make decisions about union membership and marriage. The key challenge is that unobserved individual ability differs across workers and correlates with both wages and these covariates &amp;mdash; a classic case for one-way fixed effects.&lt;/p>
&lt;pre>&lt;code class="language-python">url = &amp;quot;https://raw.githubusercontent.com/bashtage/linearmodels/main/linearmodels/datasets/wage_panel/wage_panel.csv.bz2&amp;quot;
wage_df = pd.read_csv(url, compression=&amp;quot;bz2&amp;quot;)
print(f&amp;quot;Wage panel shape: {wage_df.shape}&amp;quot;)
print(wage_df.describe().round(3))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Wage panel shape: (4360, 12)
nr year black exper hisp ... educ union lwage expersq occupation
count 4360.000 4360.000 4360.000 4360.000 4360.000 ... 4360.000 4360.000 4360.000 4360.000 4360.000
mean 5262.059 1983.500 0.116 6.500 0.161 ... 11.768 0.244 1.649 50.425 4.989
std 3496.150 2.292 0.320 2.292 0.367 ... 1.353 0.430 0.533 40.782 2.320
min 13.000 1980.000 0.000 1.000 0.000 ... 3.000 0.000 -3.579 1.000 1.000
25% 2329.000 1981.750 0.000 4.750 0.000 ... 11.000 0.000 1.351 16.000 4.000
50% 4569.000 1983.500 0.000 6.500 0.000 ... 12.000 0.000 1.671 36.000 5.000
75% 8406.000 1985.250 0.000 8.250 0.000 ... 12.000 0.000 1.991 81.000 6.000
max 12548.000 1987.000 1.000 12.000 1.000 ... 16.000 1.000 4.052 324.000 9.000
&lt;/code>&lt;/pre>
&lt;p>The panel contains 4,360 observations (545 individuals over 8 years) with 12 variables. Before running any model, it is important to understand how each variable is defined and measured.&lt;/p>
&lt;p>&lt;strong>Outcome variable:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;code>lwage&lt;/code> &amp;mdash; the natural logarithm of hourly wage. The log transformation means that coefficients are interpreted as approximate percentage changes. The mean of 1.649 corresponds to about \$5.20 per hour in 1980s dollars ($e^{1.649} \approx 5.20$). The standard deviation of 0.533 indicates substantial wage dispersion: the gap between a worker at the 25th percentile (\$3.86/hr) and the 75th percentile (\$7.32/hr) is roughly a doubling of wages.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Time-varying covariates&lt;/strong> (change within a worker over time):&lt;/p>
&lt;ul>
&lt;li>&lt;code>hours&lt;/code> &amp;mdash; annual hours worked. Mean of 2,191 (roughly 42 hours per week for 52 weeks). Ranges from 120 to 4,992, capturing both part-time spells and heavy overtime. We include hours to control for labor supply differences that affect hourly wage calculations.&lt;/li>
&lt;li>&lt;code>union&lt;/code> &amp;mdash; binary indicator (1 = covered by a union contract in the current year, 0 = not covered). About 24.4% of person-year observations are union-covered. Workers can move in and out of union jobs across years, and this within-worker variation in union status is what one-way FE use to identify the union wage premium.&lt;/li>
&lt;li>&lt;code>married&lt;/code> &amp;mdash; binary indicator (1 = currently married, 0 = not married). About 43.9% of observations are married. Since these are young men tracked from their early twenties, many transition from single to married during the panel, providing within-worker variation.&lt;/li>
&lt;li>&lt;code>exper&lt;/code> &amp;mdash; years of potential labor market experience, defined as age minus years of education minus 6. Ranges from 1 to 12 years. In this balanced panel where every worker is observed in every year, experience increases by exactly 1 each year, making it perfectly collinear with entity + year fixed effects. We therefore use &lt;code>expersq&lt;/code> instead in FE models.&lt;/li>
&lt;li>&lt;code>expersq&lt;/code> &amp;mdash; experience squared ($exper^2$). Captures the well-documented concavity in the experience&amp;ndash;earnings profile: wages rise with experience but at a diminishing rate. Unlike &lt;code>exper&lt;/code>, the squared term is a nonlinear function of time, so it is not collinear with entity + year FE and can be estimated.&lt;/li>
&lt;li>&lt;code>occupation&lt;/code> &amp;mdash; occupational category, coded 1 through 9 (9 distinct categories). Workers can and do switch occupations across years. This variable can be used as an additional fixed effect dimension.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Time-invariant covariates&lt;/strong> (fixed for each worker across all years):&lt;/p>
&lt;ul>
&lt;li>&lt;code>educ&lt;/code> &amp;mdash; years of completed schooling at the start of the panel. Mean of 11.77 years (just below a high school diploma), ranging from 3 to 16 years. Because the sample tracks young men who have already finished their schooling, education does not change over time. The median of 12 years (exactly a high school diploma) and the 75th percentile of 12 years indicate that most workers in this sample have a high school education, with a smaller group holding college degrees.&lt;/li>
&lt;li>&lt;code>black&lt;/code> &amp;mdash; binary indicator (1 = Black, 0 = non-Black). About 11.6% of workers are Black. Because race does not change over time, one-way FE absorb any wage differences associated with being Black.&lt;/li>
&lt;li>&lt;code>hisp&lt;/code> &amp;mdash; binary indicator (1 = Hispanic, 0 = non-Hispanic). About 16.1% of workers are Hispanic. Like &lt;code>black&lt;/code>, this is absorbed by one-way FE.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Panel identifiers:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;code>nr&lt;/code> &amp;mdash; unique worker identifier (545 distinct workers). This defines the entity dimension for fixed effects.&lt;/li>
&lt;li>&lt;code>year&lt;/code> &amp;mdash; calendar year, taking values 1980 through 1987. The panel is balanced: every worker appears in every year, giving exactly $545 \times 8 = 4,360$ observations.&lt;/li>
&lt;/ul>
&lt;p>The distinction between time-varying and time-invariant variables is the most consequential feature of this dataset for fixed effects analysis. Time-invariant variables will be perfectly collinear with entity dummies and cannot be estimated under one-way FE. Time-varying variables survive the within transformation and their effects can be identified. We verify this classification empirically:&lt;/p>
&lt;pre>&lt;code class="language-python">invariance = wage_df.groupby(&amp;quot;nr&amp;quot;)[[&amp;quot;educ&amp;quot;, &amp;quot;black&amp;quot;, &amp;quot;hisp&amp;quot;]].nunique()
print(f&amp;quot;Max unique values per worker:&amp;quot;)
print(invariance.max())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Max unique values per worker:
educ 1
black 1
hisp 1
dtype: int64
&lt;/code>&lt;/pre>
&lt;p>Each worker has exactly one value of education, race, and ethnicity across all eight years &amp;mdash; confirming these are truly time-invariant. By contrast, occupation is time-varying:&lt;/p>
&lt;pre>&lt;code class="language-python">occ_changes = wage_df.groupby(&amp;quot;nr&amp;quot;)[&amp;quot;occupation&amp;quot;].nunique()
print(f&amp;quot;Workers who change occupation: {(occ_changes &amp;gt; 1).sum()} / {len(occ_changes)}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Workers who change occupation: 484 / 545
&lt;/code>&lt;/pre>
&lt;p>Nearly 89% of workers switch occupations at least once during the panel. This high rate of switching makes occupation a valid candidate for a fixed effect dimension of its own (Section 11.5). By contrast, a variable like education, which never changes within a worker, would produce a column of zeros after demeaning and must be dropped &amp;mdash; a point we return to in Sections 11.3 and 11.4.&lt;/p>
&lt;h3 id="112-within-vs-between-variation">11.2 Within vs between variation&lt;/h3>
&lt;p>Before estimating any model, it helps to decompose the variation in each variable into &lt;em>between-worker&lt;/em> variation (permanent differences across workers) and &lt;em>within-worker&lt;/em> variation (changes over a worker&amp;rsquo;s career). This decomposition foreshadows what one-way fixed effects can and cannot estimate.&lt;/p>
&lt;pre>&lt;code class="language-python">cols = [&amp;quot;lwage&amp;quot;, &amp;quot;hours&amp;quot;, &amp;quot;union&amp;quot;, &amp;quot;married&amp;quot;, &amp;quot;expersq&amp;quot;, &amp;quot;educ&amp;quot;]
between = wage_df.groupby(&amp;quot;nr&amp;quot;)[cols].mean().std()
for col in cols:
wage_df[f&amp;quot;{col}_within&amp;quot;] = wage_df[col] - wage_df.groupby(&amp;quot;nr&amp;quot;)[col].transform(&amp;quot;mean&amp;quot;)
within = wage_df[[f&amp;quot;{c}_within&amp;quot; for c in cols]].std()
variation = pd.DataFrame({&amp;quot;Between&amp;quot;: between, &amp;quot;Within&amp;quot;: within}).round(4)
print(variation)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Between Within
lwage 0.3907 0.3623
hours 381.7831 418.6057
union 0.3294 0.2760
married 0.3766 0.3236
expersq 26.3513 31.1431
educ 1.7476 0.0000
&lt;/code>&lt;/pre>
&lt;p>The raw standard deviations differ wildly across variables (hours is in the hundreds, union is a fraction), so we normalize by computing each variable&amp;rsquo;s &lt;em>within share&lt;/em> &amp;mdash; the fraction of total variation that comes from within-worker changes over time. This puts all variables on the same 0&amp;ndash;100% scale:&lt;/p>
&lt;pre>&lt;code class="language-python">total = np.sqrt(between**2 + within**2)
within_share = (within / total).fillna(0) # educ: 0/0 → 0
between_share = 1 - within_share
fig, ax = plt.subplots(figsize=(10, 5))
y_pos = np.arange(len(cols))
bar_height = 0.55
# Stacked horizontal bars: between (left) + within (right) = 100%
ax.barh(y_pos, between_share.values, bar_height,
label=&amp;quot;Between (cross-worker)&amp;quot;, color=STEEL_BLUE, edgecolor=DARK_NAVY)
ax.barh(y_pos, within_share.values, bar_height, left=between_share.values,
label=&amp;quot;Within (over career)&amp;quot;, color=WARM_ORANGE, edgecolor=DARK_NAVY)
plt.savefig(&amp;quot;pyfixest_within_between.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_within_between.png" alt="Stacked horizontal bar chart showing the within vs between share of total variation for key wage panel variables, with education at 100% between variation.">&lt;/p>
&lt;p>The decomposition reveals a critical pattern. Education is 100% between-worker variation &amp;mdash; its within share is exactly 0% &amp;mdash; because no worker changes their education level during the panel. This means one-way FE literally cannot estimate education&amp;rsquo;s effect: the demeaned education column is all zeros. Log wages have a 68% within share and 32% between share, meaning most wage variation comes from changes over a worker&amp;rsquo;s career rather than permanent differences across workers. Variables with substantial within shares &amp;mdash; union (64%), married (65%), hours (74%), expersq (76%) &amp;mdash; can be estimated under one-way FE because they change over a worker&amp;rsquo;s career. The higher the within share, the more statistical power one-way FE retains for that variable.&lt;/p>
&lt;h3 id="113-the-mincer-equation-and-its-panel-extensions">11.3 The Mincer equation and its panel extensions&lt;/h3>
&lt;p>Before estimating any models, it helps to lay out the econometric framework that organizes all subsequent specifications. The &lt;strong>classic Mincer equation&lt;/strong> (Mincer, 1974) is the workhorse model of labor economics:&lt;/p>
&lt;p>$$\ln(wage_i) = \beta_0 + \beta_1 educ_i + \beta_2 exper_i + \beta_3 exper_i^2 + \epsilon_i$$&lt;/p>
&lt;p>This log-linear specification models wages as a function of years of schooling and experience, with experience entering quadratically to capture concave returns &amp;mdash; each additional year of experience raises wages, but by a diminishing amount. It is a cross-sectional model, estimating the average relationship across all workers at a single point in time.&lt;/p>
&lt;p>The &lt;strong>extended Mincer equation&lt;/strong> adds controls for union membership, marital status, hours worked, and demographic characteristics:&lt;/p>
&lt;p>$$\ln(wage_{it}) = \beta_0 + \beta_1 educ_i + \beta_2 expersq_{it} + \beta_3 union_{it} + \beta_4 married_{it} + \beta_5 hours_{it} + \beta_6 black_i + \beta_7 hisp_i + \epsilon_{it}$$&lt;/p>
&lt;p>The &lt;strong>panel FE extension&lt;/strong> replaces explicit controls for time-invariant characteristics with entity and time fixed effects:&lt;/p>
&lt;p>$$\ln(wage_{it}) = \beta X_{it} + \gamma Z_i + \alpha_i + \delta_t + \epsilon_{it}$$&lt;/p>
&lt;p>where $X_{it}$ denotes time-varying covariates (union, married, hours, experience), $Z_i$ denotes time-invariant characteristics (education, race), $\alpha_i$ captures one-way fixed effects (one intercept per worker), and $\delta_t$ captures year fixed effects. The key insight: when we include $\alpha_i$, the time-invariant variables $Z_i$ become perfectly collinear with the entity dummies and are absorbed. We gain protection against omitted variable bias from all unobserved time-invariant confounders, but we lose the ability to estimate $\gamma$.&lt;/p>
&lt;p>The &lt;strong>CRE/Mundlak extension&lt;/strong> &amp;mdash; the Mundlak (1978) device &amp;mdash; offers a way to recover $\gamma$:&lt;/p>
&lt;p>$$\ln(wage_{it}) = \beta X_{it} + \gamma Z_i + \pi \bar{X}_i + \epsilon_{it}$$&lt;/p>
&lt;p>where $\bar{X}_i$ are individual means of the time-varying variables. This replaces entity dummies with individual means, which model the correlation between unobserved heterogeneity and the covariates. The result: $\hat{\beta} \approx \hat{\beta}_{FE}$ for the time-varying variables, while $\gamma$ is now estimable because we no longer include entity dummies that absorb it.&lt;/p>
&lt;p>Sections 11.4&amp;ndash;11.7 estimate these models progressively: pooled OLS and one-way FE (11.4), two-way and three-way FE (11.5), group-specific time trends (11.6), and CRE/Mundlak (11.7).&lt;/p>
&lt;h3 id="114-from-pooled-ols-to-one-way-fe-the-education-tradeoff">11.4 From pooled OLS to one-way FE: the education tradeoff&lt;/h3>
&lt;p>We begin with the extended Mincer equation estimated by pooled OLS, which includes both time-varying and time-invariant variables:&lt;/p>
&lt;pre>&lt;code class="language-python">fit_pooled = pf.feols(
&amp;quot;lwage ~ educ + expersq + union + married + hours + black + hisp&amp;quot;,
data=wage_df, vcov=&amp;quot;HC1&amp;quot;
)
print(fit_pooled.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: lwage, Fixed effects: 0
Inference: HC1
Observations: 4360
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 0.265 | 0.069 | 3.823 | 0.000 | 0.129 | 0.402 |
| educ | 0.106 | 0.005 | 22.924 | 0.000 | 0.097 | 0.115 |
| expersq | 0.003 | 0.000 | 16.930 | 0.000 | 0.003 | 0.004 |
| union | 0.183 | 0.016 | 11.205 | 0.000 | 0.151 | 0.215 |
| married | 0.141 | 0.015 | 9.308 | 0.000 | 0.111 | 0.171 |
| hours | -0.000 | 0.000 | -3.139 | 0.002 | -0.000 | -0.000 |
| black | -0.135 | 0.024 | -5.549 | 0.000 | -0.182 | -0.087 |
| hisp | 0.013 | 0.020 | 0.670 | 0.503 | -0.025 | 0.052 |
---
RMSE: 0.484 R2: 0.175
&lt;/code>&lt;/pre>
&lt;p>Pooled OLS estimates a 10.6% return to each year of education, an 18.3% union premium, and a 14.1% marriage premium. Black workers earn about 13.5% less, while the Hispanic coefficient is small and insignificant. The R-squared is 0.175 &amp;mdash; these variables explain less than a fifth of wage variation.&lt;/p>
&lt;p>Now we estimate the one-way FE model, which absorbs all time-invariant worker characteristics:&lt;/p>
&lt;pre>&lt;code class="language-python">fit_entity = pf.feols(&amp;quot;lwage ~ expersq + union + married + hours | nr&amp;quot;,
data=wage_df, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;nr&amp;quot;})
print(fit_entity.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: lwage, Fixed effects: nr
Inference: CRV1
Observations: 4360
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| expersq | 0.004 | 0.000 | 16.537 | 0.000 | 0.003 | 0.004 |
| union | 0.078 | 0.024 | 3.319 | 0.001 | 0.032 | 0.125 |
| married | 0.115 | 0.022 | 5.217 | 0.000 | 0.071 | 0.158 |
| hours | -0.000 | 0.000 | -3.807 | 0.000 | -0.000 | -0.000 |
---
RMSE: 0.335 R2: 0.605 R2 Within: 0.145
&lt;/code>&lt;/pre>
&lt;p>One-way fixed effects dramatically improve model fit: R-squared jumps from 0.175 (pooled OLS) to 0.605, meaning worker-level heterogeneity accounts for over 40 percentage points of explained variation. The union premium drops from 18.3% to 7.8% (SE = 0.024) &amp;mdash; more than half the pooled estimate was driven by selection (workers who join unions differ systematically from those who do not). The marriage premium falls from 14.1% to 11.5% (SE = 0.022), a smaller reduction suggesting that marital status is less confounded by unobserved ability. The &lt;code>expersq&lt;/code> coefficient of 0.004 captures the concavity of the experience&amp;ndash;earnings profile within workers over time. Notice that &lt;code>educ&lt;/code>, &lt;code>black&lt;/code>, and &lt;code>hisp&lt;/code> are absent: these time-invariant variables are perfectly collinear with the 545 worker dummies and cannot be estimated under one-way FE.&lt;/p>
&lt;p>To see what happens when we try to include a time-invariant variable alongside one-way FE:&lt;/p>
&lt;pre>&lt;code class="language-python">import warnings
with warnings.catch_warnings(record=True) as w:
warnings.simplefilter(&amp;quot;always&amp;quot;)
fit_educ = pf.feols(&amp;quot;lwage ~ expersq + union + married + educ | nr&amp;quot;,
data=wage_df, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;nr&amp;quot;})
print(f&amp;quot;Coefficients estimated: {list(fit_educ.coef().index)}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Coefficients estimated: ['expersq', 'union', 'married']
&lt;/code>&lt;/pre>
&lt;p>Education is silently dropped. This is not a bug &amp;mdash; it is a fundamental consequence of the within transformation (Section 6):&lt;/p>
&lt;p>$$\ddot{educ}_{it} = educ_i - \bar{educ}_i = 0 \quad \text{for all } t$$&lt;/p>
&lt;p>Because a worker&amp;rsquo;s education does not change over the eight years of the panel, the demeaned value is exactly zero for every observation. A column of zeros is perfectly collinear with the entity dummies, so it must be dropped. The same applies to &lt;code>black&lt;/code> and &lt;code>hisp&lt;/code>.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>Pooled OLS&lt;/th>
&lt;th>One-Way FE&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>educ&lt;/td>
&lt;td>0.106&lt;/td>
&lt;td>dropped&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>expersq&lt;/td>
&lt;td>0.003&lt;/td>
&lt;td>0.004&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>union&lt;/td>
&lt;td>0.183&lt;/td>
&lt;td>0.078&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>married&lt;/td>
&lt;td>0.141&lt;/td>
&lt;td>0.115&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>hours&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>black&lt;/td>
&lt;td>-0.135&lt;/td>
&lt;td>dropped&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>hisp&lt;/td>
&lt;td>0.013&lt;/td>
&lt;td>dropped&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>R-squared&lt;/td>
&lt;td>0.175&lt;/td>
&lt;td>0.605&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>This table crystallizes the fundamental tradeoff. Pooled OLS estimates everything &amp;mdash; education, race, union, marriage &amp;mdash; but its estimates are biased by unobserved ability. One-Way FE eliminates the ability bias, and the union premium drops from 18.3% to 7.8%, revealing that more than half the raw association was selection. But the price is steep: education, Black, and Hispanic are all absorbed into the individual intercepts. We cannot estimate the return to schooling or the racial wage gap under one-way FE. Sections 11.5&amp;ndash;11.6 push further with additional FE dimensions, and Section 11.7 shows how CRE partially resolves this tradeoff.&lt;/p>
&lt;h3 id="115-two-way-and-three-way-fixed-effects">11.5 Two-way and three-way fixed effects&lt;/h3>
&lt;p>Adding year fixed effects to one-way FE creates a two-way FE (TWFE) model that absorbs both individual heterogeneity and common time trends:&lt;/p>
&lt;pre>&lt;code class="language-python">fit_panel = pf.feols(&amp;quot;lwage ~ expersq + union + married + hours | nr + year&amp;quot;,
data=wage_df, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;nr + year&amp;quot;})
&lt;/code>&lt;/pre>
&lt;p>We can go further by adding occupation as a third fixed effect dimension. As we saw in Section 11.1, nearly 89% of workers switch occupations during the panel, so occupation is a valid time-varying dimension:&lt;/p>
&lt;pre>&lt;code class="language-python">fit_threeway = pf.feols(
&amp;quot;lwage ~ expersq + union + married + hours | nr + year + C(occupation)&amp;quot;,
data=wage_df, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;nr&amp;quot;}
)
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>Pooled OLS&lt;/th>
&lt;th>One-Way FE&lt;/th>
&lt;th>Two-Way FE&lt;/th>
&lt;th>Three-Way FE&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>expersq&lt;/td>
&lt;td>0.003&lt;/td>
&lt;td>0.004&lt;/td>
&lt;td>-0.006&lt;/td>
&lt;td>-0.006&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>union&lt;/td>
&lt;td>0.183&lt;/td>
&lt;td>0.078&lt;/td>
&lt;td>0.073&lt;/td>
&lt;td>0.075&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>married&lt;/td>
&lt;td>0.141&lt;/td>
&lt;td>0.115&lt;/td>
&lt;td>0.048&lt;/td>
&lt;td>0.047&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>hours&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>R-squared&lt;/td>
&lt;td>0.175&lt;/td>
&lt;td>0.605&lt;/td>
&lt;td>0.631&lt;/td>
&lt;td>0.632&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;pre>&lt;code class="language-python">fig, axes = plt.subplots(2, 2, figsize=(12, 8))
panel_models = {&amp;quot;Pooled OLS&amp;quot;: fit_pooled, &amp;quot;One-Way FE&amp;quot;: fit_entity,
&amp;quot;Two-Way FE&amp;quot;: fit_panel, &amp;quot;Three-Way FE&amp;quot;: fit_threeway}
panel_vars = [&amp;quot;expersq&amp;quot;, &amp;quot;union&amp;quot;, &amp;quot;married&amp;quot;, &amp;quot;hours&amp;quot;]
panel_colors = [STEEL_BLUE, WARM_ORANGE, TEAL, &amp;quot;#e8956a&amp;quot;]
for idx, var in enumerate(panel_vars):
ax = axes.flatten()[idx]
model_names_p = list(panel_models.keys())
coefs_p = [panel_models[m].coef()[var] for m in model_names_p]
ses_p = [panel_models[m].se()[var] for m in model_names_p]
ax.bar(range(4), coefs_p, yerr=[1.96 * s for s in ses_p],
color=panel_colors, edgecolor=DARK_NAVY, width=0.5, capsize=4)
ax.set_xticks(range(4))
ax.set_xticklabels(model_names_p, fontsize=8, rotation=15)
ax.set_title(var, fontsize=12, fontweight=&amp;quot;bold&amp;quot;)
ax.axhline(y=0, color=NEAR_BLACK, linewidth=0.5, linestyle=&amp;quot;--&amp;quot;, alpha=0.5)
fig.suptitle(&amp;quot;Coefficient Estimates Across FE Specifications&amp;quot;,
fontsize=14, fontweight=&amp;quot;bold&amp;quot;, y=1.02)
plt.savefig(&amp;quot;pyfixest_wage_extended.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_wage_extended.png" alt="Four-panel chart comparing coefficient estimates across pooled OLS, one-way FE, two-way FE, and three-way FE specifications.">&lt;/p>
&lt;p>The results show diminishing returns to additional FE dimensions. The big action was one-way FE: R-squared jumps from 0.175 to 0.605, and the union premium drops from 18.3% to 7.8%. Adding year effects (TWFE) pushes R-squared to 0.631 and the union premium stabilizes at 7.3%. Adding occupation as a third dimension barely moves anything &amp;mdash; R-squared rises to 0.632 and the union premium is 7.5%. The &lt;code>expersq&lt;/code> coefficient flips sign with TWFE (-0.006) because year effects absorb common trends in experience and wages. The stability of the union and marriage coefficients across the last three specifications suggests these estimates are robust to additional controls for time trends and occupational sorting.&lt;/p>
&lt;h3 id="116-interactive-fixed-effects">11.6 Interactive fixed effects&lt;/h3>
&lt;p>Sections 11.4&amp;ndash;11.5 used &lt;em>additive&lt;/em> fixed effects (&lt;code>nr + year&lt;/code>), where every individual shares the same set of year effects. &lt;strong>Interactive&lt;/strong> (or &lt;em>interacted&lt;/em>) fixed effects generalize this by allowing one FE dimension to vary across levels of another &amp;mdash; producing group-specific intercepts for each time period. Instead of a single set of year dummies shared by all workers, we estimate separate year effects for each demographic group.&lt;/p>
&lt;p>Why does this matter? Black and non-Black workers may face different labor market trends during the 1980s. If macroeconomic shocks hit these groups differently, a common set of year effects would be misspecified. We can test this by allowing year effects to vary by race:&lt;/p>
&lt;p>$$\ln(wage_{it}) = \beta X_{it} + \alpha_i + \gamma_{t,g(i)} + \epsilon_{it}$$&lt;/p>
&lt;p>where $g(i) \in \{Black, non\text{-}Black\}$, so we estimate separate year effects for each racial group.&lt;/p>
&lt;p>Pyfixest implements interactive FE with the &lt;strong>caret operator&lt;/strong> (&lt;code>^&lt;/code>): the syntax &lt;code>year^black&lt;/code> in the fixed-effects slot creates a separate year dummy for each value of &lt;code>black&lt;/code>. This mirrors R&amp;rsquo;s fixest package. The equivalent manual approach is to concatenate the columns (&lt;code>wage_df[&amp;quot;year_black&amp;quot;] = wage_df[&amp;quot;year&amp;quot;].astype(str) + &amp;quot;_&amp;quot; + wage_df[&amp;quot;black&amp;quot;].astype(str)&lt;/code>) and absorb the resulting string variable, but the caret operator is preferred because it keeps the interaction structure visible in the formula.&lt;/p>
&lt;pre>&lt;code class="language-python"># Pyfixest caret operator for interacted fixed effects
fit_gtrends = pf.feols(&amp;quot;lwage ~ expersq + union + married + hours | nr + year^black&amp;quot;,
data=wage_df, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;nr&amp;quot;})
print(fit_gtrends.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: lwage, Fixed effects: nr+year^black
Inference: CRV1
Observations: 4360
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|----------: |------------: |--------: |---------: |-----: |------: |
| expersq | -0.006 | 0.001 | -5.878 | 0.000 | -0.008 | -0.004 |
| union | 0.074 | 0.024 | 3.129 | 0.002 | 0.028 | 0.121 |
| married | 0.045 | 0.020 | 2.262 | 0.024 | 0.006 | 0.084 |
| hours | -0.000 | 0.000 | -0.393 | 0.694 | -0.001 | 0.001 |
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>Two-Way FE (additive)&lt;/th>
&lt;th>Interactive FE (year × race)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>expersq&lt;/td>
&lt;td>-0.006&lt;/td>
&lt;td>-0.006&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>union&lt;/td>
&lt;td>0.073&lt;/td>
&lt;td>0.074&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>married&lt;/td>
&lt;td>0.048&lt;/td>
&lt;td>0.045&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>hours&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(9, 5))
vars_plot = [&amp;quot;expersq&amp;quot;, &amp;quot;union&amp;quot;, &amp;quot;married&amp;quot;, &amp;quot;hours&amp;quot;]
x = np.arange(len(vars_plot))
width = 0.35
twfe_coefs = [fit_panel.coef()[v] for v in vars_plot]
gtrend_coefs = [fit_gtrends.coef()[v] for v in vars_plot]
ax.bar(x - width/2, twfe_coefs, width, label=&amp;quot;Two-Way FE&amp;quot;, color=STEEL_BLUE, edgecolor=DARK_NAVY)
ax.bar(x + width/2, gtrend_coefs, width, label=&amp;quot;Interactive FE&amp;quot;, color=WARM_ORANGE, edgecolor=DARK_NAVY)
ax.set_xticks(x)
ax.set_xticklabels(vars_plot, fontsize=11)
ax.set_ylabel(&amp;quot;Coefficient Estimate&amp;quot;, fontsize=13)
ax.set_title(&amp;quot;Additive vs Interactive Fixed Effects&amp;quot;, fontsize=14, fontweight=&amp;quot;bold&amp;quot;)
ax.legend(fontsize=11)
ax.axhline(y=0, color=NEAR_BLACK, linewidth=0.5, linestyle=&amp;quot;--&amp;quot;, alpha=0.5)
plt.savefig(&amp;quot;pyfixest_group_trends.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_group_trends.png" alt="Side-by-side bar chart comparing additive TWFE and interactive fixed effect coefficient estimates.">&lt;/p>
&lt;p>The coefficients are nearly identical under both specifications. Moving from additive to interactive fixed effects barely changes the estimated returns to union membership (7.3% → 7.4%), marriage (4.8% → 4.5%), or experience. This stability indicates that year effects are similar across racial groups &amp;mdash; the additive TWFE specification is not misspecified by imposing common year effects. The interactive model uses 545 one-way FE plus 16 group-year FE (8 years × 2 groups) = 561 FE parameters to explain 4,360 observations &amp;mdash; well short of saturation. Had the coefficients shifted substantially, that would have signaled that Black and non-Black workers face sufficiently different macro trends to warrant group-specific year effects, and that the standard additive TWFE was masking this heterogeneity.&lt;/p>
&lt;h3 id="117-recovering-time-invariant-effects-the-cremundlak-approach">11.7 Recovering time-invariant effects: the CRE/Mundlak approach&lt;/h3>
&lt;p>Sections 11.4&amp;ndash;11.6 revealed a fundamental tradeoff in panel econometrics. One-way FE eliminate omitted variable bias from all unobserved time-invariant confounders &amp;mdash; a powerful guarantee &amp;mdash; but they absorb education, race, and ethnicity in the process. Pooled OLS estimates coefficients for everything, but those estimates are biased whenever unobserved worker traits correlate with the covariates. We want the best of both worlds: the bias protection of FE with the ability to estimate time-invariant effects.&lt;/p>
&lt;p>Imagine you could describe each worker&amp;rsquo;s &amp;ldquo;type&amp;rdquo; not with a unique ID but with a summary of their career trajectory &amp;mdash; their average union participation rate, average hours worked, average marital status, and so on. Two workers with similar career averages are arguably similar in unobserved ways too: a worker who spends 80% of their career in a union likely differs systematically from one who never joins. The &lt;strong>Correlated Random Effects&lt;/strong> (CRE) model &amp;mdash; also called the &lt;strong>Mundlak (1978) device&lt;/strong> &amp;mdash; operationalizes this intuition by replacing the 545 entity dummies with a handful of individual-mean variables that capture the same correlation structure.&lt;/p>
&lt;p>&lt;strong>The CRE equation.&lt;/strong> Recall from Section 11.3 that the CRE equation replaces entity dummies $\alpha_i$ with individual means $\bar{X}_i$ of the time-varying variables:&lt;/p>
&lt;p>$$\ln(wage_{it}) = \beta X_{it} + \gamma Z_i + \pi \bar{X}_i + \epsilon_{it}$$&lt;/p>
&lt;p>In words, this equation says that a worker&amp;rsquo;s log wage depends on three components: (1) their current values of time-varying covariates ($X_{it}$), (2) their permanent characteristics ($Z_i$ like education and race), and (3) a set of correction terms ($\bar{X}_i$) that capture the &lt;em>average&lt;/em> level of each time-varying variable across their career. In our code, $X_{it}$ corresponds to &lt;code>expersq&lt;/code>, &lt;code>union&lt;/code>, &lt;code>married&lt;/code>, and &lt;code>hours&lt;/code> in each year; $Z_i$ corresponds to &lt;code>educ&lt;/code>, &lt;code>black&lt;/code>, and &lt;code>hisp&lt;/code>; and $\bar{X}_i$ corresponds to the &lt;code>*_mean&lt;/code> columns we compute below.&lt;/p>
&lt;p>&lt;strong>Why does including $\bar{X}_i$ work?&lt;/strong> The individual means proxy for the unobserved individual effect $\alpha_i$. Consider union membership: if workers who join unions more often (high $\overline{union}_i$) also have higher unobserved ability or motivation, then $\overline{union}_i$ captures that correlation. Once we control for it, the remaining within-person variation in union status is &amp;ldquo;clean&amp;rdquo; &amp;mdash; and the time-invariant variables are no longer collinear with entity dummies (because there are no entity dummies).&lt;/p>
&lt;p>&lt;strong>Contrast with FE.&lt;/strong> One-way FE assumes $\alpha_i$ can be &lt;em>anything&lt;/em> &amp;mdash; completely unrestricted. CRE assumes $\alpha_i = \pi \bar{X}_i + \text{error}$ &amp;mdash; the individual effect is a linear function of the career averages. This is a stronger assumption, but it buys back education and race. The payoff: $\hat{\beta}$ for time-varying variables should approximately match the one-way FE estimates (because the means absorb the same correlation), while $\gamma$ for time-invariant variables is now estimable.&lt;/p>
&lt;pre>&lt;code class="language-python">mundlak_vars = [&amp;quot;union&amp;quot;, &amp;quot;married&amp;quot;, &amp;quot;hours&amp;quot;, &amp;quot;expersq&amp;quot;]
for var in mundlak_vars:
wage_df[f&amp;quot;{var}_mean&amp;quot;] = wage_df.groupby(&amp;quot;nr&amp;quot;)[var].transform(&amp;quot;mean&amp;quot;)
fit_mundlak = pf.feols(
&amp;quot;lwage ~ expersq + union + married + hours + educ + black + hisp &amp;quot;
&amp;quot;+ expersq_mean + union_mean + married_mean + hours_mean&amp;quot;,
data=wage_df, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;nr&amp;quot;}
)
print(fit_mundlak.summary())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Estimation: OLS
Dep. var.: lwage, Fixed effects: 0
Inference: CRV1
Observations: 4360
| Coefficient | Estimate | Std. Error | t value | Pr(&amp;gt;|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | 0.276 | 0.073 | 3.798 | 0.000 | 0.133 | 0.418 |
| expersq | 0.004 | 0.000 | 13.284 | 0.000 | 0.004 | 0.005 |
| union | 0.078 | 0.019 | 4.050 | 0.000 | 0.040 | 0.116 |
| married | 0.115 | 0.017 | 6.664 | 0.000 | 0.081 | 0.149 |
| hours | -0.000 | 0.000 | -0.007 | 0.994 | -0.000 | 0.000 |
| educ | 0.094 | 0.005 | 17.295 | 0.000 | 0.083 | 0.104 |
| black | -0.140 | 0.024 | -5.930 | 0.000 | -0.187 | -0.094 |
| hisp | 0.009 | 0.019 | 0.469 | 0.639 | -0.028 | 0.045 |
| expersq_mean | -0.003 | 0.001 | -3.498 | 0.001 | -0.005 | -0.001 |
| union_mean | 0.179 | 0.037 | 4.838 | 0.000 | 0.106 | 0.251 |
| married_mean | -0.041 | 0.042 | -0.969 | 0.333 | -0.123 | 0.042 |
| hours_mean | 0.002 | 0.001 | 3.109 | 0.002 | 0.001 | 0.003 |
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>One-Way FE&lt;/th>
&lt;th>CRE&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>expersq&lt;/td>
&lt;td>0.004&lt;/td>
&lt;td>0.004&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>union&lt;/td>
&lt;td>0.078&lt;/td>
&lt;td>0.078&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>married&lt;/td>
&lt;td>0.115&lt;/td>
&lt;td>0.115&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>hours&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;td>-0.000&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>educ&lt;/td>
&lt;td>dropped&lt;/td>
&lt;td>0.094&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>black&lt;/td>
&lt;td>dropped&lt;/td>
&lt;td>-0.140&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>hisp&lt;/td>
&lt;td>dropped&lt;/td>
&lt;td>0.009&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(10, 6))
compare_vars = [&amp;quot;expersq&amp;quot;, &amp;quot;union&amp;quot;, &amp;quot;married&amp;quot;, &amp;quot;hours&amp;quot;, &amp;quot;educ&amp;quot;, &amp;quot;black&amp;quot;, &amp;quot;hisp&amp;quot;]
x = np.arange(len(compare_vars))
width = 0.25
pooled_vals = [fit_pooled.coef()[v] for v in compare_vars]
entity_vals = [fit_entity.coef()[v] if v in fit_entity.coef().index else 0 for v in compare_vars]
mundlak_vals = [fit_mundlak.coef()[v] if v in fit_mundlak.coef().index else 0 for v in compare_vars]
ax.bar(x - width, pooled_vals, width, label=&amp;quot;Pooled OLS&amp;quot;, color=STEEL_BLUE, edgecolor=DARK_NAVY)
ax.bar(x, entity_vals, width, label=&amp;quot;One-Way FE&amp;quot;, color=WARM_ORANGE, edgecolor=DARK_NAVY)
ax.bar(x + width, mundlak_vals, width, label=&amp;quot;CRE&amp;quot;, color=TEAL, edgecolor=DARK_NAVY)
ax.set_xticks(x)
ax.set_xticklabels(compare_vars, fontsize=10, rotation=15)
ax.set_ylabel(&amp;quot;Coefficient Estimate&amp;quot;, fontsize=13)
ax.set_title(&amp;quot;Pooled OLS vs One-Way FE vs CRE&amp;quot;, fontsize=14, fontweight=&amp;quot;bold&amp;quot;)
ax.legend(fontsize=11)
ax.axhline(y=0, color=NEAR_BLACK, linewidth=0.5, linestyle=&amp;quot;--&amp;quot;, alpha=0.5)
plt.savefig(&amp;quot;pyfixest_mundlak.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_mundlak.png" alt="Grouped bar chart comparing Pooled OLS, One-Way FE, and CRE coefficient estimates, showing CRE recovers education while matching one-way FE on time-varying variables.">&lt;/p>
&lt;p>The CRE model bridges one-way FE and pooled OLS. For time-varying variables (union, married, hours, expersq), the CRE coefficients closely match the one-way FE estimates &amp;mdash; confirming that the individual means successfully proxy for entity dummies. For time-invariant variables, CRE recovers what one-way FE cannot: education&amp;rsquo;s coefficient is 0.094 per year of schooling (a 9.4% return), and the Black wage gap is -0.140 (14.0% lower wages). These are close to the pooled OLS estimates, but now they are estimated in a framework that controls for the correlation between unobserved heterogeneity and the covariates (via the individual means).&lt;/p>
&lt;p>The CRE correction terms ($\pi$ coefficients) are informative in their own right. The &lt;code>union_mean&lt;/code> coefficient of 0.179 is large and highly significant ($p &amp;lt; 0.001$): workers with persistently higher union participation earn substantially more &lt;em>on average&lt;/em>, even after controlling for the within-person union effect (0.078). This gap &amp;mdash; 0.179 versus 0.078 &amp;mdash; is evidence of positive selection into unions: workers who join unions more often tend to have higher unobserved ability or to work in higher-paying industries. The &lt;code>hours_mean&lt;/code> coefficient (0.002, $p = 0.002$) suggests that workers who consistently work longer hours earn more per hour on average, while &lt;code>married_mean&lt;/code> is small and insignificant, indicating that selection into marriage is not strongly associated with unobserved wage determinants once other factors are controlled.&lt;/p>
&lt;p>The caveat is that CRE relies on the assumption that unobserved heterogeneity correlates with covariates &lt;em>only through their individual means&lt;/em> &amp;mdash; a stronger assumption than one-way FE, which makes no such restriction. However, this assumption is testable. The CRE correction terms provide a built-in Hausman-type test: if $\pi = 0$ jointly (all correction terms are zero), then pooled OLS and one-way FE yield the same estimates, and the simpler random effects model is efficient. In our case, the large and significant &lt;code>union_mean&lt;/code> and &lt;code>hours_mean&lt;/code> coefficients strongly reject $\pi = 0$, confirming that unobserved heterogeneity &lt;em>does&lt;/em> correlate with the covariates and that FE or CRE is needed over pooled OLS. Exercise 6 asks you to formalize this test.&lt;/p>
&lt;h3 id="118-what-fixed-effects-absorb-vs-what-survives">11.8 What fixed effects absorb vs. what survives&lt;/h3>
&lt;p>The wage panel illustrates a general principle: one-way fixed effects absorb everything about a person that does not change over the observation window. Variables that &lt;em>do&lt;/em> change over time &amp;mdash; like union status, marital status, and occupation &amp;mdash; survive the within transformation and can be estimated. The CRE/Mundlak approach (Section 11.7) partially resolves the tradeoff by recovering time-invariant coefficients. The diagram below summarizes this partition and recovery:&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
subgraph &amp;quot;Absorbed by One-Way FE&amp;quot;
ED[&amp;quot;&amp;lt;b&amp;gt;Education&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(time-invariant)&amp;quot;]
AB[&amp;quot;&amp;lt;b&amp;gt;Ability&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(unobserved)&amp;quot;]
RC[&amp;quot;&amp;lt;b&amp;gt;Race&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(time-invariant)&amp;quot;]
end
subgraph &amp;quot;Estimated (time-varying)&amp;quot;
UN[&amp;quot;&amp;lt;b&amp;gt;Union&amp;lt;/b&amp;gt;&amp;quot;]
MA[&amp;quot;&amp;lt;b&amp;gt;Married&amp;lt;/b&amp;gt;&amp;quot;]
OC[&amp;quot;&amp;lt;b&amp;gt;Occupation&amp;lt;/b&amp;gt;&amp;quot;]
end
subgraph &amp;quot;Recovery strategies&amp;quot;
MK[&amp;quot;&amp;lt;b&amp;gt;CRE/Mundlak&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(individual means)&amp;quot;]
end
UN --&amp;gt; W[&amp;quot;&amp;lt;b&amp;gt;Log Wage&amp;lt;/b&amp;gt;&amp;quot;]
MA --&amp;gt; W
OC --&amp;gt; W
ED -.-&amp;gt; W
AB -.-&amp;gt; W
MK -.-&amp;gt;|&amp;quot;recovers γ&amp;quot;| ED
MK -.-&amp;gt;|&amp;quot;recovers γ&amp;quot;| RC
style ED fill:#d97757,stroke:#141413,color:#fff,stroke-dasharray: 5 5
style AB fill:#d97757,stroke:#141413,color:#fff,stroke-dasharray: 5 5
style RC fill:#d97757,stroke:#141413,color:#fff,stroke-dasharray: 5 5
style UN fill:#6a9bcc,stroke:#141413,color:#fff
style MA fill:#6a9bcc,stroke:#141413,color:#fff
style OC fill:#6a9bcc,stroke:#141413,color:#fff
style W fill:#00d4c8,stroke:#141413,color:#fff
style MK fill:#1a3a8a,stroke:#141413,color:#fff,stroke-dasharray: 5 5
&lt;/code>&lt;/pre>
&lt;p>The dashed arrows from the orange (absorbed) variables indicate that their effects on wages are &lt;em>real&lt;/em> but &lt;em>unestimable&lt;/em> under one-way FE &amp;mdash; they are folded into each worker&amp;rsquo;s individual intercept. The solid arrows from the blue (estimated) variables show the effects we can identify: changes in union status, marital status, and occupation that occur within a worker&amp;rsquo;s career. The dark blue CRE/Mundlak node represents the recovery strategy from Section 11.7: by substituting individual means for entity dummies, we recover the coefficients $\gamma$ for education and race while producing time-varying estimates that closely match one-way FE. This partially resolves the tradeoff from Section 11.4, though at the cost of a stronger modeling assumption.&lt;/p>
&lt;h2 id="12-event-study-difference-in-differences">12. Event study: difference-in-differences&lt;/h2>
&lt;h3 id="121-staggered-treatment-adoption">12.1 Staggered treatment adoption&lt;/h3>
&lt;p>Event studies are a popular extension of fixed effects that estimate dynamic treatment effects around the time of an intervention. In a &lt;em>staggered&lt;/em> design, different groups (states, firms, individuals) receive treatment at different times &amp;mdash; for example, states adopting a minimum wage increase in different years. The standard approach uses TWFE with relative-time indicators. However, this can produce biased estimates when treatment timing varies across groups and effects are heterogeneous. The DID2S estimator (Gardner, 2022) addresses this by separating the estimation into two stages: first estimating fixed effects from untreated observations, then recovering treatment effects from the residuals. The target estimand in this design is the &lt;em>Average Treatment Effect on the Treated&lt;/em> (ATT) &amp;mdash; the average effect for units that actually received treatment.&lt;/p>
&lt;p>PyFixest provides both approaches. We use a simulated dataset with staggered treatment adoption across states:&lt;/p>
&lt;pre>&lt;code class="language-python">df_het = pd.read_csv(
&amp;quot;https://raw.githubusercontent.com/py-econometrics/pyfixest/master/pyfixest/did/data/df_het.csv&amp;quot;
)
print(f&amp;quot;DiD dataset shape: {df_het.shape}&amp;quot;)
print(f&amp;quot;Columns: {list(df_het.columns)}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">DiD dataset shape: (46500, 14)
Columns: ['unit', 'state', 'group', 'unit_fe', 'g', 'year', 'year_fe', 'treat',
'rel_year', 'rel_year_binned', 'error', 'te', 'te_dynamic', 'dep_var']
&lt;/code>&lt;/pre>
&lt;p>The event study dataset contains 46,500 observations across units nested in states, with a binary treatment indicator and relative time variable measuring periods before and after treatment onset. The &lt;code>dep_var&lt;/code> column is the outcome we want to explain, and &lt;code>rel_year&lt;/code> measures the distance in years from each unit&amp;rsquo;s treatment date (negative values are pre-treatment). This structure is typical of policy evaluation studies where different states adopt a policy at different times.&lt;/p>
&lt;h3 id="122-year-1-as-the-universal-baseline">12.2 Year −1 as the universal baseline&lt;/h3>
&lt;p>Both estimators use &lt;code>ref=-1.0&lt;/code>, setting the last pre-treatment period as the baseline. This choice is not arbitrary &amp;mdash; it is the conventional and most informative reference point for three reasons:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Closest to treatment onset.&lt;/strong> Period −1 is the last observation before treatment begins. Using it as the baseline minimizes the extrapolation distance: we compare each period&amp;rsquo;s outcome to the most recent untreated state, rather than to some distant past.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Universal across cohorts.&lt;/strong> In staggered designs, different states adopt treatment in different calendar years. But &lt;code>rel_year = -1&lt;/code> has the same meaning for every cohort: &amp;ldquo;the last year before this group was treated.&amp;rdquo; It aligns all cohorts to a common relative-time clock, making the coefficients directly comparable.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Transparent parallel trends test.&lt;/strong> Pre-treatment coefficients (periods −20 through −2) measure deviations from the baseline. If these coefficients are near zero, the treated and control groups were on parallel trajectories &lt;em>before&lt;/em> treatment &amp;mdash; validating the key identifying assumption. Choosing −1 as the baseline makes this test as transparent as possible: any non-zero pre-treatment coefficient is a direct signal of differential pre-trends.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>How to read the event study plot.&lt;/strong> Each coefficient represents the difference in outcomes between treatment and control groups, relative to their difference at period −1. Pre-treatment coefficients near zero validate parallel trends. The coefficient at period 0 is the immediate treatment effect. Post-treatment coefficients show how the effect evolves over time. If we had chosen a different baseline (say, period −5), all coefficients would shift by a constant &amp;mdash; the &lt;em>shape&lt;/em> of the event study would be identical, but the levels would change. The convention of using −1 simply makes the plot easiest to interpret.&lt;/p>
&lt;h3 id="123-twfe-vs-did2s">12.3 TWFE vs DID2S&lt;/h3>
&lt;p>We estimate event study coefficients using both TWFE and DID2S, with period -1 (the year before treatment) as the reference category. The &lt;code>i()&lt;/code> operator in PyFixest creates indicator variables for each relative year, analogous to R&amp;rsquo;s &lt;code>i()&lt;/code> function.&lt;/p>
&lt;pre>&lt;code class="language-python"># TWFE event study
fit_twfe = pf.feols(
&amp;quot;dep_var ~ i(rel_year, ref=-1.0) | state + year&amp;quot;,
data=df_het, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;state&amp;quot;},
)
# DID2S (Gardner 2022) -- two-stage estimator
fit_did2s = pf.did2s(
df_het, yname=&amp;quot;dep_var&amp;quot;,
first_stage=&amp;quot;~ 0 | state + year&amp;quot;,
second_stage=&amp;quot;~ i(rel_year, ref=-1.0)&amp;quot;,
treatment=&amp;quot;treat&amp;quot;, cluster=&amp;quot;state&amp;quot;,
)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-python"># Extract coefficients from both estimators for plotting
import re
def parse_rel_years(coef_dict, se_dict):
years, vals, ses_list = [], [], []
for k in coef_dict.index:
match = re.search(r'\[T\.(-?\d+\.?\d*)\]', str(k))
if match:
years.append(float(match.group(1)))
vals.append(coef_dict[k])
ses_list.append(se_dict[k])
return years, vals, ses_list
twfe_years, twfe_vals, twfe_ses = parse_rel_years(fit_twfe.coef(), fit_twfe.se())
did2s_years, did2s_vals, did2s_ses = parse_rel_years(fit_did2s.coef(), fit_did2s.se())
&lt;/code>&lt;/pre>
&lt;p>PyFixest stores event study coefficients with names like &lt;code>[T.-5.0]&lt;/code>, &lt;code>[T.0.0]&lt;/code>, etc. The helper function above extracts the relative year from each coefficient name and pairs it with the estimate and standard error, giving us arrays ready for plotting.&lt;/p>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(12, 6))
offset = 0.15
ax.errorbar([y - offset for y in twfe_years], twfe_vals,
yerr=[1.96*s for s in twfe_ses],
fmt='o', color=STEEL_BLUE, capsize=3, label='TWFE')
ax.errorbar([y + offset for y in did2s_years], did2s_vals,
yerr=[1.96*s for s in did2s_ses],
fmt='s', color=WARM_ORANGE, capsize=3, label='DID2S (Gardner 2022)')
ax.axhline(y=0, color=LIGHT_TEXT, linewidth=0.8, linestyle=&amp;quot;--&amp;quot;, alpha=0.5)
ax.axvline(x=-0.5, color=LIGHT_TEXT, linewidth=1, linestyle=&amp;quot;--&amp;quot;, alpha=0.6)
ax.plot(-1, 0, 'D', color=TEAL, markersize=10, zorder=5,
label=&amp;quot;Baseline (t = −1)&amp;quot;)
ax.set_xlabel(&amp;quot;Relative Year&amp;quot;, fontsize=13)
ax.set_ylabel(&amp;quot;Coefficient Estimate&amp;quot;, fontsize=13)
ax.set_title(&amp;quot;Event Study: TWFE vs DID2S&amp;quot;, fontsize=14, fontweight=&amp;quot;bold&amp;quot;)
ax.legend(fontsize=11)
plt.savefig(&amp;quot;pyfixest_event_study.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="pyfixest_event_study.png" alt="Event study plot comparing TWFE and DID2S coefficient estimates across relative years, showing flat pre-trends and rising post-treatment effects.">&lt;/p>
&lt;p>Both estimators show near-zero pre-treatment coefficients (validating the parallel trends assumption) and a sharp jump at treatment onset. The immediate treatment effect at period 0 is approximately 1.3&amp;ndash;1.4, growing steadily to about 2.8 by period 20. The TWFE estimates (blue circles) are slightly larger than DID2S (orange squares) in post-treatment periods &amp;mdash; this upward bias is the well-documented problem with TWFE under staggered adoption and heterogeneous effects. The DID2S estimator corrects this by using only untreated observations to estimate the counterfactual, producing cleaner estimates of the dynamic treatment path.&lt;/p>
&lt;h2 id="13-hypothesis-testing-wald-test">13. Hypothesis testing: Wald test&lt;/h2>
&lt;p>PyFixest supports joint hypothesis testing via &lt;a href="https://pyfixest.org/reference/estimation.feols_.Feols.wald_test.html" target="_blank" rel="noopener">Wald tests&lt;/a>, which assess whether multiple coefficients are simultaneously equal to zero. This is useful when you want to test whether a group of related variables jointly matters, not just one at a time.&lt;/p>
&lt;pre>&lt;code class="language-python">fit_wald = pf.feols(&amp;quot;Y ~ X1 + X2 | f1&amp;quot;, data=data, vcov=&amp;quot;HC1&amp;quot;)
R = np.eye(2) # Test both X1=0 and X2=0 jointly
wald_result = fit_wald.wald_test(R=R)
print(f&amp;quot;Wald test (joint null: X1=0, X2=0):&amp;quot;)
print(wald_result)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Wald test (joint null: X1=0, X2=0):
statistic 1.554006e+02
pvalue 1.110223e-16
&lt;/code>&lt;/pre>
&lt;p>The Wald test statistic is 155.4 with a p-value effectively zero (&amp;lt; 10^{-16}), overwhelmingly rejecting the null hypothesis that both &lt;code>X1&lt;/code> and &lt;code>X2&lt;/code> have zero effect on &lt;code>Y&lt;/code>. This joint test is more informative than individual t-tests because it accounts for the correlation between the two coefficient estimates. In practice, Wald tests are essential for testing hypotheses about groups of variables, such as whether all interaction terms or all year dummies are jointly significant.&lt;/p>
&lt;h2 id="14-wild-cluster-bootstrap">14. Wild cluster bootstrap&lt;/h2>
&lt;p>When the number of clusters is small (roughly below 50), cluster-robust standard errors can be unreliable. The &lt;em>wild cluster bootstrap&lt;/em> provides more accurate inference in this setting by simulating the distribution of the test statistic under the null hypothesis. PyFixest integrates with the &lt;code>wildboottest&lt;/code> package to make this straightforward:&lt;/p>
&lt;pre>&lt;code class="language-python">fit_boot = pf.feols(&amp;quot;Y ~ X1 | group_id&amp;quot;, data=data, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;group_id&amp;quot;})
boot_result = fit_boot.wildboottest(param=&amp;quot;X1&amp;quot;, reps=999, seed=42)
print(boot_result)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">param X1
t value -8.616818459577098
Pr(&amp;gt;|t|) 0.0
bootstrap_type 11
inference CRV(group_id)
impose_null True
&lt;/code>&lt;/pre>
&lt;p>The wild bootstrap t-statistic of -8.62 and p-value of 0.0 confirm that the effect of &lt;code>X1&lt;/code> remains highly significant even under the more conservative bootstrap inference. The &lt;code>impose_null=True&lt;/code> setting means the bootstrap simulates data under the null hypothesis of no effect, which generally provides better size control in finite samples. With only ~20 groups in this dataset, the bootstrap p-value is more trustworthy than the asymptotic cluster-robust p-value.&lt;/p>
&lt;h2 id="15-discussion">15. Discussion&lt;/h2>
&lt;p>This tutorial posed a simple question: how do unobserved group-level characteristics bias regression estimates, and how can we account for them? The answer, demonstrated across multiple settings, is that fixed effects regression removes this bias by focusing on within-group variation only.&lt;/p>
&lt;p>The synthetic data showed that OLS estimates shift from -1.000 to -1.019 when absorbing group fixed effects &amp;mdash; a modest change in this controlled setting, but one that demonstrates the mechanism. The real-world wage panel told a more dramatic story: the union wage premium dropped from 18.3% (pooled OLS) to 7.3% (two-way FE), revealing that more than half of the apparent union premium reflects worker selection rather than a genuine union effect. This has direct implications for labor economists and policymakers: overestimating the union premium leads to overestimating the economic impact of declining unionization.&lt;/p>
&lt;p>Framing the wage panel through the Mincer equation (Section 11.3) provided a unifying thread for the entire analysis. The classic Mincer specification &amp;mdash; log wages as a function of education, experience, and experience squared &amp;mdash; is the starting point for virtually all empirical wage research. By extending it with additional controls and then progressively adding fixed effects, we traced a clear arc from pooled cross-sectional estimation to panel methods that account for unobserved heterogeneity. The within-versus-between decomposition (Section 11.2) made this arc concrete: education has zero within-worker variation, so one-way FE cannot estimate its effect, while variables like union status and marital status have substantial within-worker variation and can be identified.&lt;/p>
&lt;p>The wage panel also highlighted a fundamental tradeoff in fixed effects estimation: the very mechanism that removes ability bias &amp;mdash; absorbing all time-invariant individual characteristics &amp;mdash; also prevents estimation of time-invariant variables like education. This is not a limitation to be worked around but a defining feature of the method. The CRE/Mundlak approach (Section 11.7) offers a principled resolution: by including individual means of time-varying variables as additional regressors, it proxies for the unobserved heterogeneity that one-way FE would absorb, recovering education&amp;rsquo;s coefficient (0.094 per year of schooling) while producing time-varying estimates that closely match one-way FE. The key assumption &amp;mdash; that unobserved heterogeneity correlates with covariates only through their individual means &amp;mdash; is stronger than FE&amp;rsquo;s assumption of no time-varying confounding, but it is the price of recovering time-invariant effects.&lt;/p>
&lt;p>The three-way FE extension (adding occupation fixed effects) showed that occupation sorting explains negligible additional wage variation beyond individual and time effects, confirming that the dominant source of wage heterogeneity is persistent individual characteristics. The group-specific time trends analysis (Section 11.6) showed that allowing Black and non-Black workers to have different year effects produces estimates nearly identical to standard TWFE, supporting the common trends assumption in this particular panel. This is a useful diagnostic in practice: if group-specific trends substantially change the coefficients, the researcher should worry about whether the standard TWFE results are confounded by differential macro trends.&lt;/p>
&lt;p>PyFixest makes the entire workflow &amp;mdash; from simple OLS through two-way FE, IV, CRE/Mundlak, and event studies &amp;mdash; accessible with a concise formula syntax. The ability to estimate multiple specifications in one call (&lt;code>csw0&lt;/code>) and compare inference methods (iid, HC1, CRV1, CRV3, wild bootstrap) means researchers can quickly build a comprehensive picture of how sensitive their results are to modeling choices.&lt;/p>
&lt;h2 id="16-summary-and-next-steps">16. Summary and next steps&lt;/h2>
&lt;p>&lt;strong>Key takeaways:&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Fixed effects remove group-level confounding.&lt;/strong> In the wage panel, individual FE reduced the apparent union premium from 18.3% to 7.8%, revealing that over half the raw premium reflects selection on unobserved ability. Without FE, policy conclusions about unionization would be substantially biased.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The within-between decomposition diagnoses what FE can estimate.&lt;/strong> Decomposing each variable&amp;rsquo;s variation into between-worker and within-worker components reveals which coefficients survive one-way FE. Education has zero within variation and is absorbed; union status and marital status have substantial within shares (64% and 65%) and can be estimated. This diagnostic should precede any panel analysis.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The Mincer equation provides a unifying framework for wage regressions.&lt;/strong> Framing the analysis through the classic Mincer specification &amp;mdash; and its extensions to panel data &amp;mdash; makes the progression from pooled OLS to one-way FE to CRE/Mundlak a coherent arc rather than a collection of ad hoc specifications.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Standard errors matter as much as point estimates.&lt;/strong> Clustering standard errors inflated the SE on &lt;code>X1&lt;/code> by 50% compared to iid errors (0.1247 vs 0.0833). With weaker effects, this difference could flip a result from significant to insignificant &amp;mdash; always cluster at the appropriate level.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Multiple specifications are a robustness check, not a fishing exercise.&lt;/strong> The coefficient on &lt;code>X1&lt;/code> remained stable around -1.0 across no FE, one-way FE, and two-way FE. In the wage panel, the union premium stabilized at 7.3&amp;ndash;7.8% across one-way FE, two-way FE, three-way FE, and group-specific time trends &amp;mdash; strong evidence that these estimates are robust.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Group-specific time trends test the common trends assumption.&lt;/strong> Allowing Black and non-Black workers to have different year effects produced estimates nearly identical to standard TWFE, supporting the assumption that both groups faced similar macroeconomic trends during 1980&amp;ndash;1987. When this test fails, standard TWFE results may be unreliable.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>One-Way FE cannot estimate time-invariant effects, but CRE can recover them.&lt;/strong> Education was silently dropped from the one-way FE model because the within transformation reduces any constant variable to zero. The CRE model partially resolves this tradeoff by substituting individual means of time-varying variables for entity dummies, recovering education&amp;rsquo;s coefficient (0.094 per year) while producing time-varying estimates that match one-way FE. The cost is a stronger modeling assumption &amp;mdash; that unobserved heterogeneity correlates with covariates only through their individual means.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>TWFE event studies can be biased with staggered adoption.&lt;/strong> The DID2S estimator produced cleaner estimates by separating counterfactual estimation from treatment effect recovery. When treatment timing varies, always compare TWFE with a robust alternative like DID2S.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The event study baseline is not arbitrary.&lt;/strong> Setting &lt;code>ref=-1&lt;/code> (the last pre-treatment period) is the convention because it provides the most transparent test of parallel trends and minimizes extrapolation from the baseline to treatment onset. All cohorts in a staggered design share this reference point, making it the natural common clock.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Limitations:&lt;/strong> Fixed effects only remove time-invariant confounders. If a relevant confounder changes over time within groups, FE cannot address it. Additionally, FE estimation discards all between-group variation, which reduces statistical power and makes it impossible to estimate the effects of time-invariant variables &amp;mdash; as we saw directly in Section 11.2, where education&amp;rsquo;s within share was exactly zero. CRE offers a partial resolution, but its assumption that unobserved heterogeneity correlates with covariates only through individual means may not hold in all settings &amp;mdash; if ability correlates with the &lt;em>trajectory&lt;/em> of union membership rather than its mean, the CRE estimates would still be biased. The group-specific time trends test (Section 11.6) is a useful diagnostic but is not definitive: passing it does not prove that common trends hold, only that the data are consistent with the assumption along the dimension tested. Finally, the datasets here are synthetic or well-studied &amp;mdash; in messy real-world data, the parallel trends assumption underlying event studies may not hold.&lt;/p>
&lt;p>&lt;strong>Next steps:&lt;/strong> The CRE/Mundlak approach demonstrated in Section 11.7 can be extended in several directions: Wooldridge (2010, Ch. 10) develops the correlated random effects framework more formally, including CRE probit and tobit models for limited dependent variables. Hausman-Taylor estimation offers an alternative strategy for recovering time-invariant coefficients under different identifying assumptions. Beyond the wage panel, explore PyFixest&amp;rsquo;s support for Poisson regression (&lt;code>pf.fepois&lt;/code>) for count data, quantile regression (&lt;code>pf.quantreg&lt;/code>) for distributional effects, and the &lt;code>pf.event_study()&lt;/code> common API for streamlined event study estimation with multiple estimators. For more advanced inference, investigate randomization inference via &lt;code>fit.ritest()&lt;/code> and multiple testing corrections with &lt;code>pf.bonferroni()&lt;/code> and &lt;code>pf.rwolf()&lt;/code>.&lt;/p>
&lt;h2 id="17-exercises">17. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Varying the clustering level.&lt;/strong> Re-estimate the one-way FE model (&lt;code>Y ~ X1 | group_id&lt;/code>) with different clustering variables: &lt;code>f1&lt;/code>, &lt;code>f2&lt;/code>, and &lt;code>f3&lt;/code>. How do the standard errors change? Which clustering level produces the most conservative inference, and why?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Weak instruments.&lt;/strong> Modify the IV specification to use only &lt;code>Z1&lt;/code> as an instrument (instead of both &lt;code>Z1&lt;/code> and &lt;code>Z2&lt;/code>). How does the first-stage F-statistic change? How does the IV coefficient and its standard error respond to the weaker first stage?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>CRE with additional means.&lt;/strong> In Section 11.7, we included individual means only for the time-varying regressors. What happens if you also include year fixed effects alongside the CRE correction terms (i.e., add &lt;code>| year&lt;/code> to the CRE specification)? Do the time-varying coefficients shift closer to the TWFE estimates? Does the education coefficient change?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Group-specific trends by other dimensions.&lt;/strong> Section 11.6 allowed year effects to vary by race (&lt;code>black&lt;/code>). Repeat this analysis using &lt;code>hisp&lt;/code> instead, or using a union-status interaction (&lt;code>C(year):C(union)&lt;/code>). Do the results differ from the standard TWFE specification? What does this tell you about the common trends assumption along different group dimensions?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Within-between decomposition on new data.&lt;/strong> Download a panel dataset of your choice (e.g., Penn World Table, World Development Indicators) and compute the within-versus-between decomposition for all variables. Which variables have the highest within share? What does this predict about which coefficients will survive one-way FE? Verify by estimating both pooled OLS and one-way FE models.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Hausman test via CRE.&lt;/strong> The CRE model provides a simple Hausman-type test: if the coefficients on the individual means ($\bar{X}_i$) are jointly zero, then pooled OLS and one-way FE yield the same estimates, and random effects is efficient. Test whether the four CRE correction terms (union_mean, married_mean, hours_mean, expersq_mean) are jointly significant using a Wald test. What does the result imply about the choice between random effects and fixed effects for this panel?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="18-references">18. References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="http://scorreia.com/research/hdfe.pdf" target="_blank" rel="noopener">Correia, S. (2016). A Feasible Estimator for Linear Models with Multi-Way Fixed Effects. Working Paper.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1016/j.jeconom.2021.10.004" target="_blank" rel="noopener">Gardner, J. (2022). Two-Stage Differences in Differences. Journal of Econometrics.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/py-econometrics/pyfixest" target="_blank" rel="noopener">Fischer, A. and Schar, S. (2024). PyFixest: Fast High-Dimensional Fixed Effects Estimation in Python.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://pyfixest.org/quickstart.html" target="_blank" rel="noopener">PyFixest Documentation &amp;ndash; Quickstart Guide.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1002/%28SICI%291099-1255%28199803/04%2913:2%3c163::AID-JAE460%3e3.0.CO;2-Y" target="_blank" rel="noopener">Vella, F. and Verbeek, M. (1998). Whose Wages Do Unions Raise? A Dynamic Model of Unionism and Wage Rate Determination for Young Men. Journal of Applied Econometrics.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.3368/jhr.50.2.317" target="_blank" rel="noopener">Cameron, A.C. and Miller, D.L. (2015). A Practitioner&amp;rsquo;s Guide to Cluster-Robust Inference. Journal of Human Resources.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.nber.org/books-and-chapters/schooling-experience-and-earnings" target="_blank" rel="noopener">Mincer, J. (1974). &lt;em>Schooling, Experience, and Earnings.&lt;/em> Columbia University Press.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.2307/1913646" target="_blank" rel="noopener">Mundlak, Y. (1978). On the Pooling of Time Series and Cross Section Data. &lt;em>Econometrica&lt;/em>, 46(1), 69&amp;ndash;85.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://mitpress.mit.edu/9780262232586/" target="_blank" rel="noopener">Wooldridge, J.M. (2010). &lt;em>Econometric Analysis of Cross Section and Panel Data.&lt;/em> 2nd ed. MIT Press.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1080/00401706.2013.806694" target="_blank" rel="noopener">Olea, J.L.M. and Pflueger, C. (2013). A Robust Test for Weak Instruments. Journal of Business &amp;amp; Economic Statistics.&lt;/a>&lt;/li>
&lt;/ol></description></item><item><title>Introduction to Difference-in-Differences in Python</title><link>https://carlos-mendez.org/post/python_did/</link><pubDate>Thu, 19 Mar 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/python_did/</guid><description>&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>An education ministry rolls out AI tutoring bots in some cities but not others. Did the AI tools actually improve learning, or were those cities already on an upward trajectory? This is the core challenge of &lt;strong>policy evaluation&lt;/strong>: separating the genuine effect of an intervention from pre-existing trends and selection differences between treated and untreated groups. The seminal study by &lt;a href="https://www.jstor.org/stable/2118030" target="_blank" rel="noopener">Card and Krueger (1994)&lt;/a> pioneered this approach in a different context &amp;mdash; examining how a minimum wage increase in New Jersey affected fast-food employment compared to neighboring Pennsylvania.&lt;/p>
&lt;p>&lt;strong>Difference-in-Differences (DiD)&lt;/strong> is the workhorse method for answering such questions. The idea is elegantly simple: compare the change in outcomes over time between a group that received treatment and a group that did not. If both groups were evolving similarly before treatment &amp;mdash; the &lt;em>parallel trends&lt;/em> assumption &amp;mdash; then the difference in their changes isolates the causal effect. Think of it as using the control group as a mirror: it shows what would have happened to the treated group had the policy never been implemented.&lt;/p>
&lt;p>The &lt;strong>&lt;a href="https://diff-diff.readthedocs.io/en/stable/" target="_blank" rel="noopener">diff-diff&lt;/a>&lt;/strong> Python package, developed by &lt;a href="https://github.com/igerber/diff-diff" target="_blank" rel="noopener">Gerber (2026)&lt;/a>, provides a unified, scikit-learn-style API for 13+ DiD estimators validated against their R counterparts. These range from the classic 2x2 design to modern methods for staggered adoption. In this tutorial, we start with the simplest case, build up to event studies and multi-cohort designs, and finish with sensitivity analysis that quantifies how robust the findings are to violations of parallel trends. All examples use synthetic &lt;strong>panel data&lt;/strong> &amp;mdash; datasets where the same units (cities, firms, individuals) are observed repeatedly over multiple time periods &amp;mdash; with known true effects, so every estimate can be verified against ground truth.&lt;/p>
&lt;p>&lt;strong>Learning objectives:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Understand the logic of the 2x2 DiD design and why it identifies causal effects under parallel trends&lt;/li>
&lt;li>Estimate the Average Treatment Effect on the Treated (ATT) using classic DiD&lt;/li>
&lt;li>Test the parallel trends assumption with pre-treatment trend comparisons&lt;/li>
&lt;li>Interpret event study plots that reveal dynamic treatment effects over time&lt;/li>
&lt;li>Recognize why Two-Way Fixed Effects fails under staggered adoption and how Callaway-Sant&amp;rsquo;Anna corrects for it&lt;/li>
&lt;li>Assess robustness of causal conclusions using Bacon decomposition diagnostics and HonestDiD sensitivity analysis&lt;/li>
&lt;/ul>
&lt;p>&lt;a href="https://colab.research.google.com/github/cmg777/starter-academic-v501/blob/master/content/post/python_did/notebook.ipynb" target="_blank">&lt;img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab">&lt;/a>&lt;/p>
&lt;h2 id="conceptual-framework-what-is-difference-in-differences">Conceptual framework: What is Difference-in-Differences?&lt;/h2>
&lt;p>Imagine a school district deploys AI tutoring bots in some schools but not others, and you want to know whether the AI tools improved learning outcomes. You could compare learning scores at AI-equipped schools versus non-equipped schools after deployment. But AI-equipped schools might have had stronger students to begin with &amp;mdash; perhaps the district piloted the technology in its highest-performing schools. A simple post-treatment comparison confounds the AI effect with pre-existing differences. Alternatively, you could compare a single school before and after the AI rollout &amp;mdash; but learning scores might have been rising everywhere due to a new curriculum or improved teacher training, not the AI tools.&lt;/p>
&lt;p>DiD combines these two simpler approaches so that selection bias and the effect of time are, in turns, eliminated. The logic proceeds through &lt;strong>successive differencing&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>First difference&lt;/strong>: Compare a unit before and after treatment. This eliminates time-invariant differences between groups (e.g., one school always scores higher than another), but confounds the treatment effect with common time trends (e.g., district-wide learning improvements from a new curriculum).&lt;/li>
&lt;li>&lt;strong>Second difference&lt;/strong>: Difference the first differences between treated and control groups. This eliminates the common time trends, leaving only the treatment effect.&lt;/li>
&lt;/ul>
&lt;pre>&lt;code class="language-mermaid">graph TB
subgraph &amp;quot;Before Treatment&amp;quot;
A[&amp;quot;&amp;lt;b&amp;gt;Treated Group&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Pre-treatment outcome&amp;quot;]
B[&amp;quot;&amp;lt;b&amp;gt;Control Group&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Pre-treatment outcome&amp;quot;]
end
subgraph &amp;quot;After Treatment&amp;quot;
C[&amp;quot;&amp;lt;b&amp;gt;Treated Group&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Post-treatment outcome&amp;quot;]
D[&amp;quot;&amp;lt;b&amp;gt;Control Group&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Post-treatment outcome&amp;quot;]
end
A --&amp;gt;|&amp;quot;Change in&amp;lt;br/&amp;gt;treated&amp;quot;| C
B --&amp;gt;|&amp;quot;Change in&amp;lt;br/&amp;gt;control&amp;quot;| D
style A fill:#d97757,stroke:#141413,color:#fff
style C fill:#d97757,stroke:#141413,color:#fff
style B fill:#6a9bcc,stroke:#141413,color:#fff
style D fill:#6a9bcc,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;h3 id="the-did-estimator">The DiD estimator&lt;/h3>
&lt;p>The 2x2 DiD estimator formalizes this double comparison. Let $k$ denote the treated group and $U$ the untreated group:&lt;/p>
&lt;p>$$\hat{\delta}^{2 \times 2}_{kU} = \big( \bar{Y}_k^{Post} - \bar{Y}_k^{Pre} \big) - \big( \bar{Y}_U^{Post} - \bar{Y}_U^{Pre} \big)$$&lt;/p>
&lt;p>In words: take the before-and-after change in the treated group, subtract the before-and-after change in the control group, and the remainder is the treatment effect. Here $\bar{Y}_k^{Post}$ is the average outcome for treated units in the post-treatment period (rows where &lt;code>treated = 1&lt;/code> and &lt;code>post = 1&lt;/code>), and similarly for the other three terms.&lt;/p>
&lt;h3 id="what-did-actually-estimates-the-potential-outcomes-framework">What DiD actually estimates: The potential outcomes framework&lt;/h3>
&lt;p>The sample-means formula above tells us &lt;em>how to compute&lt;/em> DiD from data, but it does not tell us &lt;em>what causal quantity&lt;/em> DiD recovers or &lt;em>under what assumptions&lt;/em> it is valid. To answer these deeper questions, we need the &lt;strong>potential outcomes framework&lt;/strong> (&lt;a href="https://doi.org/10.1037/h0037350" target="_blank" rel="noopener">Rubin, 1974&lt;/a>).&lt;/p>
&lt;p>The key idea is that every unit has &lt;em>two&lt;/em> potential outcomes at every point in time, but we only ever observe one of them:&lt;/p>
&lt;ul>
&lt;li>$Y^1_{i}$ &amp;mdash; the outcome unit $i$ would experience &lt;strong>with&lt;/strong> treatment&lt;/li>
&lt;li>$Y^0_{i}$ &amp;mdash; the outcome unit $i$ would experience &lt;strong>without&lt;/strong> treatment&lt;/li>
&lt;/ul>
&lt;p>For a treated city, we observe $Y^1$ (what actually happened after adopting AI tutoring) but never $Y^0$ (what &lt;em>would have&lt;/em> happened had the city not adopted AI). For a control city, we observe $Y^0$ but never $Y^1$. This is the &lt;strong>fundamental problem of causal inference&lt;/strong>: for any individual unit, the causal effect $Y^1_{i} - Y^0_{i}$ is unobservable because one potential outcome is always missing.&lt;/p>
&lt;p>Since we cannot measure individual effects, we aim for the &lt;strong>Average Treatment Effect on the Treated (ATT)&lt;/strong> &amp;mdash; the average causal effect across all treated units in the post-treatment period:&lt;/p>
&lt;p>$$ATT = E[Y^1_k - Y^0_k | Post]$$&lt;/p>
&lt;p>In words: what is the average difference between what treated units actually experienced and what they &lt;em>would have&lt;/em> experienced without treatment, measured in the post-treatment period? Here $E[\cdot]$ denotes the expected value (population average), $k$ indexes the treated group, and the conditioning on $Post$ restricts attention to the post-treatment period. In our data, $E[Y^1_k | Post]$ corresponds to the average &lt;code>outcome&lt;/code> for rows where &lt;code>treated = 1&lt;/code> and &lt;code>post = 1&lt;/code> &amp;mdash; that is, $\bar{Y}_k^{Post}$ from the previous formula.&lt;/p>
&lt;p>The challenge is that $E[Y^0_k | Post]$ &amp;mdash; the average untreated outcome for the treated group after treatment &amp;mdash; is a &lt;strong>counterfactual&lt;/strong> that we never observe. Treated cities received the policy, so we cannot see what their outcomes would have been without it. This is where DiD&amp;rsquo;s clever trick comes in.&lt;/p>
&lt;h3 id="from-sample-means-to-potential-outcomes">From sample means to potential outcomes&lt;/h3>
&lt;p>Let us now connect the sample-means formula to potential outcomes by rewriting each $\bar{Y}$ term. For the &lt;strong>control group&lt;/strong>, which never receives treatment, the observed outcome always equals the untreated potential outcome: $Y_U = Y^0_U$ in both periods. For the &lt;strong>treated group&lt;/strong>, the observed outcome equals the untreated potential outcome before treatment ($Y_k = Y^0_k$ when $Pre$) and the treated potential outcome after ($Y_k = Y^1_k$ when $Post$). Substituting these into the DiD formula:&lt;/p>
&lt;p>$$\hat{\delta}^{2 \times 2}_{kU} = \big( \underbrace{\bar{Y}_k^{Post}}_{= E[Y^1_k | Post]} - \underbrace{\bar{Y}_k^{Pre}}_{= E[Y^0_k | Pre]} \big) - \big( \underbrace{\bar{Y}_U^{Post}}_{= E[Y^0_U | Post]} - \underbrace{\bar{Y}_U^{Pre}}_{= E[Y^0_U | Pre]} \big)$$&lt;/p>
&lt;p>On the left of the outer subtraction, the treated group&amp;rsquo;s pre-treatment mean uses $Y^0_k$ (no treatment yet) and post-treatment mean uses $Y^1_k$ (treatment is active). On the right, both control group means use $Y^0_U$ (never treated). Now we apply a standard algebraic trick: &lt;strong>add and subtract&lt;/strong> the unobserved counterfactual $E[Y^0_k | Post]$ inside the first parenthesis:&lt;/p>
&lt;p>$$= \big( E[Y^1_k | Post] - E[Y^0_k | Post] + E[Y^0_k | Post] - E[Y^0_k | Pre] \big) - \big( E[Y^0_U | Post] - E[Y^0_U | Pre] \big)$$&lt;/p>
&lt;p>Rearranging by grouping the first two terms and the last three:&lt;/p>
&lt;p>$$= \underbrace{E[Y^1_k | Post] - E[Y^0_k | Post]}_{ATT} + \underbrace{\big( E[Y^0_k | Post] - E[Y^0_k | Pre] \big) - \big( E[Y^0_U | Post] - E[Y^0_U | Pre] \big)}_{Bias}$$&lt;/p>
&lt;p>This is the fundamental decomposition of the DiD estimator (&lt;a href="https://mixtape.scunning.com/09-difference_in_differences" target="_blank" rel="noopener">Cunningham, 2021&lt;/a>). The first term is the &lt;strong>ATT&lt;/strong> &amp;mdash; the causal quantity we want. The second term is the &lt;strong>non-parallel trends bias&lt;/strong> &amp;mdash; the difference in how the two groups' untreated outcomes would have evolved over time. The bias term compares the untreated trajectory of the treated group ($E[Y^0_k | Post] - E[Y^0_k | Pre]$) against the untreated trajectory of the control group ($E[Y^0_U | Post] - E[Y^0_U | Pre]$). If the bias term is zero, the DiD estimator cleanly identifies the ATT.&lt;/p>
&lt;h3 id="parallel-trends-assumption">Parallel trends assumption&lt;/h3>
&lt;p>The bias term vanishes when the treated and control groups would have followed the same trajectory absent treatment:&lt;/p>
&lt;p>$$E[Y^0_k | Post] - E[Y^0_k | Pre] = E[Y^0_U | Post] - E[Y^0_U | Pre]$$&lt;/p>
&lt;p>This is the &lt;strong>parallel trends assumption&lt;/strong>. It does not require the groups to have the same outcome levels &amp;mdash; only the same &lt;em>trends&lt;/em>. Two cities can have different learning scores, but if their learning scores were rising at the same speed before the AI rollout, DiD can credibly estimate the policy&amp;rsquo;s impact. Importantly, this assumption is &lt;strong>fundamentally untestable&lt;/strong> because the counterfactual outcome $E[Y^0_k | Post]$ &amp;mdash; what would have happened to the treated group absent treatment &amp;mdash; is never observed. We can check whether trends were parallel in the pre-treatment period, but this does not guarantee they would have remained parallel afterward. This limitation is why Section 11 introduces HonestDiD sensitivity analysis.&lt;/p>
&lt;h3 id="regression-formulation">Regression formulation&lt;/h3>
&lt;p>In practice, DiD is implemented as a regression with an interaction term:&lt;/p>
&lt;p>$$Y_{it} = \alpha + \gamma \cdot Treated_i + \lambda \cdot Post_t + \delta \cdot (Treated_i \times Post_t) + \varepsilon_{it}$$&lt;/p>
&lt;p>where $Treated_i$ is the group indicator (our &lt;code>treated&lt;/code> column), $Post_t$ is the time indicator (our &lt;code>post&lt;/code> column), and $\delta$ is the DiD treatment effect. The coefficient $\gamma$ captures the pre-existing level difference between groups, and $\lambda$ captures the common time trend. This regression mechanically constructs the counterfactual using the control group&amp;rsquo;s trajectory &amp;mdash; it always estimates the $\delta$ coefficient as the extra change in the treated group, which is only valid if the counterfactual trend truly equals the control group&amp;rsquo;s trend.&lt;/p>
&lt;p>&lt;strong>Estimand clarity:&lt;/strong> DiD targets the &lt;strong>Average Treatment Effect on the Treated (ATT)&lt;/strong> &amp;mdash; the average impact of treatment on those units that actually received it. This differs from the Average Treatment Effect (ATE), which averages over the entire population including units that were never treated. The ATT answers: &amp;ldquo;For the units that received the policy, how much did it change their outcomes?&amp;rdquo; This is typically the policy-relevant question, since the decision-maker wants to know whether the intervention helped the people it was aimed at.&lt;/p>
&lt;p>Now that we understand the logic, let us implement it step by step using the &lt;code>diff-diff&lt;/code> package.&lt;/p>
&lt;h2 id="setup-and-imports">Setup and imports&lt;/h2>
&lt;p>Before running the analysis, install the required package:&lt;/p>
&lt;pre>&lt;code class="language-python"># Run in terminal (or use !pip install in a notebook)
pip install diff-diff
&lt;/code>&lt;/pre>
&lt;p>The following code imports all necessary libraries and sets configuration variables. The &lt;code>diff-diff&lt;/code> package provides &lt;a href="https://diff-diff.readthedocs.io/en/stable/" target="_blank" rel="noopener">&lt;code>generate_did_data()&lt;/code>&lt;/a> to create synthetic panel data with known treatment effects, &lt;a href="https://diff-diff.readthedocs.io/en/stable/" target="_blank" rel="noopener">&lt;code>DifferenceInDifferences()&lt;/code>&lt;/a> for the classic 2x2 estimator, and several advanced estimators for multi-period and staggered designs.&lt;/p>
&lt;pre>&lt;code class="language-python">import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from diff_diff import (
DifferenceInDifferences,
MultiPeriodDiD,
CallawaySantAnna,
BaconDecomposition,
HonestDiD,
generate_did_data,
generate_staggered_data,
check_parallel_trends,
)
# Reproducibility
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
# Site color palette
STEEL_BLUE = &amp;quot;#6a9bcc&amp;quot;
WARM_ORANGE = &amp;quot;#d97757&amp;quot;
NEAR_BLACK = &amp;quot;#141413&amp;quot;
TEAL = &amp;quot;#00d4c8&amp;quot;
# Dark-theme palette
DARK_NAVY = &amp;quot;#0f1729&amp;quot;
GRID_LINE = &amp;quot;#1f2b5e&amp;quot;
LIGHT_TEXT = &amp;quot;#c8d0e0&amp;quot;
WHITE_TEXT = &amp;quot;#e8ecf2&amp;quot;
&lt;/code>&lt;/pre>
&lt;h2 id="classic-2x2-did-design">Classic 2x2 DiD design&lt;/h2>
&lt;p>The simplest DiD setup has two groups (treated and control) observed at two time points (before and after treatment). We start here because the 2x2 case makes the mechanics of DiD transparent before moving to more complex designs.&lt;/p>
&lt;h3 id="generating-synthetic-panel-data">Generating synthetic panel data&lt;/h3>
&lt;p>We use &lt;a href="https://diff-diff.readthedocs.io/en/stable/" target="_blank" rel="noopener">&lt;code>generate_did_data()&lt;/code>&lt;/a> to create a synthetic panel where the true treatment effect is exactly 5.0 units. This known ground truth lets us verify that the estimator recovers the correct answer. The function creates a balanced panel with &lt;code>n_units&lt;/code> units observed over &lt;code>n_periods&lt;/code> periods, where &lt;code>treatment_fraction&lt;/code> of units receive treatment starting at &lt;code>treatment_period&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-python">data_2x2 = generate_did_data(
n_units=100,
n_periods=10,
treatment_effect=5.0,
treatment_period=5,
treatment_fraction=0.5,
seed=RANDOM_SEED,
)
print(f&amp;quot;Dataset shape: {data_2x2.shape}&amp;quot;)
print(f&amp;quot;Columns: {data_2x2.columns.tolist()}&amp;quot;)
print(f&amp;quot;\nTreatment groups:&amp;quot;)
print(data_2x2.groupby(&amp;quot;treated&amp;quot;)[&amp;quot;unit&amp;quot;].nunique().rename(
{0: &amp;quot;Control&amp;quot;, 1: &amp;quot;Treated&amp;quot;}))
print(f&amp;quot;\nPeriods: {sorted(int(p) for p in data_2x2['period'].unique())}&amp;quot;)
print(f&amp;quot;Treatment period: 5 (post = 1 for periods &amp;gt;= 5)&amp;quot;)
print(f&amp;quot;True treatment effect: 5.0&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>Dataset shape: (1000, 6)
Columns: ['unit', 'period', 'treated', 'post', 'outcome', 'true_effect']
Treatment groups:
treated
Control 50
Treated 50
Name: unit, dtype: int64
Periods: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Treatment period: 5 (post = 1 for periods &amp;gt;= 5)
True treatment effect: 5.0
&lt;/code>&lt;/pre>
&lt;p>The synthetic panel contains 1,000 observations: 100 units observed across 10 periods (0 through 9). Half the units (50) are assigned to treatment, which begins at period 5. The dataset includes a &lt;code>true_effect&lt;/code> column that equals 0.0 in pre-treatment periods and 5.0 in post-treatment periods for treated units, providing a built-in benchmark. The &lt;code>post&lt;/code> indicator is 1 for periods 5&amp;ndash;9 and 0 for periods 0&amp;ndash;4, matching the binary time dimension of the classic 2x2 framework.&lt;/p>
&lt;h3 id="exploring-the-2x2-dataset">Exploring the 2x2 dataset&lt;/h3>
&lt;p>Before estimating any model, we inspect the raw data to understand its structure. The &lt;code>.head()&lt;/code> method shows the first rows so we can see how each observation is organized as a unit-period pair.&lt;/p>
&lt;pre>&lt;code class="language-python">data_2x2.head(10)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code> unit period treated post outcome true_effect
0 0 1 0 10.231272 0.0
0 1 1 0 12.408662 0.0
0 2 1 0 11.253170 0.0
0 3 1 0 12.846950 0.0
0 4 1 0 11.675816 0.0
0 5 1 1 17.903997 5.0
0 6 1 1 17.659412 5.0
0 7 1 1 18.770401 5.0
0 8 1 1 20.449742 5.0
0 9 1 1 18.382114 5.0
&lt;/code>&lt;/pre>
&lt;p>Each row is one unit in one period. The &lt;code>unit&lt;/code> column identifies the individual, &lt;code>period&lt;/code> tracks time, &lt;code>treated&lt;/code> indicates group assignment (time-invariant), and &lt;code>post&lt;/code> flags observations after the treatment period. The &lt;code>outcome&lt;/code> column is what we aim to explain, and &lt;code>true_effect&lt;/code> is the ground truth we will try to recover. This unit-period structure is the hallmark of &lt;strong>panel data&lt;/strong> &amp;mdash; repeated observations on the same units over time.&lt;/p>
&lt;p>Summary statistics confirm the design parameters:&lt;/p>
&lt;pre>&lt;code class="language-python">data_2x2.describe()
&lt;/code>&lt;/pre>
&lt;pre>&lt;code> unit period treated post outcome true_effect
count 1000.000000 1000.000000 1000.00000 1000.00000 1000.000000 1000.000000
mean 49.500000 4.500000 0.50000 0.50000 13.380874 1.250000
std 28.880514 2.873719 0.50025 0.50025 3.752000 2.166147
min 0.000000 0.000000 0.00000 0.00000 4.965883 0.000000
25% 24.750000 2.000000 0.00000 0.00000 10.716817 0.000000
50% 49.500000 4.500000 0.50000 0.50000 12.558536 0.000000
75% 74.250000 7.000000 1.00000 1.00000 15.926784 1.250000
max 99.000000 9.000000 1.00000 1.00000 24.294992 5.000000
&lt;/code>&lt;/pre>
&lt;p>The means of &lt;code>treated&lt;/code> and &lt;code>post&lt;/code> are both exactly 0.50, confirming a perfectly balanced design: half the units are treated, and half the time periods are post-treatment. The outcome ranges from about 5.0 to 24.3 with a mean of 13.4, reflecting the combination of time trends, unit effects, and treatment effects. The &lt;code>true_effect&lt;/code> mean of 1.25 comes from the fact that only 25% of observations (treated units in post-treatment periods) have a non-zero effect of 5.0.&lt;/p>
&lt;p>A crosstab reveals the 2x2 structure that gives DiD its name:&lt;/p>
&lt;pre>&lt;code class="language-python">pd.crosstab(data_2x2[&amp;quot;treated&amp;quot;], data_2x2[&amp;quot;post&amp;quot;], margins=True)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>post 0 1 All
treated
0 250 250 500
1 250 250 500
All 500 500 1000
&lt;/code>&lt;/pre>
&lt;p>This is the core of the 2x2 design: 250 observations in each of the four cells (control-pre, control-post, treated-pre, treated-post). The balanced allocation means each cell has equal weight in the estimator, which maximizes statistical power. In observational studies, these cell sizes are rarely equal, but the DiD estimator adjusts for imbalance automatically.&lt;/p>
&lt;p>Finally, we examine how the outcome varies across the four cells:&lt;/p>
&lt;pre>&lt;code class="language-python">data_2x2.groupby([&amp;quot;treated&amp;quot;, &amp;quot;post&amp;quot;])[&amp;quot;outcome&amp;quot;].describe()
&lt;/code>&lt;/pre>
&lt;pre>&lt;code> count mean std min 25% 50% 75% max
treated post
0 0 250.0 10.614957 1.871283 5.670539 9.261649 10.781139 11.866492 15.825691
1 250.0 13.086386 1.968271 8.158302 11.777457 13.149548 14.600075 18.372485
1 0 250.0 11.114546 2.015353 4.965883 9.909285 11.065526 12.494486 16.804462
1 250.0 18.707609 1.905034 13.182572 17.296981 18.870692 20.070330 24.294992
&lt;/code>&lt;/pre>
&lt;p>In the pre-treatment period, both groups have similar mean outcomes: 10.61 for the control group and 11.11 for the treated group &amp;mdash; a negligible difference of 0.50 that suggests the groups started on comparable footing. In the post-treatment period, the control group mean rises to 13.09 (a gain of 2.47), while the treated group mean jumps to 18.71 (a gain of 7.59). The extra gain for the treated group (7.59 - 2.47 = 5.12) closely approximates the treatment effect that DiD will formally estimate. The raw numbers already hint that something happened to the treated group beyond the natural time trend.&lt;/p>
&lt;p>The box plot below visualizes these distributions:&lt;/p>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(9, 5))
fig.patch.set_linewidth(0)
groups = [
(&amp;quot;Control, Pre&amp;quot;, data_2x2[(data_2x2[&amp;quot;treated&amp;quot;] == 0) &amp;amp; (data_2x2[&amp;quot;post&amp;quot;] == 0)][&amp;quot;outcome&amp;quot;]),
(&amp;quot;Control, Post&amp;quot;, data_2x2[(data_2x2[&amp;quot;treated&amp;quot;] == 0) &amp;amp; (data_2x2[&amp;quot;post&amp;quot;] == 1)][&amp;quot;outcome&amp;quot;]),
(&amp;quot;Treated, Pre&amp;quot;, data_2x2[(data_2x2[&amp;quot;treated&amp;quot;] == 1) &amp;amp; (data_2x2[&amp;quot;post&amp;quot;] == 0)][&amp;quot;outcome&amp;quot;]),
(&amp;quot;Treated, Post&amp;quot;, data_2x2[(data_2x2[&amp;quot;treated&amp;quot;] == 1) &amp;amp; (data_2x2[&amp;quot;post&amp;quot;] == 1)][&amp;quot;outcome&amp;quot;]),
]
bp = ax.boxplot(
[g[1] for g in groups],
tick_labels=[g[0] for g in groups],
patch_artist=True,
widths=0.5,
medianprops=dict(color=WHITE_TEXT, linewidth=2),
)
box_colors = [STEEL_BLUE, STEEL_BLUE, WARM_ORANGE, WARM_ORANGE]
for patch, color in zip(bp[&amp;quot;boxes&amp;quot;], box_colors):
patch.set_facecolor(color)
patch.set_alpha(0.6)
ax.set_ylabel(&amp;quot;Outcome&amp;quot;)
ax.set_title(&amp;quot;Outcome Distribution by Treatment Group and Period&amp;quot;)
plt.savefig(&amp;quot;did_outcome_distribution.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="did_outcome_distribution.png" alt="Box plot showing outcome distributions for control and treated groups in pre and post periods. Both groups start with similar distributions, but the treated group shifts markedly upward in the post period.">&lt;/p>
&lt;p>The box plot makes the treatment effect visible at a glance. In the pre-treatment period, control (steel blue) and treated (warm orange) boxes overlap almost completely, centered around 10.6&amp;ndash;11.1. Both groups shift upward in the post period due to the natural time trend, but the treated group shifts &lt;em>more&lt;/em> &amp;mdash; its median jumps to around 18.9, compared to 13.1 for the control. The extra upward shift for the treated group is the treatment effect that DiD will formally estimate. Notice also that the spread (box height) remains similar across all four groups, suggesting that treatment affects the level but not the variability of outcomes.&lt;/p>
&lt;h3 id="visualizing-parallel-trends">Visualizing parallel trends&lt;/h3>
&lt;p>Before estimating the treatment effect, we check whether the treated and control groups followed similar trajectories in the pre-treatment period. This visual inspection is the first step in assessing whether the parallel trends assumption is plausible. If the two groups were diverging before treatment, any post-treatment difference could reflect pre-existing trends rather than a causal effect.&lt;/p>
&lt;pre>&lt;code class="language-python">treated_means = data_2x2[data_2x2[&amp;quot;treated&amp;quot;] == 1].groupby(&amp;quot;period&amp;quot;)[&amp;quot;outcome&amp;quot;].mean()
control_means = data_2x2[data_2x2[&amp;quot;treated&amp;quot;] == 0].groupby(&amp;quot;period&amp;quot;)[&amp;quot;outcome&amp;quot;].mean()
fig, ax = plt.subplots(figsize=(9, 5))
fig.patch.set_linewidth(0)
ax.plot(control_means.index, control_means.values, &amp;quot;o-&amp;quot;,
color=STEEL_BLUE, linewidth=2, markersize=7, label=&amp;quot;Control group&amp;quot;)
ax.plot(treated_means.index, treated_means.values, &amp;quot;s-&amp;quot;,
color=WARM_ORANGE, linewidth=2, markersize=7, label=&amp;quot;Treated group&amp;quot;)
ax.axvline(x=4.5, color=LIGHT_TEXT, linestyle=&amp;quot;--&amp;quot;, linewidth=1.5,
alpha=0.7, label=&amp;quot;Treatment onset&amp;quot;)
ax.set_xlabel(&amp;quot;Period&amp;quot;)
ax.set_ylabel(&amp;quot;Average Outcome&amp;quot;)
ax.set_title(&amp;quot;Parallel Trends: Treatment vs Control Groups&amp;quot;)
ax.legend(loc=&amp;quot;upper left&amp;quot;)
ax.set_xticks(range(10))
plt.savefig(&amp;quot;did_parallel_trends.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="did_parallel_trends.png" alt="Parallel trends plot showing treatment and control groups tracking closely in pre-treatment periods 0-4, then diverging sharply after treatment onset at period 5.">&lt;/p>
&lt;p>The two groups move in lockstep during periods 0 through 4, confirming that the parallel trends assumption holds in this synthetic dataset. Both lines fluctuate around similar values with no visible divergence before period 5. After treatment onset, the treated group (warm orange) jumps upward while the control group (steel blue) continues its prior trajectory. The gap between the two lines in the post-treatment period visually represents the treatment effect &amp;mdash; roughly 5 units, consistent with the true effect built into the data.&lt;/p>
&lt;h3 id="estimating-the-treatment-effect">Estimating the treatment effect&lt;/h3>
&lt;p>With parallel trends confirmed visually, we apply the classic DiD estimator. The &lt;a href="https://diff-diff.readthedocs.io/en/stable/" target="_blank" rel="noopener">&lt;code>DifferenceInDifferences()&lt;/code>&lt;/a> class implements the 2x2 design with analytical standard errors. The &lt;code>.fit()&lt;/code> method takes the data along with column names for the outcome, treatment indicator, and time indicator (pre/post).&lt;/p>
&lt;pre>&lt;code class="language-python">did = DifferenceInDifferences()
results_2x2 = did.fit(data_2x2, outcome=&amp;quot;outcome&amp;quot;,
treatment=&amp;quot;treated&amp;quot;, time=&amp;quot;post&amp;quot;)
results_2x2.print_summary()
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>======================================================================
Difference-in-Differences Estimation Results
======================================================================
Observations: 1000
Treated units: 500
Control units: 500
R-squared: 0.7332
----------------------------------------------------------------------
Parameter Estimate Std. Err. t-stat P&amp;gt;|t|
----------------------------------------------------------------------
ATT 5.1216 0.2455 20.863 0.0000 ***
----------------------------------------------------------------------
95% Confidence Interval: [4.6399, 5.6034]
Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
======================================================================
&lt;/code>&lt;/pre>
&lt;p>The estimated ATT is 5.12, close to the true effect of 5.0, with a standard error of 0.25. The t-statistic of 20.86 and p-value near zero confirm that the effect is highly statistically significant. The 95% confidence interval [4.64, 5.60] comfortably contains the true value of 5.0, demonstrating that the classic DiD estimator successfully recovers the known treatment effect. The small deviation from 5.0 (an overestimate of 0.12) reflects sampling variability, not estimator bias &amp;mdash; with 100 units and 10 periods, some random noise is expected.&lt;/p>
&lt;h3 id="visualizing-the-counterfactual">Visualizing the counterfactual&lt;/h3>
&lt;p>DiD&amp;rsquo;s power lies in constructing a &lt;strong>counterfactual&lt;/strong> &amp;mdash; what would have happened to the treated group without treatment. We build this by projecting the control group&amp;rsquo;s post-treatment trajectory, shifted up by the pre-treatment gap between the groups. The shaded area between the actual treated outcomes and this counterfactual line represents the estimated causal effect.&lt;/p>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(9, 5))
fig.patch.set_linewidth(0)
ax.plot(control_means.index, control_means.values, &amp;quot;o-&amp;quot;,
color=STEEL_BLUE, linewidth=2, markersize=7, label=&amp;quot;Control group&amp;quot;)
ax.plot(treated_means.index, treated_means.values, &amp;quot;s-&amp;quot;,
color=WARM_ORANGE, linewidth=2, markersize=7, label=&amp;quot;Treated group&amp;quot;)
# Counterfactual: treated group without treatment
pre_diff = treated_means.loc[:4].mean() - control_means.loc[:4].mean()
counterfactual = control_means.loc[5:] + pre_diff
ax.plot(counterfactual.index, counterfactual.values, &amp;quot;s--&amp;quot;,
color=TEAL, linewidth=2, markersize=7,
label=&amp;quot;Counterfactual (no treatment)&amp;quot;)
ax.fill_between(counterfactual.index, counterfactual.values,
treated_means.loc[5:].values, alpha=0.2, color=TEAL,
label=f&amp;quot;Treatment effect (ATT ≈ {results_2x2.att:.1f})&amp;quot;)
ax.axvline(x=4.5, color=LIGHT_TEXT, linestyle=&amp;quot;--&amp;quot;, linewidth=1.5, alpha=0.7)
ax.set_xlabel(&amp;quot;Period&amp;quot;)
ax.set_ylabel(&amp;quot;Average Outcome&amp;quot;)
ax.set_title(&amp;quot;DiD Treatment Effect: Observed vs Counterfactual&amp;quot;)
ax.legend(loc=&amp;quot;upper left&amp;quot;)
ax.set_xticks(range(10))
plt.savefig(&amp;quot;did_treatment_effect.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="did_treatment_effect.png" alt="Counterfactual plot showing the treated group diverging from its projected path after treatment. The teal shaded area between the actual and counterfactual lines represents the causal effect.">&lt;/p>
&lt;p>The teal dashed line traces where the treated group would have been without the intervention, constructed by shifting the control group&amp;rsquo;s post-treatment path to match the treated group&amp;rsquo;s pre-treatment level. The shaded gap between the actual treated outcomes (warm orange) and this counterfactual (teal) is the estimated causal effect &amp;mdash; approximately 5.1 units per period. This visualization makes the DiD logic tangible: the control group&amp;rsquo;s trajectory serves as the mirror image of the treated group&amp;rsquo;s no-treatment path, and the extra gain above that mirror is what the policy caused.&lt;/p>
&lt;h2 id="testing-parallel-trends">Testing parallel trends&lt;/h2>
&lt;p>The visual check suggested parallel trends hold, but a formal statistical test provides more rigorous evidence. The &lt;a href="https://diff-diff.readthedocs.io/en/stable/" target="_blank" rel="noopener">&lt;code>check_parallel_trends()&lt;/code>&lt;/a> function compares the pre-treatment time trends of the treated and control groups by estimating a linear slope for each group across the pre-treatment periods, then testing whether the two slopes are statistically different.&lt;/p>
&lt;pre>&lt;code class="language-python">pt_result = check_parallel_trends(
data_2x2,
outcome=&amp;quot;outcome&amp;quot;,
time=&amp;quot;period&amp;quot;,
treatment_group=&amp;quot;treated&amp;quot;,
pre_periods=[0, 1, 2, 3, 4],
)
print(f&amp;quot;Treated group pre-trend slope: {pt_result['treated_trend']:.4f}&amp;quot;
f&amp;quot; (SE = {pt_result['treated_trend_se']:.4f})&amp;quot;)
print(f&amp;quot;Control group pre-trend slope: {pt_result['control_trend']:.4f}&amp;quot;
f&amp;quot; (SE = {pt_result['control_trend_se']:.4f})&amp;quot;)
print(f&amp;quot;Trend difference: {pt_result['trend_difference']:.4f}&amp;quot;
f&amp;quot; (SE = {pt_result['trend_difference_se']:.4f})&amp;quot;)
print(f&amp;quot;t-statistic: {pt_result['t_statistic']:.4f}&amp;quot;)
print(f&amp;quot;p-value: {pt_result['p_value']:.4f}&amp;quot;)
print(f&amp;quot;Parallel trends plausible: {pt_result['parallel_trends_plausible']}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>Treated group pre-trend slope: 0.5262 (SE = 0.0839)
Control group pre-trend slope: 0.4047 (SE = 0.0798)
Trend difference: 0.1216 (SE = 0.1158)
t-statistic: 1.0497
p-value: 0.2938
Parallel trends plausible: True
&lt;/code>&lt;/pre>
&lt;p>The pre-treatment trend slopes are 0.53 for the treated group and 0.40 for the control group &amp;mdash; a difference of 0.12 with a p-value of 0.29. Since p &amp;gt; 0.05, we fail to reject the null hypothesis that the trends are equal, supporting the parallel trends assumption. However, a critical caveat: &lt;em>failing to reject is not the same as confirming&lt;/em>. The test has limited power, especially with only 5 pre-treatment periods. Even if the trends differed slightly, this test might not detect it. Moreover, &lt;a href="https://doi.org/10.1257/aeri.20210236" target="_blank" rel="noopener">Roth (2022)&lt;/a> shows that conditioning on passing a pre-test can distort subsequent inference &amp;mdash; estimated effects may be biased toward zero and confidence intervals may have incorrect coverage. This is why Section 11 introduces HonestDiD, which asks: &amp;ldquo;How wrong could parallel trends be before our conclusion changes?&amp;rdquo; That question is more informative than a binary pass/fail test.&lt;/p>
&lt;h2 id="event-study-dynamic-treatment-effects">Event study: Dynamic treatment effects&lt;/h2>
&lt;p>The 2x2 estimator produces a single ATT that averages across all post-treatment periods. But treatment effects often change over time &amp;mdash; they might build up gradually, appear immediately, or fade out. An &lt;strong>event study&lt;/strong> (also called dynamic DiD) estimates separate effects for each period relative to treatment, revealing the full trajectory.&lt;/p>
&lt;p>The event study extends the basic DiD regression by replacing the single treatment effect $\delta$ with a set of period-specific coefficients &amp;mdash; one for each period before and after treatment:&lt;/p>
&lt;p>$$Y_{it} = \gamma_i + \lambda_t + \sum_{k=-K+1}^{-2} \beta_k^{lead} D_{it}^k + \sum_{k=0}^{L} \beta_k^{lag} D_{it}^k + \varepsilon_{it}$$&lt;/p>
&lt;p>Let us unpack each component of this equation:&lt;/p>
&lt;ul>
&lt;li>$Y_{it}$ is the outcome for unit $i$ at time $t$ &amp;mdash; the variable we are trying to explain (our &lt;code>outcome&lt;/code> column).&lt;/li>
&lt;li>$\gamma_i$ are &lt;strong>unit fixed effects&lt;/strong> &amp;mdash; a separate intercept for each unit that absorbs all time-invariant characteristics. For example, if one city always has higher learning scores than another due to demographics or school funding levels, $\gamma_i$ captures that permanent difference. In practice, this is equivalent to demeaning each unit&amp;rsquo;s outcome by its own time-average.&lt;/li>
&lt;li>$\lambda_t$ are &lt;strong>time fixed effects&lt;/strong> &amp;mdash; a separate intercept for each period that absorbs shocks common to all units at a given time. If a national curriculum reform in period 3 raises learning outcomes for everyone equally, $\lambda_t$ captures that common shift. Together with unit fixed effects, this implements the &amp;ldquo;two-way&amp;rdquo; in TWFE.&lt;/li>
&lt;li>$D_{it}^k$ is a &lt;strong>relative-time indicator&lt;/strong> (also called an event-time dummy): it equals 1 when unit $i$ at time $t$ is exactly $k$ periods away from its treatment onset, and 0 otherwise. For a unit first treated at period 5, we have $D_{i,3}^{-2} = 1$ (two periods before treatment), $D_{i,5}^{0} = 1$ (the treatment period itself), $D_{i,7}^{2} = 1$ (two periods after treatment), and so on.&lt;/li>
&lt;li>$\beta_k^{lead}$ (for $k = -K+1, \ldots, -2$) are the &lt;strong>lead coefficients&lt;/strong> &amp;mdash; pre-treatment effects at each period before treatment. These serve as &lt;strong>placebo tests&lt;/strong>: if the treated and control groups were evolving similarly before the intervention, all lead coefficients should be close to zero and statistically insignificant. A significant lead coefficient signals a pre-existing divergence, which would undermine the parallel trends assumption. The summation starts at $k = -K+1$ (the earliest available lead) and stops at $k = -2$, because the period immediately before treatment ($k = -1$) is &lt;strong>omitted as the reference period&lt;/strong> and normalized to zero. All other coefficients are estimated relative to this baseline.&lt;/li>
&lt;li>$\beta_k^{lag}$ (for $k = 0, 1, \ldots, L$) are the &lt;strong>lag coefficients&lt;/strong> &amp;mdash; post-treatment effects at each period after treatment onset. The coefficient $\beta_0^{lag}$ captures the &lt;strong>instantaneous effect&lt;/strong> at the moment treatment begins, $\beta_1^{lag}$ captures the effect one period later, and so on through $\beta_L^{lag}$ at $L$ periods after treatment. These coefficients trace out the &lt;strong>dynamic treatment effect trajectory&lt;/strong>: they reveal whether the effect appears immediately or builds up gradually, whether it persists or fades out, and whether it stabilizes at a constant level or continues to grow.&lt;/li>
&lt;li>$\varepsilon_{it}$ is the error term, capturing all unobserved factors not absorbed by the fixed effects or treatment indicators.&lt;/li>
&lt;/ul>
&lt;p>The key insight is that this single equation simultaneously tests the identifying assumption &lt;em>and&lt;/em> estimates the treatment effect. The leads ($\beta_k^{lead}$) test parallel trends period by period, while the lags ($\beta_k^{lag}$) reveal how the treatment effect evolves over time. In our tutorial, treatment begins at period 5 and the reference period is 4 ($k = -1$), so we have 4 lead coefficients at $k = -5, -4, -3, -2$ (corresponding to periods 0&amp;ndash;3) and $L = 4$ lag coefficients at $k = 0, 1, 2, 3, 4$ (corresponding to periods 5&amp;ndash;9).&lt;/p>
&lt;p>The &lt;a href="https://diff-diff.readthedocs.io/en/stable/" target="_blank" rel="noopener">&lt;code>MultiPeriodDiD()&lt;/code>&lt;/a> estimator fits this specification, using one pre-treatment period as the reference point.&lt;/p>
&lt;pre>&lt;code class="language-python">event = MultiPeriodDiD()
results_event = event.fit(
data_2x2,
outcome=&amp;quot;outcome&amp;quot;,
treatment=&amp;quot;treated&amp;quot;,
time=&amp;quot;period&amp;quot;,
post_periods=[5, 6, 7, 8, 9],
reference_period=4,
)
results_event.print_summary()
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>================================================================================
Multi-Period Difference-in-Differences Estimation Results
================================================================================
Observations: 1000
Treated observations: 500
Control observations: 500
Pre-treatment periods: 5
Post-treatment periods: 5
R-squared: 0.7648
--------------------------------------------------------------------------------
Pre-Period Effects (Parallel Trends Test)
--------------------------------------------------------------------------------
Period Estimate Std. Err. t-stat P&amp;gt;|t| Sig.
--------------------------------------------------------------------------------
0 -0.5167 0.5121 -1.009 0.3132
1 -0.5050 0.5031 -1.004 0.3157
2 -0.2804 0.5228 -0.536 0.5919
3 -0.3227 0.5187 -0.622 0.5340
[ref: 4] 0.0000 --- --- ---
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Post-Period Treatment Effects
--------------------------------------------------------------------------------
Period Estimate Std. Err. t-stat P&amp;gt;|t| Sig.
--------------------------------------------------------------------------------
5 4.6509 0.5162 9.011 0.0000 ***
6 4.8285 0.5227 9.238 0.0000 ***
7 4.6907 0.5068 9.255 0.0000 ***
8 4.7888 0.4908 9.757 0.0000 ***
9 5.0244 0.5203 9.657 0.0000 ***
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Average Treatment Effect (across post-periods)
--------------------------------------------------------------------------------
Parameter Estimate Std. Err. t-stat P&amp;gt;|t| Sig.
--------------------------------------------------------------------------------
Avg ATT 4.7967 0.3923 12.227 0.0000 ***
--------------------------------------------------------------------------------
95% Confidence Interval: [4.0269, 5.5665]
Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
================================================================================
&lt;/code>&lt;/pre>
&lt;p>The pre-treatment coefficients (periods 0&amp;ndash;3) are all small and statistically insignificant, ranging from -0.52 to -0.28 with p-values well above 0.05. This confirms that the treated and control groups were evolving similarly before the intervention &amp;mdash; the period-by-period placebo test passes. In contrast, all five post-treatment effects (periods 5&amp;ndash;9) are large and highly significant, ranging from 4.65 to 5.02 with t-statistics above 9.0. The average ATT across post periods is 4.80 with a 95% CI of [4.03, 5.57], consistent with the true effect of 5.0. The effects are remarkably stable over time, indicating no fade-out or build-up &amp;mdash; the treatment shifts outcomes by roughly 5 units immediately and maintains that shift.&lt;/p>
&lt;p>The event study plot below makes these dynamics visible:&lt;/p>
&lt;pre>&lt;code class="language-python">es_df = results_event.to_dataframe()
fig, ax = plt.subplots(figsize=(9, 5))
fig.patch.set_linewidth(0)
pre = es_df[~es_df[&amp;quot;is_post&amp;quot;]]
post = es_df[es_df[&amp;quot;is_post&amp;quot;]]
ax.errorbar(pre[&amp;quot;period&amp;quot;], pre[&amp;quot;effect&amp;quot;], yerr=1.96 * pre[&amp;quot;se&amp;quot;],
fmt=&amp;quot;o&amp;quot;, color=STEEL_BLUE, capsize=4, linewidth=2,
markersize=8, label=&amp;quot;Pre-treatment&amp;quot;)
ax.errorbar(post[&amp;quot;period&amp;quot;], post[&amp;quot;effect&amp;quot;], yerr=1.96 * post[&amp;quot;se&amp;quot;],
fmt=&amp;quot;s&amp;quot;, color=WARM_ORANGE, capsize=4, linewidth=2,
markersize=8, label=&amp;quot;Post-treatment&amp;quot;)
# Reference period
ax.plot(4, 0, &amp;quot;D&amp;quot;, color=WHITE_TEXT, markersize=10, zorder=5,
label=&amp;quot;Reference period&amp;quot;)
ax.axhline(y=0, color=LIGHT_TEXT, linewidth=1, alpha=0.5)
ax.axvline(x=4.5, color=LIGHT_TEXT, linestyle=&amp;quot;--&amp;quot;, linewidth=1.5, alpha=0.5)
ax.axhline(y=5.0, color=TEAL, linestyle=&amp;quot;:&amp;quot;, linewidth=1.5, alpha=0.7,
label=&amp;quot;True effect (5.0)&amp;quot;)
ax.set_xlabel(&amp;quot;Period&amp;quot;)
ax.set_ylabel(&amp;quot;Estimated Effect&amp;quot;)
ax.set_title(&amp;quot;Event Study: Dynamic Treatment Effects&amp;quot;)
ax.legend(loc=&amp;quot;upper left&amp;quot;)
ax.set_xticks(range(10))
plt.savefig(&amp;quot;did_event_study.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="did_event_study.png" alt="Event study plot with pre-treatment coefficients clustered near zero and post-treatment coefficients jumping to approximately 5.0. Confidence intervals shown for each period.">&lt;/p>
&lt;p>The event study plot tells the DiD story at a glance. Pre-treatment coefficients (steel blue circles) hover near the zero line, their confidence intervals all crossing zero &amp;mdash; this is the visual signature of valid parallel trends. At the treatment cutoff (dashed vertical line), the estimates jump sharply to around 5.0 (warm orange squares), and the teal dotted line at 5.0 shows that every post-treatment estimate is close to the true effect. The confidence intervals in the post-treatment period are narrow and well above zero, confirming both statistical significance and accuracy.&lt;/p>
&lt;p>With the classic 2x2 case established, the next question is: what happens when different units adopt treatment at different times?&lt;/p>
&lt;h2 id="staggered-adoption-why-twfe-fails">Staggered adoption: Why TWFE fails&lt;/h2>
&lt;p>In many real-world policies, treatment does not begin simultaneously for all units. AI tutoring platforms roll out city by city, digital infrastructure investments phase in over years, and educational technology grants expand district by district. This is &lt;strong>staggered adoption&lt;/strong> &amp;mdash; different units start treatment at different times.&lt;/p>
&lt;p>The traditional approach is &lt;strong>Two-Way Fixed Effects (TWFE)&lt;/strong> regression, which estimates a single treatment coefficient using unit and time fixed effects:&lt;/p>
&lt;p>$$Y_{it} = \gamma_i + \lambda_t + \delta \cdot D_{it} + \varepsilon_{it}$$&lt;/p>
&lt;p>Here $\gamma_i$ absorbs all time-invariant unit characteristics (unit fixed effects), $\lambda_t$ absorbs all common time shocks (time fixed effects), $D_{it}$ is a treatment indicator that equals 1 when unit $i$ is treated at time $t$, and $\delta$ is the single treatment effect that TWFE estimates. With a single treatment period, $\delta$ correctly recovers the ATT. But with staggered timing, the single coefficient $\delta$ is a weighted average of many underlying 2x2 comparisons &amp;mdash; and some of those comparisons are problematic.&lt;/p>
&lt;p>The problem is that TWFE makes &lt;strong>forbidden comparisons&lt;/strong>: it implicitly uses already-treated units as controls for newly-treated units. If treatment effects grow over time, these forbidden comparisons produce negative bias, pulling the overall estimate downward. Think of it this way: if early adopters have been benefiting from treatment for three years and their outcomes have grown substantially, TWFE compares newly-treated units to these high-performing early adopters. The newly-treated units look &lt;em>worse&lt;/em> by comparison, even though they are genuinely benefiting from treatment. In extreme cases with heterogeneous treatment effects across cohorts, TWFE can even assign &lt;strong>negative weights&lt;/strong> to some 2x2 comparisons, potentially flipping the sign of the estimate opposite to every unit&amp;rsquo;s true treatment effect (this does not occur in our example, but is documented in &lt;a href="https://doi.org/10.1257/aer.20181169" target="_blank" rel="noopener">de Chaisemartin &amp;amp; D&amp;rsquo;Haultfoeuille, 2020&lt;/a>).&lt;/p>
&lt;h3 id="generating-staggered-adoption-data">Generating staggered adoption data&lt;/h3>
&lt;p>The &lt;a href="https://diff-diff.readthedocs.io/en/stable/" target="_blank" rel="noopener">&lt;code>generate_staggered_data()&lt;/code>&lt;/a> function creates a panel with multiple treatment cohorts &amp;mdash; groups of units that begin treatment in different periods &amp;mdash; plus a never-treated group.&lt;/p>
&lt;pre>&lt;code class="language-python">data_stag = generate_staggered_data(
n_units=300,
n_periods=10,
seed=RANDOM_SEED,
)
print(f&amp;quot;Dataset shape: {data_stag.shape}&amp;quot;)
cohorts = data_stag.groupby(&amp;quot;first_treat&amp;quot;)[&amp;quot;unit&amp;quot;].nunique()
print(f&amp;quot;\nCohort sizes:&amp;quot;)
for ft, n in cohorts.items():
label = &amp;quot;Never-treated&amp;quot; if ft == 0 else f&amp;quot;First treated in period {ft}&amp;quot;
print(f&amp;quot; {label}: {n} units&amp;quot;)
print(f&amp;quot;\nTotal units: {cohorts.sum()}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>Dataset shape: (3000, 7)
Cohort sizes:
Never-treated: 90 units
First treated in period 3: 60 units
First treated in period 5: 75 units
First treated in period 7: 75 units
Total units: 300
&lt;/code>&lt;/pre>
&lt;p>The staggered panel has 3,000 observations (300 units across 10 periods). Three treatment cohorts adopt at different times: 60 units start treatment in period 3, 75 in period 5, and 75 in period 7. Another 90 units are never treated, serving as a clean control group. The &lt;code>first_treat&lt;/code> column records when each unit first received treatment (0 for never-treated). This staggered structure is where naive TWFE breaks down, as the next section demonstrates.&lt;/p>
&lt;h3 id="exploring-the-staggered-dataset">Exploring the staggered dataset&lt;/h3>
&lt;p>The staggered dataset has a richer structure than the 2x2 case. Inspecting the first rows reveals additional columns:&lt;/p>
&lt;pre>&lt;code class="language-python">data_stag.head(10)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code> unit period outcome first_treat treated treat true_effect
0 0 11.278161 0 0 0 0.0
0 1 11.835615 0 0 0 0.0
0 2 11.542112 0 0 0 0.0
0 3 11.716260 0 0 0 0.0
0 4 12.289791 0 0 0 0.0
0 5 10.978501 0 0 0 0.0
0 6 11.426795 0 0 0 0.0
0 7 11.433938 0 0 0 0.0
0 8 11.108223 0 0 0 0.0
0 9 12.035899 0 0 0 0.0
&lt;/code>&lt;/pre>
&lt;p>Unit 0 is never-treated, so all indicators stay at zero across all 10 periods. To understand the staggered structure, we need to see what happens to treated units. The columns have distinct roles:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>&lt;code>first_treat&lt;/code>&lt;/strong>: the period when a unit first receives treatment (0 = never treated)&lt;/li>
&lt;li>&lt;strong>&lt;code>treat&lt;/code>&lt;/strong>: &lt;strong>time-invariant&lt;/strong> group membership &amp;mdash; equals 1 for any unit &lt;em>ever&lt;/em> assigned to treatment, 0 for never-treated&lt;/li>
&lt;li>&lt;strong>&lt;code>treated&lt;/code>&lt;/strong>: &lt;strong>time-varying&lt;/strong> post-treatment indicator &amp;mdash; equals 0 before treatment onset and switches to 1 at &lt;code>first_treat&lt;/code>&lt;/li>
&lt;li>&lt;strong>&lt;code>true_effect&lt;/code>&lt;/strong>: the known ground-truth treatment effect at each period, used for verification&lt;/li>
&lt;/ul>
&lt;p>The distinction between &lt;code>treat&lt;/code> and &lt;code>treated&lt;/code> is crucial: &lt;code>treat&lt;/code> tells you &lt;em>who&lt;/em> is in the treatment group (a permanent label), while &lt;code>treated&lt;/code> tells you &lt;em>when&lt;/em> they are actually under treatment (a dynamic state). For never-treated units, both are always 0. For treated units, &lt;code>treat&lt;/code> is always 1, but &lt;code>treated&lt;/code> flips from 0 to 1 at the unit&amp;rsquo;s treatment onset.&lt;/p>
&lt;p>An early-treated unit from cohort 3 illustrates this structure:&lt;/p>
&lt;pre>&lt;code class="language-python">early_unit = data_stag[data_stag[&amp;quot;first_treat&amp;quot;] == 3][&amp;quot;unit&amp;quot;].iloc[0]
data_stag[data_stag[&amp;quot;unit&amp;quot;] == early_unit]
&lt;/code>&lt;/pre>
&lt;pre>&lt;code> unit period outcome first_treat treated treat true_effect
90 0 13.299816 3 0 1 0.0
90 1 12.897337 3 0 1 0.0
90 2 11.882534 3 0 1 0.0
90 3 14.724679 3 1 1 2.0
90 4 16.139340 3 1 1 2.2
90 5 14.433891 3 1 1 2.4
90 6 15.949127 3 1 1 2.6
90 7 15.832888 3 1 1 2.8
90 8 17.125174 3 1 1 3.0
90 9 16.685332 3 1 1 3.2
&lt;/code>&lt;/pre>
&lt;p>Unit 90 has &lt;code>treat=1&lt;/code> throughout (it belongs to the treatment group), but &lt;code>treated&lt;/code> flips from 0 to 1 at period 3 &amp;mdash; the moment it enters the post-treatment state. The &lt;code>true_effect&lt;/code> is 0 in the pre-treatment periods, then starts at 2.0 and grows by 0.2 each period, reaching 3.2 by period 9. This growing effect pattern is what makes staggered DiD challenging: the treatment effect for cohort 3 at period 7 (2.8) is very different from the effect at period 3 (2.0).&lt;/p>
&lt;p>Now compare with a late-treated unit from cohort 7:&lt;/p>
&lt;pre>&lt;code class="language-python">late_unit = data_stag[data_stag[&amp;quot;first_treat&amp;quot;] == 7][&amp;quot;unit&amp;quot;].iloc[0]
data_stag[data_stag[&amp;quot;unit&amp;quot;] == late_unit]
&lt;/code>&lt;/pre>
&lt;pre>&lt;code> unit period outcome first_treat treated treat true_effect
91 0 7.987886 7 0 1 0.0
91 1 8.168639 7 0 1 0.0
91 2 8.904022 7 0 1 0.0
91 3 7.984438 7 0 1 0.0
91 4 8.373931 7 0 1 0.0
91 5 7.543381 7 0 1 0.0
91 6 8.981115 7 0 1 0.0
91 7 10.105654 7 1 1 2.0
91 8 10.505532 7 1 1 2.2
91 9 11.074785 7 1 1 2.4
&lt;/code>&lt;/pre>
&lt;p>Unit 91 also has &lt;code>treat=1&lt;/code> throughout, but &lt;code>treated&lt;/code> does not flip until period 7 &amp;mdash; giving it a much longer pre-treatment phase (7 periods vs 3 for cohort 3) and only 3 post-treatment periods. Its &lt;code>true_effect&lt;/code> starts at 2.0 at period 7 and reaches only 2.4 by period 9, compared to cohort 3&amp;rsquo;s 3.2. This asymmetry &amp;mdash; early cohorts accumulating larger effects over more post-treatment periods &amp;mdash; is precisely what causes TWFE to produce biased estimates when it uses already-treated cohort 3 units as &amp;ldquo;controls&amp;rdquo; for cohort 7.&lt;/p>
&lt;p>Let us examine how the staggered structure differs from the 2x2 case in scale and treatment coverage. With multiple cohorts adopting at different times, the fraction of observations in post-treatment state is no longer 50%:&lt;/p>
&lt;pre>&lt;code class="language-python">data_stag.describe()
&lt;/code>&lt;/pre>
&lt;pre>&lt;code> unit period outcome first_treat treated treat true_effect
count 3000.000000 3000.00000 3000.000000 3000.000000 3000.000000 3000.000000 3000.000000
mean 149.500000 4.50000 11.287067 3.600000 0.340000 0.700000 0.829000
std 86.616497 2.87276 2.528589 2.709695 0.473788 0.458334 1.173464
min 0.000000 0.00000 4.521385 0.000000 0.000000 0.000000 0.000000
25% 74.750000 2.00000 9.461867 0.000000 0.000000 0.000000 0.000000
50% 149.500000 4.50000 11.107083 4.000000 0.000000 1.000000 0.000000
75% 224.250000 7.00000 13.078036 5.500000 1.000000 1.000000 2.200000
max 299.000000 9.00000 20.616391 7.000000 1.000000 1.000000 3.200000
&lt;/code>&lt;/pre>
&lt;p>With 3,000 observations and 300 units, this panel is three times larger than the 2x2 case. The &lt;code>first_treat&lt;/code> variable has a mean of 3.60, reflecting the mix of never-treated (0) and cohorts treated at periods 3, 5, and 7. The &lt;code>treated&lt;/code> mean of 0.34 tells us that 34% of all unit-period observations are in a post-treatment state &amp;mdash; less than half because late cohorts contribute fewer treated periods than early cohorts.&lt;/p>
&lt;p>A crosstab of the number of &lt;strong>treated&lt;/strong> (post-treatment) units by cohort and period reveals the staggered rollout:&lt;/p>
&lt;pre>&lt;code class="language-python">pd.crosstab(data_stag[&amp;quot;first_treat&amp;quot;], data_stag[&amp;quot;period&amp;quot;],
values=data_stag[&amp;quot;treated&amp;quot;], aggfunc=&amp;quot;sum&amp;quot;).fillna(0).astype(int)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>period 0 1 2 3 4 5 6 7 8 9
first_treat
0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 60 60 60 60 60 60 60
5 0 0 0 0 0 75 75 75 75 75
7 0 0 0 0 0 0 0 75 75 75
&lt;/code>&lt;/pre>
&lt;p>The staggered structure is immediately visible: zeros cascade to treatment counts as each cohort enters the post-treatment state. At period 2, no units are yet treated. At period 3, 60 units from cohort 3 enter treatment. At period 5, cohort 5 adds 75 more, bringing the total to 135. By period 7, all 210 treated units are in post-treatment. The never-treated group (row 0) remains at zero throughout. This growing treated population &amp;mdash; and the fact that cohort 3 has been treated for 4 periods by the time cohort 7 starts &amp;mdash; is the asymmetry that makes TWFE unreliable. When TWFE uses cohort 3 as a &amp;ldquo;control&amp;rdquo; for cohort 7, it compares against units whose outcomes already incorporate a treatment effect of 2.8, not the untreated counterfactual.&lt;/p>
&lt;p>The pivoted outcome means by cohort and period reveal the staggered treatment pattern:&lt;/p>
&lt;pre>&lt;code class="language-python">data_stag.groupby([&amp;quot;first_treat&amp;quot;, &amp;quot;period&amp;quot;])[&amp;quot;outcome&amp;quot;].mean().unstack()
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>period 0 1 2 3 4 5 6 7 8 9
first_treat
0 9.92 9.95 10.17 10.28 10.40 10.46 10.53 10.68 10.78 10.88
3 10.39 10.51 10.59 12.82 13.07 13.33 13.60 13.99 14.22 14.56
5 10.08 10.17 10.33 10.32 10.58 12.70 12.90 13.11 13.64 13.77
7 9.61 9.76 9.73 10.04 10.00 10.10 10.35 12.25 12.59 12.91
&lt;/code>&lt;/pre>
&lt;p>All four cohorts track closely in their pre-treatment periods (values near 9.6&amp;ndash;10.6 in periods 0&amp;ndash;2), confirming parallel pre-trends. The divergence is sharp and cohort-specific: cohort 3 jumps at period 3 (from 10.59 to 12.82), cohort 5 jumps at period 5 (from 10.58 to 12.70), and cohort 7 jumps at period 7 (from 10.35 to 12.25). The never-treated group follows a smooth, gentle upward trend throughout. By period 9, all treated cohorts have outcomes around 12.9&amp;ndash;14.6, substantially above the never-treated group&amp;rsquo;s 10.88 &amp;mdash; but they arrived at those levels at different times.&lt;/p>
&lt;p>The line plot below visualizes these divergent trajectories:&lt;/p>
&lt;pre>&lt;code class="language-python">cohort_means = data_stag.groupby([&amp;quot;first_treat&amp;quot;, &amp;quot;period&amp;quot;])[&amp;quot;outcome&amp;quot;].mean().unstack(level=0)
cohort_colors = {0: STEEL_BLUE, 3: WARM_ORANGE, 5: TEAL, 7: WHITE_TEXT}
cohort_labels = {0: &amp;quot;Never-treated&amp;quot;, 3: &amp;quot;Cohort 3&amp;quot;, 5: &amp;quot;Cohort 5&amp;quot;, 7: &amp;quot;Cohort 7&amp;quot;}
fig, ax = plt.subplots(figsize=(9, 5))
fig.patch.set_linewidth(0)
for ft in sorted(cohort_means.columns):
ax.plot(cohort_means.index, cohort_means[ft], &amp;quot;o-&amp;quot;,
color=cohort_colors[ft], linewidth=2, markersize=6,
label=cohort_labels[ft])
# Vertical lines at treatment onsets
for ft in [3, 5, 7]:
ax.axvline(x=ft - 0.5, color=cohort_colors[ft], linestyle=&amp;quot;--&amp;quot;,
linewidth=1.2, alpha=0.5)
ax.set_xlabel(&amp;quot;Period&amp;quot;)
ax.set_ylabel(&amp;quot;Mean Outcome&amp;quot;)
ax.set_title(&amp;quot;Staggered Adoption: Cohort Mean Outcomes Over Time&amp;quot;)
ax.legend(loc=&amp;quot;upper left&amp;quot;)
ax.set_xticks(range(10))
plt.savefig(&amp;quot;did_staggered_trends.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="did_staggered_trends.png" alt="Line plot showing four cohorts tracking together before treatment, then diverging upward at their respective treatment onset periods. Dashed vertical lines mark each cohort&amp;rsquo;s treatment timing.">&lt;/p>
&lt;p>The plot makes the staggered adoption pattern unmistakable. All four lines run in parallel during the early pre-treatment periods, then each treated cohort jumps upward at its treatment onset (marked by a dashed vertical line in the corresponding color). Cohort 3 (warm orange) diverges first at period 3, followed by cohort 5 (teal) at period 5, and cohort 7 (near black) at period 7. The never-treated group (steel blue) continues its steady, gentle upward trend without any jump. This visualization explains &lt;em>why TWFE fails&lt;/em>: between periods 3 and 7, TWFE uses cohort 3 (already treated and elevated) as a comparison for cohort 7 (not yet treated). Since cohort 3&amp;rsquo;s outcomes are inflated by treatment, the comparison underestimates cohort 7&amp;rsquo;s true effect when it eventually adopts.&lt;/p>
&lt;h3 id="bacon-decomposition-diagnosing-twfe">Bacon decomposition: Diagnosing TWFE&lt;/h3>
&lt;p>The &lt;strong>Goodman-Bacon decomposition&lt;/strong> (&lt;a href="https://doi.org/10.1016/j.jeconom.2021.03.014" target="_blank" rel="noopener">Goodman-Bacon, 2021&lt;/a>) reveals exactly how TWFE constructs its estimate. The key insight is that the TWFE coefficient $\hat{\delta}$ is a weighted average of all possible 2x2 DiD comparisons between pairs of treatment cohorts:&lt;/p>
&lt;p>$$\hat{\delta}^{TWFE} = \sum_{k} s_{kU} \hat{\delta}_{kU} + \sum_{e \neq U} \sum_{l &amp;gt; e} \big( s_{el} \hat{\delta}_{el} + s_{le} \hat{\delta}_{le} \big)$$&lt;/p>
&lt;p>The first sum covers &lt;strong>clean comparisons&lt;/strong> between each treated cohort $k$ and the never-treated group $U$, weighted by $s_{kU}$. The double sum covers comparisons between pairs of treated cohorts: $\hat{\delta}_{el}$ compares earlier-treated ($e$) against later-treated ($l$) units, and $\hat{\delta}_{le}$ compares later-treated against earlier-treated units. The weights $s$ are proportional to each subsample&amp;rsquo;s size and the variance of the treatment indicator within each pair &amp;mdash; groups treated in the middle of the panel receive the most weight. Crucially, the weights sum to one, so the TWFE estimate is a proper weighted average.&lt;/p>
&lt;p>The three types of comparisons have very different reliability:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Treated vs never-treated&lt;/strong> ($\hat{\delta}_{kU}$): Clean comparisons using permanently untreated units as controls. These are the gold standard.&lt;/li>
&lt;li>&lt;strong>Earlier vs later treated&lt;/strong> ($\hat{\delta}_{el}$): Uses not-yet-treated units as controls. Valid as long as treatment has not yet affected the later cohort.&lt;/li>
&lt;li>&lt;strong>Later vs earlier treated&lt;/strong> ($\hat{\delta}_{le}$): The &lt;strong>forbidden comparisons&lt;/strong>. Uses already-treated units as controls. If treatment effects evolve over time, these comparisons are contaminated because the &amp;ldquo;controls&amp;rdquo; are themselves experiencing treatment effects.&lt;/li>
&lt;/ol>
&lt;pre>&lt;code class="language-python">bacon = BaconDecomposition()
bacon_results = bacon.fit(
data_stag, outcome=&amp;quot;outcome&amp;quot;, unit=&amp;quot;unit&amp;quot;,
time=&amp;quot;period&amp;quot;, first_treat=&amp;quot;first_treat&amp;quot;,
)
bacon_results.print_summary()
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>=====================================================================================
Goodman-Bacon Decomposition of Two-Way Fixed Effects
=====================================================================================
Total observations: 3000
Treatment timing groups: 3
Never-treated units: 90
Total 2x2 comparisons: 9
-------------------------------------------------------------------------------------
TWFE Decomposition
-------------------------------------------------------------------------------------
TWFE Estimate: 2.1822
Weighted Sum of 2x2 Estimates: 2.1052
Decomposition Error: 0.076977
-------------------------------------------------------------------------------------
Weight Breakdown by Comparison Type
-------------------------------------------------------------------------------------
Comparison Type Weight Avg Effect Contribution
-------------------------------------------------------------------------------------
Treated vs Never-treated 0.4331 2.3745 1.0284
Earlier vs Later treated 0.2836 2.1999 0.6238
Later vs Earlier (forbidden) 0.2834 1.5989 0.4531
-------------------------------------------------------------------------------------
Total 1.0000 2.1052
-------------------------------------------------------------------------------------
WARNING: 28.3% of weight is on 'forbidden' comparisons where
already-treated units serve as controls. This can bias TWFE
when treatment effects are heterogeneous over time.
Consider using Callaway-Sant'Anna or other robust estimators.
=====================================================================================
&lt;/code>&lt;/pre>
&lt;p>The decomposition reveals that 28.3% of TWFE&amp;rsquo;s weight falls on forbidden comparisons &amp;mdash; cases where already-treated units serve as controls. These forbidden comparisons produce an average effect of only 1.60, substantially lower than the 2.37 from clean treated-vs-never-treated comparisons. This downward pull drags the TWFE estimate to 2.18, below the true treatment effect. The clean comparisons (treated vs never-treated) account for 43.3% of the weight and produce the most reliable estimates, while the earlier-vs-later comparisons (28.4% weight) sit in between. The decomposition error of 0.08 reflects higher-order interaction terms that the 2x2 decomposition does not fully capture.&lt;/p>
&lt;p>The following plot visualizes the decomposition:&lt;/p>
&lt;pre>&lt;code class="language-python">bacon_df = bacon_results.to_dataframe()
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
fig.patch.set_linewidth(0)
# Left panel: scatter by comparison type
type_colors = {
&amp;quot;Treated vs Never-treated&amp;quot;: STEEL_BLUE,
&amp;quot;Earlier vs Later treated&amp;quot;: WARM_ORANGE,
&amp;quot;Later vs Earlier (forbidden)&amp;quot;: &amp;quot;#e8856c&amp;quot;,
&amp;quot;treated_vs_never&amp;quot;: STEEL_BLUE,
&amp;quot;earlier_vs_later&amp;quot;: WARM_ORANGE,
&amp;quot;later_vs_earlier&amp;quot;: &amp;quot;#e8856c&amp;quot;,
}
for comp_type in bacon_df[&amp;quot;comparison_type&amp;quot;].unique():
subset = bacon_df[bacon_df[&amp;quot;comparison_type&amp;quot;] == comp_type]
color = type_colors.get(comp_type, LIGHT_TEXT)
axes[0].scatter(subset[&amp;quot;weight&amp;quot;], subset[&amp;quot;estimate&amp;quot;],
s=80, color=color, alpha=0.7, edgecolors=DARK_NAVY,
label=comp_type)
axes[0].axhline(y=bacon_results.twfe_estimate, color=WHITE_TEXT,
linestyle=&amp;quot;--&amp;quot;, linewidth=1.5, alpha=0.7,
label=f&amp;quot;TWFE = {bacon_results.twfe_estimate:.2f}&amp;quot;)
axes[0].set_xlabel(&amp;quot;Weight&amp;quot;)
axes[0].set_ylabel(&amp;quot;2×2 DiD Estimate&amp;quot;)
axes[0].set_title(&amp;quot;Bacon Decomposition: Individual Comparisons&amp;quot;)
axes[0].legend(fontsize=9, loc=&amp;quot;lower right&amp;quot;)
# Right panel: bar chart of weights by type
type_summary = bacon_df.groupby(&amp;quot;comparison_type&amp;quot;).agg(
weight=(&amp;quot;weight&amp;quot;, &amp;quot;sum&amp;quot;),
avg_effect=(&amp;quot;estimate&amp;quot;, lambda x: np.average(
x, weights=bacon_df.loc[x.index, &amp;quot;weight&amp;quot;])),
).reset_index()
bar_colors = [type_colors.get(t, LIGHT_TEXT)
for t in type_summary[&amp;quot;comparison_type&amp;quot;]]
axes[1].barh(range(len(type_summary)), type_summary[&amp;quot;weight&amp;quot;],
color=bar_colors, edgecolor=DARK_NAVY, height=0.6)
axes[1].set_yticks(range(len(type_summary)))
axes[1].set_yticklabels(type_summary[&amp;quot;comparison_type&amp;quot;], fontsize=10)
axes[1].set_xlabel(&amp;quot;Total Weight&amp;quot;)
axes[1].set_title(&amp;quot;Weight Distribution by Comparison Type&amp;quot;)
for i, (w, e) in enumerate(zip(type_summary[&amp;quot;weight&amp;quot;],
type_summary[&amp;quot;avg_effect&amp;quot;])):
axes[1].text(w + 0.01, i, f&amp;quot;{w:.1%} (avg = {e:.2f})&amp;quot;,
va=&amp;quot;center&amp;quot;, fontsize=10)
plt.tight_layout()
plt.savefig(&amp;quot;did_bacon_decomposition.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="did_bacon_decomposition.png" alt="Two-panel Bacon decomposition plot. Left: scatter of individual 2x2 estimates colored by comparison type with TWFE reference line. Right: horizontal bars showing total weight by comparison type.">&lt;/p>
&lt;p>The left panel shows each individual 2x2 comparison as a point, colored by type. The forbidden comparisons (dark orange) cluster at lower effect estimates than the clean comparisons (steel blue), visually demonstrating how they pull TWFE downward. The right panel makes the weight problem stark: nearly a third of the total weight goes to comparisons where already-treated units masquerade as controls. For a policymaker relying on the TWFE estimate of 2.18, this contamination means the reported effect underestimates the true treatment impact.&lt;/p>
&lt;h2 id="callaway-santanna-the-modern-solution">Callaway-Sant&amp;rsquo;Anna: The modern solution&lt;/h2>
&lt;p>The &lt;strong>Callaway-Sant&amp;rsquo;Anna (CS) estimator&lt;/strong> (&lt;a href="https://doi.org/10.1016/j.jeconom.2020.12.001" target="_blank" rel="noopener">Callaway &amp;amp; Sant&amp;rsquo;Anna, 2021&lt;/a>) avoids forbidden comparisons entirely. Instead of a single pooled regression, CS starts from a fundamental building block &amp;mdash; the &lt;strong>group-time ATT&lt;/strong>:&lt;/p>
&lt;p>$$ATT(g, t) = E[Y_t(g) - Y_t(\infty) \mid G = g], \quad \text{for } t \geq g$$&lt;/p>
&lt;p>Here $g$ denotes the cohort (the period when a unit first becomes treated), $t$ is the current calendar period, $Y_t(g)$ is the potential outcome at time $t$ if first treated in period $g$, and $Y_t(\infty)$ is the potential outcome under perpetual non-treatment. The conditioning on $G = g$ restricts attention to units in cohort $g$. This yields a separate treatment effect estimate for each combination of cohort and calendar period, using only clean comparisons.&lt;/p>
&lt;p>With never-treated controls, the group-time ATT is identified as:&lt;/p>
&lt;p>$$ATT(g, t) = E[Y_t - Y_{g-1} \mid G = g] - E[Y_t - Y_{g-1} \mid G = \infty]$$&lt;/p>
&lt;p>In words: take the change in outcomes from the period just before treatment ($g - 1$) to the current period ($t$) for cohort $g$ units, and subtract the same change for never-treated units ($G = \infty$). This is a 2x2 DiD comparison that uses only the never-treated group as controls, eliminating all forbidden comparisons by construction.&lt;/p>
&lt;h3 id="the-doubly-robust-estimator">The doubly robust estimator&lt;/h3>
&lt;p>In practice, Callaway and Sant&amp;rsquo;Anna implement a &lt;strong>doubly robust&lt;/strong> version of this estimator. Before diving into the formal equation, here is the core idea: the doubly robust estimator adjusts the comparison between treated and control units in &lt;em>two&lt;/em> ways simultaneously &amp;mdash; by reweighting the control group to look more similar to the treated group (inverse-probability weighting), and by directly modeling and subtracting the expected outcome change for controls (outcome regression). Think of it as wearing both a belt &lt;em>and&lt;/em> suspenders: if either adjustment is correctly specified, the estimate is valid, even if the other one is wrong. This double protection makes the estimator more reliable than methods that rely on a single modeling assumption.&lt;/p>
&lt;p>The formal equation combines inverse-probability weighting with an outcome regression adjustment:&lt;/p>
&lt;p>$$ATT(g, t) = \mathbb{E}\left[\left(\frac{G_g}{\mathbb{E}[G_g]} - \frac{\frac{p_g(X)}{1-p_g(X)}}{\mathbb{E}\left[\frac{p_g(X)}{1-p_g(X)}\right]}\right)\left(Y_t - Y_{g-1} - m_{g,t}^{nev}(X)\right)\right]$$&lt;/p>
&lt;p>This equation multiplies two terms inside the expectation &amp;mdash; a &lt;strong>weighting term&lt;/strong> (first parentheses) and an &lt;strong>outcome term&lt;/strong> (second parentheses). Let us unpack each one.&lt;/p>
&lt;p>&lt;strong>The weighting term:&lt;/strong> $\frac{G_g}{\mathbb{E}[G_g]} - \frac{\frac{p_g(X)}{1-p_g(X)}}{\mathbb{E}\left[\frac{p_g(X)}{1-p_g(X)}\right]}$&lt;/p>
&lt;p>This term determines &lt;em>how much each observation contributes&lt;/em> to the ATT estimate. It works differently for treated and control units:&lt;/p>
&lt;ul>
&lt;li>$G_g$ is a &lt;strong>group indicator&lt;/strong> that equals 1 if the unit belongs to cohort $g$ and 0 otherwise. Dividing by $\mathbb{E}[G_g]$ (the share of units in cohort $g$) normalizes so that treated units receive equal weight on average. For a treated unit in cohort $g$, the first fraction contributes a positive value; for never-treated units, $G_g = 0$ so the first fraction is zero.&lt;/li>
&lt;li>$p_g(X)$ is the &lt;strong>generalized propensity score&lt;/strong> &amp;mdash; the probability of being in cohort $g$ (rather than the never-treated group) given covariates $X$. This is estimated via logit regression of cohort membership on covariates. The ratio $\frac{p_g(X)}{1-p_g(X)}$ are the odds of being in cohort $g$, and dividing by its expectation normalizes the weights. For never-treated units, this second fraction creates a &lt;strong>negative weight&lt;/strong> that is larger for control units whose covariates resemble the treated cohort &amp;mdash; effectively selecting the most comparable controls. For treated units, the two fractions partially cancel, leaving a net positive weight.&lt;/li>
&lt;/ul>
&lt;p>The intuition is similar to propensity score matching: if a never-treated city has covariates (population, per-student spending, teacher-student ratio) that look very much like a treated city, it receives a larger (more negative) weight, making it contribute more as a counterfactual. Cities with covariates far from the treated group receive near-zero weight. This &lt;strong>rebalances&lt;/strong> the control group so that the covariate distribution of the weighted controls matches that of the treated cohort.&lt;/p>
&lt;p>&lt;strong>The outcome term:&lt;/strong> $Y_t - Y_{g-1} - m_{g,t}^{nev}(X)$&lt;/p>
&lt;p>This term measures the &lt;strong>adjusted outcome change&lt;/strong> for each unit:&lt;/p>
&lt;ul>
&lt;li>$Y_t - Y_{g-1}$ is the raw change in outcomes from the baseline period ($g - 1$, the period just before cohort $g$ starts treatment) to the current period $t$. This is the same first difference used in any DiD estimator.&lt;/li>
&lt;li>$m_{g,t}^{nev}(X)$ is the &lt;strong>outcome regression adjustment&lt;/strong> &amp;mdash; the expected change $E[Y_t - Y_{g-1} \mid X, G = \infty]$ for never-treated units with covariates $X$. In practice, this is estimated by regressing the outcome change $\Delta Y = Y_t - Y_{g-1}$ on covariates $X$ using only the never-treated group. Subtracting $m_{g,t}^{nev}(X)$ removes the portion of the outcome change that would have occurred &lt;em>anyway&lt;/em> based on observable characteristics &amp;mdash; even without treatment. What remains is the treatment-induced change that cannot be explained by covariates alone.&lt;/li>
&lt;/ul>
&lt;p>Think of it this way: if cities with higher per-student spending tend to improve learning scores faster regardless of AI adoption, $m_{g,t}^{nev}(X)$ captures that covariate-driven growth trajectory. Subtracting it ensures that the estimated treatment effect is not confounded by differential growth rates across different types of cities.&lt;/p>
&lt;p>&lt;strong>Why &amp;ldquo;doubly robust&amp;rdquo;?&lt;/strong> The estimator combines &lt;em>both&lt;/em> adjustment strategies &amp;mdash; inverse-probability weighting (through the weighting term) and outcome regression (through $m_{g,t}^{nev}(X)$). The key advantage is that the ATT estimate is consistent if &lt;em>either&lt;/em> the propensity score model or the outcome regression model is correctly specified &amp;mdash; both do not need to be right simultaneously. If the propensity score model is wrong but the outcome regression is correct, the $m_{g,t}^{nev}(X)$ adjustment still removes confounding. If the outcome regression is wrong but the propensity score is correct, the reweighting still produces a valid comparison group. This double layer of protection makes the estimator more reliable in practice than methods relying on a single modeling assumption.&lt;/p>
&lt;p>&lt;strong>Note on the no-covariate case:&lt;/strong> In this tutorial, we do not pass covariates to &lt;code>CallawaySantAnna()&lt;/code>. Without covariates, the propensity score $p_g(X)$ reduces to the unconditional probability of being in cohort $g$ (simply the group share), and $m_{g,t}^{nev}(X)$ reduces to the simple mean outcome change among never-treated units. The doubly robust estimator then collapses to the basic difference-in-means formula shown earlier. The full equation is presented here because it is the general form that practitioners encounter when working with real data and covariates.&lt;/p>
&lt;p>The group-time ATTs are then &lt;strong>aggregated&lt;/strong> into summary parameters. Any summary is a weighted average of the building blocks:&lt;/p>
&lt;p>$$\theta = \sum_{g} \sum_{t \geq g} w_{g,t} \cdot ATT(g, t), \quad \sum_{g,t} w_{g,t} = 1$$&lt;/p>
&lt;p>Two aggregations are especially useful. The &lt;strong>overall ATT&lt;/strong> weights by cohort size:&lt;/p>
&lt;p>$$\theta^{O} = \sum_{g} \theta(g) \cdot P(G = g), \quad \text{where } \theta(g) = \frac{1}{T - g + 1} \sum_{t=g}^{T} ATT(g, t)$$&lt;/p>
&lt;p>The &lt;strong>event study aggregation&lt;/strong> averages across cohorts at each relative time $e$ (periods since treatment onset):&lt;/p>
&lt;p>$$\theta_D(e) = \sum_{g} ATT(g, g + e) \cdot P(G = g \mid g + e \leq T)$$&lt;/p>
&lt;p>This event study aggregation is the CS analogue of the leads-and-lags event study, but free from the forbidden comparison contamination that plagues TWFE-based event studies.&lt;/p>
&lt;p>The &lt;a href="https://diff-diff.readthedocs.io/en/stable/" target="_blank" rel="noopener">&lt;code>CallawaySantAnna()&lt;/code>&lt;/a> class takes &lt;code>control_group&lt;/code> to specify which units serve as controls. Using &lt;code>&amp;quot;never_treated&amp;quot;&lt;/code> restricts comparisons to units that never received treatment, the cleanest possible counterfactual. The &lt;code>base_period=&amp;quot;universal&amp;quot;&lt;/code> option uses a single reference period ($g - 1$) for all relative time comparisons within each cohort, rather than letting each relative period use its own baseline. This ensures that the pre-treatment coefficients are proper placebo tests: each one measures the outcome change from $g - 1$ to an earlier period, so a coefficient near zero means the treated and control groups were evolving similarly over that specific interval. With a universal base period, the period immediately before treatment ($e = -1$) is normalized to zero by construction.&lt;/p>
&lt;pre>&lt;code class="language-python">cs = CallawaySantAnna(control_group=&amp;quot;never_treated&amp;quot;, base_period=&amp;quot;universal&amp;quot;)
results_cs = cs.fit(
data_stag, outcome=&amp;quot;outcome&amp;quot;, unit=&amp;quot;unit&amp;quot;,
time=&amp;quot;period&amp;quot;, first_treat=&amp;quot;first_treat&amp;quot;,
aggregate=&amp;quot;event_study&amp;quot;,
)
results_cs.print_summary()
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>=====================================================================================
Callaway-Sant'Anna Staggered Difference-in-Differences Results
=====================================================================================
Total observations: 3000
Treated units: 210
Never-treated units: 90
Treatment cohorts: 3
Time periods: 10
Control group: never_treated
Base period: universal
-------------------------------------------------------------------------------------
Overall Average Treatment Effect on the Treated
-------------------------------------------------------------------------------------
Parameter Estimate Std. Err. t-stat P&amp;gt;|t| Sig.
-------------------------------------------------------------------------------------
ATT 2.4136 0.0552 43.753 0.0000 ***
-------------------------------------------------------------------------------------
95% Confidence Interval: [2.3055, 2.5217]
-------------------------------------------------------------------------------------
Event Study (Dynamic) Effects
-------------------------------------------------------------------------------------
Rel. Period Estimate Std. Err. t-stat P&amp;gt;|t| Sig.
-------------------------------------------------------------------------------------
-7 -0.1344 0.1171 -1.148 0.2510
-6 -0.0188 0.1126 -0.167 0.8671
-5 -0.1435 0.0813 -1.766 0.0774 .
-4 -0.0091 0.0744 -0.122 0.9028
-3 -0.0697 0.0560 -1.244 0.2134
-2 -0.0709 0.0631 -1.124 0.2610
-1 0.0000 nan nan nan
0 1.9713 0.0645 30.551 0.0000 ***
1 2.1416 0.0577 37.124 0.0000 ***
2 2.2969 0.0644 35.644 0.0000 ***
3 2.6763 0.0796 33.642 0.0000 ***
4 2.7925 0.0800 34.898 0.0000 ***
5 3.0259 0.1227 24.669 0.0000 ***
6 3.2663 0.1090 29.961 0.0000 ***
-------------------------------------------------------------------------------------
Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
=====================================================================================
&lt;/code>&lt;/pre>
&lt;p>The overall CS estimate of the ATT is 2.41 (SE = 0.06, p &amp;lt; 0.001), with a 95% CI of [2.31, 2.52]. This is higher than the TWFE estimate of 2.18, confirming that TWFE was biased downward by the forbidden comparisons. The event study reveals dynamic effects that grow over time: the effect starts at 1.97 in the first period after treatment and increases to 3.27 by six periods post-treatment. This pattern of growing effects is exactly the scenario where TWFE fails most dramatically &amp;mdash; the forbidden comparisons use units with large accumulated effects as controls for newly-treated units, producing a downward-biased average.&lt;/p>
&lt;p>With the universal base period, relative period -1 is the reference and is normalized to zero by construction. The remaining pre-treatment estimates all hover near zero &amp;mdash; the largest in magnitude is -0.14 at relative period -5 (p = 0.08), which does not reach significance at the 5% level. None of the seven pre-treatment coefficients are individually significant, providing clean support for the parallel trends assumption. This contrasts with the varying base period specification, where each pre-treatment coefficient uses a different baseline, making the placebo tests harder to interpret collectively.&lt;/p>
&lt;p>The event study plot visualizes these dynamics, showing how the treatment effect builds over time relative to treatment onset:&lt;/p>
&lt;pre>&lt;code class="language-python">cs_df = results_cs.to_dataframe(&amp;quot;event_study&amp;quot;)
fig, ax = plt.subplots(figsize=(9, 5))
fig.patch.set_linewidth(0)
pre_cs = cs_df[cs_df[&amp;quot;relative_period&amp;quot;] &amp;lt; 0]
post_cs = cs_df[cs_df[&amp;quot;relative_period&amp;quot;] &amp;gt;= 0]
ax.errorbar(pre_cs[&amp;quot;relative_period&amp;quot;], pre_cs[&amp;quot;effect&amp;quot;],
yerr=1.96 * pre_cs[&amp;quot;se&amp;quot;], fmt=&amp;quot;o&amp;quot;, color=STEEL_BLUE,
capsize=4, linewidth=2, markersize=8, label=&amp;quot;Pre-treatment&amp;quot;)
ax.errorbar(post_cs[&amp;quot;relative_period&amp;quot;], post_cs[&amp;quot;effect&amp;quot;],
yerr=1.96 * post_cs[&amp;quot;se&amp;quot;], fmt=&amp;quot;s&amp;quot;, color=TEAL,
capsize=4, linewidth=2, markersize=8, label=&amp;quot;Post-treatment&amp;quot;)
ax.axhline(y=0, color=LIGHT_TEXT, linewidth=1, alpha=0.5)
ax.axvline(x=-0.5, color=LIGHT_TEXT, linestyle=&amp;quot;--&amp;quot;, linewidth=1.5, alpha=0.5)
ax.set_xlabel(&amp;quot;Periods Relative to Treatment&amp;quot;)
ax.set_ylabel(&amp;quot;Estimated ATT&amp;quot;)
ax.set_title(&amp;quot;Callaway-Sant'Anna: Event Study for Staggered Adoption&amp;quot;)
ax.legend(loc=&amp;quot;upper left&amp;quot;)
plt.savefig(&amp;quot;did_staggered_att.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="did_staggered_att.png" alt="Callaway-Sant&amp;rsquo;Anna event study plot showing pre-treatment effects near zero (with period -1 normalized to zero) and post-treatment effects growing steadily from about 2.0 to 3.3.">&lt;/p>
&lt;p>The CS event study plot shows the hallmark pattern of a valid DiD analysis: pre-treatment coefficients (steel blue) cluster tightly around zero &amp;mdash; with relative period -1 pinned at exactly zero as the universal base period &amp;mdash; then post-treatment coefficients (teal) rise sharply and progressively. The upward slope in the post-treatment period reveals that the treatment effect accumulates over time, growing from roughly 2.0 immediately after treatment to 3.3 six periods later. This dynamic pattern would have been obscured by TWFE&amp;rsquo;s single pooled estimate and further distorted by its forbidden comparisons.&lt;/p>
&lt;h2 id="choosing-the-right-estimator">Choosing the right estimator&lt;/h2>
&lt;p>With multiple DiD estimators available, the choice depends on the data structure. The following decision flowchart guides the selection:&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
A[&amp;quot;&amp;lt;b&amp;gt;Panel data with&amp;lt;br/&amp;gt;treatment &amp;amp; control&amp;lt;/b&amp;gt;&amp;quot;] --&amp;gt; B{&amp;quot;Single treatment&amp;lt;br/&amp;gt;period?&amp;quot;}
B --&amp;gt;|Yes| C[&amp;quot;&amp;lt;b&amp;gt;Classic 2×2 DiD&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;DifferenceInDifferences()&amp;quot;]
B --&amp;gt;|No| D{&amp;quot;Staggered&amp;lt;br/&amp;gt;adoption?&amp;quot;}
D --&amp;gt;|&amp;quot;No&amp;lt;br/&amp;gt;(same timing)&amp;quot;| E[&amp;quot;&amp;lt;b&amp;gt;Multi-Period DiD&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;MultiPeriodDiD()&amp;quot;]
D --&amp;gt;|Yes| F{&amp;quot;Never-treated&amp;lt;br/&amp;gt;group available?&amp;quot;}
F --&amp;gt;|Yes| G[&amp;quot;&amp;lt;b&amp;gt;Callaway-Sant'Anna&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;CallawaySantAnna()&amp;quot;]
F --&amp;gt;|No| H[&amp;quot;&amp;lt;b&amp;gt;Sun-Abraham / Stacked DiD&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;SunAbraham() / StackedDiD()&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;(not covered here)&amp;lt;/i&amp;gt;&amp;quot;]
style A fill:#141413,stroke:#141413,color:#fff
style B fill:#6a9bcc,stroke:#141413,color:#fff
style C fill:#00d4c8,stroke:#141413,color:#fff
style D fill:#6a9bcc,stroke:#141413,color:#fff
style E fill:#00d4c8,stroke:#141413,color:#fff
style F fill:#6a9bcc,stroke:#141413,color:#fff
style G fill:#00d4c8,stroke:#141413,color:#fff
style H fill:#d97757,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>The following table summarizes when to use each estimator:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Scenario&lt;/th>
&lt;th>Estimator&lt;/th>
&lt;th>Advantage&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Single treatment time, 2 groups&lt;/td>
&lt;td>&lt;code>DifferenceInDifferences()&lt;/code>&lt;/td>
&lt;td>Simplest, most transparent&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Single treatment time, many periods&lt;/td>
&lt;td>&lt;code>MultiPeriodDiD()&lt;/code>&lt;/td>
&lt;td>Period-by-period effects, pre-trend test&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Staggered, never-treated available&lt;/td>
&lt;td>&lt;code>CallawaySantAnna()&lt;/code>&lt;/td>
&lt;td>Clean comparisons, flexible aggregation&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Staggered, no never-treated group&lt;/td>
&lt;td>&lt;code>SunAbraham()&lt;/code>&lt;/td>
&lt;td>Interaction-weighted, uses not-yet-treated&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Diagnosing TWFE bias&lt;/td>
&lt;td>&lt;code>BaconDecomposition()&lt;/code>&lt;/td>
&lt;td>Reveals forbidden comparison weights&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The decision logic is straightforward: if all treated units start at the same time, use the classic estimator or the multi-period event study. If treatment timing varies, use Callaway-Sant&amp;rsquo;Anna (or Sun-Abraham if no never-treated group exists). Always run Bacon decomposition on TWFE results to check for contamination from forbidden comparisons. The &lt;code>diff-diff&lt;/code> package also offers &lt;code>SyntheticDiD()&lt;/code>, &lt;code>ImputationDiD()&lt;/code>, and &lt;code>ContinuousDiD()&lt;/code> for specialized settings, but the estimators above cover the vast majority of applied research.&lt;/p>
&lt;h2 id="sensitivity-analysis-honestdid">Sensitivity analysis: HonestDiD&lt;/h2>
&lt;p>Every DiD analysis rests on parallel trends &amp;mdash; but this assumption is fundamentally &lt;strong>untestable&lt;/strong> for the post-treatment period. Pre-treatment trend tests (Section 6) check whether trends were parallel &lt;em>before&lt;/em> treatment, but they cannot guarantee that trends would have remained parallel &lt;em>after&lt;/em> treatment in the absence of the intervention. A new regulation might coincide with an economic downturn that affects treated regions differently, violating parallel trends even though pre-trends looked clean.&lt;/p>
&lt;p>&lt;strong>HonestDiD&lt;/strong> (&lt;a href="https://doi.org/10.1093/restud/rdad018" target="_blank" rel="noopener">Rambachan &amp;amp; Roth, 2023&lt;/a>) addresses this problem directly. Instead of assuming parallel trends hold exactly, it bounds the degree of violation using a &lt;strong>relative magnitudes restriction&lt;/strong>. Let $\delta_t = E[Y^0_t - Y^0_{t-1} \mid G = g] - E[Y^0_t - Y^0_{t-1} \mid G = \infty]$ denote the parallel trends violation at period $t$ &amp;mdash; the difference in untreated outcome trends between the treated cohort and the never-treated group. HonestDiD constrains the post-treatment violations relative to the largest pre-treatment violation:&lt;/p>
&lt;p>$$|\delta_t| \leq M \cdot \max_{t' &amp;lt; g} |\delta_{t'}|, \quad \text{for all } t \geq g$$&lt;/p>
&lt;p>The parameter $M$ controls the degree of allowed departure. At $M = 0$, the method assumes perfect parallel trends ($\delta_t = 0$ for all post-treatment periods) and recovers the standard CI. As $M$ increases, it allows for progressively larger post-treatment violations, widening the robust CI. The &lt;strong>breakdown value&lt;/strong> of $M$ is where the CI first includes zero &amp;mdash; the point at which the treatment conclusion becomes fragile.&lt;/p>
&lt;p>Think of $M$ as a stress test dial. Turning it up to $M = 1$ says: &amp;ldquo;The worst post-treatment violation could be as large as the worst thing we saw pre-treatment.&amp;rdquo; Turning it to $M = 5$ says: &amp;ldquo;The violation could be five times worse.&amp;rdquo; If the effect remains significant even at high $M$, the finding is genuinely robust.&lt;/p>
&lt;pre>&lt;code class="language-python">M_values = [0.0, 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 5.0, 7.0, 10.0, 12.0, 15.0]
sensitivity = []
for M in M_values:
honest = HonestDiD(method=&amp;quot;relative_magnitude&amp;quot;, M=M)
hres = honest.fit(results_cs)
sensitivity.append({
&amp;quot;M&amp;quot;: M,
&amp;quot;ci_lb&amp;quot;: hres.ci_lb,
&amp;quot;ci_ub&amp;quot;: hres.ci_ub,
&amp;quot;significant&amp;quot;: hres.ci_lb &amp;gt; 0,
})
print(f&amp;quot;M = {M:.1f}: CI = [{hres.ci_lb:.4f}, {hres.ci_ub:.4f}]&amp;quot;
f&amp;quot; {'significant' if hres.ci_lb &amp;gt; 0 else 'includes zero'}&amp;quot;)
sens_df = pd.DataFrame(sensitivity)
# Find breakdown point
breakdown_M = (sens_df[~sens_df[&amp;quot;significant&amp;quot;]][&amp;quot;M&amp;quot;].min()
if not sens_df[&amp;quot;significant&amp;quot;].all()
else sens_df[&amp;quot;M&amp;quot;].max())
print(f&amp;quot;\nBreakdown value of M: {breakdown_M:.1f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>M = 0.0: CI = [2.5324, 2.6592] significant
M = 0.5: CI = [2.4606, 2.7310] significant
M = 1.0: CI = [2.3889, 2.8028] significant
M = 1.5: CI = [2.3171, 2.8745] significant
M = 2.0: CI = [2.2453, 2.9463] significant
M = 3.0: CI = [2.1018, 3.0898] significant
M = 4.0: CI = [1.9583, 3.2334] significant
M = 5.0: CI = [1.8148, 3.3769] significant
M = 7.0: CI = [1.5277, 3.6639] significant
M = 10.0: CI = [1.0971, 4.0945] significant
M = 12.0: CI = [0.8101, 4.3816] significant
M = 15.0: CI = [0.3795, 4.8122] significant
Breakdown value of M: 15.0
&lt;/code>&lt;/pre>
&lt;p>At $M = 0$ (perfect parallel trends), the CI is narrow: [2.53, 2.66]. As $M$ increases, the CI widens symmetrically. At $M = 10$, the lower bound remains comfortably positive (1.10), and even at $M = 15$, it barely stays above zero (0.38). The breakdown value exceeds $M = 15$ &amp;mdash; the treatment effect remains statistically significant even if post-treatment violations of parallel trends are more than 15 times larger than the worst pre-treatment deviation. This is exceptionally robust &amp;mdash; in practice, a breakdown value above $M = 3$ is considered strong evidence that the finding is not driven by parallel trends violations. The improvement over the varying base period specification (which had a breakdown of $M = 12$) reflects the universal base period&amp;rsquo;s tighter pre-treatment estimates, which give HonestDiD a smaller &amp;ldquo;worst pre-treatment deviation&amp;rdquo; to scale against.&lt;/p>
&lt;p>The sensitivity plot maps the robust CI as a function of $M$, making the breakdown point visually apparent:&lt;/p>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(figsize=(9, 5))
fig.patch.set_linewidth(0)
ax.fill_between(sens_df[&amp;quot;M&amp;quot;], sens_df[&amp;quot;ci_lb&amp;quot;], sens_df[&amp;quot;ci_ub&amp;quot;],
alpha=0.25, color=STEEL_BLUE, label=&amp;quot;95% Robust CI&amp;quot;)
ax.plot(sens_df[&amp;quot;M&amp;quot;], sens_df[&amp;quot;ci_lb&amp;quot;], &amp;quot;-&amp;quot;, color=STEEL_BLUE, linewidth=2)
ax.plot(sens_df[&amp;quot;M&amp;quot;], sens_df[&amp;quot;ci_ub&amp;quot;], &amp;quot;-&amp;quot;, color=STEEL_BLUE, linewidth=2)
ax.axhline(y=0, color=LIGHT_TEXT, linewidth=1.5, alpha=0.7)
att_val = results_cs.overall_att
ax.axhline(y=att_val, color=TEAL, linestyle=&amp;quot;:&amp;quot;, linewidth=1.5,
alpha=0.7, label=f&amp;quot;Overall ATT = {att_val:.2f}&amp;quot;)
ax.axvline(x=breakdown_M, color=WARM_ORANGE, linestyle=&amp;quot;--&amp;quot;,
linewidth=2, alpha=0.8,
label=f&amp;quot;Breakdown (M = {breakdown_M:.1f})&amp;quot;)
ax.set_xlabel(&amp;quot;Sensitivity Parameter M\n&amp;quot;
&amp;quot;(maximum post-treatment violation relative to &amp;quot;
&amp;quot;largest pre-treatment violation)&amp;quot;)
ax.set_ylabel(&amp;quot;Treatment Effect (ATT)&amp;quot;)
ax.set_title(&amp;quot;HonestDiD Sensitivity Analysis: Robustness of the ATT&amp;quot;)
ax.legend(loc=&amp;quot;upper left&amp;quot;)
plt.savefig(&amp;quot;did_honest_sensitivity.png&amp;quot;, dpi=300, bbox_inches=&amp;quot;tight&amp;quot;,
facecolor=DARK_NAVY, edgecolor=DARK_NAVY, pad_inches=0)
plt.show()
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="did_honest_sensitivity.png" alt="HonestDiD sensitivity plot showing the 95% robust CI widening as M increases. The CI band is steel blue, the ATT is a teal dotted line, and the breakdown point at M=15 is marked with an orange dashed line.">&lt;/p>
&lt;p>The sensitivity plot tells the robustness story at a glance. The steel blue band shows the 95% robust CI expanding as $M$ grows &amp;mdash; allowing for larger violations of parallel trends. The teal dotted line marks the overall ATT of 2.41, which sits comfortably within the CI for all values of $M$. The warm orange dashed line at $M = 15$ marks the boundary of our grid, with the lower CI bound still positive (0.38) at that point &amp;mdash; the true breakdown lies even further out. In practical terms, the treatment conclusion would only be overturned if post-treatment parallel trend violations were more than 15 times worse than anything observed in the pre-treatment data &amp;mdash; an extreme scenario that would require a dramatic structural break coinciding precisely with the treatment timing.&lt;/p>
&lt;p>Best practice is to always report the breakdown value alongside the point estimate. A finding with a breakdown at $M = 0.5$ is fragile &amp;mdash; even mild violations destroy the conclusion. A finding with a breakdown at $M = 15$ or above, as in this example, provides strong evidence that the effect is genuine regardless of moderate parallel trends violations.&lt;/p>
&lt;h2 id="discussion">Discussion&lt;/h2>
&lt;p>Returning to the motivating question &amp;mdash; did AI tutoring actually improve learning? &amp;mdash; the evidence from both the classic and modern DiD estimators is clear: treatment produced a genuine, statistically significant positive effect. In the 2x2 setting, the estimated ATT of 5.12 (95% CI: [4.64, 5.60]) closely matches the true effect of 5.0, confirming that the classic estimator works well when all units start treatment simultaneously. The event study further validates this finding by showing near-zero pre-treatment coefficients (the largest is -0.52 with p = 0.31) and stable post-treatment effects around 4.7&amp;ndash;5.0.&lt;/p>
&lt;p>The staggered adoption setting reveals a more nuanced picture. Naive TWFE estimation produces a biased estimate of 2.18, pulled downward by the 28.3% weight on forbidden comparisons where already-treated units serve as controls. The Callaway-Sant&amp;rsquo;Anna estimator corrects this bias, finding an overall ATT of 2.41 &amp;mdash; and the event study shows that the effect is not constant but grows over time, from 1.97 immediately after treatment to 3.27 six periods later. For an education policymaker, this dynamic pattern means the AI initiative&amp;rsquo;s full benefits take time to materialize: evaluating the program too early would underestimate its long-run impact.&lt;/p>
&lt;p>The HonestDiD sensitivity analysis provides the final piece of evidence. With a breakdown value exceeding $M = 15$, the treatment conclusion is robust to post-treatment parallel trends violations more than 15 times larger than anything observed pre-treatment. This level of robustness far exceeds the $M = 3$ threshold typically considered strong in applied research. Even a skeptic who doubts the parallel trends assumption would find it difficult to argue that the treatment had no effect.&lt;/p>
&lt;p>Two important caveats apply. First, these results use synthetic data with known true effects, so the estimators are guaranteed to work under their assumptions. Real-world applications face additional challenges &amp;mdash; measurement error in learning assessments, spillover effects between treated and control cities (e.g., students in control cities accessing AI tools on their own), and the possibility that AI adoption depends on unobserved factors correlated with learning outcomes. Second, the treatment effects in the staggered dataset grow linearly over time by construction. In practice, effects may follow more complex trajectories &amp;mdash; plateauing, fading out, or accelerating &amp;mdash; which would require careful specification of the event study window and aggregation weights.&lt;/p>
&lt;h2 id="summary-and-key-takeaways">Summary and key takeaways&lt;/h2>
&lt;p>This tutorial walked through the DiD toolkit from its simplest form to its most robust modern extensions. Four key takeaways emerge:&lt;/p>
&lt;p>&lt;strong>Method insight:&lt;/strong> DiD targets the &lt;strong>ATT&lt;/strong> by using untreated units as a counterfactual for how treated units would have evolved without intervention. The classic 2x2 estimator (ATT = 5.12, SE = 0.25) works well when all units start treatment simultaneously, but staggered adoption requires modern estimators like Callaway-Sant&amp;rsquo;Anna to avoid TWFE&amp;rsquo;s forbidden comparison bias.&lt;/p>
&lt;p>&lt;strong>Data insight:&lt;/strong> The classic DiD recovered the true effect of 5.0 within sampling error (95% CI: [4.64, 5.60]). In the staggered setting, TWFE estimated 2.18 while the cleaner CS estimator found 2.41 &amp;mdash; a 10% upward correction driven by eliminating the 28.3% weight on forbidden comparisons that dragged TWFE down. The CS event study further revealed that treatment effects grow over time, from 1.97 immediately after treatment to 3.27 six periods later.&lt;/p>
&lt;p>&lt;strong>Practical limitation:&lt;/strong> Parallel trends is untestable for the post-treatment period. Pre-treatment tests (p = 0.29 in our example) can only fail to reject, not confirm. HonestDiD provides a principled solution by computing robust confidence intervals under bounded violations. Our breakdown value exceeding $M = 15$ means the conclusion survives violations more than 15 times the worst pre-treatment departure &amp;mdash; exceptionally strong robustness.&lt;/p>
&lt;p>&lt;strong>Next steps:&lt;/strong> This tutorial used synthetic data &amp;mdash; the 2x2 dataset with a constant treatment effect and the staggered dataset with effects that grow over time. Real-world applications should consider adding covariates to the CS estimator (via the &lt;code>covariates&lt;/code> argument), exploring continuous treatment intensity with &lt;code>ContinuousDiD()&lt;/code>, and comparing CS results against &lt;code>SunAbraham()&lt;/code> or &lt;code>ImputationDiD()&lt;/code> as robustness checks. The &lt;code>diff-diff&lt;/code> package supports all of these within the same API.&lt;/p>
&lt;h2 id="exercises">Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Null effect test.&lt;/strong> Modify the &lt;code>generate_did_data()&lt;/code> call to set &lt;code>treatment_effect=0.0&lt;/code>. Run the full 2x2 analysis and event study. Does the estimator correctly find a zero effect? What do the pre- and post-treatment event study coefficients look like?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Covariates in Callaway-Sant&amp;rsquo;Anna.&lt;/strong> Add covariates to the staggered data (e.g., unit-level characteristics) and pass them via the &lt;code>covariates&lt;/code> argument in &lt;code>CallawaySantAnna().fit()&lt;/code>. Compare the ATT with and without covariate adjustment. When does covariate adjustment matter most?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Sun-Abraham comparison.&lt;/strong> Estimate the staggered treatment effect using &lt;code>SunAbraham(control_group=&amp;quot;never_treated&amp;quot;)&lt;/code> instead of &lt;code>CallawaySantAnna()&lt;/code>. Compare the overall ATT and event study coefficients. Under what conditions do the two estimators differ?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>HonestDiD with finer M grid.&lt;/strong> Run the sensitivity analysis with &lt;code>M_values = np.arange(0, 15, 0.5)&lt;/code> to find the exact breakdown point. How does the breakdown change if you use &lt;code>method=&amp;quot;smoothness&amp;quot;&lt;/code> instead of &lt;code>&amp;quot;relative_magnitude&amp;quot;&lt;/code>?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="references">References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://doi.org/10.1016/j.jeconom.2020.12.001" target="_blank" rel="noopener">Callaway, B. &amp;amp; Sant&amp;rsquo;Anna, P. H. C. (2021). Difference-in-Differences with Multiple Time Periods. &lt;em>Journal of Econometrics&lt;/em>, 225(2), 200&amp;ndash;230.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/igerber/diff-diff" target="_blank" rel="noopener">Gerber, I. (2026). diff-diff: Difference-in-Differences Causal Inference for Python. GitHub repository.&lt;/a> &amp;mdash; &lt;a href="https://diff-diff.readthedocs.io/en/stable/" target="_blank" rel="noopener">Documentation&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1016/j.jeconom.2021.03.014" target="_blank" rel="noopener">Goodman-Bacon, A. (2021). Difference-in-Differences with Variation in Treatment Timing. &lt;em>Journal of Econometrics&lt;/em>, 225(2), 254&amp;ndash;277.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1093/restud/rdad018" target="_blank" rel="noopener">Rambachan, A. &amp;amp; Roth, J. (2023). A More Credible Approach to Parallel Trends. &lt;em>Review of Economic Studies&lt;/em>, 90(5), 2555&amp;ndash;2591.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1257/aeri.20210236" target="_blank" rel="noopener">Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. &lt;em>American Economic Review: Insights&lt;/em>, 4(3), 305&amp;ndash;322.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1016/j.jeconom.2020.09.006" target="_blank" rel="noopener">Sun, L. &amp;amp; Abraham, S. (2021). Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects. &lt;em>Journal of Econometrics&lt;/em>, 225(2), 175&amp;ndash;199.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.jstor.org/stable/2118030" target="_blank" rel="noopener">Card, D. &amp;amp; Krueger, A. B. (1994). Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania. &lt;em>American Economic Review&lt;/em>, 84(4), 772&amp;ndash;793.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://mixtape.scunning.com/09-difference_in_differences" target="_blank" rel="noopener">Cunningham, S. (2021). &lt;em>Causal Inference: The Mixtape&lt;/em>. Yale University Press. Chapter 9: Difference-in-Differences.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1257/aer.20181169" target="_blank" rel="noopener">de Chaisemartin, C. &amp;amp; D&amp;rsquo;Haultfoeuille, X. (2020). Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects. &lt;em>American Economic Review&lt;/em>, 110(9), 2964&amp;ndash;2996.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1037/h0037350" target="_blank" rel="noopener">Rubin, D. B. (1974). Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. &lt;em>Journal of Educational Psychology&lt;/em>, 66(5), 688&amp;ndash;701.&lt;/a>&lt;/li>
&lt;/ol></description></item><item><title>Heterogeneous treatment effects via two-stage DID</title><link>https://carlos-mendez.org/post/r_two_stage_did/</link><pubDate>Mon, 29 Jul 2024 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/r_two_stage_did/</guid><description>&lt;h2 id="homogeneous-treatment-effects">Homogeneous Treatment Effects&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>🎯 &lt;strong>Purpose&lt;/strong>:
Estimate treatment effects when the treatment is not randomly assigned.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>📉 &lt;strong>Parallel Trends Assumption&lt;/strong>:
In the absence of treatment, the treated and untreated groups would have followed parallel paths over time.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>🔄 &lt;strong>Two-Way Fixed-Effects (TWFE) Model&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Static Model&lt;/strong>:&lt;/li>
&lt;/ul>
&lt;p>$$
y_{igt} = \mu_g + \eta_t + \tau D_{gt} + \epsilon_{igt}
$$&lt;/p>
&lt;ul>
&lt;li>$ y_{igt} $: Outcome variable.&lt;/li>
&lt;li>$ i $: Individual.&lt;/li>
&lt;li>$ t $: Time.&lt;/li>
&lt;li>$ g $: Group.&lt;/li>
&lt;li>$ \mu_g $: Group fixed-effects.&lt;/li>
&lt;li>$ \eta_t $: Time fixed-effects.&lt;/li>
&lt;li>$ D_{gt} $: Indicator for treatment status.&lt;/li>
&lt;li>$ \tau $: Average treatment effect on the treated (ATT).&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>❗ &lt;strong>Limitations&lt;/strong>:
Assumes constant treatment effects across groups and time, which is often unrealistic.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="heterogeneous-treatment-effects">Heterogeneous Treatment Effects&lt;/h2>
&lt;ul>
&lt;li>🔄 &lt;strong>Enhanced TWFE Model&lt;/strong>:
$$
y_{igt} = \mu_g + \eta_t + \tau_{gt} D_{gt} + \epsilon_{igt}
$$
&lt;ul>
&lt;li>Allows treatment effects ($ \tau_{gt} $) to vary by group and time.&lt;/li>
&lt;li>Aggregates group-time average treatment effects into an overall average treatment effect ($ \tau $).&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="dynamic-event-study-twfe-model">Dynamic Event-Study TWFE Model&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>🔄 &lt;strong>Model&lt;/strong>:
$$
y_{igt} = \mu_g + \eta_t + \sum_{k=-L}^{-2} \tau_k D_{gt}^k + \sum_{k=0}^{K} \tau_k D_{gt}^k + \epsilon_{igt}
$$&lt;/p>
&lt;ul>
&lt;li>Allows for treatment effects to change over time.&lt;/li>
&lt;li>$ D_{gt}^k $: Lags and leads of treatment status.&lt;/li>
&lt;li>Coefficients ($ \tau_k $) represent the average effect of being treated for $ k $ periods.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>🎯 &lt;strong>Estimation Goals&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Objective&lt;/strong>: Estimate the average treatment effect of being exposed for $ k $ periods.&lt;/li>
&lt;li>&lt;strong>Average Treatment Effect&lt;/strong>:
$$
\tau_k = \sum_{g,t : t-g=k} \frac{N_{gt}}{N_k} \tau_{gt}
$$
&lt;ul>
&lt;li>$ N_{gt} $: Number of observations in group $ g $ and time $ t $.&lt;/li>
&lt;li>$ N_k $: Total number of observations with $ t - g = k $.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="negative-weighting-problem">Negative Weighting Problem&lt;/h2>
&lt;ul>
&lt;li>❗ &lt;strong>Issue&lt;/strong>: Traditional TWFE models can produce estimates with negative weights, leading to biased overall treatment effect estimates.&lt;/li>
&lt;li>🛠 &lt;strong>Solution by Gardner (2021)&lt;/strong>:
&lt;ul>
&lt;li>Use a two-stage approach to estimate group and time fixed-effects from untreated/not-yet-treated observations and then estimate treatment effects using residualized outcomes.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="two-stage-differences-in-differences">Two-stage differences in differences&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>🌱 &lt;strong>Gardner (2021) Approach&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>🔍 &lt;strong>Key Insight&lt;/strong>: Under parallel trends, group and time effects are identified from the untreated/not-yet-treated observations.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>📜 &lt;strong>Procedure&lt;/strong>:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>🥇 &lt;strong>First Stage&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Estimate the model:&lt;/p>
&lt;p>\begin{equation}
y_{igt} = \mu_g + \eta_t + \epsilon_{igt}
\end{equation}&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Using only untreated/not-yet-treated observations ($D_{gt} = 0$).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Obtain estimates for group and time effects ($\mu_g$ and $\eta_t$).&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>🥈 &lt;strong>Second Stage&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>Regress adjusted outcomes ($y_{igt} - \mu_g - \eta_t$) on treatment status ($D_{gt}$) in the full sample to estimate treatment effects ($\tau$).&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>
&lt;p>🎯 &lt;strong>Rationale&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>The parallel trends assumption implies that residuals ($\epsilon_{igt}$) are uncorrelated with the treatment dummy, leading to a consistent estimator for the average treatment effect.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;center>
&lt;div class="alert alert-note">
&lt;div>
Learn by coding using this &lt;a href="https://colab.research.google.com/drive/1A5zxj9SU8phTTCHBkt1fQkFX1xhFbycI?usp=sharing">Google Colab notebook&lt;/a>.
&lt;/div>
&lt;/div>
&lt;/center></description></item><item><title>Spatial Panel Regression in Stata: Cigarette Demand Across US States</title><link>https://carlos-mendez.org/post/stata_sp_regression_panel/</link><pubDate>Fri, 01 Dec 2023 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/stata_sp_regression_panel/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>Cigarette taxation is a state-level policy instrument, but consumption in one state does not exist in isolation. When a state raises its tobacco tax, consumers near state borders may simply drive across to buy cheaper cigarettes in a neighboring state. This &lt;strong>cross-border shopping&lt;/strong> effect means that a state&amp;rsquo;s cigarette consumption depends not only on its own prices and income but also on the prices and income of its neighbors. Standard panel data models &amp;mdash; pooled OLS, fixed effects, and two-way fixed effects &amp;mdash; cannot capture these spatial spillovers because they treat each state as an independent observation.&lt;/p>
&lt;p>This tutorial introduces &lt;strong>spatial panel regression&lt;/strong> as a framework for modeling geographic interdependence in panel data. We use the classic Baltagi cigarette demand dataset, which tracks per-capita cigarette consumption, real prices, and real per-capita income across 46 US states from 1963 to 1992. Starting from non-spatial panel models as a baseline, we progressively build toward the &lt;strong>Spatial Durbin Model (SDM)&lt;/strong> &amp;mdash; a flexible specification that includes both the spatial lag of the dependent variable and spatial lags of the explanatory variables. We then use &lt;strong>Wald tests&lt;/strong> to determine whether simpler spatial models (SAR, SLX, or SEM) are adequate, and finally extend the framework to &lt;strong>dynamic spatial panels&lt;/strong> that account for habit persistence in cigarette consumption.&lt;/p>
&lt;p>All estimation is performed using the &lt;code>xsmle&lt;/code> package in Stata, which implements maximum likelihood estimation for a family of spatial panel models with fixed effects. The spatial weight matrix is a binary contiguity matrix that defines two states as neighbors if they share a common border, row-standardized so that the spatial lag of a variable equals the average value among a state&amp;rsquo;s neighbors.&lt;/p>
&lt;h3 id="learning-objectives">Learning objectives&lt;/h3>
&lt;ul>
&lt;li>Estimate non-spatial panel models (pooled OLS, region FE, time FE, two-way FE) and compare their price and income elasticities&lt;/li>
&lt;li>Construct and load a row-standardized spatial weight matrix for panel data in Stata&lt;/li>
&lt;li>Estimate the Spatial Durbin Model (SDM) with two-way fixed effects using the &lt;code>xsmle&lt;/code> package&lt;/li>
&lt;li>Apply the Lee and Yu bias correction for spatial panels with moderate time dimensions&lt;/li>
&lt;li>Use Wald tests to evaluate whether the SDM simplifies to SAR, SLX, or SEM&lt;/li>
&lt;li>Estimate dynamic spatial panel models with temporal and spatiotemporal lags to capture habit persistence&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="2-the-modeling-pipeline">2. The modeling pipeline&lt;/h2>
&lt;p>The tutorial follows a progressive approach &amp;mdash; each stage builds on the previous one by relaxing assumptions and adding complexity. The diagram below summarizes the path from data preparation through the final dynamic spatial models.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
A[&amp;quot;&amp;lt;b&amp;gt;Data &amp;amp; W&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Section 3&amp;lt;/i&amp;gt;&amp;lt;br/&amp;gt;Panel setup&amp;lt;br/&amp;gt;Weight matrix&amp;quot;]
B[&amp;quot;&amp;lt;b&amp;gt;Non-Spatial&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Section 4&amp;lt;/i&amp;gt;&amp;lt;br/&amp;gt;OLS, FE,&amp;lt;br/&amp;gt;Two-way FE&amp;quot;]
C[&amp;quot;&amp;lt;b&amp;gt;SDM&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Section 6&amp;lt;/i&amp;gt;&amp;lt;br/&amp;gt;Spatial Durbin&amp;lt;br/&amp;gt;+ Lee-Yu&amp;quot;]
D[&amp;quot;&amp;lt;b&amp;gt;Wald Tests&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Section 7&amp;lt;/i&amp;gt;&amp;lt;br/&amp;gt;SAR? SLX?&amp;lt;br/&amp;gt;SEM?&amp;quot;]
E[&amp;quot;&amp;lt;b&amp;gt;Dynamic&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Section 8&amp;lt;/i&amp;gt;&amp;lt;br/&amp;gt;Temporal &amp;amp;&amp;lt;br/&amp;gt;spatial lags&amp;quot;]
A --&amp;gt; B
B --&amp;gt; C
C --&amp;gt; D
D --&amp;gt; E
style A fill:#6a9bcc,stroke:#141413,color:#fff
style B fill:#d97757,stroke:#141413,color:#fff
style C fill:#00d4c8,stroke:#141413,color:#141413
style D fill:#141413,stroke:#d97757,color:#fff
style E fill:#6a9bcc,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>We first establish non-spatial benchmarks to understand the baseline price and income elasticities. Then we introduce the Spatial Durbin Model to capture spillovers, apply Wald tests to check whether a simpler spatial specification suffices, and finally add dynamic components to account for the habit-forming nature of cigarette consumption.&lt;/p>
&lt;hr>
&lt;h2 id="3-setup-and-data-loading">3. Setup and data loading&lt;/h2>
&lt;p>Before running any spatial models, we need three Stata packages: &lt;code>spmat&lt;/code> for spatial weight matrix management, &lt;code>xsmle&lt;/code> for spatial panel estimation, and &lt;code>spwmatrix&lt;/code> for weight matrix conversion. If you have not installed them, uncomment the &lt;code>net install&lt;/code> lines below.&lt;/p>
&lt;pre>&lt;code class="language-stata">clear all
macro drop _all
set more off
version 12
* Install packages (uncomment if needed)
*net install st0292, from(http://www.stata-journal.com/software/sj13-2)
*net install xsmle, from(http://fmwww.bc.edu/RePEc/bocode/x)
*net install spwmatrix, from(http://fmwww.bc.edu/RePEc/bocode/s)
&lt;/code>&lt;/pre>
&lt;h3 id="31-spatial-weight-matrix">3.1 Spatial weight matrix&lt;/h3>
&lt;p>The spatial weight matrix &lt;strong>W&lt;/strong> defines the neighborhood structure among the 46 US states. We use a binary contiguity matrix where two states are neighbors if they share a common border. The matrix is stored in a &lt;code>.dta&lt;/code> file and converted to an &lt;code>spmat&lt;/code> object with row-standardization &amp;mdash; meaning that each row sums to one, so the spatial lag of a variable equals the &lt;strong>weighted average&lt;/strong> among a state&amp;rsquo;s neighbors.&lt;/p>
&lt;pre>&lt;code class="language-stata">* Load binary contiguity W matrix and convert to row-standardized spmat object
use &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/cigar/Wct_bin.dta&amp;quot;, replace
spmat dta Wst m1-m46, norm(row) replace
&lt;/code>&lt;/pre>
&lt;p>The &lt;code>spmat dta&lt;/code> command reads columns &lt;code>m1&lt;/code> through &lt;code>m46&lt;/code> from the loaded dataset and stores them as a spatial weight matrix object named &lt;code>Wst&lt;/code>. The &lt;code>norm(row)&lt;/code> option applies row-standardization, and &lt;code>replace&lt;/code> overwrites any existing matrix with the same name.&lt;/p>
&lt;h3 id="32-panel-data-setup">3.2 Panel data setup&lt;/h3>
&lt;p>The Baltagi cigarette demand dataset contains three variables measured across 46 US states and 30 years (1963&amp;ndash;1992): log per-capita cigarette consumption (&lt;code>logc&lt;/code>), log real cigarette price (&lt;code>logp&lt;/code>), and log real per-capita disposable income (&lt;code>logy&lt;/code>).&lt;/p>
&lt;pre>&lt;code class="language-stata">* Load panel data
use &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/cigar/baltagi_cigar.dta&amp;quot;, clear
sort year state
xtset state year
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Panel variable: state (strongly balanced)
Time variable: year, 1963 to 1992
Delta: 1 unit
&lt;/code>&lt;/pre>
&lt;p>The panel is &lt;strong>strongly balanced&lt;/strong> &amp;mdash; all 46 states are observed in all 30 years, yielding 1,380 total observations. This balanced structure simplifies estimation and avoids the complications of missing data.&lt;/p>
&lt;h3 id="33-panel-summary-statistics">3.3 Panel summary statistics&lt;/h3>
&lt;p>The &lt;code>xtsum&lt;/code> command decomposes each variable&amp;rsquo;s variation into between-state and within-state components &amp;mdash; a key diagnostic for understanding what panel models can and cannot identify.&lt;/p>
&lt;pre>&lt;code class="language-stata">xtsum
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Variable | Mean Std. dev. Min Max | Observations
-----------------+--------------------------------------------+----------------
logc overall | 4.625563 .2538233 3.736352 5.399758 | N = 1380
between | .225498 4.057739 5.19628 | n = 46
within | .1254968 4.110718 5.070093 | T = 30
| |
logp overall | 3.648067 .3364439 2.579455 4.588055 | N = 1380
between | .1927783 3.22723 4.021831 | n = 46
within | .2798008 2.780289 4.372397 | T = 30
| |
logy overall | 1.615786 .248717 .8676362 2.253795 | N = 1380
between | .1363281 1.294913 2.063736 | n = 46
within | .2098697 1.035539 2.106283 | T = 30
&lt;/code>&lt;/pre>
&lt;h3 id="variables">Variables&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>Mean&lt;/th>
&lt;th>Std. Dev.&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>logc&lt;/code>&lt;/td>
&lt;td>Log per-capita cigarette consumption (packs)&lt;/td>
&lt;td>4.626&lt;/td>
&lt;td>0.254&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>logp&lt;/code>&lt;/td>
&lt;td>Log real price per pack (cents)&lt;/td>
&lt;td>3.648&lt;/td>
&lt;td>0.336&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>logy&lt;/code>&lt;/td>
&lt;td>Log real per-capita disposable income&lt;/td>
&lt;td>1.616&lt;/td>
&lt;td>0.249&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Mean log consumption is 4.63, corresponding to roughly 102 packs per capita per year. The between-state standard deviation of &lt;code>logc&lt;/code> (0.225) is larger than the within-state standard deviation (0.125), indicating that cross-state differences in consumption levels are more pronounced than changes within a single state over time. For &lt;code>logp&lt;/code>, the pattern reverses &amp;mdash; within-state variation (0.280) exceeds between-state variation (0.193), reflecting the fact that real prices changed substantially over this 30-year period due to tax policy changes and inflation. This decomposition foreshadows why fixed effects models, which exploit within-state variation, may produce different elasticity estimates than pooled models.&lt;/p>
&lt;hr>
&lt;h2 id="4-non-spatial-panel-models">4. Non-spatial panel models&lt;/h2>
&lt;p>Before introducing spatial dependence, we estimate four standard panel specifications to establish baseline price and income elasticities. Each model relaxes a different assumption about unobserved heterogeneity, and comparing their estimates reveals how sensitive the results are to the treatment of state-level and time-level confounders.&lt;/p>
&lt;h3 id="41-pooled-ols">4.1 Pooled OLS&lt;/h3>
&lt;p>Pooled OLS treats all 1,380 observations as independent, ignoring the panel structure entirely. It provides a naive benchmark.&lt;/p>
&lt;pre>&lt;code class="language-stata">reg logc logp logy
estimates store pool
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Source | SS df MS Number of obs = 1,380
-------------+---------------------------------- F(2, 1377) = 199.28
Model | 21.564818 2 10.7824090 Prob &amp;gt; F = 0.0000
Residual | 74.518523 1,377 .054116576 R-squared = 0.2244
-------------+---------------------------------- Adj R-squared = 0.2233
Total | 96.083341 1,379 .069676098 Root MSE = .23284
------------------------------------------------------------------------------
logc | Coefficient Std. err. t P&amp;gt;|t| [95% conf. interval]
-------------+----------------------------------------------------------------
logp | -.3857227 .0309752 -12.45 0.000 -.4464987 -.3249467
logy | .3724439 .0264568 14.08 0.000 .3205328 .4243551
_cons | 4.396312 .0531992 82.64 0.000 4.291951 4.500674
------------------------------------------------------------------------------
&lt;/code>&lt;/pre>
&lt;p>Pooled OLS estimates a price elasticity of &lt;strong>-0.386&lt;/strong> and an income elasticity of &lt;strong>0.372&lt;/strong>, both statistically significant at the 1% level. However, the R-squared is only 0.224, and more importantly, this model assumes no systematic differences across states &amp;mdash; an untenable assumption given the large between-state variation we observed in the summary statistics.&lt;/p>
&lt;h3 id="42-region-fixed-effects">4.2 Region fixed effects&lt;/h3>
&lt;p>Region (state) fixed effects control for all time-invariant state characteristics &amp;mdash; geographic location, cultural attitudes toward smoking, historical tobacco production, and any other state-specific factor that does not change over the sample period.&lt;/p>
&lt;pre>&lt;code class="language-stata">xtreg logc logp logy, fe
estimates store rfe
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Fixed-effects (within) regression Number of obs = 1,380
Group variable: state Number of groups = 46
R-squared: Obs per group:
Within = 0.4059 min = 30
Between = 0.0681 avg = 30.0
Overall = 0.1050 max = 30
F(2,1332) = 455.52
corr(u_i, Xb) = -0.8072 Prob &amp;gt; F = 0.0000
------------------------------------------------------------------------------
logc | Coefficient Std. err. t P&amp;gt;|t| [95% conf. interval]
-------------+----------------------------------------------------------------
logp | -.2307217 .0276419 -8.35 0.000 -.2849426 -.1765008
logy | -.0145419 .0389849 -0.37 0.709 -.0910300 .0619462
_cons | 4.619736 .0542965 85.09 0.000 4.513180 4.726293
------------------------------------------------------------------------------
sigma_u | .21834832
sigma_e | .09498463
rho | .84090063 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(45, 1332) = 85.78 Prob &amp;gt; F = 0.0000
&lt;/code>&lt;/pre>
&lt;p>After controlling for state fixed effects, the price elasticity drops to &lt;strong>-0.231&lt;/strong> &amp;mdash; substantially smaller in magnitude than the pooled OLS estimate of -0.386. This difference reveals that much of the apparent price sensitivity in pooled OLS was driven by &lt;strong>cross-state composition effects&lt;/strong>: low-price states tend to have higher consumption for reasons unrelated to price (e.g., tobacco-producing states have both lower prices and stronger smoking cultures). The income elasticity becomes statistically insignificant at &lt;strong>-0.015&lt;/strong> (p = 0.709), suggesting that within-state income changes over time do not strongly predict consumption changes once state-level heterogeneity is absorbed. The F-test for joint significance of state fixed effects is overwhelming (F = 85.78, p &amp;lt; 0.001), confirming that state heterogeneity is substantial.&lt;/p>
&lt;h3 id="43-time-fixed-effects">4.3 Time fixed effects&lt;/h3>
&lt;p>Time fixed effects control for shocks common to all states in a given year &amp;mdash; federal anti-smoking campaigns, national health reports (such as the 1964 Surgeon General&amp;rsquo;s report), and macroeconomic fluctuations.&lt;/p>
&lt;pre>&lt;code class="language-stata">reg logc logp logy i.year
estimates store tfe
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Source | SS df MS Number of obs = 1,380
-------------+---------------------------------- F(31, 1348) = 41.04
Model | 48.7107267 31 1.57131054 Prob &amp;gt; F = 0.0000
Residual | 47.3726143 1,348 .03514290 R-squared = 0.5070
-------------+---------------------------------- Adj R-squared = 0.4957
Total | 96.083341 1,379 .069676098 Root MSE = .18747
------------------------------------------------------------------------------
logc | Coefficient Std. err. t P&amp;gt;|t| [95% conf. interval]
-------------+----------------------------------------------------------------
logp | -.8612867 .0389729 -22.10 0.000 -.9377676 -.7848058
logy | .8045032 .0466019 17.26 0.000 .7130647 .8959417
_cons | 3.958816 .0638297 62.02 0.000 3.833551 4.084081
------------------------------------------------------------------------------
&lt;/code>&lt;/pre>
&lt;p>With time fixed effects, the price elasticity jumps to &lt;strong>-0.861&lt;/strong> and the income elasticity to &lt;strong>0.805&lt;/strong> &amp;mdash; both much larger in magnitude than the pooled OLS estimates. By removing common year-level trends (such as the secular decline in smoking rates after the Surgeon General&amp;rsquo;s report), the model isolates cross-state differences in a given year. The R-squared increases to 0.507, a substantial improvement over pooled OLS.&lt;/p>
&lt;h3 id="44-two-way-fixed-effects">4.4 Two-way fixed effects&lt;/h3>
&lt;p>Two-way fixed effects combine state and time dummies, controlling simultaneously for state-specific time-invariant factors and year-specific common shocks. This is the most thorough non-spatial specification and serves as our benchmark.&lt;/p>
&lt;pre>&lt;code class="language-stata">xtreg logc logp logy i.year, fe
estimates store rtfe
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Fixed-effects (within) regression Number of obs = 1,380
Group variable: state Number of groups = 46
R-squared: Obs per group:
Within = 0.7891 min = 30
Between = 0.0121 avg = 30.0
Overall = 0.0456 max = 30
F(31,1303) = 157.60
corr(u_i, Xb) = -0.5688 Prob &amp;gt; F = 0.0000
------------------------------------------------------------------------------
logc | Coefficient Std. err. t P&amp;gt;|t| [95% conf. interval]
-------------+----------------------------------------------------------------
logp | -.4020279 .0272553 -14.75 0.000 -.4555018 -.3485541
logy | .1193476 .0478095 2.50 0.013 .0255202 .2131749
_cons | 4.515994 .0533810 84.59 0.000 4.411254 4.620733
------------------------------------------------------------------------------
sigma_u | .21428785
sigma_e | .05601281
rho | .93607854 (fraction of variance due to u_i)
------------------------------------------------------------------------------
&lt;/code>&lt;/pre>
&lt;p>The two-way FE model yields a price elasticity of &lt;strong>-0.402&lt;/strong> and an income elasticity of &lt;strong>0.119&lt;/strong>. The within R-squared is 0.789, a dramatic improvement over the region-only FE model (0.406), indicating that year effects absorb a large share of temporal variation. The price elasticity is roughly intermediate between the region-FE (-0.231) and time-FE (-0.861) estimates, illustrating how the choice of fixed effects changes the identifying variation and the resulting elasticity.&lt;/p>
&lt;h3 id="45-comparison-of-non-spatial-models">4.5 Comparison of non-spatial models&lt;/h3>
&lt;pre>&lt;code class="language-stata">estimates table pool rfe tfe rtfe, b(%7.2f) star(0.1 0.05 0.01) stf(%9.0f)
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>Pooled OLS&lt;/th>
&lt;th>Region FE&lt;/th>
&lt;th>Time FE&lt;/th>
&lt;th>Two-way FE&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>logp&lt;/code>&lt;/td>
&lt;td>-0.39***&lt;/td>
&lt;td>-0.23***&lt;/td>
&lt;td>-0.86***&lt;/td>
&lt;td>-0.40***&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>logy&lt;/code>&lt;/td>
&lt;td>0.37***&lt;/td>
&lt;td>-0.01&lt;/td>
&lt;td>0.80***&lt;/td>
&lt;td>0.12**&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>R-sq&lt;/td>
&lt;td>0.224&lt;/td>
&lt;td>0.406&lt;/td>
&lt;td>0.507&lt;/td>
&lt;td>0.789&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The four specifications tell a coherent story: price has a &lt;strong>consistently negative&lt;/strong> effect on cigarette consumption, but the magnitude varies from -0.23 (region FE) to -0.86 (time FE) depending on which sources of variation are exploited. The two-way FE estimate of -0.40 is the most credible non-spatial benchmark because it controls for both state heterogeneity and common time trends. However, all four models assume that each state&amp;rsquo;s consumption depends only on its &lt;strong>own&lt;/strong> price and income &amp;mdash; an assumption we will relax in the next section.&lt;/p>
&lt;hr>
&lt;h2 id="5-why-spatial-models">5. Why spatial models?&lt;/h2>
&lt;p>Even with two-way fixed effects, the models above ignore a potentially important channel: &lt;strong>spatial spillovers&lt;/strong>. If Virginia raises its cigarette tax, smokers in bordering states might change their behavior too &amp;mdash; either because they no longer cross into Virginia to buy cheaper cigarettes, or because Virginia&amp;rsquo;s policy signals a broader regional trend. Similarly, a rise in income in one state may increase consumption in neighboring states through commuting, trade, and social networks.&lt;/p>
&lt;p>The &lt;strong>Spatial Durbin Model (SDM)&lt;/strong> is a flexible framework that captures these spillovers through two channels:&lt;/p>
&lt;p>$$y_{it} = \rho \sum_{j=1}^{N} w_{ij} y_{jt} + x_{it} \beta + \sum_{j=1}^{N} w_{ij} x_{jt} \theta + \mu_i + \lambda_t + \varepsilon_{it}$$&lt;/p>
&lt;p>In words, this equation says that cigarette consumption in state $i$ at time $t$ depends on three spatial components: (1) the &lt;strong>spatial lag of the dependent variable&lt;/strong> $\rho W y$ &amp;mdash; how much a state&amp;rsquo;s consumption is influenced by its neighbors' consumption, (2) the &lt;strong>own effects&lt;/strong> of price and income $X \beta$, and (3) the &lt;strong>spatial lags of the explanatory variables&lt;/strong> $W X \theta$ &amp;mdash; how neighbors' prices and incomes spill over. The parameters $\mu_i$ and $\lambda_t$ are state and year fixed effects, respectively.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Symbol&lt;/th>
&lt;th>Meaning&lt;/th>
&lt;th>Code variable&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>$y_{it}$&lt;/td>
&lt;td>Log cigarette consumption in state $i$, year $t$&lt;/td>
&lt;td>&lt;code>logc&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\rho$&lt;/td>
&lt;td>Spatial autoregressive parameter (neighbor consumption effect)&lt;/td>
&lt;td>&lt;code>[Spatial]rho&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$w_{ij}$&lt;/td>
&lt;td>Element of the row-standardized weight matrix&lt;/td>
&lt;td>&lt;code>Wst&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$x_{it}$&lt;/td>
&lt;td>Own price and income&lt;/td>
&lt;td>&lt;code>logp&lt;/code>, &lt;code>logy&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\beta$&lt;/td>
&lt;td>Own-variable coefficients&lt;/td>
&lt;td>&lt;code>[Main]logp&lt;/code>, &lt;code>[Main]logy&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\theta$&lt;/td>
&lt;td>Spatial lag coefficients (neighbor effects of X)&lt;/td>
&lt;td>&lt;code>[Wx]logp&lt;/code>, &lt;code>[Wx]logy&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>A key advantage of the SDM is that it &lt;strong>nests&lt;/strong> three simpler spatial models as special cases. This means we can start with the general SDM and then test whether the data supports reducing it to a simpler specification.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
SDM[&amp;quot;&amp;lt;b&amp;gt;Spatial Durbin Model (SDM)&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;y = ρWy + Xβ + WXθ + ε&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Most general&amp;lt;/i&amp;gt;&amp;quot;]
SAR[&amp;quot;&amp;lt;b&amp;gt;SAR&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;y = ρWy + Xβ + ε&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;θ = 0&amp;lt;/i&amp;gt;&amp;quot;]
SLX[&amp;quot;&amp;lt;b&amp;gt;SLX&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;y = Xβ + WXθ + ε&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;ρ = 0&amp;lt;/i&amp;gt;&amp;quot;]
SEM[&amp;quot;&amp;lt;b&amp;gt;SEM&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;y = Xβ + u, u = λWu + ε&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;θ + ρβ = 0&amp;lt;/i&amp;gt;&amp;quot;]
SDM --&amp;gt;|&amp;quot;θ = 0?&amp;quot;| SAR
SDM --&amp;gt;|&amp;quot;ρ = 0?&amp;quot;| SLX
SDM --&amp;gt;|&amp;quot;θ + ρβ = 0?&amp;quot;| SEM
style SDM fill:#00d4c8,stroke:#141413,color:#141413
style SAR fill:#6a9bcc,stroke:#141413,color:#fff
style SLX fill:#d97757,stroke:#141413,color:#fff
style SEM fill:#141413,stroke:#d97757,color:#fff
&lt;/code>&lt;/pre>
&lt;p>The &lt;strong>SAR&lt;/strong> (Spatial Autoregressive) model restricts $\theta = 0$, assuming that only neighbors' consumption (not their prices or incomes) matters. The &lt;strong>SLX&lt;/strong> (Spatial Lag of X) model restricts $\rho = 0$, assuming that neighbors' characteristics affect local consumption but there is no autoregressive feedback. The &lt;strong>SEM&lt;/strong> (Spatial Error Model) imposes the common factor restriction $\theta + \rho \beta = 0$, implying that spatial dependence operates entirely through correlated errors rather than substantive spillovers. In Section 7, we will use Wald tests to determine which, if any, of these restrictions the data supports.&lt;/p>
&lt;hr>
&lt;h2 id="6-spatial-durbin-model-sdm">6. Spatial Durbin Model (SDM)&lt;/h2>
&lt;h3 id="61-sdm-with-two-way-fixed-effects">6.1 SDM with two-way fixed effects&lt;/h3>
&lt;p>We now estimate the full Spatial Durbin Model with both state and year fixed effects. The &lt;code>xsmle&lt;/code> command performs maximum likelihood estimation for spatial panel models. The option &lt;code>type(both)&lt;/code> specifies two-way fixed effects, &lt;code>mod(sdm)&lt;/code> selects the Spatial Durbin specification, and &lt;code>effects nsim(999)&lt;/code> computes direct and indirect effects using 999 Monte Carlo simulations.&lt;/p>
&lt;pre>&lt;code class="language-stata">xsmle logc logp logy, fe type(both) wmat(Wst) mod(sdm) effects nsim(999) nolog
estimates store sdm1
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Spatial Durbin model with fixed-effects Number of obs = 1,380
Group variable: state Number of groups = 46
Time variable: year
Obs per group:
min = 30
avg = 30.0
max = 30
Wald chi2(4) = 379.19
Log-likelihood = 1971.5204 Prob &amp;gt; chi2 = 0.0000
------------------------------------------------------------------------------
logc | Coefficient Std. err. z P&amp;gt;|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Main |
logp | -.3068973 .0282114 -10.88 0.000 -.3621907 -.2516039
logy | .0781427 .0481269 1.62 0.104 -.0161843 .1724697
-------------+----------------------------------------------------------------
Wx |
logp | -.2060671 .0649703 -3.17 0.002 -.3334065 -.0787277
logy | .1803542 .0885162 2.04 0.042 .0068656 .3538428
-------------+----------------------------------------------------------------
Spatial |
rho | .2649571 .0327948 8.08 0.000 .2006804 .3292339
-------------+----------------------------------------------------------------
sigma2_e| .0027866
------------------------------------------------------------------------------
Direct | -.3131508 .0285649 -10.96 0.000 -.3691370 -.2571645
Indirect | -.3138174 .0812337 -3.86 0.000 -.4730325 -.1546023
Total | -.6269682 .0866710 -7.23 0.000 -.7968403 -.4570961
|
Direct | .0941302 .0488720 1.93 0.054 -.0016572 .1899176
Indirect | .2683417 .1099814 2.44 0.015 .0527821 .4839013
Total | .3624719 .1216523 2.98 0.003 .1240378 .6009060
&lt;/code>&lt;/pre>
&lt;p>The spatial autoregressive parameter $\rho$ is &lt;strong>0.265&lt;/strong> (z = 8.08, p &amp;lt; 0.001), indicating substantial positive spatial dependence &amp;mdash; states with higher-consuming neighbors tend to consume more themselves, even after controlling for own prices and income. The own price coefficient (&lt;code>[Main]logp&lt;/code>) is -0.307, while the spatial lag of neighbors' prices (&lt;code>[Wx]logp&lt;/code>) is -0.206, meaning that higher prices in neighboring states also reduce local consumption. This is consistent with the cross-border shopping hypothesis: when neighbors' prices rise, there are fewer opportunities for local consumers to shop across borders, reinforcing the local price effect.&lt;/p>
&lt;p>The &lt;strong>direct effect&lt;/strong> of price is -0.313, meaning that a 1% increase in a state&amp;rsquo;s own price reduces its consumption by 0.31%. The &lt;strong>indirect (spillover) effect&lt;/strong> of price is -0.314, nearly as large as the direct effect. This means that when all neighboring states raise prices by 1%, the resulting reduction in consumption in the focal state is comparable to the state raising its own price. The &lt;strong>total effect&lt;/strong> of price is -0.627 &amp;mdash; much larger than the two-way FE estimate of -0.402, revealing that non-spatial models substantially underestimate the true price sensitivity of cigarette demand.&lt;/p>
&lt;h3 id="62-lee-and-yu-bias-correction">6.2 Lee and Yu bias correction&lt;/h3>
&lt;p>In spatial panels with fixed effects, the maximum likelihood estimator suffers from the &lt;strong>incidental parameters problem&lt;/strong> &amp;mdash; the number of fixed effect parameters grows with the number of states, which introduces a bias term of order $1/T$. With $T = 30$ years, this bias may be non-negligible. Lee and Yu (2010) proposed a bias correction procedure that adjusts the ML estimates to eliminate the leading bias term.&lt;/p>
&lt;pre>&lt;code class="language-stata">xsmle logc logp logy, fe type(both) leeyu wmat(Wst) mod(sdm) effects nsim(999) nolog
estimates store sdm2
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Spatial Durbin model with fixed-effects (Lee-Yu) Number of obs = 1,334
Group variable: state Number of groups = 46
Time variable: year
Obs per group:
min = 29
avg = 29.0
max = 29
Wald chi2(4) = 392.50
Log-likelihood = 1932.4681 Prob &amp;gt; chi2 = 0.0000
------------------------------------------------------------------------------
logc | Coefficient Std. err. z P&amp;gt;|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Main |
logp | -.3044782 .0283901 -10.72 0.000 -.3601218 -.2488346
logy | .0770150 .0486311 1.58 0.113 -.0183001 .1723301
-------------+----------------------------------------------------------------
Wx |
logp | -.2083124 .0654876 -3.18 0.001 -.3366657 -.0799591
logy | .1869831 .0894718 2.09 0.037 .0116216 .3623446
-------------+----------------------------------------------------------------
Spatial |
rho | .2596348 .0332441 7.81 0.000 .1944776 .3247920
-------------+----------------------------------------------------------------
sigma2_e| .0027512
------------------------------------------------------------------------------
Direct | -.3104271 .0287814 -10.79 0.000 -.3668377 -.2540166
Indirect | -.3122946 .0825781 -3.78 0.000 -.4741447 -.1504446
Total | -.6227218 .0878439 -7.09 0.000 -.7948927 -.4505509
|
Direct | .0935487 .0494610 1.89 0.059 -.0033931 .1904905
Indirect | .2739264 .1115282 2.46 0.014 .0553351 .4925177
Total | .3674751 .1235608 2.97 0.003 .1253004 .6096498
&lt;/code>&lt;/pre>
&lt;p>The Lee-Yu correction uses $N \times (T-1) = 46 \times 29 = 1{,}334$ observations (one time period is lost in the transformation). The corrected estimates are very close to the uncorrected ones: $\rho$ changes from 0.265 to &lt;strong>0.260&lt;/strong>, the own price coefficient from -0.307 to -0.304, and the total price effect from -0.627 to &lt;strong>-0.623&lt;/strong>. This stability is reassuring &amp;mdash; with $T = 30$, the bias is already small. The closeness of the two sets of estimates provides confidence that the standard ML estimates are reliable for this dataset.&lt;/p>
&lt;h3 id="63-comparison">6.3 Comparison&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>SDM (standard)&lt;/th>
&lt;th>SDM (Lee-Yu)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>$\rho$&lt;/td>
&lt;td>0.265***&lt;/td>
&lt;td>0.260***&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>logp&lt;/code> (own)&lt;/td>
&lt;td>-0.307***&lt;/td>
&lt;td>-0.304***&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>logy&lt;/code> (own)&lt;/td>
&lt;td>0.078&lt;/td>
&lt;td>0.077&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>W*logp&lt;/code> (neighbors)&lt;/td>
&lt;td>-0.206***&lt;/td>
&lt;td>-0.208***&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>W*logy&lt;/code> (neighbors)&lt;/td>
&lt;td>0.180**&lt;/td>
&lt;td>0.187**&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Direct price effect&lt;/td>
&lt;td>-0.313***&lt;/td>
&lt;td>-0.310***&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Indirect price effect&lt;/td>
&lt;td>-0.314***&lt;/td>
&lt;td>-0.312***&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Total price effect&lt;/td>
&lt;td>-0.627***&lt;/td>
&lt;td>-0.623***&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The two sets of estimates are nearly identical, confirming that the incidental parameters bias is negligible with 30 time periods. For the remainder of this tutorial, we use the Lee-Yu corrected estimates as our preferred specification.&lt;/p>
&lt;hr>
&lt;h2 id="7-wald-specification-tests">7. Wald specification tests&lt;/h2>
&lt;p>The SDM is the most general model in the spatial panel family, nesting SAR, SLX, and SEM as special cases. Before accepting the full SDM, we should test whether the data supports a simpler specification. We do this by testing the parameter restrictions that define each nested model. If the restrictions are rejected, the simpler model is inadequate and we should retain the SDM.&lt;/p>
&lt;p>We first re-estimate the SDM with the Lee-Yu correction (the &lt;code>quietly&lt;/code> prefix suppresses output since we already displayed these results).&lt;/p>
&lt;pre>&lt;code class="language-stata">quietly xsmle logc logp logy, fe type(both) leeyu wmat(Wst) mod(sdm) effects nsim(999) nolog
&lt;/code>&lt;/pre>
&lt;h3 id="71-can-the-sdm-reduce-to-sar">7.1 Can the SDM reduce to SAR?&lt;/h3>
&lt;p>The SAR model restricts $\theta = 0$ &amp;mdash; that is, the spatial lags of the explanatory variables are zero. Under SAR, only neighbors' consumption matters, not their prices or incomes directly. We test this with a joint Wald test on the &lt;code>[Wx]&lt;/code> coefficients.&lt;/p>
&lt;pre>&lt;code class="language-stata">* Wald test: Reduce to SAR? (NO if p &amp;lt; 0.05)
test ([Wx]logp = 0) ([Wx]logy = 0)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> ( 1) [Wx]logp = 0
( 2) [Wx]logy = 0
chi2( 2) = 12.87
Prob &amp;gt; chi2 = 0.0016
&lt;/code>&lt;/pre>
&lt;p>The Wald test &lt;strong>rejects&lt;/strong> the SAR restriction (chi2 = 12.87, p = 0.002). This means that neighbors' prices and incomes have direct effects on local consumption beyond their influence through the spatial lag of consumption. Dropping the $WX$ terms from the model would misspecify the spatial dependence structure.&lt;/p>
&lt;h3 id="72-can-the-sdm-reduce-to-slx">7.2 Can the SDM reduce to SLX?&lt;/h3>
&lt;p>The SLX model restricts $\rho = 0$ &amp;mdash; there is no spatial autoregressive feedback through the dependent variable. Under SLX, neighbors' characteristics affect local consumption directly, but the spatial multiplier effect (where shocks propagate through the network) is absent.&lt;/p>
&lt;pre>&lt;code class="language-stata">* Wald test: Reduce to SLX? (NO if p &amp;lt; 0.05)
test ([Spatial]rho = 0)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> ( 1) [Spatial]rho = 0
chi2( 1) = 61.04
Prob &amp;gt; chi2 = 0.0000
&lt;/code>&lt;/pre>
&lt;p>The Wald test &lt;strong>overwhelmingly rejects&lt;/strong> the SLX restriction (chi2 = 61.04, p &amp;lt; 0.001). The spatial autoregressive parameter $\rho$ is far from zero, confirming that there is a genuine feedback mechanism: a shock to consumption in one state propagates to its neighbors, which in turn affects their neighbors, creating a spatial multiplier.&lt;/p>
&lt;h3 id="73-can-the-sdm-reduce-to-sem">7.3 Can the SDM reduce to SEM?&lt;/h3>
&lt;p>The SEM (Spatial Error Model) imposes the common factor restriction $\theta + \rho \beta = 0$. Under this restriction, the spatial dependence is purely a &lt;strong>nuisance&lt;/strong> &amp;mdash; it enters through correlated error terms rather than through substantive economic spillovers. If SEM is adequate, the apparent spillover effects are an artifact of omitted spatially correlated variables, not genuine cross-border interactions.&lt;/p>
&lt;pre>&lt;code class="language-stata">* Wald test: Reduce to SEM? (NO if p &amp;lt; 0.05)
testnl ([Wx]logp = -[Spatial]rho*[Main]logp) ([Wx]logy = -[Spatial]rho*[Main]logy)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> (1) [Wx]logp = -[Spatial]rho*[Main]logp
(2) [Wx]logy = -[Spatial]rho*[Main]logy
chi2( 2) = 8.49
Prob &amp;gt; chi2 = 0.0143
&lt;/code>&lt;/pre>
&lt;p>The Wald test &lt;strong>rejects&lt;/strong> the SEM common factor restriction (chi2 = 8.49, p = 0.014). The spatial dependence in cigarette demand is not merely a nuisance in the error term &amp;mdash; it reflects &lt;strong>substantive economic spillovers&lt;/strong> across state borders. This is exactly what economic theory predicts: cross-border shopping creates genuine causal links between neighboring states' prices and local consumption.&lt;/p>
&lt;h3 id="74-summary-of-specification-tests">7.4 Summary of specification tests&lt;/h3>
&lt;pre>&lt;code class="language-mermaid">graph TD
SDM[&amp;quot;&amp;lt;b&amp;gt;Spatial Durbin Model (SDM)&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;RETAINED&amp;quot;]
SAR[&amp;quot;&amp;lt;b&amp;gt;SAR&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;θ = 0&amp;lt;br/&amp;gt;Rejected&amp;lt;br/&amp;gt;p = 0.002&amp;quot;]
SLX[&amp;quot;&amp;lt;b&amp;gt;SLX&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;ρ = 0&amp;lt;br/&amp;gt;Rejected&amp;lt;br/&amp;gt;p &amp;lt; 0.001&amp;quot;]
SEM[&amp;quot;&amp;lt;b&amp;gt;SEM&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;θ + ρβ = 0&amp;lt;br/&amp;gt;Rejected&amp;lt;br/&amp;gt;p = 0.014&amp;quot;]
SDM --&amp;gt;|&amp;quot;chi2 = 12.87&amp;quot;| SAR
SDM --&amp;gt;|&amp;quot;chi2 = 61.04&amp;quot;| SLX
SDM --&amp;gt;|&amp;quot;chi2 = 8.49&amp;quot;| SEM
style SDM fill:#00d4c8,stroke:#141413,color:#141413
style SAR fill:#d97757,stroke:#141413,color:#fff
style SLX fill:#d97757,stroke:#141413,color:#fff
style SEM fill:#d97757,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>All three Wald tests reject the restricted models. The SDM cannot be simplified to SAR (neighbors' X variables matter), SLX (the autoregressive feedback matters), or SEM (the spatial dependence is substantive, not a nuisance). The &lt;strong>full SDM is the appropriate specification&lt;/strong> for modeling cigarette demand across US states. This result confirms that spatial spillovers in cigarette consumption operate through multiple channels simultaneously: direct cross-border effects of neighbors' prices and incomes, and feedback effects through the spatial lag of consumption itself.&lt;/p>
&lt;hr>
&lt;h2 id="8-dynamic-spatial-panel-models">8. Dynamic spatial panel models&lt;/h2>
&lt;p>Cigarette consumption is well known to be &lt;strong>habit-forming&lt;/strong> &amp;mdash; past consumption is a strong predictor of current consumption because of nicotine addiction. Standard (static) spatial models ignore this temporal persistence, which may bias the spatial parameter estimates. Dynamic spatial panel models extend the SDM by including lagged values of consumption, allowing us to separate habit persistence from spatial spillovers.&lt;/p>
&lt;p>The &lt;code>xsmle&lt;/code> package supports three dynamic specifications through the &lt;code>dlag()&lt;/code> option:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;code>dlag()&lt;/code>&lt;/th>
&lt;th>Dynamic term added&lt;/th>
&lt;th>Interpretation&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>1&lt;/td>
&lt;td>$\tau \cdot y_{i,t-1}$&lt;/td>
&lt;td>Temporal lag: own past consumption&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>2&lt;/td>
&lt;td>$\psi \cdot \sum_j w_{ij} y_{j,t-1}$&lt;/td>
&lt;td>Spatiotemporal lag: neighbors' past consumption&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>3&lt;/td>
&lt;td>Both $\tau \cdot y_{i,t-1}$ and $\psi \cdot \sum_j w_{ij} y_{j,t-1}$&lt;/td>
&lt;td>Full dynamic: own + neighbors' past consumption&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The most general dynamic SDM (with &lt;code>dlag(3)&lt;/code>) extends the static equation from Section 5 by adding two lagged terms:&lt;/p>
&lt;p>$$y_{it} = \tau \, y_{i,t-1} + \psi \sum_{j=1}^{N} w_{ij} \, y_{j,t-1} + \rho \sum_{j=1}^{N} w_{ij} \, y_{jt} + x_{it} \beta + \sum_{j=1}^{N} w_{ij} \, x_{jt} \theta + \mu_i + \lambda_t + \varepsilon_{it}$$&lt;/p>
&lt;p>In words, this equation says that a state&amp;rsquo;s cigarette consumption depends on its &lt;strong>own past consumption&lt;/strong> ($\tau y_{i,t-1}$, capturing habit persistence), the &lt;strong>average past consumption of its neighbors&lt;/strong> ($\psi W y_{t-1}$, capturing spatiotemporal diffusion), and all the contemporaneous spatial terms from the static SDM. The parameter $\tau$ measures how strongly last year&amp;rsquo;s smoking predicts this year&amp;rsquo;s &amp;mdash; think of it as the &amp;ldquo;addiction coefficient.&amp;rdquo; The parameter $\psi$ captures whether neighbors' past behavior diffuses across borders over time.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Symbol&lt;/th>
&lt;th>Meaning&lt;/th>
&lt;th>Code variable&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>$\tau$&lt;/td>
&lt;td>Temporal lag (habit persistence)&lt;/td>
&lt;td>&lt;code>[Temporal]tau&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\psi$&lt;/td>
&lt;td>Spatiotemporal lag (neighbors' past consumption)&lt;/td>
&lt;td>&lt;code>[Temporal]psi&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$y_{i,t-1}$&lt;/td>
&lt;td>Own consumption last year&lt;/td>
&lt;td>&lt;code>dlag(1)&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$W y_{t-1}$&lt;/td>
&lt;td>Average neighbors' consumption last year&lt;/td>
&lt;td>&lt;code>dlag(2)&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="81-non-dynamic-sdm-baseline">8.1 Non-dynamic SDM (baseline)&lt;/h3>
&lt;p>We re-estimate the static SDM as a baseline for comparison with the dynamic specifications.&lt;/p>
&lt;pre>&lt;code class="language-stata">xsmle logc logp logy, fe type(both) wmat(Wst) mod(sdm) effects nsim(999) nolog
eststo SDM0
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Spatial Durbin model with fixed-effects Number of obs = 1,380
------------------------------------------------------------------------------
logc | Coefficient Std. err. z P&amp;gt;|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Main |
logp | -.3068973 .0282114 -10.88 0.000 -.3621907 -.2516039
logy | .0781427 .0481269 1.62 0.104 -.0161843 .1724697
Wx |
logp | -.2060671 .0649703 -3.17 0.002 -.3334065 -.0787277
logy | .1803542 .0885162 2.04 0.042 .0068656 .3538428
Spatial |
rho | .2649571 .0327948 8.08 0.000 .2006804 .3292339
------------------------------------------------------------------------------
&lt;/code>&lt;/pre>
&lt;h3 id="82-dynamic-sdm-with-temporal-lag-tau-cdot-y_it-1">8.2 Dynamic SDM with temporal lag ($\tau \cdot y_{i,t-1}$)&lt;/h3>
&lt;p>Adding the temporal lag of own consumption captures habit persistence &amp;mdash; the tendency for this year&amp;rsquo;s smoking to depend on last year&amp;rsquo;s smoking, holding prices and income constant.&lt;/p>
&lt;pre>&lt;code class="language-stata">xsmle logc logp logy, dlag(1) fe type(both) wmat(Wst) mod(sdm) effects nsim(999) nolog
eststo dySDM1
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Dynamic Spatial Durbin model with fixed-effects Number of obs = 1,334
------------------------------------------------------------------------------
logc | Coefficient Std. err. z P&amp;gt;|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Main |
logp | -.1516305 .0226714 -6.69 0.000 -.1960657 -.1071954
logy | .0285493 .0376124 0.76 0.448 -.0451697 .1022683
Wx |
logp | -.0714289 .0521683 -1.37 0.171 -.1736769 .0308190
logy | .0592735 .0706984 0.84 0.402 -.0792929 .1978399
Spatial |
rho | .1021753 .0307624 3.32 0.001 .0418821 .1624685
Temporal |
tau | .6543218 .0196285 33.33 0.000 .6158507 .6927928
------------------------------------------------------------------------------
&lt;/code>&lt;/pre>
&lt;p>The temporal lag coefficient $\tau$ is &lt;strong>0.654&lt;/strong> (z = 33.33, p &amp;lt; 0.001) &amp;mdash; a very strong habit persistence effect. Controlling for last year&amp;rsquo;s consumption dramatically reduces the other coefficients: the own price effect drops from -0.307 to &lt;strong>-0.152&lt;/strong>, and the spatial autoregressive parameter $\rho$ falls from 0.265 to &lt;strong>0.102&lt;/strong>. This means that much of the apparent spatial dependence in the static SDM was actually capturing &lt;strong>temporal autocorrelation&lt;/strong> that manifests spatially. The spatial lag of neighbors' prices (&lt;code>[Wx]logp&lt;/code>) becomes insignificant (p = 0.171), suggesting that once habit persistence is controlled for, the direct cross-border price spillover weakens considerably.&lt;/p>
&lt;h3 id="83-dynamic-sdm-with-spatiotemporal-lag-psi-cdot-w-cdot-y_it-1">8.3 Dynamic SDM with spatiotemporal lag ($\psi \cdot W \cdot y_{i,t-1}$)&lt;/h3>
&lt;p>Instead of own past consumption, this specification includes the spatial lag of past consumption &amp;mdash; how much neighbors smoked last year.&lt;/p>
&lt;pre>&lt;code class="language-stata">xsmle logc logp logy, dlag(2) fe type(both) wmat(Wst) mod(sdm) effects nsim(999) nolog
eststo dySDM2
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Dynamic Spatial Durbin model with fixed-effects Number of obs = 1,334
------------------------------------------------------------------------------
logc | Coefficient Std. err. z P&amp;gt;|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Main |
logp | -.2981475 .0280193 -10.64 0.000 -.3530643 -.2432307
logy | .0637218 .0478561 1.33 0.183 -.0300745 .1575181
Wx |
logp | -.1425379 .0647518 -2.20 0.028 -.2694490 -.0156268
logy | .1320869 .0888243 1.49 0.137 -.0420055 .3061793
Spatial |
rho | .1523264 .0369871 4.12 0.000 .0798330 .2248199
Temporal |
psi | .2712508 .0339714 7.98 0.000 .2046680 .3378335
------------------------------------------------------------------------------
&lt;/code>&lt;/pre>
&lt;p>The spatiotemporal lag coefficient $\psi$ is &lt;strong>0.271&lt;/strong> (z = 7.98, p &amp;lt; 0.001), indicating that neighbors' past consumption does have a positive effect on current consumption. However, this effect is weaker than the own temporal lag ($\tau = 0.654$ in the previous specification). The spatial autoregressive parameter drops to $\rho = 0.152$, and the own price coefficient stays close to the static SDM value at -0.298.&lt;/p>
&lt;h3 id="84-full-dynamic-sdm-tau-cdot-y_it-1--psi-cdot-w-cdot-y_it-1">8.4 Full dynamic SDM ($\tau \cdot y_{i,t-1} + \psi \cdot W \cdot y_{i,t-1}$)&lt;/h3>
&lt;p>The most general dynamic specification includes both the temporal lag and the spatiotemporal lag.&lt;/p>
&lt;pre>&lt;code class="language-stata">xsmle logc logp logy, dlag(3) fe type(both) wmat(Wst) mod(sdm) effects nsim(999) nolog
eststo dySDM3
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Dynamic Spatial Durbin model with fixed-effects Number of obs = 1,334
------------------------------------------------------------------------------
logc | Coefficient Std. err. z P&amp;gt;|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Main |
logp | -.1498627 .0226523 -6.62 0.000 -.1942603 -.1054651
logy | .0271398 .0376004 0.72 0.470 -.0465556 .1008351
Wx |
logp | -.0636842 .0524156 -1.21 0.224 -.1664169 .0390485
logy | .0471982 .0712803 0.66 0.508 -.0925087 .1869052
Spatial |
rho | .0803516 .0322458 2.49 0.013 .0171509 .1435524
Temporal |
tau | .6389621 .0208541 30.64 0.000 .5980889 .6798353
psi | .0494172 .0325896 1.52 0.130 -.0144571 .1132915
------------------------------------------------------------------------------
&lt;/code>&lt;/pre>
&lt;p>In the full dynamic model, the temporal lag dominates: $\tau = 0.639$ (z = 30.64, p &amp;lt; 0.001), while the spatiotemporal lag $\psi = 0.049$ is &lt;strong>not statistically significant&lt;/strong> (p = 0.130). This indicates that a state&amp;rsquo;s own past consumption is the primary driver of temporal persistence, and neighbors' past consumption does not add meaningful additional information once own habit persistence is controlled for. The spatial autoregressive parameter further drops to $\rho = 0.080$, and the spatial lags of price and income become insignificant.&lt;/p>
&lt;h3 id="85-comparison-of-dynamic-models">8.5 Comparison of dynamic models&lt;/h3>
&lt;pre>&lt;code class="language-stata">esttab SDM0 dySDM1 dySDM2 dySDM3, mtitle(&amp;quot;SDM&amp;quot; &amp;quot;dySDM1&amp;quot; &amp;quot;dySDM2&amp;quot; &amp;quot;dySDM3&amp;quot;)
&lt;/code>&lt;/pre>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>SDM (static)&lt;/th>
&lt;th>dySDM1 ($\tau$)&lt;/th>
&lt;th>dySDM2 ($\psi$)&lt;/th>
&lt;th>dySDM3 ($\tau + \psi$)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>logp&lt;/code> (own)&lt;/td>
&lt;td>-0.307***&lt;/td>
&lt;td>-0.152***&lt;/td>
&lt;td>-0.298***&lt;/td>
&lt;td>-0.150***&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>logy&lt;/code> (own)&lt;/td>
&lt;td>0.078&lt;/td>
&lt;td>0.029&lt;/td>
&lt;td>0.064&lt;/td>
&lt;td>0.027&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>W*logp&lt;/code>&lt;/td>
&lt;td>-0.206***&lt;/td>
&lt;td>-0.071&lt;/td>
&lt;td>-0.143**&lt;/td>
&lt;td>-0.064&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>W*logy&lt;/code>&lt;/td>
&lt;td>0.180**&lt;/td>
&lt;td>0.059&lt;/td>
&lt;td>0.132&lt;/td>
&lt;td>0.047&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\rho$&lt;/td>
&lt;td>0.265***&lt;/td>
&lt;td>0.102***&lt;/td>
&lt;td>0.152***&lt;/td>
&lt;td>0.080**&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\tau$ (own lag)&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>0.654***&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>0.639***&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$\psi$ (spatial lag)&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>0.271***&lt;/td>
&lt;td>0.049&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The comparison reveals a clear pattern. First, &lt;strong>habit persistence is the dominant dynamic force&lt;/strong>: $\tau$ is large and highly significant whether estimated alone (0.654) or jointly with $\psi$ (0.639), while $\psi$ loses significance once $\tau$ is included. Second, &lt;strong>controlling for habit persistence substantially attenuates spatial spillover estimates&lt;/strong>: the spatial autoregressive parameter $\rho$ falls from 0.265 (static) to 0.080 (full dynamic), and the spatial lags of price and income become insignificant. This suggests that the static SDM&amp;rsquo;s spillover estimates partly capture omitted temporal dynamics. Third, the &lt;strong>short-run price elasticity&lt;/strong> in the dynamic model (-0.150) is about half the static estimate (-0.307), but the long-run price elasticity &amp;mdash; computed as $\beta / (1 - \tau)$ &amp;mdash; is approximately $-0.150 / (1 - 0.639) = -0.416$, close to the static estimate. The static SDM conflates short-run and long-run responses into a single coefficient.&lt;/p>
&lt;hr>
&lt;h2 id="9-discussion">9. Discussion&lt;/h2>
&lt;p>This tutorial demonstrates that &lt;strong>spatial dependence matters&lt;/strong> for modeling cigarette demand across US states. The Wald tests in Section 7 conclusively reject all three restricted spatial models (SAR, SLX, SEM), confirming that the Spatial Durbin Model is the appropriate specification. The total price effect in the static SDM (-0.627) is more than 50% larger than the two-way FE estimate (-0.402), revealing that non-spatial models systematically understate the true price sensitivity of cigarette demand by ignoring cross-border spillovers.&lt;/p>
&lt;p>The dynamic extensions in Section 8 provide important nuance. Once habit persistence is controlled for ($\tau \approx 0.65$), the spatial autoregressive parameter drops by two-thirds (from 0.265 to 0.080), and many spatial lag coefficients lose statistical significance. This does not mean spatial dependence is unimportant &amp;mdash; rather, it means that the &lt;strong>static SDM conflates temporal and spatial dynamics&lt;/strong>. In the dynamic model, the short-run own price elasticity is -0.15 and the long-run elasticity is approximately -0.42, offering policymakers a clearer picture of how quickly cigarette taxation takes effect.&lt;/p>
&lt;p>From a policy perspective, these results carry a direct implication: &lt;strong>state-level tobacco taxation has cross-border spillover effects that policymakers must consider&lt;/strong>. When a single state raises its cigarette tax, the demand reduction is partially offset by cross-border shopping. However, when neighboring states raise taxes simultaneously, the total demand reduction is amplified. This supports the case for coordinated regional or federal tobacco taxation rather than isolated state-level policies. The finding that habit persistence is the dominant dynamic force ($\tau \approx 0.65$) also suggests that the full impact of a tax increase takes several years to materialize, as consumers slowly adjust their consumption habits.&lt;/p>
&lt;hr>
&lt;h2 id="10-summary-and-next-steps">10. Summary and next steps&lt;/h2>
&lt;p>This tutorial covered the complete workflow for spatial panel regression in Stata &amp;mdash; from loading a spatial weight matrix and estimating non-spatial benchmarks, through the full Spatial Durbin Model with Wald specification tests, to dynamic spatial extensions. The key takeaways are:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Non-spatial models understate price sensitivity.&lt;/strong> The two-way FE price elasticity is -0.40, but the SDM total effect is -0.63 &amp;mdash; a 57% increase that reflects cross-border spillovers ignored by standard panel models.&lt;/li>
&lt;li>&lt;strong>The SDM cannot be simplified.&lt;/strong> All three Wald tests reject the SAR, SLX, and SEM restrictions, meaning that spatial dependence operates through multiple channels simultaneously: neighbors' consumption ($\rho$), neighbors' prices ($\theta_{logp}$), and neighbors' income ($\theta_{logy}$).&lt;/li>
&lt;li>&lt;strong>Habit persistence dominates temporal dynamics.&lt;/strong> The temporal lag coefficient $\tau \approx 0.65$ is large and robust, while the spatiotemporal lag $\psi$ loses significance once $\tau$ is included. Static spatial models overstate contemporaneous spillovers by absorbing temporal autocorrelation.&lt;/li>
&lt;li>&lt;strong>Short-run vs. long-run elasticities differ substantially.&lt;/strong> The dynamic SDM&amp;rsquo;s short-run price elasticity (-0.15) is less than half its long-run counterpart (-0.42), information that is lost in static specifications.&lt;/li>
&lt;/ul>
&lt;p>For further study, consider applying these methods to other spatial datasets or exploring alternative spatial specifications. The companion tutorial on &lt;a href="https://carlos-mendez.org/post/stata_sp_regression_cross_section/">cross-sectional spatial regression&lt;/a> covers the spatial models available for single-period data, including the full taxonomy of SAR, SEM, SLX, SDM, SDEM, and SAC models. For datasets where unobserved common factors (macroeconomic shocks, regulatory changes) may drive cross-sectional dependence beyond what the spatial weight matrix captures, see the &lt;a href="https://carlos-mendez.org/post/stata_spxtivdfreg/">spatial dynamic panels with common factors&lt;/a> tutorial, which uses the &lt;code>spxtivdfreg&lt;/code> package to combine spatial lags with defactored IV estimation. For Python implementations of spatial econometrics, see the PySAL ecosystem and the &lt;code>spreg&lt;/code> package.&lt;/p>
&lt;hr>
&lt;h2 id="11-exercises">11. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Alternative weight matrix.&lt;/strong> Replace the binary contiguity matrix with an inverse-distance weight matrix. Re-estimate the SDM and compare the spatial autoregressive parameter $\rho$ and the indirect effects. Does the choice of weight matrix change the substantive conclusions about cross-border spillovers?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>SAR vs. SDM direct comparison.&lt;/strong> Estimate a SAR model (&lt;code>mod(sar)&lt;/code> in &lt;code>xsmle&lt;/code>) with two-way fixed effects and the Lee-Yu correction. Compare its price elasticity to the SDM. Given that the Wald test rejected the SAR restriction, how different are the elasticity estimates in practice?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Subsample analysis.&lt;/strong> Split the sample into two periods (1963&amp;ndash;1977 and 1978&amp;ndash;1992) and estimate the SDM separately for each. Did the spatial dependence structure of cigarette demand change over time? What historical events (e.g., the Surgeon General&amp;rsquo;s reports, the rise of anti-smoking legislation) might explain differences between the two periods?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="references">References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://link.springer.com/book/10.1007/978-3-030-53953-5" target="_blank" rel="noopener">Baltagi, B. H. (2021). &lt;em>Econometric Analysis of Panel Data&lt;/em> (6th ed.). Springer.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://link.springer.com/book/10.1007/978-3-642-40340-8" target="_blank" rel="noopener">Elhorst, J. P. (2014). &lt;em>Spatial Econometrics: From Cross-Sectional Data to Spatial Panels&lt;/em>. Springer.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1201/9781420064254" target="_blank" rel="noopener">LeSage, J. P. &amp;amp; Pace, R. K. (2009). &lt;em>Introduction to Spatial Econometrics&lt;/em>. Chapman &amp;amp; Hall/CRC.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1016/j.jeconom.2009.08.001" target="_blank" rel="noopener">Lee, L. F. &amp;amp; Yu, J. (2010). Estimation of spatial autoregressive panel data models with fixed effects. &lt;em>Journal of Econometrics&lt;/em>, 154(2), 165&amp;ndash;185.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1177/1536867X1701700109" target="_blank" rel="noopener">Belotti, F., Hughes, G., &amp;amp; Mortari, A. P. (2017). Spatial panel-data models using Stata. &lt;em>Stata Journal&lt;/em>, 17(1), 139&amp;ndash;180.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/quarcs-lab/data-open/tree/master/cigar" target="_blank" rel="noopener">Baltagi cigarette demand dataset &amp;ndash; QUARCS Lab open data repository.&lt;/a>&lt;/li>
&lt;/ol></description></item><item><title>Staggered DiD (Ex1)</title><link>https://carlos-mendez.org/post/r_staggered_did/</link><pubDate>Sun, 03 Sep 2023 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/r_staggered_did/</guid><description>&lt;p>An introduction to difference in differences with multiple time periods and staggered treatment adoption. This tutorial is based on &lt;a href="https://github.com/Mixtape-Sessions/Advanced-DID/tree/main/Exercises/Exercise-1" target="_blank" rel="noopener">Exercise 1&lt;/a> of the Advanced DiD mixed tape session of Jonathan Roth. You can run and extend the analysis of this case study using &lt;a href="https://colab.research.google.com/drive/14LJEYHZTlw5wtIK0bR0lOza7lQiO0krc?usp=sharing" target="_blank" rel="noopener">Google Colab&lt;/a>.&lt;/p></description></item><item><title>Staggered DiD</title><link>https://carlos-mendez.org/post/r_staggered_did1/</link><pubDate>Sat, 02 Sep 2023 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/r_staggered_did1/</guid><description>&lt;p>An introduction to difference in differences with multiple time periods and staggered treatment adoption. You can run and extend the analysis of this case study using &lt;a href="https://colab.research.google.com/drive/1ucJmhyvb7pn01zyQji0xVZy_nZbo3_jB?usp=sharing" target="_blank" rel="noopener">Google Colab&lt;/a>.&lt;/p></description></item></channel></rss>