<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>event-study | Carlos Mendez</title><link>https://carlos-mendez.org/tag/event-study/</link><atom:link href="https://carlos-mendez.org/tag/event-study/index.xml" rel="self" type="application/rss+xml"/><description>event-study</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>© 2018–2026 Carlos Mendez. All rights reserved.</copyright><lastBuildDate>Fri, 12 Jun 2026 00:00:00 +0000</lastBuildDate><image><url>https://carlos-mendez.org/media/icon_huedfae549300b4ca5d201a9bd09a3ecd5_79625_512x512_fill_lanczos_center_3.png</url><title>event-study</title><link>https://carlos-mendez.org/tag/event-study/</link></image><item><title>Do Industrial Parks Work? Evaluating Place-Based Policy in Ethiopia with Difference-in-Differences</title><link>https://carlos-mendez.org/post/python_did_industrial_park/</link><pubDate>Fri, 12 Jun 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/python_did_industrial_park/</guid><description>&lt;h2 id="abstract">Abstract&lt;/h2>
&lt;p>Governments across the developing world spend billions on industrial parks — fenced zones with serviced land, power and customs to lure factories — yet whether these place-based subsidies actually lift the surrounding economy, and who inside it benefits, remains hotly contested. This tutorial asks whether Ethiopia&amp;rsquo;s industrial parks raised local economic activity, urbanization, household living standards, and women&amp;rsquo;s economic agency, and how each effect can be measured credibly when parks are not placed at random. It replicates Huang, Wang &amp;amp; Xu (2026) on synthetic calibrated data combining a satellite district-year panel of 139 woredas observed annually over 2005–2020 (2,224 rows; 17 park-hosting woredas treated on a staggered 2008–2021 rollout against 122 propensity-score-matched never-treated controls) with two Ethiopia DHS repeated cross-sections — 13,200 households and 17,900 individuals across five survey rounds. It estimates a static two-way fixed-effects difference-in-differences and an event study with pyfixest, cross-checks them against the modern Sun-Abraham, Borusyak/Gardner and Callaway-Sant&amp;rsquo;Anna staggered estimators plus a Goodman-Bacon decomposition with diff-diff, and runs survey-weighted repeated-cross-section DiD with Conley spatial standard errors. A park raises inverse-hyperbolic-sine nighttime light by +0.215 (p &amp;lt; 0.01), the four staggered estimators agree within 0.046 units with 95.4% clean Bacon weight, and households gain durables (+0.229), housing (+0.248) and wealth (+0.383); crucially, average non-agricultural employment is insignificant (+0.091) yet the female effect is large (+0.140, p &amp;lt; 0.01). These findings imply that well-sited parks can reshape a local economy and women&amp;rsquo;s lives, but only a sex-disaggregated analysis reveals it.&lt;/p>
&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>Industrial parks are one of the most popular instruments of modern development policy. The recipe is simple: the government clears a tract of land, installs power, water, roads and a one-stop customs office, and rents serviced plots to manufacturers — usually in textiles, garments or leather. Ethiopia bet heavily on this model, opening more than twenty parks across eighteen districts between 2008 and 2021. The hope was that factories would cluster, create jobs, and pull a largely rural region into a wage economy. But place-based subsidies are controversial precisely because they might do little more than relocate activity that would have happened anyway — or light up a fenced enclave while the surrounding districts see nothing.&lt;/p>
&lt;p>So the question this post tackles is genuinely two-sided: &lt;strong>do industrial parks raise local economic activity, and — just as important — for whom?&lt;/strong> A park could boost satellite-measured luminosity yet leave household living standards flat. It could create jobs on average, yet only for men. Measuring this credibly is hard, because the government did not flip a coin to decide where parks go — it chose districts near cities and roads, which were already growing faster. We need a research design that nets out those pre-existing differences, and that handles a &lt;em>staggered&lt;/em> rollout where parks opened in different years. That design is &lt;strong>difference-in-differences (DiD)&lt;/strong>, and the modern staggered-robust toolkit built around it.&lt;/p>
&lt;p>Why not just one DiD regression? Because the workhorse two-way fixed-effects (TWFE) estimator can mislead under staggered timing: it secretly uses already-treated districts as controls for later-treated ones, a &amp;ldquo;forbidden comparison&amp;rdquo; that can flip the sign of the estimate when effects grow over time. A central goal of this tutorial is to show that worry being &lt;em>checked&lt;/em> rather than ignored — we run four estimators side by side and decompose exactly where the TWFE number comes from. The estimand throughout is the &lt;strong>average treatment effect on the treated (ATT)&lt;/strong> — the effect on the districts (and people) that actually got a park — identified under a parallel-trends assumption, in an explicitly &lt;strong>observational&lt;/strong> setting where the parks were not randomly placed.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>A note on the data (please read this).&lt;/strong> This tutorial replicates &lt;strong>Huang, Wang &amp;amp; Xu (2026)&lt;/strong>, but it runs on &lt;strong>synthetic data built for teaching&lt;/strong>. The paper&amp;rsquo;s real inputs (harmonized nighttime lights, the GISD30 impervious-surface product, confidential Ethiopia DHS micro-data, the official park list) are licensed or restricted. Our dataset is &lt;em>calibrated&lt;/em> so that re-running the paper&amp;rsquo;s analyses reproduces its &lt;strong>findings&lt;/strong> — the signs, the significance stars, and the &lt;em>approximate&lt;/em> magnitudes of the key coefficients. Most results track the paper closely; a handful of magnitudes differ, and we tabulate exactly which in &lt;a href="#13-reproduction-audit-synthetic-data-vs-the-paper">Section 13&lt;/a>. Use this to learn the &lt;em>methods&lt;/em>, not to draw new conclusions about Ethiopia.&lt;/p>
&lt;/blockquote>
&lt;h3 id="11-learning-objectives">1.1 Learning objectives&lt;/h3>
&lt;p>By the end of this tutorial, you will be able to:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Frame&lt;/strong> a staggered place-based policy as a quasi-experiment, and explain why a treated-vs-never-treated comparison identifies the &lt;strong>ATT&lt;/strong> under parallel trends.&lt;/li>
&lt;li>&lt;strong>Estimate&lt;/strong> a static two-way fixed-effects difference-in-differences and a dynamic event study on satellite outcomes with &lt;a href="https://pyfixest.org/" target="_blank" rel="noopener">&lt;code>pyfixest&lt;/code>&lt;/a>.&lt;/li>
&lt;li>&lt;strong>Compare&lt;/strong> TWFE against the modern Sun-Abraham, Borusyak/Gardner and Callaway-Sant&amp;rsquo;Anna estimators, and &lt;strong>diagnose&lt;/strong> the staggered negative-weights problem with a Goodman-Bacon decomposition using &lt;a href="https://github.com/igerber/diff-diff" target="_blank" rel="noopener">&lt;code>diff-diff&lt;/code>&lt;/a>.&lt;/li>
&lt;li>&lt;strong>Apply&lt;/strong> survey-weighted repeated-cross-section DiD to DHS household welfare and individual employment, and read a heterogeneity split that turns a null average into a sharp finding.&lt;/li>
&lt;li>&lt;strong>Defend&lt;/strong> your inference when treatment is spatially clustered, using Conley spatial-HAC standard errors and restricted-control-pool checks.&lt;/li>
&lt;/ul>
&lt;h3 id="12-study-design">1.2 Study design&lt;/h3>
&lt;p>The diagram below maps the whole tutorial. Three data streams flow into one DiD design, that design is estimated by an escalating ladder of estimators, and the estimates answer three outcome families. Read it left to right: the staggered park rollout splits woredas (Ethiopia&amp;rsquo;s local districts) into treated and never-treated; we observe satellite, household, and individual outcomes; we climb from a naive 2×2 to the modern staggered-robust estimators; and we report effects on activity, welfare, and women&amp;rsquo;s empowerment.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
subgraph DATA[&amp;quot;Three data streams&amp;quot;]
A[&amp;quot;&amp;lt;b&amp;gt;Satellite&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;district x year&amp;lt;br/&amp;gt;panel&amp;quot;]
B[&amp;quot;&amp;lt;b&amp;gt;DHS household&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;repeated&amp;lt;br/&amp;gt;cross-section&amp;quot;]
C[&amp;quot;&amp;lt;b&amp;gt;DHS individual&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;repeated&amp;lt;br/&amp;gt;cross-section&amp;quot;]
end
subgraph DESIGN[&amp;quot;DiD design&amp;quot;]
D[&amp;quot;&amp;lt;b&amp;gt;Staggered rollout&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;17 treated woredas&amp;lt;br/&amp;gt;vs 122 controls&amp;quot;]
end
subgraph LADDER[&amp;quot;Estimator ladder&amp;quot;]
E[&amp;quot;Naive 2x2&amp;quot;]
F[&amp;quot;Static TWFE&amp;lt;br/&amp;gt;+ event study&amp;quot;]
G[&amp;quot;Sun-Abraham /&amp;lt;br/&amp;gt;Borusyak /&amp;lt;br/&amp;gt;Callaway-Sant'Anna&amp;quot;]
end
subgraph OUT[&amp;quot;Outcome families&amp;quot;]
H[&amp;quot;&amp;lt;b&amp;gt;Activity&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;lights, impervious&amp;quot;]
I[&amp;quot;&amp;lt;b&amp;gt;Welfare&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;durables, wealth&amp;quot;]
J[&amp;quot;&amp;lt;b&amp;gt;Empowerment&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;female jobs, agency&amp;quot;]
end
A --&amp;gt; D
B --&amp;gt; D
C --&amp;gt; D
D --&amp;gt; E --&amp;gt; F --&amp;gt; G
F --&amp;gt; H
B --&amp;gt; I
C --&amp;gt; J
style A fill:#6a9bcc,stroke:#141413,color:#fff
style B fill:#6a9bcc,stroke:#141413,color:#fff
style C fill:#6a9bcc,stroke:#141413,color:#fff
style D fill:#d97757,stroke:#141413,color:#fff
style G fill:#00d4c8,stroke:#141413,color:#fff
style J fill:#00d4c8,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>The key idea the diagram encodes is that one estimand — the ATT — threads through everything. The naive 2×2 is the cartoon version; TWFE and its event-study view are the workhorse; and the three modern estimators are the robustness insurance that the workhorse has not been led astray by staggered timing. Each box maps onto a section below, and the gender finding (the teal &amp;ldquo;Empowerment&amp;rdquo; box) is where the analysis lands.&lt;/p>
&lt;h3 id="13-where-are-the-industrial-parks-located">1.3 Where are the industrial parks located?&lt;/h3>
&lt;p>Ethiopia placed its parks deliberately — clustered around the capital, Addis Ababa, and the main transport corridors, yet reaching into peripheral regions of the country. Before we build any statistical machinery, it helps to see the real geography we are modeling.&lt;/p>
&lt;p>&lt;img src="map_industrial_parks.png" alt="Map of Ethiopia showing the locations of its industrial parks (red dots), the regional state capitals (blue stars), and the paved and primary road network.">&lt;/p>
&lt;p>&lt;em>Source: Appendix Figure A2 in Huang, Wang &amp;amp; Xu (2026), &amp;ldquo;The socioeconomic impacts of industrial parks in Ethiopia.&amp;rdquo; The map shows the paper&amp;rsquo;s real park locations for geographic context; this tutorial&amp;rsquo;s analysis runs on synthetic data calibrated to reproduce the paper&amp;rsquo;s results.&lt;/em>&lt;/p>
&lt;p>That deliberate clustering near cities and roads is exactly the kind of non-random placement our design has to handle — so before estimating anything, the next section pins down the vocabulary that makes the treated-versus-control comparison credible.&lt;/p>
&lt;h2 id="2-key-concepts">2. Key concepts&lt;/h2>
&lt;p>The post leans on a small vocabulary repeatedly, and the later sections assume you can move between these terms quickly. Each concept below has three parts. The &lt;strong>definition&lt;/strong> is always visible; the &lt;strong>example&lt;/strong> and &lt;strong>analogy&lt;/strong> sit behind clickable cards — open them when you need them, leave them closed for a quick scan. If a later section mentions &amp;ldquo;forbidden comparisons&amp;rdquo; or &amp;ldquo;repeated cross-section&amp;rdquo; and the term feels slippery, this is the section to re-read.&lt;/p>
&lt;p>&lt;strong>1. Staggered difference-in-differences.&lt;/strong>
Units adopt treatment at &lt;em>different&lt;/em> times, not all at once. We compare the change in outcomes for a treated group to the change for a not-yet-treated or never-treated group. With many adoption dates, the design is a stack of overlapping 2×2 comparisons.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>Ethiopia&amp;rsquo;s parks open across eight cohorts: 1 woreda in 2008, then 2 in 2014, 2 in 2015, 3 in 2016, 3 in 2017, 2 in 2018, 2 in 2019, and 2 in 2020 — 17 treated woredas in total, each turning on in its own year.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>A city installs streetlights block by block over a decade. To judge their effect you cannot just compare &amp;ldquo;before any lights&amp;rdquo; to &amp;ldquo;after all lights&amp;rdquo; — you must line up each block against its own opening date and a block that never got lit.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>2. Parallel trends.&lt;/strong>
The identifying assumption of DiD: absent the park, treated and control woredas would have followed the &lt;em>same&lt;/em> path on average. Their &lt;em>levels&lt;/em> can differ; their &lt;em>trends&lt;/em> must match. We cannot prove it, but a flat pre-treatment event study makes it credible.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>The four pre-opening event-study leads run from −0.0275 to −0.0013 and the largest absolute &lt;em>t&lt;/em> among them is just 2.17 — close enough to flat to read as parallel trends holding before the parks open.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>Two boats on the same current sit at different points but drift in step. Only an engine — the treatment — should make one pull ahead. If they were already diverging before the engine fired, the comparison is broken.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>3. ATT&lt;/strong> $E[Y_i(1) - Y_i(0) \mid D_i = 1]$.
The Average effect of the Treatment on the Treated — the effect &lt;em>on the districts that got a park&lt;/em>, not on a random district. DiD, TWFE, and all three modern estimators here target the ATT, not the population-wide ATE.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>The +0.215 light effect is the ATT &lt;em>for the 17 park woredas&lt;/em>. It does not promise that placing a park in any random district would raise its lights that much — only that &lt;em>these&lt;/em> districts, given &lt;em>these&lt;/em> parks, ended up that much brighter.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>The bonus speed measured on the car that actually got the new engine — not a promise about any car you might pick off the street.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>4. TWFE bias, negative weights, and forbidden comparisons.&lt;/strong>
Under staggered timing, the two-way fixed-effects regression quietly uses &lt;em>already-treated&lt;/em> units as controls for &lt;em>later-treated&lt;/em> ones. Those &amp;ldquo;forbidden&amp;rdquo; comparisons can get negative weights and bias — even flip the sign of — the estimate when effects grow over time.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>Here the danger is tiny: the Goodman-Bacon decomposition shows the forbidden later-vs-earlier comparisons carry only 1.21% of the total weight (and average +0.0135), while clean treated-vs-never comparisons carry 95.42%.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>Grading a class on a curve where some students were secretly given the exam early and then used as the &amp;ldquo;average&amp;rdquo; everyone else is scored against. If only a couple of students got the early peek, the curve is barely distorted — which is the situation here.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>5. Event study.&lt;/strong>
Instead of one ATT, estimate one coefficient per year-relative-to-opening (event time $k$). Plotting them shows the &lt;em>dynamic path&lt;/em>: flat leads before opening (no anticipation) and rising lags after (the effect building up).&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>The light effect is +0.115 the year a park opens ($k = 0$), climbs to +0.193 at $k = +1$ and +0.219 at $k = +2$, and plateaus at +0.484 by $k = +4$ — a slow build, not an instant jump.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>A medical chart that plots a patient&amp;rsquo;s temperature day by day around the start of a drug, rather than reporting a single before/after average. The shape of the curve tells you when and how the drug works.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>6. Repeated-cross-section DiD.&lt;/strong>
When each survey round interviews &lt;em>different&lt;/em> households (no panel key), you cannot use household fixed effects. The effect is identified off district × round group means: compare treated vs control districts before vs after their park opens, absorbing district and region×round fixed effects.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>The DHS data are five rounds (2000, 2005, 2011, 2016, 2019) of fresh respondents. So the household regression uses &lt;code>| district_id + region_id^survey_round&lt;/code> — district and region-by-round fixed effects — with no household effect, and only coarse event &lt;em>phases&lt;/em> $\{-3, &amp;hellip;, +1\}$.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>Polling a city&amp;rsquo;s mood with a fresh sample of pedestrians each year. You cannot track any one person over time, but you can still compare how &lt;em>neighborhoods&lt;/em> shifted relative to each other.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>7. Survey weights and clustered/Conley standard errors.&lt;/strong>
The DHS is a complex sample, so regressions are weighted by the sampling weight. Standard errors are clustered on district (allowing a district&amp;rsquo;s errors to correlate over time) and, for the satellite panel, hardened with Conley spatial-HAC errors that also allow nearby districts to correlate.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>For the light ATT the cluster SE (0.0792) and the Conley-HAC SE (0.0799) are nearly identical and 2.43× the naive HC0 SE (0.0329) — yet the +0.215 estimate stays significant at &lt;em>t&lt;/em> = 2.69.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>Counting a milling crowd. If everyone keeps shuffling between seats, you have far fewer &lt;em>truly independent&lt;/em> heads than the rows suggest — honest standard errors admit that.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>8. SUTVA and spillovers.&lt;/strong>
The stable-unit-treatment-value assumption says one unit&amp;rsquo;s treatment does not affect another&amp;rsquo;s outcome. If a park lifts its &lt;em>neighbours&lt;/em>&amp;rsquo; lights, the never-treated controls are contaminated and the ATT is biased. A &lt;code>nearby&lt;/code> test checks for exactly this leakage.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">
&lt;summary>Example&lt;/summary>
&lt;p>The &lt;code>nearby&lt;/code> coefficient (control districts within 10 km of a park) is +0.0648 and insignificant (&lt;em>t&lt;/em> = 1.06), while the host effect stays +0.2712 — no measurable spillover, so SUTVA is plausible here.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">
&lt;summary>Analogy&lt;/summary>
&lt;p>Testing whether a new factory&amp;rsquo;s smoke drifts onto the neighbouring farm. If the farm&amp;rsquo;s crops are unchanged, you can fairly use it as a clean comparison for the factory&amp;rsquo;s own land.&lt;/p>
&lt;/details>
&lt;/div>
&lt;h2 id="3-setup-and-the-two-star-libraries">3. Setup and the two star libraries&lt;/h2>
&lt;p>Two specialist packages do the heavy lifting, and each gets a one-line introduction the first time it appears:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>&lt;a href="https://pyfixest.org/" target="_blank" rel="noopener">&lt;code>pyfixest&lt;/code>&lt;/a>&lt;/strong> runs fixed-effects regressions with a fast, Stata-flavored formula syntax: everything left of the &lt;code>|&lt;/code> is estimated, everything right of it is &lt;em>absorbed&lt;/em> as fixed effects. It also ships an &lt;code>event_study&lt;/code> helper with the modern &lt;code>saturated&lt;/code> (Sun-Abraham) and &lt;code>did2s&lt;/code> (Borusyak/Gardner) estimators built in.&lt;/li>
&lt;li>&lt;strong>&lt;a href="https://github.com/igerber/diff-diff" target="_blank" rel="noopener">&lt;code>diff-diff&lt;/code>&lt;/a>&lt;/strong> is a teaching-oriented package for difference-in-differences. We use its &lt;code>DifferenceInDifferences&lt;/code>, &lt;code>CallawaySantAnna&lt;/code>, and &lt;code>BaconDecomposition&lt;/code> classes — the last two are exactly the staggered-robust tools this post needs.&lt;/li>
&lt;/ul>
&lt;pre>&lt;code class="language-python"># In Colab, install the two estimation libraries first:
# !pip install pyfixest==0.50.1 diff-diff==3.5.2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pyfixest as pf
import diff_diff as dd
np.random.seed(42) # reproducibility
# Site dark-theme palette for figures
STEEL_BLUE, WARM_ORANGE, TEAL = &amp;quot;#6a9bcc&amp;quot;, &amp;quot;#d97757&amp;quot;, &amp;quot;#00d4c8&amp;quot;
DARK_NAVY, GRID_LINE, LIGHT_TEXT = &amp;quot;#0f1729&amp;quot;, &amp;quot;#1f2b5e&amp;quot;, &amp;quot;#c8d0e0&amp;quot;
&lt;/code>&lt;/pre>
&lt;p>The satellite specifications need two small design helpers. The first builds the staggered &lt;code>first_treat&lt;/code> column the modern estimators require: treated woredas get their park&amp;rsquo;s opening year, and &lt;strong>never-treated controls get 0 — not &lt;code>NaN&lt;/code>&lt;/strong>, because a missing value would silently drop the 122 controls that every staggered estimator needs as its clean comparison group. The second builds the &amp;ldquo;with-trends&amp;rdquo; interactions that absorb the faster pre-existing urban trend of treated woredas (more on why in Section 6).&lt;/p>
&lt;pre>&lt;code class="language-python">def add_first_treat(d):
&amp;quot;&amp;quot;&amp;quot;Treated woredas get their open_year; never-treated controls get 0.&amp;quot;&amp;quot;&amp;quot;
out = d.copy()
out[&amp;quot;first_treat&amp;quot;] = out[&amp;quot;open_year&amp;quot;].fillna(0).astype(int)
return out
def add_trend_terms(d):
&amp;quot;&amp;quot;&amp;quot;Centre time at 2012 and interact it with 2007 baseline characteristics,
so each woreda can follow its own linear trend (the paper's even columns).&amp;quot;&amp;quot;&amp;quot;
out = d.copy()
out[&amp;quot;t&amp;quot;] = out[&amp;quot;year&amp;quot;] - 2012
for c in [&amp;quot;urbanization_rate_2007&amp;quot;, &amp;quot;employment_rate_2007&amp;quot;,
&amp;quot;log_pop_density_2007&amp;quot;, &amp;quot;share_christian_2007&amp;quot;, &amp;quot;share_amharic_2007&amp;quot;]:
out[f&amp;quot;t_{c}&amp;quot;] = out[&amp;quot;t&amp;quot;] * out[c]
return out
TREND_TERMS = [&amp;quot;t_urbanization_rate_2007&amp;quot;, &amp;quot;t_employment_rate_2007&amp;quot;,
&amp;quot;t_log_pop_density_2007&amp;quot;, &amp;quot;t_share_christian_2007&amp;quot;,
&amp;quot;t_share_amharic_2007&amp;quot;]
&lt;/code>&lt;/pre>
&lt;p>With the tooling in place, the next step is to load the three data layers and understand why they are structured so differently.&lt;/p>
&lt;h2 id="4-the-three-datasets">4. The three datasets&lt;/h2>
&lt;p>Evaluating a place-based policy forces a measurement choice to the surface. National statistics would barely flinch at a few new factories, so we need &lt;em>sub-national&lt;/em> data — and at three different grains. We load all three straight from the post&amp;rsquo;s data folder on GitHub, so the code runs unchanged in Colab.&lt;/p>
&lt;pre>&lt;code class="language-python">BASE = (&amp;quot;https://raw.githubusercontent.com/cmg777/starter-academic-v501/&amp;quot;
&amp;quot;master/content/post/python_did_industrial_park/data/&amp;quot;)
district = pd.read_csv(BASE + &amp;quot;industrial_park_district_panel.csv&amp;quot;)
household = pd.read_csv(BASE + &amp;quot;industrial_park_household_rcs.csv&amp;quot;)
individual = pd.read_csv(BASE + &amp;quot;industrial_park_individual_rcs.csv&amp;quot;)
print(&amp;quot;district panel :&amp;quot;, district.shape)
print(&amp;quot;household RCS :&amp;quot;, household.shape)
print(&amp;quot;individual RCS :&amp;quot;, individual.shape)
print(&amp;quot;treated woredas:&amp;quot;, district.loc[district.treated == 1, &amp;quot;district_id&amp;quot;].nunique())
print(&amp;quot;control woredas:&amp;quot;, district.loc[district.treated == 0, &amp;quot;district_id&amp;quot;].nunique())
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">district panel : (2224, 34)
household RCS : (13200, 13)
individual RCS : (17900, 22)
treated woredas: 17
control woredas: 122
&lt;/code>&lt;/pre>
&lt;p>The three layers have fundamentally different structures, and that distinction drives every downstream choice. The &lt;strong>district layer is a balanced panel&lt;/strong> — 139 woredas × 16 years (2005–2020) = &lt;strong>2,224 rows&lt;/strong> — so it supports a genuine panel event study with annual event time. The &lt;strong>household and individual layers are repeated cross-sections&lt;/strong>: five DHS rounds of &lt;em>different&lt;/em> respondents (13,200 households and 17,900 individuals), with &lt;strong>no within-respondent panel key&lt;/strong>, so they admit only coarse event phases and survey-weighted regressions, never unit fixed effects. The treatment split is small on the treated side — &lt;strong>17 park woredas against 122 matched controls&lt;/strong> — which is exactly why several effects below are borderline and why honest standard errors matter.&lt;/p>
&lt;h3 id="41-the-staggered-rollout">4.1 The staggered rollout&lt;/h3>
&lt;p>The single feature that makes this a &lt;em>staggered&lt;/em> design is that parks opened in different years. Tabulating the treated woredas by opening year shows the cohort structure that every modern estimator below keys on.&lt;/p>
&lt;pre>&lt;code class="language-python">cohorts = (district[district.treated == 1]
.drop_duplicates(&amp;quot;district_id&amp;quot;)
.groupby(&amp;quot;open_year&amp;quot;).size())
print(cohorts.rename(&amp;quot;n_treated_woredas&amp;quot;).to_string())
print(&amp;quot;total treated:&amp;quot;, int(cohorts.sum()))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">open_year
2008 1
2014 2
2015 2
2016 3
2017 3
2018 2
2019 2
2020 2
total treated: 17
&lt;/code>&lt;/pre>
&lt;p>The rollout is genuinely staggered: a single anchor woreda opens in &lt;strong>2008&lt;/strong> (the Eastern Industrial Park), then the main build-out runs &lt;strong>2014–2020&lt;/strong> with two to three woredas per year. This spread is what makes a naive before/after impossible — there is no single &amp;ldquo;before&amp;rdquo; — and what makes the staggered-robust estimators in Section 6 necessary rather than decorative. It also guarantees that every event time has at least three treated woredas behind it, so the dynamic path is estimated off real data at each lag.&lt;/p>
&lt;h3 id="42-the-outcomes-and-a-transparent-word-on-the-data">4.2 The outcomes, and a transparent word on the data&lt;/h3>
&lt;p>The satellite layer carries two outcomes: &lt;code>ihs_light&lt;/code>, the inverse hyperbolic sine of nighttime luminosity (a log-like transform that handles zeros), and &lt;code>impervious_ratio&lt;/code>, the share of a woreda&amp;rsquo;s land that is built-up surface, observed only every five years. The household layer carries durable goods per capita, a housing-quality indicator, and the standardized wealth index. The individual layer carries non-agricultural employment plus, for women, decision-making power, savings-account ownership, and acceptance of domestic violence.&lt;/p>
&lt;pre>&lt;code class="language-python">for col, layer, df in [(&amp;quot;ihs_light&amp;quot;, &amp;quot;district&amp;quot;, district),
(&amp;quot;durable_goods_pc&amp;quot;, &amp;quot;household&amp;quot;, household),
(&amp;quot;nonag_employment&amp;quot;, &amp;quot;individual&amp;quot;, individual)]:
s = df[col]
print(f&amp;quot;{col:18s} ({layer:10s}) N={s.notna().sum():6d} &amp;quot;
f&amp;quot;mean={s.mean():.3f} sd={s.std():.3f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">ihs_light (district ) N= 2224 mean=0.352 sd=0.715
durable_goods_pc (household ) N= 12207 mean=0.308 sd=0.487
nonag_employment (individual ) N= 17219 mean=0.343 sd=0.475
&lt;/code>&lt;/pre>
&lt;p>These means anchor every magnitude that follows. Durable goods average &lt;strong>0.308&lt;/strong> items per capita, so the +0.229 ATT we find later is a ~74% lift off that base; non-agricultural employment averages &lt;strong>0.343&lt;/strong>, so a +0.140 effect for women is a large move. Before modeling, though, one caveat must be stated plainly: &lt;strong>the data are synthetic&lt;/strong>. The data-generating process was tuned so that re-running the paper&amp;rsquo;s regressions recovers its coefficients (within about 0.02 on the headline cells), with the same signs and stars; spatial and serial shocks were injected so the standard errors behave realistically &lt;em>without moving the point estimates&lt;/em>. We hold ourselves to that in &lt;a href="#13-reproduction-audit-synthetic-data-vs-the-paper">Section 13&lt;/a>. With the measurement settled, let us look at the data before regressing it.&lt;/p>
&lt;h2 id="5-exploratory-analysis-the-case-for-parallel-trends">5. Exploratory analysis: the case for parallel trends&lt;/h2>
&lt;p>Good causal work &lt;em>looks&lt;/em> at the data before it models it. The first and most important view plots treated and control group-mean light over time — the picture difference-in-differences was invented for. One subtlety drives how we draw it: because of the synthetic &lt;strong>bright-base device&lt;/strong> (treated park-cities are modelled as intrinsically much brighter than rural controls, a level difference the district fixed effect absorbs), plotting &lt;em>raw&lt;/em> light levels would put the two groups miles apart and hide the trends. So we plot light &lt;strong>indexed to each group&amp;rsquo;s own pre-2008 mean&lt;/strong> — baseline-normalized — which makes the &amp;ldquo;matched-then-diverge&amp;rdquo; picture read correctly.&lt;/p>
&lt;pre>&lt;code class="language-python"># baseline-normalize each group's mean light to its pre-2008 average
g = (district.assign(grp=np.where(district.treated == 1, &amp;quot;Treated&amp;quot;, &amp;quot;Control&amp;quot;))
.groupby([&amp;quot;grp&amp;quot;, &amp;quot;year&amp;quot;])[&amp;quot;ihs_light&amp;quot;].mean().reset_index())
base = g[g.year &amp;lt; 2008].groupby(&amp;quot;grp&amp;quot;)[&amp;quot;ihs_light&amp;quot;].mean()
g[&amp;quot;normed&amp;quot;] = g.apply(lambda r: r.ihs_light - base[r.grp], axis=1)
# (full dark-theme styling is in script.py)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="python_did_industrial_park_01_parallel_trends.png" alt="Baseline-normalized group-mean IHS light: treated and control overlap before the rollout, then the treated woredas pull away.">&lt;/p>
&lt;p>Indexed to each group&amp;rsquo;s pre-2008 mean, the treated and control series sit on top of each other through the pre-rollout era — in 2008 the treated group is at &lt;strong>−0.0018&lt;/strong> and the control group at &lt;strong>−0.0030&lt;/strong>, essentially identical, the visual signature of parallel trends holding before treatment turns on. From the 2014 build-out onward the treated series climbs steadily (&lt;strong>+0.083 in 2014 → +0.186 in 2016 → +0.244 in 2017 → +0.237 in 2020&lt;/strong>) while the controls hover around zero with no trend. The eye already sees a matched pair of groups that diverge only after the parks open; the rest of the post is about measuring that divergence and trusting the measurement.&lt;/p>
&lt;p>The staggered structure is easier to see one cohort at a time. The &amp;ldquo;staircase&amp;rdquo; figure traces each opening-year cohort&amp;rsquo;s mean light against the flat never-treated baseline.&lt;/p>
&lt;p>&lt;img src="python_did_industrial_park_02_cohort_staircase.png" alt="Cohort staircase: each opening-year cohort turns up at its own park-opening date against a flat never-treated baseline.">&lt;/p>
&lt;p>Each cohort turns up at its &lt;em>own&lt;/em> opening year — the 2016 cohort lifts off in 2016, the 2018 cohort in 2018 — while the never-treated line stays flat and even drifts down slightly, sharpening the contrast. This is the staggered design made visual: there is no single treatment date, so any honest estimator must align each cohort to its own clock. The next view confirms a second design fact — that treatment is not scattered randomly across the map.&lt;/p>
&lt;p>&lt;img src="python_did_industrial_park_03_treatment_map.png" alt="Treatment map: the 17 treated woredas (orange) cluster spatially among the 122 matched controls (blue).">&lt;/p>
&lt;p>Plotting the 17 treated woredas (orange) and 122 controls (blue) by longitude and latitude shows the treated units are &lt;strong>spatially clustered&lt;/strong>, not randomly sprinkled — parks went to a handful of regions near cities and roads. Clustered treatment means a regional shock could hit several treated woredas at once, so their errors are unlikely to be independent. That is precisely the problem Conley spatial standard errors fix in Section 11. Finally, a distributional view shows the bright-base device head-on.&lt;/p>
&lt;p>&lt;img src="python_did_industrial_park_04_outcome_boxplots.png" alt="Outcome boxplots: treated woredas sit far above controls in level, and shift up further after their parks open.">&lt;/p>
&lt;p>The boxplots split IHS light by group and pre/post period. Treated woredas sit &lt;strong>far above&lt;/strong> controls in level — the synthetic bright base — and shift up further after opening, while controls barely move. The large level gap looks alarming but is harmless: the district fixed effect absorbs any time-invariant brightness, leaving the DiD coefficient untouched. With the intuition built, we can put the first number on the table.&lt;/p>
&lt;h2 id="6-from-a-naive-22-to-the-static-twfe-att">6. From a naive 2×2 to the static TWFE ATT&lt;/h2>
&lt;h3 id="61-the-naive-22-and-why-it-understates-the-effect">6.1 The naive 2×2 (and why it understates the effect)&lt;/h3>
&lt;p>The simplest possible estimate collapses the whole staggered design at the median opening year (2017), forms four treated/control × pre/post cell means, and takes the difference of differences. &lt;a href="https://github.com/igerber/diff-diff" target="_blank" rel="noopener">&lt;code>diff-diff&lt;/code>&lt;/a>&amp;rsquo;s &lt;code>DifferenceInDifferences&lt;/code> class returns it with a standard error.&lt;/p>
&lt;pre>&lt;code class="language-python">d = district.copy()
d[&amp;quot;post&amp;quot;] = (d.year &amp;gt;= 2017).astype(int) # collapse at the median opening year
cells = d.groupby([&amp;quot;treated&amp;quot;, &amp;quot;post&amp;quot;])[&amp;quot;ihs_light&amp;quot;].mean().unstack(&amp;quot;post&amp;quot;)
print(cells.round(4))
res = dd.DifferenceInDifferences(cluster=&amp;quot;district_id&amp;quot;).fit(
d, outcome=&amp;quot;ihs_light&amp;quot;, treatment=&amp;quot;treated&amp;quot;, time=&amp;quot;post&amp;quot;)
print(f&amp;quot;\nDiD ATT = {res.att:+.4f} (SE {res.se:.4f}, p = {res.p_value:.4f})&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">post 0 1
treated
0 0.0990 0.0909
1 2.1308 2.3237
DiD ATT = +0.2011 (SE 0.0885, p = 0.0232)
&lt;/code>&lt;/pre>
&lt;p>Treated light rises &lt;strong>+0.1929&lt;/strong> post-opening while controls &lt;em>fall&lt;/em> &lt;strong>−0.0082&lt;/strong>, so the difference-in-differences is &lt;strong>+0.2011&lt;/strong> (SE 0.0885, p = 0.0232) — significant at 5%, with the by-hand and &lt;code>diff-diff&lt;/code> estimates agreeing to four decimals. But this blended 2×2 &lt;strong>understates&lt;/strong> the dynamic effect: the park&amp;rsquo;s impact ramps up over roughly five years (the event study below reaches +0.48), so averaging the small early post-years with the large late ones pulls the mean toward 0.20. It also leans on the Goodman-Bacon &amp;ldquo;forbidden comparisons&amp;rdquo; we worry about under staggering. The fix is to let the effect vary over time and to absorb confounders with fixed effects.&lt;/p>
&lt;h3 id="62-the-static-twfe-difference-in-differences">6.2 The static TWFE difference-in-differences&lt;/h3>
&lt;p>The workhorse specification adds two-way fixed effects. For woreda $d$ in year $t$:&lt;/p>
&lt;p>$$Y_{dt} = \beta \, D_{dt} + \alpha_d + \gamma_{r(d),t} + \varepsilon_{dt}$$&lt;/p>
&lt;p>In words, this says that the outcome $Y_{dt}$ (here &lt;code>ihs_light&lt;/code>) equals a park effect $\beta$ times the treatment indicator $D_{dt}$ (the &lt;code>treatment&lt;/code> column, which is 1 once a woreda&amp;rsquo;s park is open), plus a &lt;strong>woreda fixed effect&lt;/strong> $\alpha_d$ that absorbs anything permanent about a district (including its bright base), plus a &lt;strong>region-by-year fixed effect&lt;/strong> $\gamma_{r(d),t}$ that absorbs shocks common to a whole region in a given year, plus noise $\varepsilon_{dt}$. The coefficient $\beta$ is the &lt;strong>ATT&lt;/strong> — the average park effect on the treated woredas. The &amp;ldquo;with-trends&amp;rdquo; specification adds the &lt;code>t_*&lt;/code> interactions to let each woreda follow its own linear trend. In &lt;code>pyfixest&lt;/code>, the part after the &lt;code>|&lt;/code> lists the fixed effects to absorb:&lt;/p>
&lt;pre>&lt;code class="language-python">dt = add_trend_terms(district)
out_rows = []
for ycol, label in [(&amp;quot;ihs_light&amp;quot;, &amp;quot;IHS night-light&amp;quot;),
(&amp;quot;light_intensity&amp;quot;, &amp;quot;Raw night-light&amp;quot;),
(&amp;quot;impervious_ratio&amp;quot;, &amp;quot;Impervious ratio&amp;quot;)]:
m0 = pf.feols(f&amp;quot;{ycol} ~ treatment | district_id + region^year&amp;quot;,
data=dt, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;district_id&amp;quot;})
m1 = pf.feols(f&amp;quot;{ycol} ~ treatment + &amp;quot; + &amp;quot; + &amp;quot;.join(TREND_TERMS) +
&amp;quot; | district_id + region^year&amp;quot;,
data=dt, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;district_id&amp;quot;})
out_rows.append((label, m0.coef()[&amp;quot;treatment&amp;quot;], m0.se()[&amp;quot;treatment&amp;quot;],
m1.coef()[&amp;quot;treatment&amp;quot;], m1.se()[&amp;quot;treatment&amp;quot;]))
for label, b0, se0, b1, se1 in out_rows:
print(f&amp;quot;{label:18s} no-trends {b0:+.4f} ({se0:.4f}) &amp;quot;
f&amp;quot;with-trends {b1:+.4f} ({se1:.4f})&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">IHS night-light no-trends +0.2704 (0.1007) with-trends +0.2152 (0.0833)
Raw night-light no-trends +1.7316 (0.4807) with-trends +1.6181 (0.4540)
Impervious ratio no-trends +0.0292 (0.0042) with-trends +0.0263 (0.0037)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="python_did_industrial_park_05_twfe_forest.png" alt="Table 1 forest: a positive park ATT across all three satellite outcomes, no-trends vs with-trends.">&lt;/p>
&lt;p>The static TWFE regression recovers the paper&amp;rsquo;s headline: a park raises IHS nighttime light by &lt;strong>+0.2152&lt;/strong> with trend interactions (SE 0.0833, &lt;em>t&lt;/em> = 2.58, significant at 1%) and &lt;strong>+0.2704&lt;/strong> without them — roughly a &lt;strong>21–27% increase in luminosity&lt;/strong>, since the IHS coefficient reads approximately as a proportional change at these magnitudes. The drop from 0.27 to 0.21 when trends are added is a textbook differential-trend confound: treated woredas were already more urban in 2007 and trending up faster, so the time × urbanization interaction absorbs that slope and the with-trends estimate is the cleaner ATT. The impervious-surface ratio rises &lt;strong>+0.0263&lt;/strong> with trends (SE 0.0037, &lt;em>t&lt;/em> = 7.07) — about 2.6 percentage points of built-up land, ~82% of its 0.032 mean, and the most precisely estimated satellite coefficient in the study. The raw-light coefficient runs high (+1.618 vs the paper&amp;rsquo;s 1.276), a documented synthetic artifact of the bright-base device that we flag again in the reproduction audit. With a static ATT in hand, we unfold it across event time.&lt;/p>
&lt;h2 id="7-the-event-study-the-dynamic-path">7. The event study: the dynamic path&lt;/h2>
&lt;p>A single ATT hides &lt;em>when&lt;/em> the effect arrives. The event study estimates one coefficient per year-relative-to-opening, normalized to the year before opening ($k = -1$). For woreda $d$ in year $t$, with cohort opening year $g$:&lt;/p>
&lt;p>$$Y_{dt} = \sum_{k \neq -1} \delta_k \, \mathbf{1}[t - g = k] + \alpha_d + \gamma_{r(d),t} + \varepsilon_{dt}$$&lt;/p>
&lt;p>In words, this says we replace the single treatment dummy with a &lt;em>set&lt;/em> of dummies, one for each event time $k$ (years since the park opened), each carrying its own coefficient $\delta_k$. The pre-opening coefficients ($k &amp;lt; 0$) should hug zero if parallel trends and no-anticipation hold; the post-opening coefficients ($k \geq 0$) trace how the effect builds. Here $\mathbf{1}[t - g = k]$ is an indicator equal to 1 when woreda $d$ is exactly $k$ years from its own opening, $\alpha_d$ and $\gamma_{r(d),t}$ are the same fixed effects as before, and the omitted $k = -1$ is the reference. We estimate the clean leads and lags with &lt;code>pyfixest&lt;/code>&amp;rsquo;s &lt;code>saturated&lt;/code> (Sun-Abraham) estimator, whose &lt;code>.aggregate()&lt;/code> collapses the cohort dimension to one effect per $k$.&lt;/p>
&lt;pre>&lt;code class="language-python">df = add_first_treat(district)
m = pf.event_study(df, yname=&amp;quot;ihs_light&amp;quot;, idname=&amp;quot;district_id&amp;quot;, tname=&amp;quot;year&amp;quot;,
gname=&amp;quot;first_treat&amp;quot;, estimator=&amp;quot;saturated&amp;quot;, att=True)
es = m.aggregate().reset_index()
es[&amp;quot;event_time&amp;quot;] = es[&amp;quot;period&amp;quot;].astype(float)
es = es[(es.event_time &amp;gt;= -5) &amp;amp; (es.event_time &amp;lt;= 5)].sort_values(&amp;quot;event_time&amp;quot;)
print(es[[&amp;quot;event_time&amp;quot;, &amp;quot;Estimate&amp;quot;, &amp;quot;Std. Error&amp;quot;, &amp;quot;Pr(&amp;gt;|t|)&amp;quot;]].round(4).to_string(index=False))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> event_time Estimate Std. Error Pr(&amp;gt;|t|)
-5.0 -0.0139 0.0176 0.4288
-4.0 -0.0013 0.0138 0.9226
-3.0 -0.0275 0.0127 0.0304
-2.0 -0.0135 0.0077 0.0791
0.0 0.1153 0.0295 0.0001
1.0 0.1928 0.0422 0.0000
2.0 0.2187 0.0641 0.0006
3.0 0.3138 0.0880 0.0004
4.0 0.4844 0.0463 0.0000
5.0 0.4697 0.0712 0.0000
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="python_did_industrial_park_06_event_study.png" alt="Event study: a flat pre-trend for k &amp;lt; 0, then a rising post-opening effect that plateaus by k = 4-5.">&lt;/p>
&lt;p>The figure tells the whole story in one arc. The four pre-opening leads &lt;strong>hug zero&lt;/strong> — they range from −0.0275 to −0.0013 and the largest absolute &lt;em>t&lt;/em> among them is just &lt;strong>2.17&lt;/strong> — weak enough to read as a flat pre-trend rather than a violation. The jump comes strictly &lt;em>after&lt;/em> opening: the effect is already &lt;strong>+0.1153 at $k = 0$&lt;/strong> (p = 0.0001), climbs through +0.1928 ($k = +1$) and +0.2187 ($k = +2$), and plateaus at &lt;strong>+0.4844 ($k = +4$)&lt;/strong> and +0.4697 ($k = +5$). This rising-then-flattening dynamic is exactly &lt;em>why&lt;/em> the naive 2×2 (+0.2011) understated the long-run ATT — it averaged the small early years with the large late ones. The flat pre-period is the central piece of &lt;em>suggestive&lt;/em> support for parallel trends, though it is never a proof, since the assumption concerns the unobserved post-period counterfactual. A skeptic might still worry the +0.215 TWFE headline is an artifact of staggered timing; the next section confronts that worry directly.&lt;/p>
&lt;h2 id="8-modern-staggered-estimators-the-negative-weights-teaching-moment">8. Modern staggered estimators: the negative-weights teaching moment&lt;/h2>
&lt;p>Here is the worry stated precisely. Under staggered adoption, the TWFE regression does not only compare treated woredas to never-treated ones. It &lt;em>also&lt;/em> uses &lt;strong>already-treated&lt;/strong> woredas as controls for &lt;strong>later-treated&lt;/strong> ones — a &amp;ldquo;forbidden comparison.&amp;rdquo; When treatment effects grow over time (as ours clearly do, from +0.12 to +0.48), those forbidden comparisons receive &lt;em>negative weights&lt;/em> and can bias TWFE, in extreme cases flipping its sign. The fix is a generation of estimators — &lt;strong>Sun-Abraham&lt;/strong>, &lt;strong>Borusyak/Gardner&lt;/strong>, and &lt;strong>Callaway-Sant&amp;rsquo;Anna&lt;/strong> — that only ever compare treated cohorts to clean (not-yet- or never-treated) controls. Each targets the same &lt;strong>ATT&lt;/strong>; if they agree with TWFE, the negative-weights problem is not biting.&lt;/p>
&lt;pre>&lt;code class="language-python">def stars(t):
&amp;quot;&amp;quot;&amp;quot;Significance stars from a t-stat (10% / 5% / 1%).&amp;quot;&amp;quot;&amp;quot;
a = abs(t)
return &amp;quot;***&amp;quot; if a &amp;gt; 2.576 else &amp;quot;**&amp;quot; if a &amp;gt; 1.960 else &amp;quot;*&amp;quot; if a &amp;gt; 1.645 else &amp;quot;&amp;quot;
def cell(b, se):
&amp;quot;&amp;quot;&amp;quot;Format a regression cell like '+0.2699*** (0.1005)'.&amp;quot;&amp;quot;&amp;quot;
return f&amp;quot;{b:+.4f}{stars(b / se)} ({se:.4f})&amp;quot;
df = add_first_treat(district)
Y = &amp;quot;ihs_light&amp;quot;
# TWFE benchmark
m_twfe = pf.event_study(df, yname=Y, idname=&amp;quot;district_id&amp;quot;, tname=&amp;quot;year&amp;quot;,
gname=&amp;quot;first_treat&amp;quot;, estimator=&amp;quot;twfe&amp;quot;, att=True)
twfe_b, twfe_se = m_twfe.coef().iloc[0], m_twfe.se().iloc[0]
# Sun-Abraham (saturated): average the clean post-period (k = 0..5) effects
m_sa = pf.event_study(df, yname=Y, idname=&amp;quot;district_id&amp;quot;, tname=&amp;quot;year&amp;quot;,
gname=&amp;quot;first_treat&amp;quot;, estimator=&amp;quot;saturated&amp;quot;, att=True)
sa = m_sa.aggregate(); sa.index = sa.index.astype(float)
sa_post = sa[(sa.index &amp;gt;= 0) &amp;amp; (sa.index &amp;lt;= 5)]
sa_b = float(sa_post[&amp;quot;Estimate&amp;quot;].mean())
sa_se = float(np.sqrt((sa_post[&amp;quot;Std. Error&amp;quot;].astype(float) ** 2).mean() / len(sa_post)))
# Borusyak/Gardner imputation (did2s)
m_d2s = pf.event_study(df, yname=Y, idname=&amp;quot;district_id&amp;quot;, tname=&amp;quot;year&amp;quot;,
gname=&amp;quot;first_treat&amp;quot;, estimator=&amp;quot;did2s&amp;quot;, att=True)
d2s_b, d2s_se = m_d2s.coef().iloc[0], m_d2s.se().iloc[0]
# Callaway-Sant'Anna against the never-treated group
cs = dd.CallawaySantAnna(control_group=&amp;quot;never_treated&amp;quot;, cluster=&amp;quot;district_id&amp;quot;).fit(
df, outcome=Y, unit=&amp;quot;district_id&amp;quot;, time=&amp;quot;year&amp;quot;,
first_treat=&amp;quot;first_treat&amp;quot;, aggregate=&amp;quot;simple&amp;quot;)
print(f&amp;quot;TWFE ATT : {cell(twfe_b, twfe_se)}&amp;quot;)
print(f&amp;quot;Sun-Abraham ATT (avg k=0..5) : {cell(sa_b, sa_se)}&amp;quot;)
print(f&amp;quot;Borusyak/Gardner ATT (did2s) : {cell(d2s_b, d2s_se)}&amp;quot;)
print(f&amp;quot;Callaway-Sant'Anna ATT : {cell(cs.att, cs.se)}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">TWFE ATT : +0.2699*** (0.1005)
Sun-Abraham ATT (avg k=0..5) : +0.2991*** (0.0246)
Borusyak/Gardner ATT (did2s) : +0.3022*** (0.0907)
Callaway-Sant'Anna ATT : +0.2561*** (0.0763)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="python_did_industrial_park_07_estimator_comparison.png" alt="Four estimators, one estimand: TWFE, Sun-Abraham, Borusyak/Gardner and Callaway-Sant&amp;amp;rsquo;Anna all land in the ~0.21-0.30 band.">&lt;/p>
&lt;p>All four estimators target the same ATT and land in a tight band: TWFE &lt;strong>+0.2699&lt;/strong>, Sun-Abraham &lt;strong>+0.2991&lt;/strong>, Borusyak/Gardner &lt;strong>+0.3022&lt;/strong>, and Callaway-Sant&amp;rsquo;Anna &lt;strong>+0.2561&lt;/strong> — a spread of only &lt;strong>0.046 IHS units&lt;/strong> across methods that, in other settings, can diverge sharply. Each is significant at 1%. They agree here because there is a real never-treated comparison group (the 122 controls) and the treatment effect is fairly homogeneous, so the conditions that make TWFE&amp;rsquo;s forbidden comparisons dangerous simply do not bind. This agreement is the methodological payoff: a reader worried that the headline is a negative-weighting artifact can see three staggered-robust estimators reproduce it. To show &lt;em>why&lt;/em> they agree, we decompose the TWFE number itself.&lt;/p>
&lt;p>The &lt;strong>Goodman-Bacon decomposition&lt;/strong> breaks the TWFE coefficient into the weighted average of every underlying 2×2 comparison, labeling each by type. &lt;code>diff-diff&lt;/code> does it in one call.&lt;/p>
&lt;pre>&lt;code class="language-python">bac = dd.BaconDecomposition().fit(df, outcome=Y, unit=&amp;quot;district_id&amp;quot;,
time=&amp;quot;year&amp;quot;, first_treat=&amp;quot;first_treat&amp;quot;)
bdf = bac.to_dataframe()
print(f&amp;quot;Goodman-Bacon: TWFE = {bac.twfe_estimate:+.4f} decomposes into &amp;quot;
f&amp;quot;{len(bdf)} 2x2 comparisons.&amp;quot;)
print(bdf.groupby(&amp;quot;comparison_type&amp;quot;)
.apply(lambda g: pd.Series({&amp;quot;total_weight&amp;quot;: g.weight.sum(),
&amp;quot;weighted_avg_estimate&amp;quot;: np.average(g.estimate, weights=g.weight)}))
.round(4))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Goodman-Bacon: TWFE = +0.2699 decomposes into 64 2x2 comparisons.
comparison_type total_weight weighted_avg_estimate
earlier_vs_later 0.0338 0.3370
later_vs_earlier 0.0121 0.0135
treated_vs_never 0.9542 0.2708
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="python_did_industrial_park_08_bacon_weights.png" alt="Goodman-Bacon decomposition: the clean treated-vs-never 2x2 comparisons carry nearly all the weight.">&lt;/p>
&lt;p>The decomposition is reassuring. The &lt;strong>clean treated-vs-never-treated comparisons carry 95.42% of the total weight&lt;/strong> and average &lt;strong>+0.2708&lt;/strong> — essentially the headline. The &amp;ldquo;forbidden&amp;rdquo; later-vs-earlier comparisons (already-treated units used as controls, the ones that can flip TWFE&amp;rsquo;s sign) carry just &lt;strong>1.21% of the weight&lt;/strong> and contribute a near-zero +0.0135; clean earlier-vs-later comparisons add another 3.38% at +0.337. With at most ~1.2% of the weight on biased comparisons, TWFE is &lt;strong>barely contaminated&lt;/strong> here — the empirical reason the four estimators agreed. The general lesson is worth keeping: the negative-weights problem is real in principle but &lt;em>empirically negligible whenever a large never-treated pool dominates the weighting&lt;/em>, as the 122 PSM controls do. Having trusted the average, we can now ask where the effect is strongest.&lt;/p>
&lt;h2 id="9-heterogeneity-and-spillovers">9. Heterogeneity and spillovers&lt;/h2>
&lt;h3 id="91-where-parks-work-distance-and-roads">9.1 Where parks work: distance and roads&lt;/h3>
&lt;p>Place-based policy is, by definition, about place — so the effect should depend on &lt;em>where&lt;/em> the park sits. We interact the treatment with distance moderators (a negative interaction means the effect fades with distance) and road-density moderators (a positive interaction means roads amplify it), each on the with-trends spec.&lt;/p>
&lt;pre>&lt;code class="language-python">dt = add_trend_terms(district)
for mod in [&amp;quot;dist_addis_km&amp;quot;, &amp;quot;dist_state_capital_km&amp;quot;, &amp;quot;dist_nearest_city_km&amp;quot;,
&amp;quot;primary_road_density&amp;quot;, &amp;quot;paved_road_density&amp;quot;]:
m = pf.feols(f&amp;quot;ihs_light ~ treatment + treatment:{mod} + &amp;quot; +
&amp;quot; + &amp;quot;.join(TREND_TERMS) + &amp;quot; | district_id + region^year&amp;quot;,
data=dt, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;district_id&amp;quot;})
b, se = m.coef()[f&amp;quot;treatment:{mod}&amp;quot;], m.se()[f&amp;quot;treatment:{mod}&amp;quot;]
print(f&amp;quot;{mod:24s} interaction {b:+.5f} (se {se:.5f}, t {b/se:+.2f})&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">dist_addis_km interaction -0.00822 (se 0.00232, t -3.54)
dist_state_capital_km interaction -0.00862 (se 0.00406, t -2.13)
dist_nearest_city_km interaction -0.03352 (se 0.00684, t -4.90)
primary_road_density interaction +0.32640 (se 0.84748, t +0.39)
paved_road_density interaction +0.66945 (se 0.32174, t +2.08)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="python_did_industrial_park_09_heterogeneity.png" alt="Heterogeneity: the implied park effect fades the farther a woreda lies from Addis, its state capital, or the nearest city.">&lt;/p>
&lt;p>Location fundamentals sharply moderate park effectiveness, exactly as the paper argues. All three &lt;strong>distance interactions are negative&lt;/strong> — the park effect fades with distance from economic centers — and three of them are significant: distance to nearest city (&lt;strong>−0.0335&lt;/strong>, &lt;em>t&lt;/em> = −4.90, the steepest decay), distance to Addis (&lt;strong>−0.0082&lt;/strong>, &lt;em>t&lt;/em> = −3.54), and distance to the state capital (&lt;strong>−0.0086&lt;/strong>, &lt;em>t&lt;/em> = −2.13). Both &lt;strong>road interactions are positive&lt;/strong> — denser roads amplify the effect — with paved-road density significant (&lt;strong>+0.6695&lt;/strong>, &lt;em>t&lt;/em> = 2.08) but primary-road density correctly signed yet borderline insignificant (+0.3264, &lt;em>t&lt;/em> = 0.39). That last result is an honest synthetic limitation: with only 17 treated woredas the mutually-correlated moderators cannot all be precise at once, so one of the two road interactions necessarily reads non-significant. The point estimates all carry the predicted sign; precision, not direction, is what the small treated sample cannot fully deliver. A related question is whether the park&amp;rsquo;s gain is truly &lt;em>new&lt;/em> or merely stolen from its neighbours.&lt;/p>
&lt;h3 id="92-spillovers-does-a-park-lift-its-neighbours">9.2 Spillovers: does a park lift its neighbours?&lt;/h3>
&lt;p>The spillover test adds a &lt;code>nearby&lt;/code> indicator — control woredas within 10 km of an operational park — to the Table 1 spec. If parks merely displace activity from neighbours, &lt;code>nearby&lt;/code> should be negative; if the gains are net-new, it should be zero.&lt;/p>
&lt;pre>&lt;code class="language-python">for ycol, label in [(&amp;quot;ihs_light&amp;quot;, &amp;quot;IHS night-light&amp;quot;),
(&amp;quot;light_intensity&amp;quot;, &amp;quot;Raw night-light&amp;quot;)]:
m = pf.feols(f&amp;quot;{ycol} ~ treatment + nearby | district_id + region^year&amp;quot;,
data=district, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;district_id&amp;quot;})
print(f&amp;quot;{label:18s} treatment {m.coef()['treatment']:+.4f} &amp;quot;
f&amp;quot;nearby {m.coef()['nearby']:+.4f} (t {m.coef()['nearby']/m.se()['nearby']:+.2f})&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">IHS night-light treatment +0.2712 nearby +0.0648 (t +1.06)
Raw night-light treatment +1.7328 nearby +0.0927 (t +1.35)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="python_did_industrial_park_10_spillover.png" alt="Spillover test: treatment lifts the host woreda strongly, but the effect on neighbours is about zero.">&lt;/p>
&lt;p>The &lt;code>nearby&lt;/code> coefficient is &lt;strong>+0.0648 (SE 0.0610, &lt;em>t&lt;/em> = 1.06) for IHS light&lt;/strong> and &lt;strong>+0.0927 (&lt;em>t&lt;/em> = 1.35) for raw light&lt;/strong> — both small and statistically indistinguishable from zero — while the treatment coefficient stays large and significant (+0.2712). The reading is &lt;strong>no spillover&lt;/strong>: the park lifts its host woreda by ~0.27 IHS but leaves immediate neighbours essentially unchanged, so the host&amp;rsquo;s gain is net-new activity, not displacement. This also reassures on SUTVA: with no measurable geographic spillover, the never-treated controls are not contaminated by proximity to a park, so the main ATT is not biased by treated-on-control externalities. Economically, the parks behave like relatively self-contained enclaves with weak local supplier linkages. So far the story is about lights and land — but did the parks change how people actually live?&lt;/p>
&lt;h2 id="10-household-welfare-and-womens-empowerment">10. Household welfare and women&amp;rsquo;s empowerment&lt;/h2>
&lt;h3 id="101-household-living-standards-table-5">10.1 Household living standards (Table 5)&lt;/h3>
&lt;p>We now switch to the DHS household repeated cross-section. Because each round samples &lt;em>different&lt;/em> households, there is no household panel key, so we use &lt;strong>no household fixed effect&lt;/strong> — the effect is identified off district × round group means, with district and region×round fixed effects and DHS survey weights. We report each outcome with and without household-size and head-age controls.&lt;/p>
&lt;pre>&lt;code class="language-python">for ycol, label in [(&amp;quot;durable_goods_pc&amp;quot;, &amp;quot;Durable goods p.c.&amp;quot;),
(&amp;quot;housing_quality&amp;quot;, &amp;quot;Housing quality&amp;quot;),
(&amp;quot;wealth_index&amp;quot;, &amp;quot;Wealth index&amp;quot;)]:
m0 = pf.feols(f&amp;quot;{ycol} ~ treatment | district_id + region_id^survey_round&amp;quot;,
data=household, weights=&amp;quot;survey_weight&amp;quot;, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;district_id&amp;quot;})
m1 = pf.feols(f&amp;quot;{ycol} ~ treatment + hh_size + age_head | &amp;quot;
&amp;quot;district_id + region_id^survey_round&amp;quot;,
data=household, weights=&amp;quot;survey_weight&amp;quot;, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;district_id&amp;quot;})
print(f&amp;quot;{label:18s} no-controls {m0.coef()['treatment']:+.4f} &amp;quot;
f&amp;quot;with-controls {m1.coef()['treatment']:+.4f} ({m1.se()['treatment']:.4f})&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Durable goods p.c. no-controls +0.2489 with-controls +0.2286 (0.0284)
Housing quality no-controls +0.2484 with-controls +0.2480 (0.0193)
Wealth index no-controls +0.3875 with-controls +0.3825 (0.0461)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="python_did_industrial_park_11_household_forest.png" alt="Table 5 forest: households near a park gain durables, housing quality and wealth, with or without controls.">&lt;/p>
&lt;p>All three living-standards outcomes rise sharply and significantly. Durable goods per capita gain &lt;strong>+0.2286&lt;/strong> with controls (SE 0.0284, &lt;em>t&lt;/em> = 8.06) — against a 0.308 mean, a &lt;strong>~74% increase&lt;/strong>. Housing quality (an indicator for having electricity, piped water, a toilet, and a finished floor) rises &lt;strong>+0.2480&lt;/strong>, so the probability of clearing that bar jumps &lt;strong>~24.8 percentage points&lt;/strong> off a 30.7% base. The composite wealth index rises &lt;strong>+0.3825 standard deviations&lt;/strong> (SE 0.0461, &lt;em>t&lt;/em> = 8.29). Crucially, adding controls barely moves any estimate (durables 0.249 → 0.229, the others essentially unchanged), which confirms the district + region×round design already absorbs the main confounding — the covariates are only mildly correlated with treatment. As at the satellite level, the timing is clean.&lt;/p>
&lt;pre>&lt;code class="language-python"># RCS event study uses coarse phase dummies (no balanced unit x time grid).
# _rcs_event_study() is defined in the companion script.py.
es = _rcs_event_study(household, &amp;quot;durable_goods_pc&amp;quot;, controls=[&amp;quot;hh_size&amp;quot;, &amp;quot;age_head&amp;quot;])
print(es.round(4).to_string(index=False))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> event_phase estimate se p_value
-3.0 -0.0197 0.0482 0.6840
-2.0 0.0236 0.0329 0.4757
0.0 0.2606 0.0398 0.0000
1.0 0.1513 0.0387 0.0001
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="python_did_industrial_park_12_household_event_study.png" alt="Household durables RCS event study: flat pre-phases, then a jump at opening (phase 0).">&lt;/p>
&lt;p>Because the DHS data are repeated cross-sections, the household event study uses coarse &lt;em>phase&lt;/em> dummies rather than annual event time. The two pre-opening phases are flat and insignificant — phase −3 at &lt;strong>−0.0197&lt;/strong> (p = 0.68) and phase −2 at &lt;strong>+0.0236&lt;/strong> (p = 0.48), both straddling zero — so there is no differential pre-trend in household durables. The effect then jumps to &lt;strong>+0.2606 at phase 0&lt;/strong> (p &amp;lt; 0.0001) and stays strongly positive at +0.1513 at phase +1. This is the RCS counterpart to the satellite event study&amp;rsquo;s no-anticipation evidence, with the honest caveat that two pre-phases make a low-powered test. Now to the question the whole post has been building toward: who got the jobs?&lt;/p>
&lt;h3 id="102-employment-and-womens-empowerment-tables-67-the-climax">10.2 Employment and women&amp;rsquo;s empowerment (Tables 6–7): the climax&lt;/h3>
&lt;p>This is the analytical climax, and a textbook case for heterogeneity analysis. We estimate non-agricultural employment for the full sample, then split by sex, using the same survey-weighted RCS design.&lt;/p>
&lt;pre>&lt;code class="language-python">ctrl = &amp;quot;hh_size + age_head + age + age_sq&amp;quot;
for label, sub in [(&amp;quot;Full sample&amp;quot;, individual),
(&amp;quot;Women&amp;quot;, individual[individual.sex == 1]),
(&amp;quot;Men&amp;quot;, individual[individual.sex == 0])]:
m = pf.feols(f&amp;quot;nonag_employment ~ treatment + {ctrl} | &amp;quot;
&amp;quot;district_id + region_id^survey_round&amp;quot;,
data=sub, weights=&amp;quot;survey_weight&amp;quot;, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;district_id&amp;quot;})
b, se = m.coef()[&amp;quot;treatment&amp;quot;], m.se()[&amp;quot;treatment&amp;quot;]
print(f&amp;quot;{label:12s} {b:+.4f} ({se:.4f}) t {b/se:+.2f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Full sample +0.0911 (0.0580) t +1.57 &amp;lt;-- NULL on average
Women +0.1404 (0.0468) t +3.00 &amp;lt;-- SIGNIFICANT for women
Men +0.0176 (0.0934) t +0.19
&lt;/code>&lt;/pre>
&lt;p>The &lt;strong>average&lt;/strong> non-agricultural employment effect is &lt;strong>+0.0911 (SE 0.0580, &lt;em>t&lt;/em> = 1.57) — insignificant&lt;/strong> — which, read alone, would suggest parks do not move employment at all. But pooling the sexes hides a strong gendered split: the &lt;strong>female&lt;/strong> effect is &lt;strong>+0.1404 (SE 0.0468, &lt;em>t&lt;/em> = 3.00, significant at 1%)&lt;/strong> — about a &lt;strong>14-percentage-point rise&lt;/strong> in women&amp;rsquo;s non-agricultural employment — while the &lt;strong>male&lt;/strong> effect is &lt;strong>+0.0176 (&lt;em>t&lt;/em> = 0.19), essentially zero&lt;/strong>. The parks, concentrated in textiles and garments, pull &lt;em>women&lt;/em> into factory wage work; the men were largely already off-farm, so the average washes out. A reader who quoted only the full-sample number would badly misread the study — the sex split &lt;em>is&lt;/em> the finding, not a footnote. The empowerment cascade follows the jobs.&lt;/p>
&lt;pre>&lt;code class="language-python">women = individual[individual.sex == 1]
for ycol, label in [(&amp;quot;decision_power&amp;quot;, &amp;quot;Decision power&amp;quot;),
(&amp;quot;savings_account&amp;quot;, &amp;quot;Savings account&amp;quot;),
(&amp;quot;dv_accept&amp;quot;, &amp;quot;Accepts DV&amp;quot;)]:
m = pf.feols(f&amp;quot;{ycol} ~ treatment + {ctrl} | &amp;quot;
&amp;quot;district_id + region_id^survey_round&amp;quot;,
data=women, weights=&amp;quot;survey_weight&amp;quot;, vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;district_id&amp;quot;})
b, se = m.coef()[&amp;quot;treatment&amp;quot;], m.se()[&amp;quot;treatment&amp;quot;]
print(f&amp;quot;{label:18s} {b:+.4f} ({se:.4f}) t {b/se:+.2f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Decision power +0.1096 (0.0194) t +5.66
Savings account +0.3153 (0.0182) t +17.34
Accepts DV -0.2096 (0.0254) t -8.24
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="python_did_industrial_park_13_employment_empowerment.png" alt="The gender story: employment is null overall but large for women; women&amp;amp;rsquo;s decision power and savings rise while acceptance of domestic violence falls.">&lt;/p>
&lt;p>With factory jobs, women&amp;rsquo;s outcomes shift across the board (women only). Decision-making power rises &lt;strong>+0.1096&lt;/strong> (SE 0.0194, &lt;em>t&lt;/em> = 5.66), savings-account ownership rises &lt;strong>+0.3153&lt;/strong> (SE 0.0182, &lt;em>t&lt;/em> = 17.34) — enormous against a 6.3% base — and acceptance of domestic violence &lt;strong>falls −0.2096&lt;/strong> (SE 0.0254, &lt;em>t&lt;/em> = −8.24), a ~21-point reduction off a 63.5% base. Economic agency translates into household bargaining power and shifting gender norms. The event study below confirms the timing.&lt;/p>
&lt;p>&lt;img src="python_did_industrial_park_14_empowerment_event_study.png" alt="Female employment and decision-power RCS event study: women&amp;amp;rsquo;s gains appear at and after opening, not before.">&lt;/p>
&lt;p>The female-employment and decision-power event studies both sit near zero in the pre-phases and turn up at and after phase 0 (female employment jumps to +0.1311 at phase 0, p = 0.013), reinforcing the no-anticipation reading — women&amp;rsquo;s gains appear &lt;em>with&lt;/em> the park, not before it. The gender result is the substantive heart of the study; one robustness battery remains to decide whether to trust the satellite headline that anchors it.&lt;/p>
&lt;h2 id="11-robustness-conley-spatial-standard-errors-and-restricted-pools">11. Robustness: Conley spatial standard errors and restricted pools&lt;/h2>
&lt;p>Recall from the map that all 17 treated woredas cluster spatially. When treated units are packed together, a regional shock hits several at once, so their errors are not independent draws — and the naive standard error, which assumes independence, will be too small. The fix is a &lt;strong>Conley spatial-HAC&lt;/strong> standard error, which allows a district&amp;rsquo;s errors to correlate with &lt;em>itself&lt;/em> over time (serial) and with &lt;em>nearby&lt;/em> districts in the same year (spatial). The point estimate never changes; only the standard error does. We compute four standard errors for the with-trends light ATT and re-estimate it on restricted control pools.&lt;/p>
&lt;pre>&lt;code class="language-python"># four SEs for the with-trends IHS-light ATT (full Conley sandwich in script.py)
se_tab = conley_se_for_spec(add_trend_terms(district), &amp;quot;ihs_light&amp;quot;,
[&amp;quot;treatment&amp;quot;] + TREND_TERMS)
print(se_tab.loc[se_tab.term == &amp;quot;treatment&amp;quot;,
[&amp;quot;estimate&amp;quot;, &amp;quot;se_naive&amp;quot;, &amp;quot;se_clustered&amp;quot;, &amp;quot;se_conley&amp;quot;, &amp;quot;se_hac&amp;quot;]]
.round(4).to_string(index=False))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> estimate se_naive se_clustered se_conley se_hac
0.2152 0.0329 0.0792 0.0346 0.0799
&lt;/code>&lt;/pre>
&lt;p>The satellite headline survives honest standard errors. The most conservative &lt;strong>Conley spatial-HAC SE is 0.0799 — 2.43× the naive HC0 SE of 0.0329&lt;/strong> — yet the ATT of +0.2152 stays significant (&lt;em>t&lt;/em> = 2.69, significant at 1%). Notice the cluster SE (0.0792) and the Conley-HAC SE (0.0799) are nearly identical: clustering at the district level already captures most of the dependence, so spatial correlation &lt;em>beyond&lt;/em> the district adds little here. The estimate is also stable when we change the comparison group — dropping the Addis Ababa region or restricting controls to those far from any city.&lt;/p>
&lt;pre>&lt;code class="language-python">specs = {&amp;quot;Full sample&amp;quot;: district,
&amp;quot;Drop Addis region&amp;quot;: district[district.region != &amp;quot;Addis Ababa&amp;quot;],
&amp;quot;Controls &amp;gt;= 50km from city&amp;quot;: district[(district.treated == 1) |
(district.dist_nearest_city_km &amp;gt;= 50)]}
for name, sub in specs.items():
m = pf.feols(&amp;quot;ihs_light ~ treatment + &amp;quot; + &amp;quot; + &amp;quot;.join(TREND_TERMS) +
&amp;quot; | district_id + region^year&amp;quot;,
data=add_trend_terms(sub), vcov={&amp;quot;CRV1&amp;quot;: &amp;quot;district_id&amp;quot;})
b, se = m.coef()[&amp;quot;treatment&amp;quot;], m.se()[&amp;quot;treatment&amp;quot;]
print(f&amp;quot;{name:28s} {b:+.4f} ({se:.4f}) N={m._N}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Full sample +0.2152 (0.0833) N=2224
Drop Addis region +0.1550 (0.0910) N=1984
Controls &amp;gt;= 50km from city +0.2143 (0.0854) N=1392
&lt;/code>&lt;/pre>
&lt;p>Dropping the Addis Ababa region pulls the estimate to &lt;strong>+0.1550&lt;/strong> (still significant at 10%, N 1,984) and restricting controls to those at least 50 km from a city holds it at &lt;strong>+0.2143&lt;/strong> (significant at 5%, N 1,392). Combined with the Section 8 agreement of Sun-Abraham, Borusyak/Gardner, and Callaway-Sant&amp;rsquo;Anna, the satellite result is robust to both the standard-error specification and the choice of comparison group. With the evidence assembled, we can return to the opening question.&lt;/p>
&lt;h2 id="12-discussion">12. Discussion&lt;/h2>
&lt;p>&lt;strong>What we found.&lt;/strong> Yes — and the &amp;ldquo;for whom&amp;rdquo; matters as much as the &amp;ldquo;whether.&amp;rdquo; A park raises local nighttime light by about &lt;strong>+0.215 IHS&lt;/strong> (~21%) and built-up land by ~2.6 percentage points, with the effect building over five years to a +0.48 plateau and &lt;strong>no spillover&lt;/strong> to neighbours. Four estimators agree the staggered-DiD negative-weights problem is not biting (spread 0.046, with 95.4% clean Bacon weight). Households near a park gain durables (+0.229), housing quality (+0.248), and wealth (+0.383 SD). And the central result: average non-agricultural employment is an insignificant &lt;strong>+0.091&lt;/strong>, yet &lt;strong>women&amp;rsquo;s&lt;/strong> employment rises a significant &lt;strong>+0.140&lt;/strong>, lifting their decision power (+0.110), savings (+0.315), and lowering acceptance of domestic violence (−0.210). The parks reshaped the local economy, and they did so largely &lt;em>through women&lt;/em>.&lt;/p>
&lt;p>&lt;strong>So what?&lt;/strong> Two design lessons follow directly. First, on &lt;strong>site selection&lt;/strong>: the effect fades steeply with distance from cities (−0.0335 per km to the nearest city) and is amplified by paved roads (+0.6695). A park dropped in a remote, poorly-connected woreda would do far less — proximity to existing economic centers is first-order, so place-based policy should follow the roads. Second, on &lt;strong>sector and inclusion&lt;/strong>: because the employment and empowerment gains run through female-intensive sectors (textiles, garments), a policymaker who measured only the &lt;em>average&lt;/em> employment effect would conclude the parks failed on jobs and miss their largest social return. Evaluations of place-based policy should be sex-disaggregated by default.&lt;/p>
&lt;p>&lt;strong>Limitations and the observational caveat.&lt;/strong> Be appropriately humble. The data are &lt;strong>synthetic&lt;/strong> — calibrated to teach the methods, not to report new facts about Ethiopia. The treated group is tiny (17 woredas), so several effects are borderline; the primary-road interaction is correctly signed but imprecise, and the raw-light coefficient runs high. Most fundamentally, this is an &lt;strong>observational&lt;/strong> study: the parks were not randomly placed, so identification rests on &lt;strong>parallel trends&lt;/strong>, not randomization. The flat pre-trends and the null spillover support that assumption but never prove it. The adjustment here — district and region×year fixed effects, baseline-trend interactions, and the PSM-matched controls — is &lt;em>confounding control&lt;/em>, not the precision-only adjustment of a randomized experiment. The ATT we report is the effect on &lt;em>these&lt;/em> parks in &lt;em>this&lt;/em> setting; it travels only as far as that.&lt;/p>
&lt;h2 id="13-reproduction-audit-synthetic-data-vs-the-paper">13. Reproduction audit: synthetic data vs the paper&lt;/h2>
&lt;p>Because the data are synthetic, transparency demands we line our numbers up against the published ones. The data-generating process was tuned to match the paper coefficient by coefficient; signs and significance agree throughout, and the headline magnitudes land within about 0.02. We also disclose four documented gaps rather than paper over them.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Result&lt;/th>
&lt;th>This synthetic data&lt;/th>
&lt;th>Paper (reported)&lt;/th>
&lt;th style="text-align:center">Sign&lt;/th>
&lt;th style="text-align:center">Significance&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Table 1: IHS light, no trends&lt;/td>
&lt;td>+0.2704***&lt;/td>
&lt;td>≈ +0.265**&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Table 1: IHS light, with trends&lt;/td>
&lt;td>+0.2152***&lt;/td>
&lt;td>≈ +0.214**&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Table 1: raw light, with trends&lt;/td>
&lt;td>+1.6181***&lt;/td>
&lt;td>≈ +1.276**&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">partial (high)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Table 1: impervious, with trends&lt;/td>
&lt;td>+0.0263***&lt;/td>
&lt;td>≈ +0.028**&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Table 2: &lt;code>nearby&lt;/code> spillover (IHS)&lt;/td>
&lt;td>+0.0648 (ns)&lt;/td>
&lt;td>≈ 0 (ns)&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Table 3: distance to nearest city&lt;/td>
&lt;td>−0.0335***&lt;/td>
&lt;td>negative &amp;amp; sig.&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Table 4: paved-road density&lt;/td>
&lt;td>+0.6695**&lt;/td>
&lt;td>positive&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Table 4: primary-road density&lt;/td>
&lt;td>+0.3264 (ns)&lt;/td>
&lt;td>positive&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">partial (ns)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Table 5: durables (controls)&lt;/td>
&lt;td>+0.2286***&lt;/td>
&lt;td>≈ +0.226***&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Table 5: housing (controls)&lt;/td>
&lt;td>+0.2480***&lt;/td>
&lt;td>≈ +0.252***&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Table 5: wealth (controls)&lt;/td>
&lt;td>+0.3825***&lt;/td>
&lt;td>≈ +0.409*&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Table 6: employment, full sample&lt;/td>
&lt;td>+0.0911 (ns)&lt;/td>
&lt;td>≈ +0.110 (ns)&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Table 6: employment, women&lt;/td>
&lt;td>+0.1404***&lt;/td>
&lt;td>≈ +0.133***&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Table 6: employment, men&lt;/td>
&lt;td>+0.0176 (ns)&lt;/td>
&lt;td>≈ +0.015 (ns)&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Table 7: decision power&lt;/td>
&lt;td>+0.1096***&lt;/td>
&lt;td>≈ +0.103***&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Table 7: savings account&lt;/td>
&lt;td>+0.3153***&lt;/td>
&lt;td>≈ +0.318***&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Table 7: DV acceptance&lt;/td>
&lt;td>−0.2096***&lt;/td>
&lt;td>≈ −0.212***&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Staggered: TWFE / SA / BG / CS ATT&lt;/td>
&lt;td>+0.270 / +0.299 / +0.302 / +0.256&lt;/td>
&lt;td>&amp;ldquo;closely track baseline&amp;rdquo;&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;td style="text-align:center">✓&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;em>Stars: *** p &amp;lt; .01, ** p &amp;lt; .05, * p &amp;lt; .10.&lt;/em>&lt;/p>
&lt;p>Of the audited cells, the great majority &lt;strong>land on target&lt;/strong> in sign, significance, and magnitude (within ~0.02 on the headline coefficients). Four gaps are documented and bounded. (1) The &lt;strong>raw-light coefficient runs high&lt;/strong> (~1.6 vs 1.276): keeping treated woredas essentially always-lit (for a clean IHS event study with only 17 clusters) removes the zero-dilution that would otherwise pull the raw mean down — a deliberate bright-base device that &lt;em>protects&lt;/em> the on-target IHS coefficient. (2) The &lt;strong>primary-road interaction&lt;/strong> is correctly signed and on-magnitude but borderline non-significant — the 17-treated sample cannot make both road interactions precise at once. (3) &lt;strong>Light levels are not matched&lt;/strong>: treated woredas carry an intrinsically bright base (~4–5) and controls a dim one (~0.1), unlike the paper&amp;rsquo;s PSM-matched 0.94/0.87, which is exactly why the EDA figure is baseline-normalized. (4) The &lt;strong>decision-power mean&lt;/strong> (~0.88) sits a touch below the paper&amp;rsquo;s 0.899 because the linear-probability clipping ceiling caps the achievable effect. Everywhere else, direction and significance track the paper closely. The synthetic data reproduce the paper&amp;rsquo;s &lt;em>findings&lt;/em> — they are not, and are not claimed to be, the paper&amp;rsquo;s data.&lt;/p>
&lt;h2 id="14-summary-and-takeaways">14. Summary and takeaways&lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Number to remember&lt;/th>
&lt;th>Value&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Light ATT (with trends)&lt;/td>
&lt;td>&lt;strong>+0.2152***&lt;/strong> (~21%)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Four-estimator spread&lt;/td>
&lt;td>&lt;strong>0.046 IHS units&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Clean Bacon weight&lt;/td>
&lt;td>&lt;strong>95.4%&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Wealth-index ATT&lt;/td>
&lt;td>&lt;strong>+0.383 SD&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Female employment ATT&lt;/td>
&lt;td>&lt;strong>+0.140***&lt;/strong> (vs +0.091 ns full sample)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Light SE: naive → Conley-HAC&lt;/td>
&lt;td>&lt;strong>0.0329 → 0.0799&lt;/strong>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ol>
&lt;li>&lt;strong>A park raises local activity ~21% — and the staggered-bias worry does not bite.&lt;/strong> The with-trends TWFE ATT is &lt;strong>+0.2152***&lt;/strong>, and TWFE, Sun-Abraham (+0.299), Borusyak/Gardner (+0.302), and Callaway-Sant&amp;rsquo;Anna (+0.256) all agree within &lt;strong>0.046&lt;/strong> because &lt;strong>95.4%&lt;/strong> of the Bacon weight is clean treated-vs-never comparisons. When a large never-treated pool dominates, plain TWFE is barely contaminated.&lt;/li>
&lt;li>&lt;strong>The average hides the finding — split by sex.&lt;/strong> Full-sample non-ag employment is an insignificant &lt;strong>+0.091&lt;/strong>, but the &lt;strong>female&lt;/strong> effect is &lt;strong>+0.140***&lt;/strong> and the male effect is ~0. The empowerment cascade follows: decision power +0.110, savings +0.315, and acceptance of domestic violence −0.210, all highly significant. Heterogeneity analysis turned a null into the study&amp;rsquo;s headline.&lt;/li>
&lt;li>&lt;strong>Honest inference matters but does not overturn the result (a limitation in spirit).&lt;/strong> With all 17 treated woredas clustered in space, the Conley-HAC SE (0.0799) is &lt;strong>2.43×&lt;/strong> the naive HC0 SE (0.0329); the ATT still clears significance (&lt;em>t&lt;/em> = 2.69), and the small treated sample is why the primary-road interaction and the raw-light level remain imprecise or off-target.&lt;/li>
&lt;li>&lt;strong>Next step.&lt;/strong> Re-estimate the event study with a Callaway-Sant&amp;rsquo;Anna &lt;em>dynamic&lt;/em> aggregation to compare its lag-by-lag path against the &lt;code>saturated&lt;/code> one, add a sensitivity analysis (à la Rambachan-Roth) that asks how large a pre-trend violation would overturn the +0.215 ATT, and test whether labor-intensive parks drive the female-employment effect more than capital-intensive ones.&lt;/li>
&lt;/ol>
&lt;h2 id="15-exercises">15. Exercises&lt;/h2>
&lt;ol>
&lt;li>&lt;strong>Drop the anchor cohort.&lt;/strong> Re-run the staggered estimators after excluding the single 2008 woreda, so all treated units come from the 2014–2020 build-out. Do the four ATTs still agree within 0.05, and does the Goodman-Bacon clean-weight share change? What does that tell you about how much one early cohort drives the comparison structure?&lt;/li>
&lt;li>&lt;strong>Stress-test the gender result.&lt;/strong> Add an interaction &lt;code>treatment:sex&lt;/code> to the &lt;em>full-sample&lt;/em> employment regression instead of splitting the data. Does the interaction coefficient recover the female-minus-male gap (≈ +0.123)? Why might the pooled-interaction and split-sample approaches give slightly different standard errors?&lt;/li>
&lt;li>&lt;strong>Move the collapse year.&lt;/strong> The naive 2×2 in Section 6.1 collapsed the design at the median opening year (2017). Recompute it collapsing at 2014 and at 2019. How much does the blended ATT move, and why does the choice of collapse year matter for a staggered design but not for a single-date one?&lt;/li>
&lt;/ol>
&lt;h2 id="16-references">16. References&lt;/h2>
&lt;ol>
&lt;li>Huang, G., Wang, M., &amp;amp; Xu, H. (2026). The socioeconomic impacts of industrial parks in Ethiopia. &lt;em>Journal of Urban Economics&lt;/em>. &lt;a href="https://doi.org/10.1016/j.jue.2026.103867" target="_blank" rel="noopener">https://doi.org/10.1016/j.jue.2026.103867&lt;/a>&lt;/li>
&lt;li>Callaway, B., &amp;amp; Sant&amp;rsquo;Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. &lt;em>Journal of Econometrics, 225&lt;/em>(2), 200–230. &lt;a href="https://doi.org/10.1016/j.jeconom.2020.12.001" target="_blank" rel="noopener">https://doi.org/10.1016/j.jeconom.2020.12.001&lt;/a>&lt;/li>
&lt;li>Sun, L., &amp;amp; Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. &lt;em>Journal of Econometrics, 225&lt;/em>(2), 175–199. &lt;a href="https://doi.org/10.1016/j.jeconom.2020.09.006" target="_blank" rel="noopener">https://doi.org/10.1016/j.jeconom.2020.09.006&lt;/a>&lt;/li>
&lt;li>Borusyak, K., Jaravel, X., &amp;amp; Spiess, J. (2024). Revisiting event-study designs: Robust and efficient estimation. &lt;em>Review of Economic Studies, 91&lt;/em>(6), 3253–3285. &lt;a href="https://doi.org/10.1093/restud/rdae007" target="_blank" rel="noopener">https://doi.org/10.1093/restud/rdae007&lt;/a>&lt;/li>
&lt;li>Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. &lt;em>Journal of Econometrics, 225&lt;/em>(2), 254–277. &lt;a href="https://doi.org/10.1016/j.jeconom.2021.03.014" target="_blank" rel="noopener">https://doi.org/10.1016/j.jeconom.2021.03.014&lt;/a>&lt;/li>
&lt;li>Conley, T. G. (1999). GMM estimation with cross-sectional dependence. &lt;em>Journal of Econometrics, 92&lt;/em>(1), 1–45. &lt;a href="https://doi.org/10.1016/S0304-4076%2898%2900084-0" target="_blank" rel="noopener">https://doi.org/10.1016/S0304-4076(98)00084-0&lt;/a>&lt;/li>
&lt;li>&lt;code>pyfixest&lt;/code> documentation — &lt;a href="https://pyfixest.org/" target="_blank" rel="noopener">https://pyfixest.org/&lt;/a>&lt;/li>
&lt;li>&lt;code>diff-diff&lt;/code> documentation — &lt;a href="https://github.com/igerber/diff-diff" target="_blank" rel="noopener">https://github.com/igerber/diff-diff&lt;/a>&lt;/li>
&lt;li>Ethiopia Demographic and Health Surveys (DHS), 2000–2019 — The DHS Program, ICF / Ethiopian Public Health Institute. &lt;a href="https://dhsprogram.com/" target="_blank" rel="noopener">https://dhsprogram.com/&lt;/a>&lt;/li>
&lt;li>Chen, Z., Yu, B., Yang, C., et al. (2021). An extended time series (2000–2018) of global NPP-VIIRS-like nighttime light data. &lt;em>Earth System Science Data, 13&lt;/em>(3), 889–906. &lt;a href="https://doi.org/10.5194/essd-13-889-2021" target="_blank" rel="noopener">https://doi.org/10.5194/essd-13-889-2021&lt;/a>&lt;/li>
&lt;li>Zhang, X., Liu, L., Zhao, T., et al. (2022). GISD30: Global 30-m impervious-surface dynamic dataset. &lt;em>Earth System Science Data, 14&lt;/em>(4), 1831–1856. &lt;a href="https://doi.org/10.5194/essd-14-1831-2022" target="_blank" rel="noopener">https://doi.org/10.5194/essd-14-1831-2022&lt;/a>&lt;/li>
&lt;/ol>
&lt;p>&lt;em>This tutorial is a teaching replication built on synthetic data; see the data note in Section 1 and the reproduction audit in Section 13. The companion &lt;code>script.py&lt;/code> regenerates every figure and table.&lt;/em>&lt;/p>
&lt;hr>
&lt;style>
.podcast-overlay {
display: none;
position: fixed;
bottom: 0;
left: 0;
right: 0;
z-index: 9999;
animation: podSlideUp 0.35s ease-out;
}
@keyframes podSlideUp {
from { transform: translateY(100%); }
to { transform: translateY(0); }
}
.podcast-overlay.pod-closing {
animation: podSlideDown 0.3s ease-in forwards;
}
@keyframes podSlideDown {
from { transform: translateY(0); }
to { transform: translateY(100%); }
}
.podcast-container {
background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%);
padding: 18px 24px 20px;
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
box-shadow: 0 -4px 32px rgba(0,0,0,0.5);
border-top: 1px solid rgba(106,155,204,0.2);
}
.podcast-inner {
max-width: 800px;
margin: 0 auto;
}
.podcast-top-row {
display: flex;
align-items: center;
gap: 14px;
margin-bottom: 14px;
}
.podcast-icon {
width: 42px;
height: 42px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 10px;
display: flex;
align-items: center;
justify-content: center;
flex-shrink: 0;
}
.podcast-icon svg {
width: 22px;
height: 22px;
fill: #fff;
}
.podcast-title-block {
flex: 1;
min-width: 0;
}
.podcast-title-block h4 {
margin: 0 0 1px 0;
color: #f0ece2;
font-size: 14px;
font-weight: 600;
letter-spacing: 0.02em;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.podcast-title-block span {
color: #8b9dc3;
font-size: 11px;
}
.podcast-close-btn {
background: none;
border: none;
cursor: pointer;
padding: 6px;
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.2s;
flex-shrink: 0;
}
.podcast-close-btn:hover {
background: rgba(255,255,255,0.1);
}
.podcast-close-btn svg {
width: 20px;
height: 20px;
fill: #8b9dc3;
}
.podcast-progress-wrap {
margin-bottom: 12px;
}
.podcast-time-row {
display: flex;
justify-content: space-between;
font-size: 11px;
color: #8b9dc3;
margin-bottom: 5px;
font-variant-numeric: tabular-nums;
}
.podcast-bar-bg {
width: 100%;
height: 6px;
background: rgba(255,255,255,0.1);
border-radius: 3px;
cursor: pointer;
position: relative;
overflow: hidden;
transition: height 0.15s;
}
.podcast-bar-buffered {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: rgba(106,155,204,0.25);
border-radius: 3px;
transition: width 0.3s;
}
.podcast-bar-progress {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: linear-gradient(90deg, #6a9bcc, #00d4c8);
border-radius: 3px;
transition: width 0.1s linear;
}
.podcast-bar-bg:hover {
height: 10px;
margin-top: -2px;
}
.podcast-controls-row {
display: flex;
align-items: center;
justify-content: space-between;
}
.podcast-transport {
display: flex;
align-items: center;
gap: 8px;
}
.podcast-btn {
background: none;
border: none;
cursor: pointer;
padding: 4px;
display: flex;
align-items: center;
justify-content: center;
border-radius: 50%;
transition: all 0.2s;
}
.podcast-btn svg {
fill: #c8d0e0;
transition: fill 0.2s;
}
.podcast-btn:hover svg {
fill: #f0ece2;
}
.podcast-btn-skip {
position: relative;
}
.podcast-btn-skip span {
position: absolute;
font-size: 7px;
font-weight: 700;
color: #c8d0e0;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
pointer-events: none;
margin-top: 1px;
}
.podcast-btn-play {
width: 48px;
height: 48px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 50%;
box-shadow: 0 3px 12px rgba(217,119,87,0.4);
transition: all 0.2s;
}
.podcast-btn-play:hover {
transform: scale(1.08);
box-shadow: 0 5px 20px rgba(217,119,87,0.5);
}
.podcast-btn-play svg {
fill: #fff;
width: 22px;
height: 22px;
}
.podcast-extras {
display: flex;
align-items: center;
gap: 10px;
}
.podcast-volume-wrap {
display: flex;
align-items: center;
gap: 5px;
}
.podcast-volume-wrap svg {
fill: #8b9dc3;
width: 16px;
height: 16px;
cursor: pointer;
flex-shrink: 0;
}
.podcast-volume-wrap svg:hover {
fill: #c8d0e0;
}
.podcast-volume-slider {
-webkit-appearance: none;
appearance: none;
width: 60px;
height: 4px;
background: rgba(255,255,255,0.12);
border-radius: 2px;
outline: none;
cursor: pointer;
}
.podcast-volume-slider::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: 12px;
height: 12px;
background: #6a9bcc;
border-radius: 50%;
cursor: pointer;
}
.podcast-speed-btn {
background: rgba(255,255,255,0.08);
border: 1px solid rgba(255,255,255,0.12);
color: #c8d0e0;
font-size: 11px;
font-weight: 600;
padding: 3px 9px;
border-radius: 12px;
cursor: pointer;
transition: all 0.2s;
font-family: inherit;
min-width: 40px;
text-align: center;
}
.podcast-speed-btn:hover {
background: rgba(106,155,204,0.2);
border-color: #6a9bcc;
color: #f0ece2;
}
.podcast-download-btn {
background: none;
border: 1px solid rgba(255,255,255,0.12);
border-radius: 8px;
padding: 4px 10px;
cursor: pointer;
display: flex;
align-items: center;
gap: 4px;
color: #8b9dc3;
font-size: 11px;
font-family: inherit;
text-decoration: none;
transition: all 0.2s;
}
.podcast-download-btn:hover {
border-color: #6a9bcc;
color: #f0ece2;
background: rgba(106,155,204,0.1);
}
.podcast-download-btn svg {
width: 14px;
height: 14px;
fill: currentColor;
}
@media (max-width: 600px) {
.podcast-container { padding: 14px 16px 16px; }
.podcast-volume-wrap { display: none; }
.podcast-title-block h4 { font-size: 13px; }
.podcast-extras { gap: 8px; }
}
&lt;/style>
&lt;div class="podcast-overlay" id="podOverlay">
&lt;div class="podcast-container">
&lt;div class="podcast-inner">
&lt;audio id="podAudio" preload="none" src="https://files.catbox.moe/a6xlu2.m4a">&lt;/audio>
&lt;div class="podcast-top-row">
&lt;div class="podcast-icon">
&lt;svg viewBox="0 0 24 24">&lt;path d="M12 1a5 5 0 0 0-5 5v4a5 5 0 0 0 10 0V6a5 5 0 0 0-5-5zm0 16a7 7 0 0 1-7-7H3a9 9 0 0 0 8 8.94V22h2v-3.06A9 9 0 0 0 21 10h-2a7 7 0 0 1-7 7z"/>&lt;/svg>
&lt;/div>
&lt;div class="podcast-title-block">
&lt;h4>AI Podcast: Do Industrial Parks Work?&lt;/h4>
&lt;span id="podDurationLabel">Click play to load&lt;/span>
&lt;/div>
&lt;button class="podcast-close-btn" onclick="podClose()" title="Close player">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"/>&lt;/svg>
&lt;/button>
&lt;/div>
&lt;div class="podcast-progress-wrap">
&lt;div class="podcast-time-row">
&lt;span id="podCurrent">0:00&lt;/span>
&lt;span id="podDuration">0:00&lt;/span>
&lt;/div>
&lt;div class="podcast-bar-bg" id="podBarBg" onclick="podSeek(event)">
&lt;div class="podcast-bar-buffered" id="podBuffered">&lt;/div>
&lt;div class="podcast-bar-progress" id="podProgress">&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class="podcast-controls-row">
&lt;div class="podcast-transport">
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(-15)" title="Back 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1L7 6l5 5V7c3.31 0 6 2.69 6 6s-2.69 6-6 6-6-2.69-6-6H4c0 4.42 3.58 8 8 8s8-3.58 8-8-3.58-8-8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-play" id="podPlayBtn" onclick="podToggle()" title="Play">
&lt;svg id="podIconPlay" viewBox="0 0 24 24">&lt;path d="M8 5v14l11-7z"/>&lt;/svg>
&lt;svg id="podIconPause" viewBox="0 0 24 24" style="display:none">&lt;path d="M6 19h4V5H6v14zm8-14v14h4V5h-4z"/>&lt;/svg>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(15)" title="Forward 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1l5 5-5 5V7c-3.31 0-6 2.69-6 6s2.69 6 6 6 6-2.69 6-6h2c0 4.42-3.58 8-8 8s-8-3.58-8-8 3.58-8 8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;/div>
&lt;div class="podcast-extras">
&lt;div class="podcast-volume-wrap">
&lt;svg id="podVolIcon" onclick="podMute()" viewBox="0 0 24 24">&lt;path d="M3 9v6h4l5 5V4L7 9H3zm13.5 3A4.5 4.5 0 0 0 14 8.5v7a4.47 4.47 0 0 0 2.5-3.5zM14 3.23v2.06a6.51 6.51 0 0 1 0 13.42v2.06A8.51 8.51 0 0 0 14 3.23z"/>&lt;/svg>
&lt;input type="range" class="podcast-volume-slider" id="podVolume" min="0" max="1" step="0.05" value="0.8">
&lt;/div>
&lt;button class="podcast-speed-btn" id="podSpeedBtn" onclick="podCycleSpeed()" title="Playback speed">1x&lt;/button>
&lt;a class="podcast-download-btn" href="https://files.catbox.moe/a6xlu2.m4a" target="_blank" rel="noopener" title="Stream">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 9h-4V3H9v6H5l7 7 7-7zM5 18v2h14v-2H5z"/>&lt;/svg>
&lt;/a>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;script>
(function(){
var overlay = document.getElementById('podOverlay');
var a = document.getElementById('podAudio');
var speeds = [0.75, 1, 1.25, 1.5, 2];
var si = 1;
var opened = false;
function fmt(s){
if(isNaN(s)) return '0:00';
var m=Math.floor(s/60), sec=Math.floor(s%60);
return m+':'+(sec&lt;10?'0':'')+sec;
}
document.addEventListener('click', function(e){
var link = e.target.closest('a.btn-page-header');
if(!link) return;
var text = link.textContent.trim();
if(text.indexOf('AI Podcast') === -1) return;
e.preventDefault();
e.stopPropagation();
overlay.style.display = 'block';
overlay.classList.remove('pod-closing');
if(!opened){
a.preload = 'metadata';
a.load();
opened = true;
}
});
a.volume = 0.8;
a.addEventListener('loadedmetadata', function(){
document.getElementById('podDuration').textContent = fmt(a.duration);
document.getElementById('podDurationLabel').textContent = fmt(a.duration) + ' minutes';
});
a.addEventListener('timeupdate', function(){
document.getElementById('podCurrent').textContent = fmt(a.currentTime);
var pct = a.duration ? (a.currentTime/a.duration)*100 : 0;
document.getElementById('podProgress').style.width = pct+'%';
});
a.addEventListener('progress', function(){
if(a.buffered.length>0){
var pct = (a.buffered.end(a.buffered.length-1)/a.duration)*100;
document.getElementById('podBuffered').style.width = pct+'%';
}
});
a.addEventListener('ended', function(){
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
});
window.podToggle = function(){
if(a.paused){a.play();document.getElementById('podIconPlay').style.display='none';document.getElementById('podIconPause').style.display='';}
else{a.pause();document.getElementById('podIconPlay').style.display='';document.getElementById('podIconPause').style.display='none';}
};
window.podSkip = function(s){a.currentTime = Math.max(0,Math.min(a.duration||0,a.currentTime+s));};
window.podSeek = function(e){
var rect = document.getElementById('podBarBg').getBoundingClientRect();
var pct = (e.clientX - rect.left)/rect.width;
a.currentTime = pct * (a.duration||0);
};
window.podMute = function(){
a.muted = !a.muted;
document.getElementById('podVolume').value = a.muted ? 0 : a.volume;
};
window.podCycleSpeed = function(){
si = (si+1) % speeds.length;
a.playbackRate = speeds[si];
document.getElementById('podSpeedBtn').textContent = speeds[si]+'x';
};
window.podClose = function(){
overlay.classList.add('pod-closing');
setTimeout(function(){ overlay.style.display='none'; }, 300);
a.pause();
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
};
document.getElementById('podVolume').addEventListener('input', function(){
a.volume = this.value;
a.muted = false;
});
if(window.location.hash === '#podcast-player'){
overlay.style.display = 'block';
a.preload = 'metadata';
a.load();
opened = true;
}
})();
&lt;/script></description></item><item><title>Staggered Synthetic Difference-in-Differences (SDID) in Stata: Gender Quotas and Women in Parliament</title><link>https://carlos-mendez.org/post/stata_sdid_staggered/</link><pubDate>Sun, 07 Jun 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/stata_sdid_staggered/</guid><description>&lt;h2 id="abstract">Abstract&lt;/h2>
&lt;p>Most real-world policies are not adopted on a single clock — parliamentary gender quotas, minimum-wage laws, and carbon taxes arrive in different units in different years, a staggered-adoption design where naive two-way fixed-effects difference-in-differences quietly breaks by using already-treated units as controls and placing negative weights on some effects. This tutorial extends synthetic difference-in-differences (SDID) to staggered adoption and applies it in Stata to a question in political economy: do parliamentary gender quotas raise the share of women in national parliaments? It uses the &lt;code>quota_example&lt;/code> dataset distributed with the &lt;code>sdid&lt;/code> package (Bhalotra, Clarke, Gomes &amp;amp; Venkataramani, 2023) — a balanced panel of 119 countries observed annually from 1990 to 2015 (3,094 observations), in which 9 countries adopt a quota across 7 cohorts (2000, 2002, 2003, 2005, 2010, 2012, 2013) and 110 remain never-treated. The method estimates a separate, clean SDID per cohort against the never-treated donor pool, then aggregates the cohort effects into the overall ATT with non-negative treated-period-share weights, complemented by the &lt;code>sdid_event&lt;/code> event study and bootstrap, jackknife, and placebo inference. The overall ATT is +8.03 percentage points (SE 3.74, p = 0.032), robust to a log-GDP control (8.05 optimized, 8.06 projected), but the cohort effects swing from −3.5 to +21.8 points, with flat pre-adoption placebos supporting parallel synthetic trends and dynamic effects that appear immediately and persist for over a decade. The lesson is that a single headline number summarizes real heterogeneity, and that transparent, non-negative cohort weighting is essential when treatment timing is staggered.&lt;/p>
&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>In a &lt;a href="https://carlos-mendez.org/post/stata_sdid/">previous tutorial&lt;/a>, one unit — California — adopted one policy — Proposition 99 — in one year — 1989. That &lt;strong>block design&lt;/strong> is the textbook setting for synthetic difference-in-differences (SDID). But most real policies do not arrive on a single clock. Parliamentary gender quotas, minimum-wage laws, carbon taxes, and clean-air regulations are adopted by &lt;strong>different units in different years&lt;/strong>. This is the &lt;strong>staggered adoption&lt;/strong> design, and it is where naive panel methods quietly break.&lt;/p>
&lt;p>This tutorial extends SDID to staggered adoption and applies it in Stata to a real question in political economy: &lt;strong>do parliamentary gender quotas raise the share of women in national parliaments?&lt;/strong> We use the &lt;code>quota_example&lt;/code> dataset that ships with the &lt;code>sdid&lt;/code> package — 119 countries observed annually from 1990 to 2015, in which 9 countries adopt a gender quota across 7 different cohorts (2000, 2002, 2003, 2005, 2010, 2012, and 2013).&lt;/p>
&lt;p>The headline is a story about heterogeneity. The overall effect of quotas is about &lt;strong>+8 percentage points&lt;/strong> of women in parliament, but the cohort-by-cohort effects swing from &lt;strong>−3.5 to +21.8 points&lt;/strong>. A single number hides that range — and, as we will see, the naive two-way fixed-effects regression that most people reach for first can hide even more.&lt;/p>
&lt;details>
&lt;summary>&lt;b>Why does staggered timing break the naive regression?&lt;/b> (click to expand)&lt;/summary>
&lt;p>The workhorse for panel policy evaluation is the &lt;strong>two-way fixed-effects (TWFE)&lt;/strong> regression — unit dummies, time dummies, and a treatment dummy. With one adoption date it estimates a clean difference-in-differences. With &lt;em>staggered&lt;/em> timing and &lt;em>heterogeneous&lt;/em> effects, the same regression implicitly uses &lt;strong>already-treated units as controls for later adopters&lt;/strong> (&amp;ldquo;forbidden comparisons&amp;rdquo;). The result is a variance-weighted average of every 2×2 comparison in the panel, and some of those weights can be &lt;strong>negative&lt;/strong> — so the estimate can even take the wrong sign (Goodman-Bacon, 2021; de Chaisemartin &amp;amp; D&amp;rsquo;Haultfœuille, 2020). Staggered SDID sidesteps this by estimating a &lt;strong>separate, clean&lt;/strong> SDID effect for each adoption cohort and aggregating with transparent, non-negative weights.&lt;/p>
&lt;/details>
&lt;pre>&lt;code class="language-mermaid">graph TD
subgraph &amp;quot;Block design — predecessor (Prop 99)&amp;quot;
B1[&amp;quot;California&amp;lt;br/&amp;gt;adopts 1989&amp;quot;] --&amp;gt; BATT[&amp;quot;one ATT&amp;quot;]
B2[&amp;quot;other states&amp;lt;br/&amp;gt;never treated&amp;quot;] --&amp;gt; BATT
end
subgraph &amp;quot;Staggered design — this post (gender quotas)&amp;quot;
S1[&amp;quot;cohort 2000&amp;quot;] --&amp;gt; SATT[&amp;quot;aggregate ATT&amp;quot;]
S2[&amp;quot;cohort 2002&amp;quot;] --&amp;gt; SATT
S3[&amp;quot;cohorts 2003 to 2013&amp;quot;] --&amp;gt; SATT
SC[&amp;quot;110 never-treated&amp;lt;br/&amp;gt;controls&amp;quot;] -.donor pool.-&amp;gt; SATT
end
style B1 fill:#d97757,stroke:#141413,color:#fff
style B2 fill:#6a9bcc,stroke:#141413,color:#fff
style BATT fill:#00d4c8,stroke:#141413,color:#141413
style S1 fill:#d97757,stroke:#141413,color:#fff
style S2 fill:#d97757,stroke:#141413,color:#fff
style S3 fill:#d97757,stroke:#141413,color:#fff
style SC fill:#6a9bcc,stroke:#141413,color:#fff
style SATT fill:#00d4c8,stroke:#141413,color:#141413
&lt;/code>&lt;/pre>
&lt;h3 id="11-learning-objectives">1.1 Learning objectives&lt;/h3>
&lt;p>By the end of this tutorial you will be able to:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Explain&lt;/strong> why staggered adoption breaks naive TWFE difference-in-differences, and how per-cohort SDID avoids the forbidden-comparison problem.&lt;/li>
&lt;li>&lt;strong>Derive&lt;/strong> the SDID estimator from first principles — unit weights $\omega$, time weights $\lambda$, and the weighted two-way fixed-effects objective — and the rule that aggregates cohort-specific effects $\hat{\tau}_a$ into one overall ATT.&lt;/li>
&lt;li>&lt;strong>Estimate&lt;/strong> the effect of gender quotas with &lt;code>sdid&lt;/code> on a staggered panel, add a covariate two different ways (&lt;code>optimized&lt;/code> vs &lt;code>projected&lt;/code>), and choose among bootstrap, jackknife, and placebo inference.&lt;/li>
&lt;li>&lt;strong>Read&lt;/strong> an SDID event-study plot produced by &lt;code>sdid_event&lt;/code>, distinguishing pre-trend placebo coefficients from post-period dynamic effects.&lt;/li>
&lt;/ul>
&lt;h2 id="2-key-concepts-at-a-glance">2. Key concepts at a glance&lt;/h2>
&lt;p>Each card gives a plain-language &lt;strong>definition&lt;/strong>, a concrete &lt;strong>example&lt;/strong> from this quota study, and an everyday &lt;strong>analogy&lt;/strong>. Open any term that is unfamiliar.&lt;/p>
&lt;details>
&lt;summary>&lt;b>1. ATT (average treatment effect on the treated)&lt;/b> — the question we actually answer.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> The effect of adopting a quota on the women-in-parliament share, &lt;em>in the countries that adopted one&lt;/em>, averaged over their post-adoption years. It is not the effect a quota would have everywhere — only where one was actually tried.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> Our headline ATT is &lt;strong>+8.0 percentage points&lt;/strong>: across the nine adopting countries, quotas raised women&amp;rsquo;s parliamentary share by about eight points relative to their no-quota counterfactual.&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> Like asking &amp;ldquo;how much did the patients who &lt;em>took&lt;/em> the drug improve?&amp;rdquo; — not &amp;ldquo;how much would everyone improve?&amp;rdquo; You measure only the units that were actually treated.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>2. Synthetic control&lt;/b> — a made-to-order comparison country.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> A weighted blend of never-treated &amp;ldquo;donor&amp;rdquo; countries, built so its pre-adoption path mimics the treated cohort. It stands in for the unobservable counterfactual: what the cohort&amp;rsquo;s outcome &lt;em>would&lt;/em> have been without a quota.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> The 2002 cohort&amp;rsquo;s synthetic control mixes dozens of donors (Belgium, Paraguay, Cuba, …) so that, before 2002, the blend tracks the cohort&amp;rsquo;s trend — then keeps going as the cohort would have without the law.&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> A stunt double cast to match the lead actor&amp;rsquo;s build and movement — close enough that, in the shots you cannot film the star, the double stands in convincingly.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>3. Unit weights (ω)&lt;/b> — how much each donor counts.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> Non-negative weights, one per donor country, summing to one, that build the synthetic control. Each cohort gets its own ω.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> In the 2000 cohort, 80 donors receive nonzero weight — Argentina ≈ 0.061, Guatemala ≈ 0.057, Austria ≈ 0.045 — a &lt;em>diffuse&lt;/em> blend rather than one or two stand-ins.&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> A recipe calling for many ingredients in small, precise amounts: no single one dominates, so the dish survives a bad batch of any one ingredient.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>4. Time weights (λ)&lt;/b> — which "before" years matter.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> Non-negative weights on the pre-adoption years, summing to one, that decide which pre-periods define the baseline. They up-weight the years most like the post-period.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> For the 2002 cohort, λ concentrates on the late 1990s and 2001 rather than spreading evenly across 1990–2001 — the recent past is the relevant baseline.&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> Forecasting tomorrow&amp;rsquo;s weather, you trust last week far more than the same date five years ago. Time weights formalize &amp;ldquo;recent and similar counts more.&amp;rdquo;&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>5. Adoption cohort (a)&lt;/b> — units that switch on together.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> The set of countries that first adopt a quota in the same calendar year. Staggered SDID runs one self-contained SDID per cohort, always against the never-treated controls.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> There are seven cohorts — 2000, 2002, 2003, 2005, 2010, 2012, 2013 — with two countries each in 2002 and 2003, and one in the rest.&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> School graduating classes: the &amp;ldquo;class of 2002&amp;rdquo; and the &amp;ldquo;class of 2010&amp;rdquo; share a start date and are analyzed as groups, even though all attend the same school.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>6. Staggered adoption &amp;amp; the forbidden comparison&lt;/b> — why the naive regression breaks.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> Staggered adoption means units are treated at different times. The hazard: a two-way fixed-effects regression can use &lt;em>already-treated&lt;/em> units as controls for &lt;em>later&lt;/em> adopters — a &amp;ldquo;forbidden comparison&amp;rdquo; that places negative weights on some effects and can flip the sign.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> When the 2012 cohort adopts, a naive TWFE quietly treats the 2002 cohort — already treated, already changed — as part of its control group. Staggered SDID never does this: each cohort is compared only to the 110 never-treated countries.&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> Timing a late runner against runners who already crossed the line and slowed to a walk — your &amp;ldquo;control&amp;rdquo; is contaminated because it has already run the race.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>7. Event time (relative period)&lt;/b> — every cohort on its own clock.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> Time measured relative to each cohort&amp;rsquo;s &lt;em>own&lt;/em> adoption year (… −2, −1, 0, +1 …), so cohorts that adopted in different calendar years can be lined up and averaged.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> Event time 0 is the year 2000 for the first cohort but 2013 for the last; re-centring lets us ask &amp;ldquo;what happens three years &lt;em>after&lt;/em> a quota?&amp;rdquo; across all cohorts at once.&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> Comparing marathon runners by their own start gun, not the wall clock: a runner who started at 9:05 and one who started at 9:20 are both &amp;ldquo;at mile 10&amp;rdquo; measured from their own start.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>8. ATT aggregation&lt;/b> — from many cohort effects to one number.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> The overall ATT is a weighted average of the cohort effects, each weighted by its share of treated unit-by-post-period observations — earlier, longer-exposed, larger cohorts count more.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> The seven cohort effects span &lt;strong>−3.5 to +21.8&lt;/strong>; weighted by treated country-years they average to &lt;strong>+8.0&lt;/strong> (the plain unweighted mean would be ≈ 7.0).&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> A course grade that weights the final exam more than a pop quiz: the cohorts you observe for longer carry more of the final mark.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>9. Pre-trend placebo test&lt;/b> — the assumption you can see.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> Event-study coefficients for the &lt;em>pre-adoption&lt;/em> periods. If treated and synthetic-control countries moved in parallel before treatment, these sit near zero — a falsification check.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> For the 2002 cohort, all twelve pre-period placebos fall in &lt;strong>[−0.2, +0.8]&lt;/strong> points — flat, so we cannot reject parallel synthetic trends.&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> Checking a scale by weighing nothing first: if it does not read zero when empty, you distrust every later reading. Flat placebos are that &amp;ldquo;reads zero when empty&amp;rdquo; check.&lt;/p>
&lt;/details>
&lt;details>
&lt;summary>&lt;b>10. Bootstrap, jackknife, placebo&lt;/b> — three rulers for uncertainty.&lt;/summary>
&lt;p>&lt;strong>Definition.&lt;/strong> Three ways to attach a standard error to the ATT. With many treated units all three are available; they share one point estimate but report different spread.&lt;/p>
&lt;p>&lt;strong>Example.&lt;/strong> On the two-cohort subsample the ATT is &lt;strong>10.3&lt;/strong> for all three, but the SE is &lt;strong>4.7&lt;/strong> (bootstrap), &lt;strong>6.0&lt;/strong> (jackknife, most conservative), and &lt;strong>2.3&lt;/strong> (placebo, tightest).&lt;/p>
&lt;p>&lt;strong>Analogy.&lt;/strong> Measuring a table with a tape, a folding ruler, and a laser: they agree on the length but disagree on the error bars — the cautious carpenter reports the widest.&lt;/p>
&lt;/details>
&lt;h2 id="3-the-data-gender-quotas-across-119-countries">3. The data: gender quotas across 119 countries&lt;/h2>
&lt;p>We use &lt;code>quota_example.dta&lt;/code>, the balanced panel from Bhalotra, Clarke, Gomes &amp;amp; Venkataramani (2023) distributed with the &lt;code>sdid&lt;/code> package. The outcome is the percentage of seats held by women in the national parliament; the treatment is the adoption of a reserved-seat gender quota; the covariate is log GDP per capita.&lt;/p>
&lt;pre>&lt;code class="language-stata">webuse set www.damianclarke.net/stata/
webuse quota_example, clear
label variable quota &amp;quot;Parliamentary gender quota&amp;quot;
xtset country year
codebook country year quota womparl lngdp, compact
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Variable Obs Unique Mean Min Max Label
----------------------------------------------------------------------------
country 3094 119 . . . Country
year 3094 26 2002.5 1990 2015 Year
quota 3094 2 .0303814 0 1 =1 if country has a gender quota
womparl 3094 449 14.96531 0 63.8 Women in parliament
lngdp 2990 2956 9.154291 5.8701 11.61789 log(GDP)
----------------------------------------------------------------------------
&lt;/code>&lt;/pre>
&lt;p>The panel is &lt;strong>balanced&lt;/strong>: 119 countries times 26 years equals 3,094 observations, with no gaps in the outcome or treatment (&lt;code>lngdp&lt;/code> has 104 missing values, which will matter only when we add the covariate). The treatment indicator &lt;code>quota&lt;/code> equals one for just 3% of observations, a reminder that treated country-years are scarce. Crucially, &lt;code>quota&lt;/code> is &lt;strong>absorbing&lt;/strong> — once a country adopts a quota it stays treated — which SDID requires.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>Role&lt;/th>
&lt;th>Symbol&lt;/th>
&lt;th>Description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>country&lt;/code>&lt;/td>
&lt;td>unit&lt;/td>
&lt;td>$i$&lt;/td>
&lt;td>119 countries (9 ever-treated, 110 never-treated)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>year&lt;/code>&lt;/td>
&lt;td>time&lt;/td>
&lt;td>$t$&lt;/td>
&lt;td>1990–2015 (26 years)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>womparl&lt;/code>&lt;/td>
&lt;td>outcome&lt;/td>
&lt;td>$Y_{it}$&lt;/td>
&lt;td>% women in the national parliament&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>quota&lt;/code>&lt;/td>
&lt;td>treatment&lt;/td>
&lt;td>$W_{it}$&lt;/td>
&lt;td>1 once a country has a quota, 0 before / never&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>lngdp&lt;/code>&lt;/td>
&lt;td>covariate&lt;/td>
&lt;td>$X_{it}$&lt;/td>
&lt;td>log GDP per capita&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>The estimand.&lt;/strong> Our target is the &lt;strong>average treatment effect on the treated (ATT)&lt;/strong>: the effect of adopting a quota on the women-in-parliament share &lt;em>in the countries that adopted one&lt;/em>, averaged over their post-adoption years. Formally,&lt;/p>
&lt;p>$$
\tau = \frac{1}{N_{tr}\, T_{post}} \sum_{i:\, W_i = 1}\ \sum_{t &amp;gt; T_{pre}} \left[\, Y_{it}(1) - Y_{it}(0) \,\right]
$$&lt;/p>
&lt;p>In words: for every treated country and every post-adoption year, take the gap between the share of women &lt;em>with&lt;/em> a quota, $Y_{it}(1)$, and the share that &lt;em>would have occurred without one&lt;/em>, $Y_{it}(0)$ — then average. The first term is observed; the second is the counterfactual that the synthetic control must impute, because we never see a quota-adopting country in the parallel world where it abstained.&lt;/p>
&lt;p>&lt;strong>An observational, not experimental, setting.&lt;/strong> Quotas are not randomly assigned. Countries that adopt them early may differ systematically — they may be wealthier, more democratic, or already on a rising trajectory of women&amp;rsquo;s representation. That is exactly why we need a method that builds a &lt;em>credible counterfactual&lt;/em> from comparison countries rather than assuming a simple before/after change would have held. Identification rests on assumptions we will keep visible: that treated and synthetic-control countries share a &lt;strong>common (synthetic) trend&lt;/strong> absent treatment, &lt;strong>no anticipation&lt;/strong> of the quota, &lt;strong>no spillovers&lt;/strong> across countries, and that adoption timing is not itself driven by the outcome&amp;rsquo;s future path.&lt;/p>
&lt;h3 id="31-the-staggered-structure">3.1 The staggered structure&lt;/h3>
&lt;p>Before modelling, let us see the timing directly. The adoption year is the first year a country is treated; we tabulate the cohorts.&lt;/p>
&lt;pre>&lt;code class="language-stata">bysort country (year): egen firsttreat = min(cond(quota==1, year, .))
preserve
keep country firsttreat
duplicates drop
tab firsttreat, missing
restore
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> firsttreat | Freq. Percent Cum.
------------+-----------------------------------
2000 | 1 0.84 0.84
2002 | 2 1.68 2.52
2003 | 2 1.68 4.20
2005 | 1 0.84 5.04
2010 | 1 0.84 5.88
2012 | 1 0.84 6.72
2013 | 1 0.84 7.56
. | 110 92.44 100.00
------------+-----------------------------------
Total | 119 100.00
&lt;/code>&lt;/pre>
&lt;p>Nine countries adopt a quota, spread across &lt;strong>seven cohorts&lt;/strong>; the 2002 and 2003 cohorts contain two countries each, the rest one. The remaining &lt;strong>110 countries are never treated&lt;/strong> — they form the donor pool from which every cohort&amp;rsquo;s synthetic control is built. This staircase of adoption dates is the defining feature of a staggered design, and the reason a single &amp;ldquo;post&amp;rdquo; dummy is too blunt.&lt;/p>
&lt;h2 id="4-exploratory-analysis-with-panelview">4. Exploratory analysis with &lt;code>panelview&lt;/code>&lt;/h2>
&lt;p>A staggered design is best understood by &lt;em>looking&lt;/em> at it. The &lt;code>panelview&lt;/code> command (Xu &amp;amp; Hua) draws two pictures we need: a heatmap of &lt;em>who is treated when&lt;/em>, and the raw outcome trajectories colored by treatment status.&lt;/p>
&lt;pre>&lt;code class="language-stata">ssc install panelview, replace
panelview womparl quota, i(country) t(year) type(treat) bytiming
panelview womparl quota, i(country) t(year) type(outcome)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_sdid_staggered_panelview_treat.png" alt="Treatment-timing heatmap: countries sorted by adoption year reveal the staggered staircase">&lt;/p>
&lt;p>The treatment heatmap (&lt;code>type(treat)&lt;/code>, sorted with &lt;code>bytiming&lt;/code>) makes the staggered structure unmistakable: the dark treated cells appear in the &lt;strong>top-right corner as a staircase&lt;/strong>, each step a different cohort switching on between 2000 and 2013, against a sea of never-treated controls. This is the visual opposite of a block design, where every treated cell would switch on in the same column.&lt;/p>
&lt;p>&lt;img src="stata_sdid_staggered_panelview_outcome.png" alt="Outcome trajectories: treated countries (orange) against the control spaghetti (blue)">&lt;/p>
&lt;p>The outcome plot (&lt;code>type(outcome)&lt;/code>) overlays all 119 women-in-parliament series, with the 9 treated countries in orange. Several treated countries start near the bottom of the distribution and climb steeply after their adoption year — a hint of a positive effect — but the climbs begin at different times, and a few treated countries barely move. No single &amp;ldquo;treated average&amp;rdquo; line could summarize this; we need cohort-specific counterfactuals.&lt;/p>
&lt;pre>&lt;code class="language-stata">collapse (mean) womparl, by(evertreat year)
* ... reshape and plot ever- vs never-adopting means ...
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_sdid_staggered_raw_trends.png" alt="Mean outcome: ever-adopting vs never-adopting countries">&lt;/p>
&lt;p>Collapsing to group means tells a cautionary tale. The ever-adopting countries (orange) start the 1990s &lt;strong>below&lt;/strong> the never-adopting countries (about 4% vs 10% women in parliament) and end &lt;strong>above&lt;/strong> them by 2015 (about 23% vs 22%). A naive eyeball difference-in-differences on these two lines would be badly confounded: the groups began at different levels and the &amp;ldquo;treated&amp;rdquo; line aggregates countries that switched on in seven different years. The raw means motivate the machinery to come — we must compare each cohort to a &lt;em>tailored&lt;/em> synthetic control, not to the grand average.&lt;/p>
&lt;h2 id="5-synthetic-difference-in-differences-from-first-principles">5. Synthetic difference-in-differences from first principles&lt;/h2>
&lt;p>Before tackling staggered timing, fix ideas with a single cohort. SDID (Arkhangelsky et al., 2021) is a &lt;strong>weighted two-way fixed-effects regression&lt;/strong>. It chooses an ATT, a constant, unit fixed effects, and time fixed effects to minimize a weighted sum of squared residuals:&lt;/p>
&lt;p>$$
\left(\hat{\tau}, \hat{\mu}, \hat{\alpha}, \hat{\beta}\right) = \arg\min_{\tau,\mu,\alpha,\beta} \sum_{i=1}^{N} \sum_{t=1}^{T} \left(Y_{it} - \mu - \alpha_i - \beta_t - W_{it}\,\tau\right)^{2}\, \hat{\omega}_i\, \hat{\lambda}_t
$$&lt;/p>
&lt;p>In words: run a difference-in-differences regression, but weight each observation by a &lt;strong>unit weight&lt;/strong> $\hat{\omega}_i$ times a &lt;strong>time weight&lt;/strong> $\hat{\lambda}_t$. Here $\alpha_i$ is a country fixed effect, $\beta_t$ a year fixed effect, $W_{it}$ the treatment dummy, and $\tau$ the ATT we want. Set all weights equal and you recover ordinary DiD; the weights are what make SDID special. They are not free parameters — each solves its own optimization.&lt;/p>
&lt;p>The &lt;strong>unit weights&lt;/strong> are chosen so that a weighted blend of control countries tracks the treated cohort across the pre-period:&lt;/p>
&lt;p>$$
\hat{\omega} = \arg\min_{\omega_0,\, \omega \ge 0} \sum_{t=1}^{T_{pre}} \left(\omega_0 + \sum_{i=1}^{N_{co}} \omega_i\, Y_{it} - \frac{1}{N_{tr}} \sum_{i=1}^{N_{tr}} Y_{it}\right)^{2} + \zeta^{2}\, T_{pre}\, \lVert \omega \rVert^{2}
$$&lt;/p>
&lt;p>The bracketed term asks the synthetic control $\sum_i \omega_i Y_{it}$ (plus an intercept $\omega_0$) to match the treated average in every pre-adoption year. The intercept $\omega_0$ is the SDID twist: it lets the synthetic match the treated &lt;em>trend&lt;/em> without matching its &lt;em>level&lt;/em>, because any constant level gap is later absorbed by the unit fixed effect $\alpha_i$. The final term is a &lt;strong>ridge penalty&lt;/strong> with regularization strength $\zeta$; it spreads weight across many donors instead of concentrating it on a few, which stabilizes the estimate. (Synthetic control, by contrast, drops $\omega_0$ and the penalty and must match the level too.)&lt;/p>
&lt;p>The &lt;strong>time weights&lt;/strong> are the mirror image — they pick the pre-period years that best predict each control country&amp;rsquo;s post-period average:&lt;/p>
&lt;p>$$
\hat{\lambda} = \arg\min_{\lambda_0,\, \lambda \ge 0} \sum_{i=1}^{N_{co}} \left(\lambda_0 + \sum_{t=1}^{T_{pre}} \lambda_t\, Y_{it} - \frac{1}{T_{post}} \sum_{t=T_{pre}+1}^{T} Y_{it}\right)^{2} + \zeta_{\lambda}^{2}\, N_{co}\, \lVert \lambda \rVert^{2}
$$&lt;/p>
&lt;p>Years that look most like the post-period get the most weight, so the &amp;ldquo;before&amp;rdquo; comparison is built from the most relevant history rather than a flat average over possibly-irrelevant early years. The two weighting schemes together are what distinguish SDID from its cousins, as the table summarizes.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Method&lt;/th>
&lt;th>Unit weights $\omega$&lt;/th>
&lt;th>Time weights $\lambda$&lt;/th>
&lt;th>Unit FE $\alpha_i$&lt;/th>
&lt;th>Must match&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>DiD&lt;/strong>&lt;/td>
&lt;td>uniform&lt;/td>
&lt;td>uniform&lt;/td>
&lt;td>yes&lt;/td>
&lt;td>trend on &lt;em>all&lt;/em> controls&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Synthetic control&lt;/strong>&lt;/td>
&lt;td>optimized&lt;/td>
&lt;td>uniform&lt;/td>
&lt;td>&lt;strong>no&lt;/strong>&lt;/td>
&lt;td>level &lt;em>and&lt;/em> trend&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>SDID&lt;/strong>&lt;/td>
&lt;td>optimized&lt;/td>
&lt;td>optimized&lt;/td>
&lt;td>yes&lt;/td>
&lt;td>trend (level gap allowed)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="6-the-staggered-extension-per-cohort-effects-and-their-aggregation">6. The staggered extension: per-cohort effects and their aggregation&lt;/h2>
&lt;p>Staggered SDID is a disarmingly simple idea: &lt;strong>do the single-cohort analysis once per adoption cohort, then average.&lt;/strong> For each cohort $a$, take only that cohort&amp;rsquo;s treated countries plus the pure never-treated controls, solve the SDID problem above on that sub-panel to get its own $\hat{\omega}_a$, $\hat{\lambda}_a$, and cohort effect $\hat{\tau}_a$. Because each cohort is compared &lt;strong>only to never-treated controls&lt;/strong>, an already-treated unit is never used as a control for a later adopter — precisely the contamination that breaks naive TWFE.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
POOL[&amp;quot;110 never-treated&amp;lt;br/&amp;gt;controls (donor pool)&amp;quot;]
C1[&amp;quot;Cohort 2000&amp;lt;br/&amp;gt;+ controls&amp;quot;]
C2[&amp;quot;Cohort 2002&amp;lt;br/&amp;gt;+ controls&amp;quot;]
CD[&amp;quot;Cohorts 2003…2013&amp;lt;br/&amp;gt;+ controls&amp;quot;]
T1[&amp;quot;SDID &amp;amp;rarr; &amp;amp;tau;&amp;lt;sub&amp;gt;2000&amp;lt;/sub&amp;gt; = 8.4&amp;quot;]
T2[&amp;quot;SDID &amp;amp;rarr; &amp;amp;tau;&amp;lt;sub&amp;gt;2002&amp;lt;/sub&amp;gt; = 7.0&amp;quot;]
TD[&amp;quot;SDID &amp;amp;rarr; &amp;amp;tau;&amp;lt;sub&amp;gt;a&amp;lt;/sub&amp;gt;&amp;lt;br/&amp;gt;(&amp;amp;minus;3.5 … +21.8)&amp;quot;]
ATT[&amp;quot;Aggregate ATT = 8.0&amp;lt;br/&amp;gt;weighted by treated periods&amp;quot;]
POOL --&amp;gt; C1 --&amp;gt; T1 --&amp;gt; ATT
POOL --&amp;gt; C2 --&amp;gt; T2 --&amp;gt; ATT
POOL --&amp;gt; CD --&amp;gt; TD --&amp;gt; ATT
style POOL fill:#6a9bcc,stroke:#141413,color:#fff
style C1 fill:#d97757,stroke:#141413,color:#fff
style C2 fill:#d97757,stroke:#141413,color:#fff
style CD fill:#d97757,stroke:#141413,color:#fff
style T1 fill:#1f2b5e,stroke:#6a9bcc,color:#fff
style T2 fill:#1f2b5e,stroke:#6a9bcc,color:#fff
style TD fill:#1f2b5e,stroke:#6a9bcc,color:#fff
style ATT fill:#00d4c8,stroke:#141413,color:#141413
&lt;/code>&lt;/pre>
&lt;p>The overall ATT aggregates the cohort effects with &lt;strong>non-negative&lt;/strong> weights equal to each cohort&amp;rsquo;s share of treated unit-by-post-period observations:&lt;/p>
&lt;p>$$
\widehat{ATT} = \sum_{a \in \mathcal{A}} \frac{N_{tr}^{a}\, T_{post}^{a}}{\sum_{b \in \mathcal{A}} N_{tr}^{b}\, T_{post}^{b}}\ \hat{\tau}_a
$$&lt;/p>
&lt;p>In words: a cohort counts in proportion to how many treated country-years it contributes. The 2000 cohort, treated for 16 years (2000–2015), carries more weight than the 2013 cohort, treated for only 3. This is the staggered generalization of single-cohort SDID, and — unlike TWFE — every weight is positive and interpretable. (When each cohort has one treated unit, this reduces to the post-period share $T_{post}^{a}/T_{post}$ from Clarke et al., 2024.)&lt;/p>
&lt;h2 id="7-estimation-in-stata">7. Estimation in Stata&lt;/h2>
&lt;p>One command does the whole staggered procedure. We request bootstrap inference and a fixed seed for reproducibility.&lt;/p>
&lt;pre>&lt;code class="language-stata">sdid womparl country year quota, vce(bootstrap) seed(1213)
matrix list e(tau)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Synthetic Difference-in-Differences Estimator
-----------------------------------------------------------------------------
womparl | ATT Std. Err. t P&amp;gt;|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------
quota | 8.03410 3.74040 2.15 0.032 0.70305 15.36516
-----------------------------------------------------------------------------
&lt;/code>&lt;/pre>
&lt;p>The overall &lt;strong>ATT is +8.03 percentage points&lt;/strong> (SE 3.74, $t=2.15$, $p=0.032$), with a 95% confidence interval of [0.70, 15.37] that excludes zero. Substantively: adopting a parliamentary gender quota raises the share of women in parliament by about &lt;strong>eight percentage points&lt;/strong> in the adopting countries — a large effect against a sample mean of 15%, and statistically distinguishable from no effect at the 5% level.&lt;/p>
&lt;p>The single number, though, is the average of a very heterogeneous set of cohort effects, returned in &lt;code>e(tau)&lt;/code>:&lt;/p>
&lt;pre>&lt;code class="language-text">T[7,3]
Tau Std.Err. Time
r1 8.3888685 .68278345 2000
r2 6.9677465 .64102999 2002
r3 13.952256 9.1289943 2003
r4 -3.4505431 .75603453 2005
r5 2.7490355 .44799502 2010
r6 21.762716 .91589982 2012
r7 -.82032354 .83151601 2013
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_sdid_staggered_cohort_taus.png" alt="Cohort-specific SDID effects with 95% confidence intervals and the aggregate ATT">&lt;/p>
&lt;p>The cohort effects span an enormous range: from &lt;strong>−3.5 points&lt;/strong> (2005 cohort) to &lt;strong>+21.8 points&lt;/strong> (2012 cohort), with the 2003 cohort essentially uninformative (SE 9.13, a confidence interval that runs from −4 to +32). The teal line marks the aggregate ATT of 8.0. Notice that this aggregate is &lt;strong>not&lt;/strong> the simple average of the seven cohort effects — that average would be about 7.0. It is the &lt;em>treated-period-weighted&lt;/em> average from the aggregation formula, which up-weights the earlier, longer-exposed 2000, 2002, and 2003 cohorts. The lesson of the figure is that &amp;ldquo;+8 points on average&amp;rdquo; is a summary of real heterogeneity, not a universal constant; some quotas were transformative, others did nothing measurable.&lt;/p>
&lt;p>To see the synthetic-control machinery underneath one cohort, the figure below plots the 2002 cohort against its synthetic control. Because SDID matches the pre-period &lt;em>trend&lt;/em> and lets the unit fixed effect absorb the &lt;em>level&lt;/em> gap, we anchor the synthetic to the treated cohort by its $\lambda$-weighted pre-period gap so the two align before adoption.&lt;/p>
&lt;p>&lt;img src="stata_sdid_staggered_cohort2002_path.png" alt="SDID counterfactual for the 2002 cohort (synthetic anchored to the treated pre-period)">&lt;/p>
&lt;p>The treated 2002 cohort (orange) and its anchored synthetic control (blue dashed) track each other closely &lt;strong>before 2002&lt;/strong> — the synthetic was built precisely to do so — and then diverge: the treated cohort climbs to roughly 15% women in parliament while the synthetic counterfactual reaches only about 9–10%. That post-2002 gap is the cohort effect, about +7 points, matching $\hat{\tau}_{2002}=6.97$ from &lt;code>e(tau)&lt;/code>.&lt;/p>
&lt;p>Which pre-period years anchor that comparison? The time weights $\hat{\lambda}_t$ for the 2002 cohort do not spread evenly over 1990–2001 — they concentrate on the years just before adoption.&lt;/p>
&lt;p>&lt;img src="stata_sdid_staggered_lambda.png" alt="SDID pre-period time weights (λ) for the 2002 cohort">&lt;/p>
&lt;p>The bars show SDID&amp;rsquo;s baseline for the 2002 cohort leaning on the late 1990s and 2001 — the pre-adoption years whose level most resembles the post-adoption period — rather than weighting all twelve pre-years equally as a plain difference-in-differences would. This is the time-weighting half of SDID at work: it builds the &amp;ldquo;before&amp;rdquo; from the most relevant history, which is also the baseline the event study below measures against.&lt;/p>
&lt;h2 id="8-adding-a-covariate-optimized-vs-projected">8. Adding a covariate: optimized vs projected&lt;/h2>
&lt;p>Does the quota effect simply reflect economic development — richer countries both grow GDP and elect more women? We can condition on log GDP per capita. The &lt;code>sdid&lt;/code> command offers two routes, and SDID needs a balanced panel, so we first drop the country-years with missing &lt;code>lngdp&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-stata">drop if missing(lngdp)
sdid womparl country year quota, vce(bootstrap) seed(2022) covariates(lngdp, optimized)
sdid womparl country year quota, vce(bootstrap) seed(1213) covariates(lngdp, projected)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">SDID + lngdp (optimized) ATT = 8.0515 SE = 3.0466
SDID + lngdp (projected) ATT = 8.0593 SE = 3.1191
&lt;/code>&lt;/pre>
&lt;p>The two methods differ in &lt;em>how&lt;/em> they estimate the covariate&amp;rsquo;s coefficient. The &lt;strong>optimized&lt;/strong> method (Arkhangelsky et al., 2021) folds the covariate adjustment into the SDID optimization itself, estimating it jointly with the weights — flexible but computationally heavy. The &lt;strong>projected&lt;/strong> method (Kranz, 2022) instead regresses the outcome on the covariate among the &lt;em>untreated&lt;/em> observations first, then runs SDID on the residuals — much faster and numerically more stable. Reassuringly, here they agree to the second decimal: &lt;strong>8.05 and 8.06&lt;/strong>, essentially unchanged from the no-covariate estimate of 8.03. Controlling for income does &lt;strong>not&lt;/strong> explain away the quota effect; the result is robust to the most obvious confounder.&lt;/p>
&lt;h2 id="9-the-event-study-with-sdid_event">9. The event study with &lt;code>sdid_event&lt;/code>&lt;/h2>
&lt;p>A single ATT — even per cohort — cannot tell us &lt;em>when&lt;/em> the effect appears, or whether treated and control countries were already diverging &lt;em>before&lt;/em> the quota. For that we need an &lt;strong>event study&lt;/strong>: the treatment effect traced out by years relative to adoption. The modern &lt;code>sdid_event&lt;/code> command (Ciccia, Clarke &amp;amp; Pailañir, 2024) computes exactly this for SDID, including pre-period &lt;strong>placebo&lt;/strong> estimates that serve as a parallel-trends test.&lt;/p>
&lt;p>The dynamic effect at event time $\ell$ is the treated-minus-synthetic gap in that period, &lt;em>net of the same gap at baseline&lt;/em>, where — characteristically for SDID — the baseline is the $\lambda$-weighted pre-period average rather than a single &amp;ldquo;year −1&amp;rdquo;:&lt;/p>
&lt;p>$$
\delta_{\ell} = \left(\bar{Y}_{\ell}^{,tr} - \bar{Y}_{\ell}^{,co}\right) - \left(\bar{Y}_{base}^{,tr} - \bar{Y}_{base}^{,co}\right), \qquad \bar{Y}_{base}^{,g} = \sum_{t=1}^{T_{pre}} \hat{\lambda}_t\, \bar{Y}_t^{,g}
$$&lt;/p>
&lt;p>&lt;code>sdid_event&lt;/code> handles the full staggered panel directly, returning a cohort-aggregated ATT plus dynamic effects. To read the dynamics transparently we focus the &lt;em>plot&lt;/em> on the 2002 cohort — the package authors&amp;rsquo; own worked example — which gives a clean event-time axis; the full-panel call confirms the same aggregated ATT (≈ 8.06).&lt;/p>
&lt;pre>&lt;code class="language-stata">ssc install sdid_event, replace
* full staggered panel: aggregated ATT + cohort-aggregated dynamic effects
sdid_event womparl country year quota, vce(bootstrap) brep(100) effects(8) placebo(5) covariates(lngdp)
* clean event study on the 2002 cohort, with all placebos
keep if quotaYear==2002 | quotaYear==.
sdid_event womparl country year quota, vce(placebo) brep(100) placebo(all) covariates(lngdp)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> | Estimate SE LB CI UB CI Switchers
-------------+------------------------------------------------------
ATT | 6.853472 3.372744 .2428928 13.46405 2
Effect_1 | 4.086404 1.191517 1.75103 6.421778 2
Effect_2 | 9.164442 1.522799 6.179756 12.14913 2
Effect_3 | 7.938504 2.182572 3.660663 12.21635 2
... |
Placebo_1 | -.218417 .470226 -1.14006 .703227 2
Placebo_2 | .242148 .884557 -1.491584 1.975880 2
... |
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_sdid_staggered_event_study.png" alt="Event-study SDID for the 2002 cohort: flat placebos before adoption, rising effects after">&lt;/p>
&lt;p>This plot rewards careful reading, and there are three things to look for.&lt;/p>
&lt;p>&lt;strong>First, the baseline is $\lambda$-weighted, not &amp;ldquo;the year before.&amp;rdquo;&lt;/strong> Unlike a textbook event study that normalizes to $t=-1$, SDID measures everything against the optimally weighted pre-period average. That is why the zero line is a &lt;em>weighted&lt;/em> baseline; do not read it as the single pre-adoption year.&lt;/p>
&lt;p>&lt;strong>Second, the points to the &lt;em>left&lt;/em> of zero are placebo tests.&lt;/strong> Every pre-adoption coefficient (&lt;code>Placebo_1&lt;/code> through &lt;code>Placebo_12&lt;/code>, event times −1 to −12) sits within a whisker of zero — ranging only from about −0.2 to +0.8. Because the treated cohort and its synthetic control moved in parallel &lt;em>before&lt;/em> 2002, we cannot reject that the parallel-(synthetic-)trends assumption holds. This is the identifying assumption made visible and, here, survived.&lt;/p>
&lt;p>&lt;strong>Third, the points to the &lt;em>right&lt;/em> of zero are the dynamic ATT.&lt;/strong> The effect appears immediately at adoption (&lt;code>Effect_1&lt;/code> = +4.1 points at event time 0), roughly doubles within a year or two (&lt;code>Effect_2&lt;/code> = +9.2), and then settles in the +6 to +9 range for over a decade. Quotas do not just shift the level once; they sustain a higher share of women in parliament. Aggregated by the same treated-period logic as before, these dynamic effects reproduce the cohort&amp;rsquo;s overall ATT of about +7 points — but the plot shows the &lt;em>shape&lt;/em> the single number conceals.&lt;/p>
&lt;h2 id="10-inference-bootstrap-jackknife-and-placebo">10. Inference: bootstrap, jackknife, and placebo&lt;/h2>
&lt;p>With one treated unit (California), the previous tutorial could only use placebo/permutation inference. With &lt;strong>nine&lt;/strong> treated units here, all three of &lt;code>sdid&lt;/code>&amp;rsquo;s variance estimators are on the table. To keep the comparison clean — jackknife needs more than one treated unit &lt;em>per adoption period&lt;/em> — we follow Clarke et al. (2024) and restrict to the two-country 2002 and 2003 cohorts by dropping the five single-country cohorts.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
Q1{&amp;quot;How many&amp;lt;br/&amp;gt;treated units?&amp;quot;}
Q1 --&amp;gt;|&amp;quot;One (e.g. California)&amp;quot;| PL1[&amp;quot;Placebo only&amp;lt;br/&amp;gt;jackknife undefined&amp;quot;]
Q1 --&amp;gt;|&amp;quot;Many (e.g. 9 quota adopters)&amp;quot;| Q2{&amp;quot;More controls than treated?&amp;lt;br/&amp;gt;no singleton cohorts?&amp;quot;}
Q2 --&amp;gt;|&amp;quot;Yes&amp;quot;| ALL[&amp;quot;All three available&amp;quot;]
Q2 --&amp;gt;|&amp;quot;Singleton cohorts&amp;quot;| PL2[&amp;quot;Placebo / bootstrap&amp;lt;br/&amp;gt;jackknife drops out&amp;quot;]
ALL --&amp;gt; BOOT[&amp;quot;bootstrap&amp;lt;br/&amp;gt;SE 4.7 (default)&amp;quot;]
ALL --&amp;gt; JACK[&amp;quot;jackknife&amp;lt;br/&amp;gt;SE 6.0 (most conservative)&amp;quot;]
ALL --&amp;gt; PLAC[&amp;quot;placebo&amp;lt;br/&amp;gt;SE 2.3 (homoskedastic)&amp;quot;]
style Q1 fill:#141413,stroke:#6a9bcc,color:#fff
style Q2 fill:#141413,stroke:#6a9bcc,color:#fff
style PL1 fill:#d97757,stroke:#141413,color:#fff
style PL2 fill:#d97757,stroke:#141413,color:#fff
style ALL fill:#00d4c8,stroke:#141413,color:#141413
style BOOT fill:#6a9bcc,stroke:#141413,color:#fff
style JACK fill:#6a9bcc,stroke:#141413,color:#fff
style PLAC fill:#6a9bcc,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-stata">drop if inlist(country,&amp;quot;Algeria&amp;quot;,&amp;quot;Kenya&amp;quot;,&amp;quot;Samoa&amp;quot;,&amp;quot;Swaziland&amp;quot;,&amp;quot;Tanzania&amp;quot;)
sdid womparl country year quota, vce(bootstrap) seed(1213)
sdid womparl country year quota, vce(placebo) seed(1213)
sdid womparl country year quota, vce(jackknife)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">method att se ci_l ci_u
bootstrap 10.33066 4.7291 1.0618 19.5995
placebo 10.33066 2.3404 5.7436 14.9178
jackknife 10.33066 6.0056 -1.4401 22.1014
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_sdid_staggered_inference.png" alt="Same ATT, three variance estimators">&lt;/p>
&lt;p>The point estimate is &lt;strong>identical&lt;/strong> across all three methods — 10.33 points on this subsample — because the inference procedure changes only the &lt;em>standard error&lt;/em>, never the estimate. But the standard errors differ by a factor of nearly three: &lt;strong>jackknife is the most conservative&lt;/strong> (SE 6.01, a confidence interval that crosses zero), &lt;strong>placebo is the tightest&lt;/strong> (SE 2.34) but rests on a homoskedasticity assumption and requires more controls than treated units, and &lt;strong>bootstrap sits in between&lt;/strong> (SE 4.73) and is the default. The practical takeaway: with only a handful of treated units, report the bootstrap as your headline but cross-check it — a result that is &amp;ldquo;significant&amp;rdquo; under placebo but not under jackknife deserves caution. (The subsample ATT of 10.3 is larger than the full-sample 8.0 because dropping the five single-country cohorts discards the negative 2005 and 2013 effects.)&lt;/p>
&lt;h2 id="11-robustness-and-discussion">11. Robustness and discussion&lt;/h2>
&lt;p>Three caveats keep the result honest. &lt;strong>Effect concentration:&lt;/strong> the +8 aggregate leans heavily on a few cohorts — the 2012 cohort alone contributes a +21.8 effect, and the early 2000/2002/2003 cohorts carry most of the aggregation weight. Drop the 2012 cohort and the average falls noticeably. &lt;strong>Fragile counterfactuals:&lt;/strong> with only 110 controls and as few as one treated country per cohort, some synthetic controls are imprecise — the 2003 cohort&amp;rsquo;s standard error of 9.13 is the tell. &lt;strong>Identifying assumptions:&lt;/strong> SDID still requires no anticipation, an absorbing treatment, no cross-country spillovers, and that quota timing is not itself a response to the outcome&amp;rsquo;s trajectory; the flat event-study placebos support, but cannot prove, the parallel-trends part. Finally, &lt;code>quota_example&lt;/code> is a teaching subset of Bhalotra et al. (2023); these numbers illustrate the &lt;em>method&lt;/em>, not a final verdict on quota policy.&lt;/p>
&lt;h2 id="12-summary-and-key-takeaways">12. Summary and key takeaways&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Method.&lt;/strong> Staggered SDID estimates a &lt;em>separate, clean&lt;/em> synthetic difference-in-differences for each adoption cohort — comparing it only to never-treated controls — and aggregates the cohort effects $\hat{\tau}_a$ with non-negative, treated-period-share weights. This avoids the negative-weighting trap that contaminates naive two-way fixed-effects DiD under staggered timing.&lt;/li>
&lt;li>&lt;strong>Result.&lt;/strong> Gender quotas raise the share of women in parliament by an overall &lt;strong>ATT of +8.0 percentage points&lt;/strong> (SE 3.74, $p=0.032$), robust to a log-GDP control (8.05 optimized, 8.06 projected). Cohort effects range widely, from &lt;strong>−3.5 to +21.8 points&lt;/strong> — heterogeneity the single number hides.&lt;/li>
&lt;li>&lt;strong>Event study.&lt;/strong> The &lt;code>sdid_event&lt;/code> plot shows pre-adoption placebo coefficients near zero (parallel synthetic trends) and post-adoption effects that appear immediately and persist for over a decade — the dynamics behind the average.&lt;/li>
&lt;li>&lt;strong>Inference.&lt;/strong> With nine treated units, bootstrap, jackknife, and placebo are all available; they share one point estimate (10.3 on the two-cohort illustration) but report standard errors of 4.7, 6.0, and 2.3. Jackknife is the most conservative.&lt;/li>
&lt;li>&lt;strong>Bridge.&lt;/strong> The block design (Proposition 99, the &lt;a href="https://carlos-mendez.org/post/stata_sdid/">previous tutorial&lt;/a>) and the staggered design here are two faces of one estimator — the staggered version is just single-cohort SDID, done once per cohort and averaged.&lt;/li>
&lt;/ul>
&lt;h2 id="13-exercises">13. Exercises&lt;/h2>
&lt;ol>
&lt;li>&lt;strong>Re-aggregate by hand.&lt;/strong> Pull &lt;code>e(tau)&lt;/code> and each cohort&amp;rsquo;s treated unit-count and post-period length. Verify that the treated-period-weighted average of the seven $\hat{\tau}_a$ reproduces the overall ATT of 8.03, and show that it differs from the unweighted mean (≈ 7.0). Which cohorts move the aggregate the most?&lt;/li>
&lt;li>&lt;strong>Inference sensitivity.&lt;/strong> Re-run the full nine-country sample with &lt;code>vce(bootstrap)&lt;/code> and then &lt;code>vce(placebo)&lt;/code> at &lt;code>reps(500)&lt;/code>. How much do the standard error and confidence interval move, and which would you report given only nine treated units?&lt;/li>
&lt;li>&lt;strong>Drop the outlier cohort.&lt;/strong> Re-estimate the overall ATT excluding the 2012 cohort (the +21.8 outlier). How far does the aggregate fall, and what does that tell you about how concentrated the average effect is?&lt;/li>
&lt;/ol>
&lt;h2 id="14-references">14. References&lt;/h2>
&lt;ol>
&lt;li>Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., &amp;amp; Wager, S. (2021). &lt;a href="https://doi.org/10.1257/aer.20190159" target="_blank" rel="noopener">Synthetic Difference-in-Differences&lt;/a>. &lt;em>American Economic Review&lt;/em>, 111(12), 4088–4118.&lt;/li>
&lt;li>Clarke, D., Pailañir, D., Athey, S., &amp;amp; Imbens, G. (2024). &lt;a href="https://doi.org/10.1177/1536867X241297184" target="_blank" rel="noopener">On Synthetic Difference-in-Differences and Related Estimation Methods in Stata&lt;/a>. &lt;em>The Stata Journal&lt;/em>, 24(4). Package: &lt;code>ssc install sdid&lt;/code>.&lt;/li>
&lt;li>Ciccia, D. (2024). &lt;a href="https://arxiv.org/abs/2407.09565" target="_blank" rel="noopener">A Short Note on Event-Study Synthetic Difference-in-Differences Estimators&lt;/a>. Package: &lt;code>ssc install sdid_event&lt;/code>.&lt;/li>
&lt;li>Bhalotra, S., Clarke, D., Gomes, J. F., &amp;amp; Venkataramani, A. (2023). &lt;a href="https://doi.org/10.1093/jeea/jvad043" target="_blank" rel="noopener">Maternal Mortality and Women&amp;rsquo;s Political Power&lt;/a>. &lt;em>Journal of the European Economic Association&lt;/em>. (Source of the &lt;code>quota_example&lt;/code> data.)&lt;/li>
&lt;li>Goodman-Bacon, A. (2021). &lt;a href="https://doi.org/10.1016/j.jeconom.2021.03.014" target="_blank" rel="noopener">Difference-in-Differences with Variation in Treatment Timing&lt;/a>. &lt;em>Journal of Econometrics&lt;/em>, 225(2), 254–277.&lt;/li>
&lt;li>de Chaisemartin, C., &amp;amp; D&amp;rsquo;Haultfœuille, X. (2020). &lt;a href="https://doi.org/10.1257/aer.20181169" target="_blank" rel="noopener">Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects&lt;/a>. &lt;em>American Economic Review&lt;/em>, 110(9), 2964–2996.&lt;/li>
&lt;li>Xu, Y., &amp;amp; Hua, L. &lt;a href="https://yiqingxu.org/packages/panelview_stata/" target="_blank" rel="noopener">panelView: Visualizing Panel Data&lt;/a>. Package: &lt;code>ssc install panelview&lt;/code>.&lt;/li>
&lt;/ol>
&lt;p>&lt;em>Related tutorials on this site:&lt;/em> &lt;a href="https://carlos-mendez.org/post/stata_sdid/">Synthetic Difference-in-Differences (the block design)&lt;/a> · &lt;a href="https://carlos-mendez.org/post/stata_did/">Difference-in-Differences&lt;/a>.&lt;/p>
&lt;h2 id="15-acknowledgments">15. Acknowledgments&lt;/h2>
&lt;p>This tutorial uses the &lt;code>sdid&lt;/code> command (Clarke, Pailañir, Athey &amp;amp; Imbens), the &lt;code>sdid_event&lt;/code> command (Ciccia, Clarke &amp;amp; Pailañir), and &lt;code>panelview&lt;/code> (Xu &amp;amp; Hua). The data, &lt;code>quota_example&lt;/code>, is distributed with &lt;code>sdid&lt;/code> and draws on Bhalotra, Clarke, Gomes &amp;amp; Venkataramani (2023). All estimates were produced by the companion &lt;code>analysis.do&lt;/code> and verified against Clarke et al. (2024). AI tools (Claude Code) assisted with drafting and figure preparation; all code was executed and every number checked by the author.&lt;/p>
&lt;hr>
&lt;style>
.podcast-overlay {
display: none;
position: fixed;
bottom: 0;
left: 0;
right: 0;
z-index: 9999;
animation: podSlideUp 0.35s ease-out;
}
@keyframes podSlideUp {
from { transform: translateY(100%); }
to { transform: translateY(0); }
}
.podcast-overlay.pod-closing {
animation: podSlideDown 0.3s ease-in forwards;
}
@keyframes podSlideDown {
from { transform: translateY(0); }
to { transform: translateY(100%); }
}
.podcast-container {
background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%);
padding: 18px 24px 20px;
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
box-shadow: 0 -4px 32px rgba(0,0,0,0.5);
border-top: 1px solid rgba(106,155,204,0.2);
}
.podcast-inner {
max-width: 800px;
margin: 0 auto;
}
.podcast-top-row {
display: flex;
align-items: center;
gap: 14px;
margin-bottom: 14px;
}
.podcast-icon {
width: 42px;
height: 42px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 10px;
display: flex;
align-items: center;
justify-content: center;
flex-shrink: 0;
}
.podcast-icon svg {
width: 22px;
height: 22px;
fill: #fff;
}
.podcast-title-block {
flex: 1;
min-width: 0;
}
.podcast-title-block h4 {
margin: 0 0 1px 0;
color: #f0ece2;
font-size: 14px;
font-weight: 600;
letter-spacing: 0.02em;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.podcast-title-block span {
color: #8b9dc3;
font-size: 11px;
}
.podcast-close-btn {
background: none;
border: none;
cursor: pointer;
padding: 6px;
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.2s;
flex-shrink: 0;
}
.podcast-close-btn:hover {
background: rgba(255,255,255,0.1);
}
.podcast-close-btn svg {
width: 20px;
height: 20px;
fill: #8b9dc3;
}
.podcast-progress-wrap {
margin-bottom: 12px;
}
.podcast-time-row {
display: flex;
justify-content: space-between;
font-size: 11px;
color: #8b9dc3;
margin-bottom: 5px;
font-variant-numeric: tabular-nums;
}
.podcast-bar-bg {
width: 100%;
height: 6px;
background: rgba(255,255,255,0.1);
border-radius: 3px;
cursor: pointer;
position: relative;
overflow: hidden;
transition: height 0.15s;
}
.podcast-bar-buffered {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: rgba(106,155,204,0.25);
border-radius: 3px;
transition: width 0.3s;
}
.podcast-bar-progress {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: linear-gradient(90deg, #6a9bcc, #00d4c8);
border-radius: 3px;
transition: width 0.1s linear;
}
.podcast-bar-bg:hover {
height: 10px;
margin-top: -2px;
}
.podcast-controls-row {
display: flex;
align-items: center;
justify-content: space-between;
}
.podcast-transport {
display: flex;
align-items: center;
gap: 8px;
}
.podcast-btn {
background: none;
border: none;
cursor: pointer;
padding: 4px;
display: flex;
align-items: center;
justify-content: center;
border-radius: 50%;
transition: all 0.2s;
}
.podcast-btn svg {
fill: #c8d0e0;
transition: fill 0.2s;
}
.podcast-btn:hover svg {
fill: #f0ece2;
}
.podcast-btn-skip {
position: relative;
}
.podcast-btn-skip span {
position: absolute;
font-size: 7px;
font-weight: 700;
color: #c8d0e0;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
pointer-events: none;
margin-top: 1px;
}
.podcast-btn-play {
width: 48px;
height: 48px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 50%;
box-shadow: 0 3px 12px rgba(217,119,87,0.4);
transition: all 0.2s;
}
.podcast-btn-play:hover {
transform: scale(1.08);
box-shadow: 0 5px 20px rgba(217,119,87,0.5);
}
.podcast-btn-play svg {
fill: #fff;
width: 22px;
height: 22px;
}
.podcast-extras {
display: flex;
align-items: center;
gap: 10px;
}
.podcast-volume-wrap {
display: flex;
align-items: center;
gap: 5px;
}
.podcast-volume-wrap svg {
fill: #8b9dc3;
width: 16px;
height: 16px;
cursor: pointer;
flex-shrink: 0;
}
.podcast-volume-wrap svg:hover {
fill: #c8d0e0;
}
.podcast-volume-slider {
-webkit-appearance: none;
appearance: none;
width: 60px;
height: 4px;
background: rgba(255,255,255,0.12);
border-radius: 2px;
outline: none;
cursor: pointer;
}
.podcast-volume-slider::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: 12px;
height: 12px;
background: #6a9bcc;
border-radius: 50%;
cursor: pointer;
}
.podcast-speed-btn {
background: rgba(255,255,255,0.08);
border: 1px solid rgba(255,255,255,0.12);
color: #c8d0e0;
font-size: 11px;
font-weight: 600;
padding: 3px 9px;
border-radius: 12px;
cursor: pointer;
transition: all 0.2s;
font-family: inherit;
min-width: 40px;
text-align: center;
}
.podcast-speed-btn:hover {
background: rgba(106,155,204,0.2);
border-color: #6a9bcc;
color: #f0ece2;
}
.podcast-download-btn {
background: none;
border: 1px solid rgba(255,255,255,0.12);
border-radius: 8px;
padding: 4px 10px;
cursor: pointer;
display: flex;
align-items: center;
gap: 4px;
color: #8b9dc3;
font-size: 11px;
font-family: inherit;
text-decoration: none;
transition: all 0.2s;
}
.podcast-download-btn:hover {
border-color: #6a9bcc;
color: #f0ece2;
background: rgba(106,155,204,0.1);
}
.podcast-download-btn svg {
width: 14px;
height: 14px;
fill: currentColor;
}
@media (max-width: 600px) {
.podcast-container { padding: 14px 16px 16px; }
.podcast-volume-wrap { display: none; }
.podcast-title-block h4 { font-size: 13px; }
.podcast-extras { gap: 8px; }
}
&lt;/style>
&lt;div class="podcast-overlay" id="podOverlay">
&lt;div class="podcast-container">
&lt;div class="podcast-inner">
&lt;audio id="podAudio" preload="none" src="https://files.catbox.moe/iea7xk.m4a">&lt;/audio>
&lt;div class="podcast-top-row">
&lt;div class="podcast-icon">
&lt;svg viewBox="0 0 24 24">&lt;path d="M12 1a5 5 0 0 0-5 5v4a5 5 0 0 0 10 0V6a5 5 0 0 0-5-5zm0 16a7 7 0 0 1-7-7H3a9 9 0 0 0 8 8.94V22h2v-3.06A9 9 0 0 0 21 10h-2a7 7 0 0 1-7 7z"/>&lt;/svg>
&lt;/div>
&lt;div class="podcast-title-block">
&lt;h4>AI Podcast: Staggered Synthetic Difference-in-Differences&lt;/h4>
&lt;span id="podDurationLabel">Click play to load&lt;/span>
&lt;/div>
&lt;button class="podcast-close-btn" onclick="podClose()" title="Close player">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"/>&lt;/svg>
&lt;/button>
&lt;/div>
&lt;div class="podcast-progress-wrap">
&lt;div class="podcast-time-row">
&lt;span id="podCurrent">0:00&lt;/span>
&lt;span id="podDuration">0:00&lt;/span>
&lt;/div>
&lt;div class="podcast-bar-bg" id="podBarBg" onclick="podSeek(event)">
&lt;div class="podcast-bar-buffered" id="podBuffered">&lt;/div>
&lt;div class="podcast-bar-progress" id="podProgress">&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class="podcast-controls-row">
&lt;div class="podcast-transport">
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(-15)" title="Back 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1L7 6l5 5V7c3.31 0 6 2.69 6 6s-2.69 6-6 6-6-2.69-6-6H4c0 4.42 3.58 8 8 8s8-3.58 8-8-3.58-8-8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-play" id="podPlayBtn" onclick="podToggle()" title="Play">
&lt;svg id="podIconPlay" viewBox="0 0 24 24">&lt;path d="M8 5v14l11-7z"/>&lt;/svg>
&lt;svg id="podIconPause" viewBox="0 0 24 24" style="display:none">&lt;path d="M6 19h4V5H6v14zm8-14v14h4V5h-4z"/>&lt;/svg>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(15)" title="Forward 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1l5 5-5 5V7c-3.31 0-6 2.69-6 6s2.69 6 6 6 6-2.69 6-6h2c0 4.42-3.58 8-8 8s-8-3.58-8-8 3.58-8 8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;/div>
&lt;div class="podcast-extras">
&lt;div class="podcast-volume-wrap">
&lt;svg id="podVolIcon" onclick="podMute()" viewBox="0 0 24 24">&lt;path d="M3 9v6h4l5 5V4L7 9H3zm13.5 3A4.5 4.5 0 0 0 14 8.5v7a4.47 4.47 0 0 0 2.5-3.5zM14 3.23v2.06a6.51 6.51 0 0 1 0 13.42v2.06A8.51 8.51 0 0 0 14 3.23z"/>&lt;/svg>
&lt;input type="range" class="podcast-volume-slider" id="podVolume" min="0" max="1" step="0.05" value="0.8">
&lt;/div>
&lt;button class="podcast-speed-btn" id="podSpeedBtn" onclick="podCycleSpeed()" title="Playback speed">1x&lt;/button>
&lt;a class="podcast-download-btn" href="https://files.catbox.moe/iea7xk.m4a" target="_blank" rel="noopener" title="Stream">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 9h-4V3H9v6H5l7 7 7-7zM5 18v2h14v-2H5z"/>&lt;/svg>
&lt;/a>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;script>
(function(){
var overlay = document.getElementById('podOverlay');
var a = document.getElementById('podAudio');
var speeds = [0.75, 1, 1.25, 1.5, 2];
var si = 1;
var opened = false;
function fmt(s){
if(isNaN(s)) return '0:00';
var m=Math.floor(s/60), sec=Math.floor(s%60);
return m+':'+(sec&lt;10?'0':'')+sec;
}
document.addEventListener('click', function(e){
var link = e.target.closest('a.btn-page-header');
if(!link) return;
var text = link.textContent.trim();
if(text.indexOf('AI Podcast') === -1) return;
e.preventDefault();
e.stopPropagation();
overlay.style.display = 'block';
overlay.classList.remove('pod-closing');
if(!opened){
a.preload = 'metadata';
a.load();
opened = true;
}
});
a.volume = 0.8;
a.addEventListener('loadedmetadata', function(){
document.getElementById('podDuration').textContent = fmt(a.duration);
document.getElementById('podDurationLabel').textContent = fmt(a.duration) + ' minutes';
});
a.addEventListener('timeupdate', function(){
document.getElementById('podCurrent').textContent = fmt(a.currentTime);
var pct = a.duration ? (a.currentTime/a.duration)*100 : 0;
document.getElementById('podProgress').style.width = pct+'%';
});
a.addEventListener('progress', function(){
if(a.buffered.length>0){
var pct = (a.buffered.end(a.buffered.length-1)/a.duration)*100;
document.getElementById('podBuffered').style.width = pct+'%';
}
});
a.addEventListener('ended', function(){
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
});
window.podToggle = function(){
if(a.paused){a.play();document.getElementById('podIconPlay').style.display='none';document.getElementById('podIconPause').style.display='';}
else{a.pause();document.getElementById('podIconPlay').style.display='';document.getElementById('podIconPause').style.display='none';}
};
window.podSkip = function(s){a.currentTime = Math.max(0,Math.min(a.duration||0,a.currentTime+s));};
window.podSeek = function(e){
var rect = document.getElementById('podBarBg').getBoundingClientRect();
var pct = (e.clientX - rect.left)/rect.width;
a.currentTime = pct * (a.duration||0);
};
window.podMute = function(){
a.muted = !a.muted;
document.getElementById('podVolume').value = a.muted ? 0 : a.volume;
};
window.podCycleSpeed = function(){
si = (si+1) % speeds.length;
a.playbackRate = speeds[si];
document.getElementById('podSpeedBtn').textContent = speeds[si]+'x';
};
window.podClose = function(){
overlay.classList.add('pod-closing');
setTimeout(function(){ overlay.style.display='none'; }, 300);
a.pause();
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
};
document.getElementById('podVolume').addEventListener('input', function(){
a.volume = this.value;
a.muted = false;
});
if(window.location.hash === '#podcast-player'){
overlay.style.display = 'block';
a.preload = 'metadata';
a.load();
opened = true;
}
})();
&lt;/script></description></item></channel></rss>