<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>double-lasso | Carlos Mendez</title><link>https://carlos-mendez.org/tag/double-lasso/</link><atom:link href="https://carlos-mendez.org/tag/double-lasso/index.xml" rel="self" type="application/rss+xml"/><description>double-lasso</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>© 2018–2026 Carlos Mendez. All rights reserved.</copyright><lastBuildDate>Thu, 21 May 2026 00:00:00 +0000</lastBuildDate><image><url>https://carlos-mendez.org/media/icon_huedfae549300b4ca5d201a9bd09a3ecd5_79625_512x512_fill_lanczos_center_3.png</url><title>double-lasso</title><link>https://carlos-mendez.org/tag/double-lasso/</link></image><item><title>Double LASSO for Causal Inference: Does Abortion Reduce Crime?</title><link>https://carlos-mendez.org/post/r_double_lasso/</link><pubDate>Thu, 21 May 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/r_double_lasso/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>In 2001, John Donohue and Steven Levitt published one of the most controversial findings in modern economics: that the legalisation of abortion in the 1970s caused a sharp decline in U.S. crime rates in the 1990s. Their argument — that unwanted children are at higher risk of becoming criminals, and that abortion reduced the cohort at risk — was provocative on its own. But the empirical machinery behind it was textbook: a difference-in-differences regression on a 48-state panel with eight carefully chosen controls. Twenty-five years later, the question we ask here is not &lt;em>whether the substantive claim is true&lt;/em> — that debate goes well beyond any single regression — but &lt;em>whether the regression&amp;rsquo;s headline result survives&lt;/em> when, instead of eight hand-picked controls, we let the data choose from a library of &lt;strong>284 candidate covariates&lt;/strong> using a high-dimensional method called &lt;strong>Double LASSO&lt;/strong>.&lt;/p>
&lt;p>This post is a pedagogical replication of the empirical example in &lt;a href="#18-references">Fitzgerald, Lattimore, Robinson and Zhu&amp;rsquo;s (2026, &lt;em>Journal of Applied Econometrics&lt;/em>)&lt;/a> &amp;ldquo;Double LASSO: Replication and Practical Insights.&amp;rdquo; The paper&amp;rsquo;s primary contribution is methodological — it provides practical guidance on when Double LASSO (DL) helps for causal inference. We borrow its setting because it is one of the cleanest illustrations of the &lt;strong>n is small, p is large&lt;/strong> regime where DL is designed to shine: with 576 observations after first-differencing and 284 candidate controls, the ratio p / n is roughly one-half, exactly the regime the paper studies. Throughout, we treat the abortion-crime application as a &lt;em>case study&lt;/em> of the method, not as a primary causal claim about the substantive question.&lt;/p>
&lt;p>&lt;img src="r_double_lasso_estimates.png" alt="Forest plot of α̂ ± 95 % CI for all five estimators (First diff, OLS-full, PSL, DL-rigorous, DL-CV) facetted by outcome. LASSO methods land between the no-controls baseline and the kitchen-sink OLS.">&lt;/p>
&lt;p>The figure above is the post&amp;rsquo;s spoiler. Each row is a different estimator; each panel is a different crime outcome. The dashed vertical line is zero — to its left, the abortion-crime relationship is &lt;em>negative&lt;/em> (more abortion is associated with less crime). Two patterns jump out. First, the LASSO methods (PSL, DL-rigorous, and rigorous-CV) cluster sensibly near the original Donohue–Levitt baseline (First diff) for violent and property crime; second, &lt;strong>OLS with all 284 controls is uninterpretable&lt;/strong> — its murder estimate is +2.34 with confidence interval [1.73, 2.95], which would mean a unit increase in the abortion rate raises murder by 234 %. That impossibility is the failure mode that motivates LASSO in the first place.&lt;/p>
&lt;p>&lt;strong>Learning objectives.&lt;/strong> After working through this tutorial you will be able to:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Explain&lt;/strong> when high-dimensional methods like LASSO add value over plain OLS, and when they do not.&lt;/li>
&lt;li>&lt;strong>Implement&lt;/strong> the Belloni–Chernozhukov–Hansen Double LASSO procedure in R using &lt;code>hdm::rlasso&lt;/code> and &lt;code>glmnet::cv.glmnet&lt;/code>.&lt;/li>
&lt;li>&lt;strong>Distinguish&lt;/strong> the &lt;em>rigorous&lt;/em> and &lt;em>cross-validated&lt;/em> penalty rules for LASSO, and recognise which is appropriate for causal inference.&lt;/li>
&lt;li>&lt;strong>Compute&lt;/strong> state-clustered standard errors with the HC1 finite-sample correction by hand, and read the resulting sandwich matrix.&lt;/li>
&lt;li>&lt;strong>Diagnose&lt;/strong> the regime in which Double LASSO most helps (treatment well-predicted, outcome not), using the selection-count fingerprint |I_y| and |I_d|.&lt;/li>
&lt;li>&lt;strong>Critique&lt;/strong> the limits of DL: identification still requires conditional independence and parallel trends; LASSO does not invent variation that is not in the data.&lt;/li>
&lt;/ul>
&lt;h3 id="key-concepts-at-a-glance">Key concepts at a glance&lt;/h3>
&lt;p>The post leans on a small vocabulary repeatedly. The rest of the tutorial assumes you can move between these terms quickly. Each concept below has three parts. The &lt;strong>definition&lt;/strong> is always visible. The &lt;strong>example&lt;/strong> and &lt;strong>analogy&lt;/strong> sit behind clickable cards: open them when you need them, leave them collapsed for a quick scan. If a later section mentions &amp;ldquo;selection set&amp;rdquo; or &amp;ldquo;rigorous penalty&amp;rdquo; and the term feels slippery, this is the section to re-read.&lt;/p>
&lt;p>&lt;strong>1. LASSO&lt;/strong> $\hat\beta(\lambda) = \arg\min_\beta \frac{1}{2n}\|y - X\beta\|_2^2 + \lambda \sum_j \lvert\beta_j\rvert$. L1-penalised OLS: the absolute-value penalty produces &lt;em>exactly-zero&lt;/em> coefficients (variable selection).&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>In §6, LASSO of crime on 284 controls picks just 8 — the rest get shrunk to zero. The penalty knob $\lambda$ controls how aggressively.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>A budget that forces you to drop expensive items entirely, not just buy smaller portions.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>2. Penalty $\lambda$.&lt;/strong> The knob controlling shrinkage. Higher $\lambda$ pins more coefficients to zero. Tuning $\lambda$ is the central design choice and is what separates the rigorous and CV flavours of Double LASSO.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>The rigorous penalty for our data is around $\lambda \approx 0.1$; the CV-tuned &lt;code>lambda.min&lt;/code> is much smaller (~0.01) and keeps 143 coefficients.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>The volume knob on selection: turn it up and only the loudest signals get through.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>3. Post-Structural LASSO (PSL).&lt;/strong> One CV-LASSO with the treatment forced in via &lt;code>penalty.factor = 0&lt;/code>, then plain OLS on the selected support. The simplest one-LASSO causal estimator.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>§6: PSL keeps 3 controls for violent crime and gives $\hat\alpha = -0.157$ — close to the no-controls baseline of $-0.152$.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>Insurance + lottery: you guarantee one ticket (the treatment) and let chance pick the rest.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>4. Double LASSO (DL).&lt;/strong> Two LASSOs (y on X, d on X), union of selected controls, then post-OLS. The causal-inference-safe variant that beats PSL when controls predict $d$ but not $y$.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>§7: DL picks 8 controls for violent crime ($|I_y \cup I_d|$); $\hat\alpha = -0.096$, exactly matching the paper&amp;rsquo;s selection counts and within 0.01 of its point estimate.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>Two independent quality inspectors: you keep anything either flags as important.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>5. Selection sets $I_y$ and $I_d$.&lt;/strong> The indices of controls each LASSO step keeps. Their union $I_y \cup I_d$ is the support of the post-OLS regression. Their &lt;em>imbalance&lt;/em> is the empirical fingerprint of when DL adds value.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>For violent crime, $|I_y| = 0$ and $|I_d| = 8$. Crime is essentially unpredictable from the 284 controls; abortion is well-predicted. This is the regime where DL beats PSL.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>A movie&amp;rsquo;s lead vs. supporting cast — both lists matter but they answer different questions.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>6. Rigorous vs CV penalty.&lt;/strong> Two ways to pick $\lambda$. Rigorous: theory-based (Belloni et al. 2012) Bonferroni-style formula. CV: data-driven cross-validation minimising prediction MSE. Different objectives, different answers.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>Rigorous keeps 8 controls for violent crime; CV keeps 150. CV&amp;rsquo;s $\hat\alpha$ flips sign to $+0.019$. For causal inference, rigorous is the right choice.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>Two thermostats: one set by an engineer for system stability, the other by an algorithm chasing minimum heating cost.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>7. Post-OLS step.&lt;/strong> After LASSO selects a support, refit with plain (unshrunk) OLS to remove the shrinkage bias on $\hat\alpha$. LASSO is used only for &lt;em>selection&lt;/em>, never for the final estimate.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>All five estimators in this post — even the LASSO-based ones — produce their final $\hat\alpha$ from plain &lt;code>lm()&lt;/code> on the selected support. Without this step, $\hat\alpha$ would be biased toward zero by 10–20 %.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>LASSO is the casting director; the post-OLS is the actual film. The director picks who appears; the camera records what they do.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>8. State-clustered standard errors.&lt;/strong> HC1-adjusted sandwich variance with state-level clustering. Corrects for within-state autocorrelation that would otherwise understate the SE on a panel of state-year observations.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>§8: with $G = 48$ states, clustering inflates the SE by roughly 40 % over naïve heteroscedastic-robust. Without it, our confidence intervals would be too narrow and we would over-reject the null.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>An average across 48 dependent siblings, not 576 independent strangers.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>A note on tone. The post is calibrated for an empirical-economics graduate student who is comfortable with OLS, panel data, and clustered standard errors but has never used LASSO. Every R idiom — &lt;code>penalty.factor&lt;/code>, &lt;code>lambda.min&lt;/code>, &lt;code>rlasso&lt;/code>, &lt;code>intercept = FALSE&lt;/code> — gets one short line of plain English the first time it appears. Every coefficient is given a &amp;ldquo;a unit increase in the differenced abortion rate is associated with&amp;hellip;&amp;rdquo; gloss. The paper&amp;rsquo;s footnote-4 framework (&amp;ldquo;DL helps when the treatment is predictable from the controls but the outcome is not&amp;rdquo;) is the organising principle and we anchor it to the actual selection counts we observe.&lt;/p>
&lt;hr>
&lt;h2 id="2-the-data">2. The data&lt;/h2>
&lt;p>We use the exact panel that &lt;a href="#18-references">Belloni, Chernozhukov and Hansen (2014)&lt;/a> compiled from &lt;a href="#18-references">Donohue and Levitt&amp;rsquo;s (2001)&lt;/a> original replication archive: &lt;strong>48 U.S. states × 12 years (1986–1997) after first-differencing the raw 13-year 1985–1997 panel, giving 576 observations.&lt;/strong> First-differencing absorbs state fixed effects (anything that does not vary over time within a state — culture, geography, long-run institutions). Year fixed effects are absorbed in a separate pre-processing step using the Frisch–Waugh–Lovell projection, which we say more about in §7. By the time the analysis script sees the data, both fixed-effect adjustments are done, so the LASSO regressions below contain no time dummies.&lt;/p>
&lt;p>The treatment $d$ is the &lt;strong>effective abortion rate&lt;/strong> — a weighted average of past abortion-to-birth ratios, lagged to match the ages at which crime is most prevalent. The three outcomes $y$ are state-level &lt;strong>violent crime, property crime, and murder rates&lt;/strong>, each first-differenced. The candidate-control matrix $X$ has &lt;strong>284 columns&lt;/strong>: it expands Donohue–Levitt&amp;rsquo;s original 8 controls into squares, two-way interactions, time interactions, lagged levels, within-state means, and initial-value × time-trend interactions, then screens for multicollinearity. The 284-control specification is the Belloni-et-al. extension we replicate.&lt;/p>
&lt;p>For reproducibility, the data lives in the post&amp;rsquo;s &lt;code>data/&lt;/code> folder and is loaded over HTTPS from the GitHub raw URL. No local Matlab files needed.&lt;/p>
&lt;p>&lt;strong>Code chunk 1 — Loading the data:&lt;/strong>&lt;/p>
&lt;pre>&lt;code class="language-r">BASE_URL &amp;lt;- &amp;quot;https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/r_double_lasso/data/&amp;quot;
read_remote &amp;lt;- function(filename, check.names = TRUE) {
read.csv(paste0(BASE_URL, filename), check.names = check.names,
stringsAsFactors = FALSE)
}
state &amp;lt;- read_remote(&amp;quot;levitt_state.csv&amp;quot;)$state
linear &amp;lt;- read_remote(&amp;quot;levitt_linear.csv&amp;quot;) # raw first differences
partialled &amp;lt;- read_remote(&amp;quot;levitt_partialled.csv&amp;quot;) # after year-FE partialling
ctrl_viol &amp;lt;- read_remote(&amp;quot;levitt_controls_viol.csv&amp;quot;, check.names = FALSE)
ctrl_prop &amp;lt;- read_remote(&amp;quot;levitt_controls_prop.csv&amp;quot;, check.names = FALSE)
ctrl_murd &amp;lt;- read_remote(&amp;quot;levitt_controls_murd.csv&amp;quot;, check.names = FALSE)
&lt;/code>&lt;/pre>
&lt;p>Six CSVs, six lines. The &lt;code>check.names = FALSE&lt;/code> argument preserves the original variable names — which include characters like &lt;code>^&lt;/code> and &lt;code>*&lt;/code> from the original Matlab code that R&amp;rsquo;s default sanitiser would mangle. The &lt;code>state&lt;/code> vector holds the cluster identifier for the state-clustered standard errors we compute later; it takes integer values 1 through 48 with each state appearing exactly 12 times (one per differenced year).&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>File&lt;/th>
&lt;th>Shape&lt;/th>
&lt;th>What it contains&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>levitt_state.csv&lt;/code>&lt;/td>
&lt;td>576 × 1&lt;/td>
&lt;td>State cluster id (1–48) for each observation&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>levitt_linear.csv&lt;/code>&lt;/td>
&lt;td>576 × 7&lt;/td>
&lt;td>Raw first-differences of the outcomes and treatment&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>levitt_partialled.csv&lt;/code>&lt;/td>
&lt;td>576 × 7&lt;/td>
&lt;td>Outcomes and treatment after year-FE absorption&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>levitt_controls_viol.csv&lt;/code>&lt;/td>
&lt;td>576 × 284&lt;/td>
&lt;td>Control matrix $Z_v$ for the violent-crime equation&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>levitt_controls_prop.csv&lt;/code>&lt;/td>
&lt;td>576 × 284&lt;/td>
&lt;td>Control matrix $Z_p$ for the property-crime equation&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>levitt_controls_murd.csv&lt;/code>&lt;/td>
&lt;td>576 × 284&lt;/td>
&lt;td>Control matrix $Z_m$ for the murder equation&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The dimensions matter for the LASSO methods that follow. We are in the &lt;strong>moderate-dimensional&lt;/strong> regime: $p = 284$ is large but smaller than $n = 576$, so OLS is technically feasible but unstable, and LASSO is the natural tool to discipline the variable selection.&lt;/p>
&lt;hr>
&lt;h2 id="3-five-estimators-in-plain-language">3. Five estimators in plain language&lt;/h2>
&lt;p>Five regression procedures appear in this post, each with a different attitude toward how many controls to keep. We summarise the cast here so you can navigate the rest of the article. The table below gives the recipe; the sections that follow walk through each one in detail.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Estimator&lt;/th>
&lt;th>Recipe in one sentence&lt;/th>
&lt;th>Number of controls used&lt;/th>
&lt;th>Section&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>First-difference OLS&lt;/strong>&lt;/td>
&lt;td>Regress differenced crime on differenced abortion with &lt;strong>no&lt;/strong> controls — the original Donohue–Levitt 1993 specification.&lt;/td>
&lt;td>0&lt;/td>
&lt;td>§4&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>OLS (full)&lt;/strong>&lt;/td>
&lt;td>Add all 284 controls and let the matrix algebra sort it out.&lt;/td>
&lt;td>284&lt;/td>
&lt;td>§5&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>PSL&lt;/strong> (Post-Structural LASSO)&lt;/td>
&lt;td>One LASSO with the treatment forced in via &lt;code>penalty.factor = 0&lt;/code>, then plain OLS on the selected support.&lt;/td>
&lt;td>3 / 12 / 0 (varies by outcome)&lt;/td>
&lt;td>§6&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>DL (rigorous)&lt;/strong>&lt;/td>
&lt;td>Two LASSOs (y on X, d on X) with the Belloni-et-al. theory-based penalty; refit OLS on the &lt;strong>union&lt;/strong> of selected variables.&lt;/td>
&lt;td>8 / 12 / 9&lt;/td>
&lt;td>§7&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>DL (CV)&lt;/strong>&lt;/td>
&lt;td>Same recipe as DL-rigorous but each LASSO uses 3-fold cross-validation to pick lambda.&lt;/td>
&lt;td>150 / 109 / 161&lt;/td>
&lt;td>§10&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Two pairs of estimators do most of the pedagogical work. First-diff vs. OLS-full is the &lt;em>control-count&lt;/em> contrast (no controls vs. too many controls), showing why we need disciplined selection. DL-rigorous vs. DL-CV is the &lt;em>penalty-rule&lt;/em> contrast (theory vs. data-driven), showing that the choice of lambda can flip a coefficient&amp;rsquo;s sign. PSL sits in between as the simplest one-LASSO benchmark — it gets reasonable numbers but it has a causal-inference blind spot that motivates the move to Double LASSO.&lt;/p>
&lt;hr>
&lt;h2 id="4-first-difference-ols--the-no-controls-baseline">4. First-difference OLS — the no-controls baseline&lt;/h2>
&lt;p>The original Donohue–Levitt 1993 specification regresses differenced crime on differenced abortion with no controls beyond first-differencing itself:&lt;/p>
&lt;p>$$
\Delta y_{st} = \alpha \, \Delta d_{st} + \varepsilon_{st}.
$$&lt;/p>
&lt;p>Here, $\Delta y_{st}$ is the change in the crime rate for state $s$ from year $t-1$ to $t$, $\Delta d_{st}$ is the change in the effective abortion rate, and $\varepsilon_{st}$ is the regression error. The parameter $\alpha$ is the &lt;strong>average partial effect of the differenced abortion rate on the differenced crime rate&lt;/strong>, identified under (i) conditional independence given the differenced trajectories and (ii) parallel trends in levels. We use state-clustered standard errors throughout (more on this in §8) because observations within a state are autocorrelated through governor effects, state policy waves, and business-cycle exposure.&lt;/p>
&lt;p>Running this regression for each of the three crime outcomes gives our baseline numbers:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Outcome&lt;/th>
&lt;th style="text-align:right">$\hat\alpha$&lt;/th>
&lt;th style="text-align:right">SE (state-clustered)&lt;/th>
&lt;th>95 % CI&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Violent crime&lt;/td>
&lt;td style="text-align:right">&lt;strong>−0.1521&lt;/strong>&lt;/td>
&lt;td style="text-align:right">0.0337&lt;/td>
&lt;td>[−0.218, −0.086]&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Property crime&lt;/td>
&lt;td style="text-align:right">&lt;strong>−0.1084&lt;/strong>&lt;/td>
&lt;td style="text-align:right">0.0219&lt;/td>
&lt;td>[−0.151, −0.065]&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Murder&lt;/td>
&lt;td style="text-align:right">&lt;strong>−0.2039&lt;/strong>&lt;/td>
&lt;td style="text-align:right">0.0667&lt;/td>
&lt;td>[−0.335, −0.073]&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>Reading the violent-crime coefficient:&lt;/strong> a one-unit increase in the differenced effective abortion rate is associated with a &lt;strong>0.152-unit decrease&lt;/strong> in the differenced violent-crime rate (both variables are on a per-100,000-population scale, scaled to roughly log-changes). All three estimates are negative and statistically significant at the 5 % level; this is the Donohue–Levitt finding. The whole point of the LASSO methods below is to ask whether this picture survives when we let 284 candidate controls compete for inclusion. The baseline gives us a clear target: any procedure that drives $\hat\alpha$ to zero, flips its sign, or blows up the standard error needs to be examined critically.&lt;/p>
&lt;hr>
&lt;h2 id="5-kitchen-sink-ols--why-we-cannot-just-add-everything">5. Kitchen-sink OLS — why we cannot just add everything&lt;/h2>
&lt;p>A natural reaction to &amp;ldquo;you only used 8 controls&amp;rdquo; is to add all 284 and let OLS sort it out. With $p = 284 &amp;lt; n = 576$ the $X&amp;rsquo;X$ matrix is technically invertible, so the procedure runs. The output:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Outcome&lt;/th>
&lt;th style="text-align:right">$\hat\alpha$&lt;/th>
&lt;th style="text-align:right">SE&lt;/th>
&lt;th>95 % CI&lt;/th>
&lt;th>Sign matches baseline?&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Violent crime&lt;/td>
&lt;td style="text-align:right">&lt;strong>+0.0135&lt;/strong>&lt;/td>
&lt;td style="text-align:right">0.0911&lt;/td>
&lt;td>[−0.165, +0.192]&lt;/td>
&lt;td>no — flips sign&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Property crime&lt;/td>
&lt;td style="text-align:right">&lt;strong>−0.1950&lt;/strong>&lt;/td>
&lt;td style="text-align:right">0.0472&lt;/td>
&lt;td>[−0.287, −0.103]&lt;/td>
&lt;td>yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Murder&lt;/td>
&lt;td style="text-align:right">&lt;strong>+2.3426&lt;/strong>&lt;/td>
&lt;td style="text-align:right">0.3114&lt;/td>
&lt;td>[+1.732, +2.953]&lt;/td>
&lt;td>no — flips dramatically&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The violent-crime point estimate has flipped sign (+0.014 vs the baseline&amp;rsquo;s −0.152) and its confidence interval crosses zero; the murder estimate has exploded to &lt;strong>+2.34&lt;/strong>, which would mean a unit increase in the differenced abortion rate raises the murder rate by 234 %. This is not a plausible causal effect — it is a numerical artefact.&lt;/p>
&lt;p>To see why, recall the OLS estimator in matrix form:&lt;/p>
&lt;p>$$
\hat\beta_{\text{OLS}} = (X&amp;rsquo;X)^{-1} X' y, \qquad
\widehat{\operatorname{Var}}(\hat\beta_{\text{OLS}}) = \hat\sigma^{2} \, (X&amp;rsquo;X)^{-1}.
$$&lt;/p>
&lt;p>Here, $X$ is the $n \times p$ design matrix (the treatment plus 284 controls), $y$ is the $n \times 1$ outcome vector, and $\hat\sigma^2$ is the estimated residual variance. The variance of any coefficient — including the treatment effect — depends on $(X&amp;rsquo;X)^{-1}$. &lt;strong>When the columns of $X$ are nearly collinear, the smallest eigenvalues of $X&amp;rsquo;X$ approach zero and its inverse blows up.&lt;/strong> In our problem, R&amp;rsquo;s &lt;code>lm()&lt;/code> automatically drops 3 of the 284 columns as exact linear combinations of the others (so the regression uses 281 controls), but the remaining 281 are still close enough to collinear that the variance matrix is wildly inflated for some coefficients and the point estimates wander far from anything credible.&lt;/p>
&lt;p>This is exactly the failure mode that LASSO is designed to fix. &lt;strong>The cure is variable selection: keep the controls that matter, drop the rest.&lt;/strong> The next two sections build up to the Double LASSO procedure, which automates this in a way that is honest about causal inference rather than just about prediction.&lt;/p>
&lt;hr>
&lt;h2 id="6-lasso-and-the-one-lasso-benchmark-psl">6. LASSO and the one-LASSO benchmark (PSL)&lt;/h2>
&lt;p>The Least Absolute Shrinkage and Selection Operator (&lt;a href="#18-references">Tibshirani 1996&lt;/a>) modifies the OLS minimisation by adding an L1 penalty on the coefficients:&lt;/p>
&lt;p>$$
\hat\beta_{\text{LASSO}}(\lambda) = \arg\min_{\beta \in \mathbb{R}^p} \;
\frac{1}{2n} \| y - X\beta \|_2^2 \, + \, \lambda \sum_{j=1}^p \lvert\beta_j\rvert.
$$&lt;/p>
&lt;p>The first term is the usual sum of squared residuals. The second is the penalty: it adds $\lambda$ times the sum of the &lt;em>absolute values&lt;/em> of the coefficients to whatever the residual sum is. Two things make this choice interesting. First, the absolute-value penalty has a corner at zero — unlike a squared penalty (which would give Ridge regression), LASSO can shrink coefficients &lt;strong>exactly&lt;/strong> to zero, performing variable selection at the same time as estimation. Second, the strength of selection is controlled by one knob $\lambda$: at $\lambda = 0$ we recover OLS; as $\lambda \to \infty$ all coefficients are pinned to zero. Choosing $\lambda$ is the central tuning question, and §10 below shows that this choice can dominate the answer.&lt;/p>
&lt;p>&lt;strong>Post-Structural LASSO (PSL)&lt;/strong> is the simplest LASSO-based causal estimator. Run one LASSO on $y$ regressed on $(d, X)$, but force the treatment $d$ to stay in by setting its coefficient&amp;rsquo;s penalty multiplier to zero. Then refit by plain OLS on the selected support. In R:&lt;/p>
&lt;p>&lt;strong>Code chunk 2 — Post-Structural LASSO (PSL = one CV-LASSO with the treatment forced in):&lt;/strong>&lt;/p>
&lt;pre>&lt;code class="language-r">psl_fit &amp;lt;- function(y, d, X, group, nfolds = 3) {
M &amp;lt;- cbind(d, X)
# penalty.factor multiplies each coefficient's penalty by 0 or 1.
# Putting 0 in the d slot pins d in: LASSO cannot shrink it away.
pf &amp;lt;- c(0, rep(1, ncol(X)))
cv &amp;lt;- cv.glmnet(M, y, alpha = 1, intercept = TRUE,
penalty.factor = pf, nfolds = nfolds)
coefs &amp;lt;- as.numeric(coef(cv, s = &amp;quot;lambda.min&amp;quot;))[-1] # drop intercept
sel &amp;lt;- which(coefs[-1] != 0) # X-columns selected
Xs &amp;lt;- X[, sel, drop = FALSE]
ols_fit(y, d, Xs, group) # plain OLS + clustered SE
}
&lt;/code>&lt;/pre>
&lt;p>A few annotations on the R idioms. &lt;code>cv.glmnet&lt;/code> runs LASSO across a grid of $\lambda$ values and uses k-fold cross-validation to pick the best one — by default it returns &lt;code>lambda.min&lt;/code> (the value that minimises out-of-sample MSE) and &lt;code>lambda.1se&lt;/code> (the simplest model within one standard error of that minimum). We use &lt;code>lambda.min&lt;/code> to match Fitzgerald et al.&amp;rsquo;s footnote 2. The &lt;code>alpha = 1&lt;/code> argument selects pure LASSO (&lt;code>alpha = 0&lt;/code> would be Ridge, &lt;code>0 &amp;lt; alpha &amp;lt; 1&lt;/code> would be Elastic Net). &lt;code>nfolds = 3&lt;/code> likewise matches the paper.&lt;/p>
&lt;p>The results:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Outcome&lt;/th>
&lt;th style="text-align:right">$\hat\alpha$&lt;/th>
&lt;th style="text-align:right">SE&lt;/th>
&lt;th style="text-align:right"># controls selected&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Violent crime&lt;/td>
&lt;td style="text-align:right">&lt;strong>−0.1567&lt;/strong>&lt;/td>
&lt;td style="text-align:right">0.0342&lt;/td>
&lt;td style="text-align:right">3&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Property crime&lt;/td>
&lt;td style="text-align:right">&lt;strong>−0.0683&lt;/strong>&lt;/td>
&lt;td style="text-align:right">0.0319&lt;/td>
&lt;td style="text-align:right">12&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Murder&lt;/td>
&lt;td style="text-align:right">&lt;strong>−0.2061&lt;/strong>&lt;/td>
&lt;td style="text-align:right">0.0514&lt;/td>
&lt;td style="text-align:right">0&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>For violent and property crime, PSL keeps a small set (3 and 12 of 284 controls) and gives sensible estimates: violent crime $-0.157$ (very close to the baseline&amp;rsquo;s $-0.152$), property crime $-0.068$ (somewhat attenuated from $-0.108$), murder $-0.206$ (essentially the baseline). The standard errors are smaller than the kitchen-sink OLS — the variable selection has paid off in precision. So why is this not the end of the story?&lt;/p>
&lt;p>&lt;strong>Because PSL has a causal-inference blind spot.&lt;/strong> LASSO selects controls based on how well they predict $y$. But a covariate can be a &lt;em>confounder&lt;/em> — biasing $\hat\alpha$ if omitted — even when it does not predict $y$ strongly. Imagine a variable that is highly correlated with the treatment $d$ but only weakly with $y$. PSL&amp;rsquo;s one LASSO will drop it (it does not improve prediction of $y$ much), and the post-OLS will inherit the omitted-variable bias. &lt;a href="#18-references">Belloni, Chernozhukov and Hansen (2014)&lt;/a> made exactly this point, and proposed Double LASSO as the fix.&lt;/p>
&lt;hr>
&lt;h2 id="7-double-lasso--the-causal-side-fix">7. Double LASSO — the causal-side fix&lt;/h2>
&lt;p>Double LASSO runs &lt;strong>two&lt;/strong> LASSOs, not one. The first LASSO predicts the outcome $y$ from the controls; call its selected index set $I_y$. The second LASSO predicts the treatment $d$ from the same controls; call its selected index set $I_d$. The final estimate of $\alpha$ comes from a plain OLS regression of $y$ on $d$ and the &lt;strong>union&lt;/strong> $I_y \cup I_d$, with state-clustered standard errors. The diagram below summarises the procedure.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">flowchart TD
A[&amp;quot;Data: outcome y, treatment d,&amp;lt;br/&amp;gt;controls X (p = 284)&amp;quot;] --&amp;gt; B[&amp;quot;Step 1: LASSO of y on X&amp;lt;br/&amp;gt;(no d on right-hand side)&amp;lt;br/&amp;gt;selected set I_y&amp;quot;]
A --&amp;gt; C[&amp;quot;Step 2: LASSO of d on X&amp;lt;br/&amp;gt;(no y on right-hand side)&amp;lt;br/&amp;gt;selected set I_d&amp;quot;]
B --&amp;gt; D[&amp;quot;Union: I_y &amp;amp;cup; I_d&amp;quot;]
C --&amp;gt; D
D --&amp;gt; E[&amp;quot;Step 3: post-OLS&amp;lt;br/&amp;gt;y ~ d + X[, union]&amp;lt;br/&amp;gt;with state-clustered SE&amp;quot;]
E --&amp;gt; F[&amp;quot;Causal estimate &amp;amp;alpha;&amp;amp;#770;&amp;quot;]
style A fill:#0f1729,stroke:#6a9bcc,color:#e8ecf2
style B fill:#1f2b5e,stroke:#00d4c8,color:#e8ecf2
style C fill:#1f2b5e,stroke:#00d4c8,color:#e8ecf2
style D fill:#1f2b5e,stroke:#d97757,color:#e8ecf2
style E fill:#0f1729,stroke:#6a9bcc,color:#e8ecf2
style F fill:#1f2b5e,stroke:#00d4c8,color:#e8ecf2
&lt;/code>&lt;/pre>
&lt;p>The intuition is rooted in the &lt;strong>Frisch–Waugh–Lovell theorem&lt;/strong>. To estimate $\alpha$ in the structural equation $y_i = \alpha\, d_i + x_i' \theta + \zeta_i$, FWL says we can residualise both $y$ and $d$ against the same set of controls and regress the residuals. Concretely, let $M_X = I - X(X&amp;rsquo;X)^{-1}X'$ be the residual-maker matrix; then&lt;/p>
&lt;p>$$
\hat\alpha = \bigl(\tilde d' \tilde d\bigr)^{-1} \tilde d' \tilde y, \quad \text{where} \quad \tilde y = M_X y, \, \tilde d = M_X d.
$$&lt;/p>
&lt;p>The trick is that we do not need to use &lt;em>all&lt;/em> of $X$ in the residualisation. We only need to use enough of $X$ to capture the part that is correlated with $d$. Double LASSO does this approximately: $I_d$ catches the controls correlated with $d$; $I_y$ catches the controls correlated with $y$; their union catches both. Refitting OLS on $d$ plus the union approximates the FWL projection without committing to all 284 controls.&lt;/p>
&lt;p>The &amp;ldquo;rigorous&amp;rdquo; penalty rule chooses $\lambda$ from theory, not from CV. &lt;a href="#18-references">Belloni, Chen, Chernozhukov and Hansen (2012)&lt;/a> showed that the right scaling is&lt;/p>
&lt;p>$$
\lambda^{\text{rig}} = \frac{2 c \, \hat\sigma}{\sqrt{n}} \, \Phi^{-1}\!\left(1 - \frac{\gamma}{2 p}\right), \quad c = 1.1, \, \gamma = 0.05,
$$&lt;/p>
&lt;p>where $\hat\sigma$ is a pilot estimate of the residual standard deviation, $n$ is the sample size, $p$ is the number of candidate controls, and $\Phi^{-1}$ is the inverse standard-normal CDF. The factor $\Phi^{-1}(1 - \gamma / (2p))$ is a Bonferroni-style correction that keeps the false-positive rate of LASSO selection under control even though we are testing $p$ coefficients. The constants $c = 1.1$ and $\gamma = 0.05$ are the defaults Belloni et al. recommend and Fitzgerald et al. follow. The point of all this machinery is that, unlike CV, the rigorous penalty is &lt;em>not tuned to optimise prediction&lt;/em>. It is tuned so that &lt;strong>the LASSO selection error is asymptotically small relative to the estimation noise&lt;/strong> — which is the right calibration for causal inference, not for forecasting.&lt;/p>
&lt;p>&lt;strong>Code chunk 3 — The two rigorous LASSOs:&lt;/strong>&lt;/p>
&lt;pre>&lt;code class="language-r">dl_rigorous_fit &amp;lt;- function(y, d, X, group) {
pen &amp;lt;- list(c = 1.1, gamma = 0.05)
fit_y &amp;lt;- rlasso(X, y, post = FALSE, intercept = FALSE, penalty = pen) # y-equation
fit_d &amp;lt;- rlasso(X, d, post = FALSE, intercept = FALSE, penalty = pen) # d-equation
Iy &amp;lt;- which(as.numeric(coef(fit_y)) != 0) - 1 # drop intercept
Id &amp;lt;- which(as.numeric(coef(fit_d)) != 0) - 1
Iy &amp;lt;- Iy[Iy &amp;gt; 0]; Id &amp;lt;- Id[Id &amp;gt; 0]
U &amp;lt;- sort(union(Iy, Id))
list(Iy = Iy, Id = Id, U = U)
}
&lt;/code>&lt;/pre>
&lt;p>A few notes. &lt;code>rlasso()&lt;/code> from the &lt;code>hdm&lt;/code> package is the standard R implementation of the rigorous-penalty LASSO. &lt;code>intercept = FALSE&lt;/code> is correct here because the data has already been partialled for year fixed effects (so the column means are essentially zero); using &lt;code>intercept = TRUE&lt;/code> on already-partialled data tends to produce spurious selections. &lt;code>post = FALSE&lt;/code> returns the raw LASSO coefficients rather than the post-OLS refit — we run our own post-OLS in the next step so we can attach state-clustered standard errors.&lt;/p>
&lt;p>&lt;strong>Code chunk 4 — The post-OLS step:&lt;/strong>&lt;/p>
&lt;pre>&lt;code class="language-r">fit_dl &amp;lt;- dl_rigorous_fit(y, d, X, state)
Xs &amp;lt;- X[, fit_dl$U, drop = FALSE] # union of selected controls
final &amp;lt;- ols_fit(y, d, Xs, state) # plain OLS, state-clustered SE
&lt;/code>&lt;/pre>
&lt;p>We pass the union of selected controls to a helper &lt;code>ols_fit()&lt;/code> that calls &lt;code>lm()&lt;/code> on &lt;code>y ~ d + Xs - 1&lt;/code> (the &lt;code>- 1&lt;/code> suppresses the intercept on already-partialled data), pulls out the treatment coefficient, and computes a state-clustered standard error via the sandwich formula in the next section. &lt;strong>The final $\hat\alpha$ comes from unshrunk OLS&lt;/strong> — LASSO is used only to choose which controls to include.&lt;/p>
&lt;p>The results:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Outcome&lt;/th>
&lt;th style="text-align:right">$\hat\alpha$&lt;/th>
&lt;th style="text-align:right">SE&lt;/th>
&lt;th>95 % CI&lt;/th>
&lt;th style="text-align:right">|I_y|&lt;/th>
&lt;th style="text-align:right">|I_d|&lt;/th>
&lt;th style="text-align:right">Union&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Violent crime&lt;/td>
&lt;td style="text-align:right">&lt;strong>−0.0964&lt;/strong>&lt;/td>
&lt;td style="text-align:right">0.0514&lt;/td>
&lt;td>[−0.197, +0.004]&lt;/td>
&lt;td style="text-align:right">0&lt;/td>
&lt;td style="text-align:right">8&lt;/td>
&lt;td style="text-align:right">8&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Property crime&lt;/td>
&lt;td style="text-align:right">&lt;strong>−0.0314&lt;/strong>&lt;/td>
&lt;td style="text-align:right">0.0227&lt;/td>
&lt;td>[−0.076, +0.013]&lt;/td>
&lt;td style="text-align:right">3&lt;/td>
&lt;td style="text-align:right">9&lt;/td>
&lt;td style="text-align:right">12&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Murder&lt;/td>
&lt;td style="text-align:right">&lt;strong>−0.1662&lt;/strong>&lt;/td>
&lt;td style="text-align:right">0.0790&lt;/td>
&lt;td>[−0.321, −0.011]&lt;/td>
&lt;td style="text-align:right">0&lt;/td>
&lt;td style="text-align:right">9&lt;/td>
&lt;td style="text-align:right">9&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>Reading the violent-crime row.&lt;/strong> $\hat\alpha = -0.0964$ means a unit increase in the differenced effective abortion rate is associated with a 0.096-unit decrease in the differenced violent-crime rate, conditional on the 8 controls in the union. The 95 % confidence interval [−0.197, +0.004] barely contains zero — under this specification, the violent-crime effect drops one notch below significance at the 5 % level. The selection counts |I_y| = 0, |I_d| = 8 tell us something more interesting: the LASSO of crime on controls picked &lt;strong>zero&lt;/strong> controls (out of 284), while the LASSO of abortion on controls picked 8. We unpack the meaning of this asymmetry in the next section.&lt;/p>
&lt;hr>
&lt;h2 id="8-state-clustered-standard-errors">8. State-clustered standard errors&lt;/h2>
&lt;p>A digression on the standard errors. The 576 observations are not independent — they are 12 differenced years of data for each of 48 states, and within-state observations are autocorrelated through governor effects, state policy waves, and business-cycle exposure. Treating them as independent (the default &lt;code>vcov&lt;/code> for &lt;code>lm()&lt;/code>) would understate the uncertainty by about 40 % on this panel. We use a cluster-robust sandwich estimator with the standard HC1 finite-sample adjustment (&lt;a href="#18-references">Cameron and Miller 2015&lt;/a>):&lt;/p>
&lt;p>$$
\hat V_{\text{cluster}} = \underbrace{\frac{n-1}{n-k}}_{\text{small-sample}} \cdot \underbrace{\frac{G}{G-1}}_{\text{cluster-count}} \cdot \underbrace{(X&amp;rsquo;X)^{-1}}_{\text{bread}} \cdot \underbrace{\left(\sum_{g=1}^G X_g' \hat e_g \hat e_g' X_g\right)}_{\text{meat}} \cdot \underbrace{(X&amp;rsquo;X)^{-1}}_{\text{bread}}.
$$&lt;/p>
&lt;p>The &amp;ldquo;sandwich&amp;rdquo; name comes from the structure: two slices of bread $(X&amp;rsquo;X)^{-1}$ around the meat $\sum_g X_g' \hat e_g \hat e_g' X_g$, the cluster-summed outer product of the within-cluster scores. The two front factors are the small-sample correction (Cameron and Miller 2015): $(n-1)/(n-k)$ adjusts for the degrees of freedom consumed by the regressors, and $G/(G-1)$ adjusts for the number of clusters. Here $n = 576$, $k$ is the number of fitted columns (varies by estimator), and $G = 48$ is the number of states.&lt;/p>
&lt;p>&lt;strong>Code chunk 5 — The &lt;code>cluster_se&lt;/code> function:&lt;/strong>&lt;/p>
&lt;pre>&lt;code class="language-r">cluster_se &amp;lt;- function(X, e, group) {
X &amp;lt;- as.matrix(X)
n &amp;lt;- length(e); k &amp;lt;- ncol(X); G &amp;lt;- length(unique(group))
XX &amp;lt;- crossprod(X)
bread &amp;lt;- tryCatch(solve(XX), error = function(err) MASS::ginv(XX))
S &amp;lt;- matrix(0, k, k)
for (g in unique(group)) {
idx &amp;lt;- which(group == g)
Xg &amp;lt;- X[idx, , drop = FALSE]; eg &amp;lt;- e[idx]
Xe &amp;lt;- crossprod(Xg, eg) # k x 1
S &amp;lt;- S + tcrossprod(Xe) # outer product, accumulated
}
V &amp;lt;- ((n - 1) / (n - k)) * (G / (G - 1)) * (bread %*% S %*% bread)
sqrt(diag(V))
}
&lt;/code>&lt;/pre>
&lt;p>Two implementation notes. First, when $X$ has near-collinear columns (the kitchen-sink OLS case in §5), &lt;code>solve()&lt;/code> can still return finite numbers, but they are unreliable. We fall back to a Moore–Penrose pseudoinverse via &lt;code>MASS::ginv()&lt;/code> if &lt;code>solve()&lt;/code> raises an error. This is the right behaviour for a pedagogical script; in production you would also check the condition number. Second, the cluster-count correction $G/(G-1)$ assumes the number of clusters $G$ is &amp;ldquo;large.&amp;rdquo; A rule of thumb is $G \geq 30$; with $G = 48$ states we are comfortably above that threshold.&lt;/p>
&lt;p>The clustered standard errors are visible throughout the post — they are why the confidence intervals are wider than the heteroscedastic-robust intervals you might compute from &lt;code>vcovHC(fit, type = &amp;quot;HC1&amp;quot;)&lt;/code>. On this panel, the inflation factor is roughly $\sqrt{1 + (\bar n_g - 1) \rho_e}$ where $\bar n_g = 12$ is the average cluster size and $\rho_e$ is the within-state error autocorrelation — a 40 % SE increase corresponds to $\rho_e \approx 0.08$, a modest but not negligible level.&lt;/p>
&lt;hr>
&lt;h2 id="9-when-does-double-lasso-help-most">9. When does Double LASSO help most?&lt;/h2>
&lt;p>Look back at the DL-rigorous table in §7. For violent crime and murder, |I_y| = 0 — the LASSO of &lt;em>crime&lt;/em> on controls picked &lt;strong>zero variables&lt;/strong> out of 284. For all three outcomes |I_d| is 8 or 9 — the LASSO of &lt;em>abortion&lt;/em> on controls picked a handful. This asymmetry is the empirical fingerprint of the situation in which Double LASSO most helps: the treatment is well-predicted by the controls, but the outcome is not. Fitzgerald et al. (2026) emphasise this in their footnote 4, paraphrased: &lt;em>DL is most useful when the outcome is hard to predict but the treatment is well-predicted, because that is when the second LASSO catches controls that the first one missed.&lt;/em>&lt;/p>
&lt;p>Why does this matter for causal inference? Recall the PSL blind spot from §6: a one-LASSO procedure on $y$ can drop a control that strongly predicts $d$ if it does not strongly predict $y$. Suppose the (unobserved) data-generating process is&lt;/p>
&lt;p>$$
y_i = \alpha \, d_i + x_i' \theta + \zeta_i, \quad d_i = x_i' \pi + v_i, \quad \zeta_i \perp v_i.
$$&lt;/p>
&lt;p>If a particular $x_j$ has a large $\pi_j$ but a small $\theta_j$, then $x_j$ is a strong confounder (it predicts $d$, and thus moves $\hat\alpha$ when omitted), but a weak predictor of $y$. PSL drops it; DL keeps it via the d-equation LASSO. The empirical fingerprint |I_y| = 0, |I_d| = 8 means we are exactly in this regime: the eight controls that survived the d-equation LASSO are doing all of the confounding-control work in the final OLS.&lt;/p>
&lt;p>A natural follow-up question: which eight controls? The paper&amp;rsquo;s §4 discussion (and our &lt;code>selection_diagnostic.csv&lt;/code> for the curious) names lagged prisoners per capita, lagged income per capita, and lagged unemployment as common selections across replications. These are exactly the variables Donohue and Levitt themselves controlled for in 2001 — DL has, in a sense, &lt;em>rediscovered&lt;/em> a sensible subset of the original eight controls from a candidate pool of 284, automatically.&lt;/p>
&lt;hr>
&lt;h2 id="10-rigorous-vs-cross-validated-penalty--a-sign-flip">10. Rigorous vs. cross-validated penalty — a sign flip&lt;/h2>
&lt;p>The second flavour of Double LASSO replaces the rigorous penalty with &lt;strong>3-fold cross-validation&lt;/strong>. The recipe is identical to §7 — two LASSOs, take the union, post-OLS — but each LASSO now uses &lt;code>cv.glmnet&lt;/code> to pick $\lambda$ by minimising out-of-sample mean-squared error on the prediction problem. The catch is that this choice optimises a different objective — prediction-MSE on $y$ alone, or on $d$ alone, is not the same thing as choosing the right controls for the causal estimate of $\alpha$.&lt;/p>
&lt;p>&lt;strong>Code chunk 6 — The CV-penalty Double LASSO:&lt;/strong>&lt;/p>
&lt;pre>&lt;code class="language-r">dl_cv_fit &amp;lt;- function(y, d, X, group, nfolds = 3) {
cv_y &amp;lt;- cv.glmnet(X, y, alpha = 1, intercept = TRUE, nfolds = nfolds)
cv_d &amp;lt;- cv.glmnet(X, d, alpha = 1, intercept = TRUE, nfolds = nfolds)
Iy &amp;lt;- which(as.numeric(coef(cv_y, s = &amp;quot;lambda.min&amp;quot;))[-1] != 0)
Id &amp;lt;- which(as.numeric(coef(cv_d, s = &amp;quot;lambda.min&amp;quot;))[-1] != 0)
U &amp;lt;- sort(union(Iy, Id))
Xs &amp;lt;- X[, U, drop = FALSE]
ols_fit(y, d, Xs, group)
}
&lt;/code>&lt;/pre>
&lt;p>The figure below shows the d-equation LASSO paths for the violent-crime panel. Each curve is one of the 284 candidate controls; the horizontal axis is $\log(\lambda)$ (larger $\lambda$ means more shrinkage, so curves move toward zero as we go right). The dashed vertical line marks $\log(\lambda_{\min})$ — the CV-optimal penalty. Teal curves are nonzero at $\lambda_{\min}$; faint grey curves were shrunk to zero. The dramatic finding: &lt;strong>143 of 284 controls survive at the CV-optimal penalty&lt;/strong>, illustrating exactly the over-selection that motivates using the rigorous penalty in §7.&lt;/p>
&lt;p>&lt;img src="r_double_lasso_paths.png" alt="CV-LASSO coefficient paths for the d-equation (predicting the abortion rate from the 284 partialled controls), violent-crime panel: 143 of 284 controls survive at the CV-chosen lambda.min — illustrating the over-selection that motivates the rigorous penalty.">&lt;/p>
&lt;p>Compare to the rigorous penalty: |I_d| = 8. A 17-fold difference in the d-equation alone. The consequence for the final estimate is dramatic. Side-by-side:&lt;/p>
&lt;p>&lt;img src="r_double_lasso_methods_compare.png" alt="Rigorous-penalty vs. CV-penalty Double LASSO, side by side across the three outcomes: CV&amp;rsquo;s permissive selection moves coefficients dramatically.">&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Outcome&lt;/th>
&lt;th style="text-align:right">$\hat\alpha_{\text{rig}}$&lt;/th>
&lt;th style="text-align:right">$\hat\alpha_{\text{CV}}$&lt;/th>
&lt;th style="text-align:right">$\lvert I_y \cup I_d \rvert_{\text{rig}}$&lt;/th>
&lt;th style="text-align:right">$\lvert I_y \cup I_d \rvert_{\text{CV}}$&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Violent crime&lt;/td>
&lt;td style="text-align:right">−0.0964&lt;/td>
&lt;td style="text-align:right">&lt;strong>+0.0193&lt;/strong>&lt;/td>
&lt;td style="text-align:right">8&lt;/td>
&lt;td style="text-align:right">&lt;strong>150&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Property crime&lt;/td>
&lt;td style="text-align:right">−0.0314&lt;/td>
&lt;td style="text-align:right">&lt;strong>−0.1784&lt;/strong>&lt;/td>
&lt;td style="text-align:right">12&lt;/td>
&lt;td style="text-align:right">&lt;strong>109&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Murder&lt;/td>
&lt;td style="text-align:right">−0.1662&lt;/td>
&lt;td style="text-align:right">&lt;strong>−1.1128&lt;/strong>&lt;/td>
&lt;td style="text-align:right">9&lt;/td>
&lt;td style="text-align:right">&lt;strong>161&lt;/strong>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>For violent crime, the coefficient &lt;strong>flips sign&lt;/strong> (rigorous $-0.096$ vs. CV $+0.019$). For murder, the coefficient &lt;strong>multiplies by seven&lt;/strong> and stays negative but lands at an implausible $-1.11$. The reason is the same in both cases: CV&amp;rsquo;s $\lambda_{\min}$ keeps too many marginally-predictive controls, and each of them soaks up a bit of the treatment variation, leaving less for the post-OLS to identify $\alpha$ on.&lt;/p>
&lt;p>This is not a knock on CV in general. CV&amp;rsquo;s $\lambda_{\min}$ is exactly the right choice when the goal is &lt;strong>prediction&lt;/strong> — out-of-sample MSE on $y$, for example. But for causal inference on the treatment effect $\alpha$, the rigorous penalty is the better choice because it is tuned to the right asymptotic objective: keeping selection error small &lt;em>relative to estimation error&lt;/em>, not minimising prediction loss.&lt;/p>
&lt;hr>
&lt;h2 id="11-the-forest-plot">11. The forest plot&lt;/h2>
&lt;p>Stacking all five estimators against all three outcomes gives the headline figure:&lt;/p>
&lt;p>&lt;img src="r_double_lasso_estimates.png" alt="Forest plot of α̂ ± 95 % CI for all five estimators across all three crime outcomes. The dashed line is zero; bars to the left of it indicate a crime-reducing association.">&lt;/p>
&lt;p>A coherent story for violent and property crime: the LASSO methods (PSL, DL-rigorous, and DL-CV in the small-selection case) land between the two extremes — First-difference OLS at $-0.152$ (violent) and Kitchen-sink OLS at $+0.014$ (violent). PSL and DL-rigorous concentrate the data&amp;rsquo;s signal near the small set of controls that actually matter (3 to 12 of them), giving estimates in the $-0.10$ to $-0.16$ range with tighter standard errors than OLS-full.&lt;/p>
&lt;p>For murder, the story is messier. Kitchen-sink OLS gives the nonsensical $+2.34$. DL-CV gives the implausible $-1.11$. But First-diff ($-0.20$), PSL ($-0.21$), and DL-rigorous ($-0.17$) cluster sensibly. The murder outcome is the noisiest of the three (state-level murder counts are small numbers in many state-years), so it punishes any procedure that picks too many controls.&lt;/p>
&lt;p>The variable-selection bar chart visualises the over-selection problem at a glance:&lt;/p>
&lt;p>&lt;img src="r_double_lasso_selection.png" alt="Variable selection across the two Double LASSO penalties: bars show the size of |I_y|, |I_d|, intersection, and union out of 284 candidate controls.">&lt;/p>
&lt;p>In every panel, the orange CV bars dwarf the teal rigorous bars. For violent crime: union 150 vs. 8. For property crime: 109 vs. 12. For murder: 161 vs. 9. Both methods follow the same three-step recipe and run on the same data; the only difference is how $\lambda$ is chosen. The chart makes the principal trade-off in high-dimensional causal inference visible: prediction-tuned penalties (CV) over-select; theory-tuned penalties (rigorous) deliberately under-select to leave the causal signal undisturbed.&lt;/p>
&lt;p>&lt;strong>Code chunk 7 — Building the forest plot (compressed):&lt;/strong>&lt;/p>
&lt;pre>&lt;code class="language-r">ggplot(table2, aes(x = estimate, y = method, color = method)) +
geom_vline(xintercept = 0, color = LIGHT_TEXT, linetype = &amp;quot;dashed&amp;quot;) +
geom_errorbar(aes(xmin = ci_lo, xmax = ci_hi), width = 0.25,
orientation = &amp;quot;y&amp;quot;) +
geom_point(size = 3.2) +
facet_wrap(~ outcome, scales = &amp;quot;free_x&amp;quot;, ncol = 3) +
scale_color_manual(values = method_colors, guide = &amp;quot;none&amp;quot;) +
scale_y_discrete(limits = rev(method_levels)) +
labs(x = expression(hat(alpha) ~ &amp;quot;(effect of effective abortion rate)&amp;quot;),
y = NULL,
caption = &amp;quot;Replication of Table 2 in Fitzgerald et al. (2026).&amp;quot;) +
theme_site()
&lt;/code>&lt;/pre>
&lt;p>The full ggplot call (including the title, subtitle, and &lt;code>theme_site()&lt;/code> definition with the site palette) lives in &lt;code>analysis.R&lt;/code> at lines 600–620. We use &lt;code>geom_errorbar&lt;/code> with &lt;code>orientation = &amp;quot;y&amp;quot;&lt;/code> rather than the deprecated &lt;code>geom_errorbarh&lt;/code>; the orientation argument was added in ggplot2 3.3 and lets you flip any of the standard geoms to read horizontally.&lt;/p>
&lt;hr>
&lt;h2 id="12-when-to-use-which-method">12. When to use which method?&lt;/h2>
&lt;p>The decision tree below offers practical guidance for a researcher facing a fresh dataset. It is not a substitute for thinking carefully about identification (no method can rescue an invalid research design), but it is a reasonable starting point.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">flowchart TD
Start[&amp;quot;You have n observations,&amp;lt;br/&amp;gt;p candidate controls,&amp;lt;br/&amp;gt;and want a causal &amp;amp;alpha;&amp;amp;#770;&amp;quot;] --&amp;gt; Q1{&amp;quot;p &amp;amp;ge; n?&amp;quot;}
Q1 --&amp;gt;|Yes| L[&amp;quot;LASSO methods required&amp;lt;br/&amp;gt;(OLS infeasible)&amp;quot;]
Q1 --&amp;gt;|No| Q2{&amp;quot;p / n &amp;amp;gt; 0.3?&amp;quot;}
Q2 --&amp;gt;|Yes, like this post&amp;lt;br/&amp;gt;p=284, n=576| L
Q2 --&amp;gt;|No| Q3{&amp;quot;n &amp;amp;ge; 5,000?&amp;quot;}
Q3 --&amp;gt;|Yes| O[&amp;quot;Plain OLS with all&amp;lt;br/&amp;gt;controls is fine&amp;quot;]
Q3 --&amp;gt;|No| L
L --&amp;gt; Q4{&amp;quot;Need valid causal&amp;lt;br/&amp;gt;inference, not just&amp;lt;br/&amp;gt;prediction?&amp;quot;}
Q4 --&amp;gt;|Yes| DL[&amp;quot;Double LASSO&amp;lt;br/&amp;gt;with rigorous penalty&amp;lt;br/&amp;gt;(this post's &amp;amp;sect;7)&amp;quot;]
Q4 --&amp;gt;|No| Pred[&amp;quot;DL-CV or PSL are&amp;lt;br/&amp;gt;both fine for prediction&amp;quot;]
style Start fill:#0f1729,stroke:#6a9bcc,color:#e8ecf2
style DL fill:#1f2b5e,stroke:#00d4c8,color:#e8ecf2
style Pred fill:#1f2b5e,stroke:#d97757,color:#e8ecf2
style O fill:#1f2b5e,stroke:#d97757,color:#e8ecf2
style L fill:#0f1729,stroke:#6a9bcc,color:#e8ecf2
style Q1 fill:#1f2b5e,stroke:#6a9bcc,color:#e8ecf2
style Q2 fill:#1f2b5e,stroke:#6a9bcc,color:#e8ecf2
style Q3 fill:#1f2b5e,stroke:#6a9bcc,color:#e8ecf2
style Q4 fill:#1f2b5e,stroke:#d97757,color:#e8ecf2
&lt;/code>&lt;/pre>
&lt;p>The thresholds are rough. Fitzgerald et al. (2026) section 3.2 shows DL&amp;rsquo;s advantage shrinks rapidly as $n$ grows at fixed $p$; by $n = 3{,}000$ in their Monte Carlo, OLS is essentially indistinguishable from DL. The $p / n &amp;gt; 0.3$ cutoff is informal — it corresponds to the regime where $(X&amp;rsquo;X)^{-1}$ starts having visible numerical instability — but it is a reasonable diagnostic.&lt;/p>
&lt;p>One more piece of intuition justifies the post-OLS refit step in DL (and PSL). LASSO&amp;rsquo;s coefficients on the variables it selects are shrunken toward zero by construction. If you used those shrunken coefficients to compute the residuals for $\alpha$, you would inherit a bias of the order&lt;/p>
&lt;p>$$
\hat\alpha_{\text{LASSO}} - \alpha = O_p\!\left(\frac{\lambda}{n}\right).
$$&lt;/p>
&lt;p>For our $\lambda^{\text{rig}}$ and $n = 576$, that bias is roughly 5–15 % of the treatment effect — large enough to matter. Refitting with plain OLS on the selected support &lt;strong>removes the shrinkage&lt;/strong> and recovers the unbiased estimate. This is why every method in this post uses LASSO for &lt;em>selection only&lt;/em> and post-OLS for &lt;em>estimation&lt;/em>. It is the load-bearing step in the whole machinery.&lt;/p>
&lt;hr>
&lt;h2 id="13-caveats-and-identification">13. Caveats and identification&lt;/h2>
&lt;p>Six things to keep in mind when reading the headline estimates.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>This is a replication exercise, not a primary causal claim.&lt;/strong> Fitzgerald et al. (2026) is itself a replication paper studying Double LASSO as a &lt;em>method&lt;/em>. Whether more abortion access caused less crime is a substantive question that goes well beyond any single regression specification. We inherit the paper&amp;rsquo;s framing: this post is about DL behaviour on a particular dataset, not about endorsing the Donohue–Levitt 2001 substantive claim.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Identification rests on two assumptions.&lt;/strong> First, &lt;em>conditional independence given $X$&lt;/em>: the 284 partialled controls must capture every variable that influenced both the abortion rate and the crime rate in the 1980s. Second, &lt;em>parallel trends in levels&lt;/em>: state fixed effects are absorbed by first-differencing, year fixed effects by the partialling step in &lt;code>prepare_data.R&lt;/code>. Neither assumption is innocuous. Fitzgerald et al. section 3.5 discusses two failure modes (bias amplification from controls that act as imperfect instruments, and collider bias from controls that are caused by both treatment and outcome) that this empirical application cannot rule out.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>State-clustering relies on $G \geq 30$.&lt;/strong> Cluster-robust inference is justified asymptotically in $G$, the number of clusters. With $G = 48$ states we are above the rule of thumb. If you had only 5 or 10 clusters, the cluster-robust SE would be unreliable and you would need to switch to wild bootstrap or block bootstrap inference.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>CV LASSO is non-deterministic.&lt;/strong> &lt;code>cv.glmnet&lt;/code> randomly partitions the data into $K$ folds; without setting a seed, the variable-selection counts in §10 would vary by ±5 controls between runs and the headline coefficient by ±0.01. The script sets &lt;code>set.seed(20260520)&lt;/code> so the post&amp;rsquo;s numbers reproduce exactly. The rigorous LASSO is deterministic given the data and the penalty arguments.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>OLS-full and DL-rigorous standard errors diverge from the paper.&lt;/strong> Our SE on OLS-full violent crime is $0.091$ vs. the paper&amp;rsquo;s $0.875$; the gap stems from inverting near-singular $X&amp;rsquo;X$ via &lt;code>solve()&lt;/code> + &lt;code>MASS::ginv()&lt;/code> here vs. the paper&amp;rsquo;s &lt;code>matlib::inv(X'X * 1e8) * 1e8&lt;/code> rescaling. The audit appendix in &lt;code>results_report.md&lt;/code> walks through it — both implementations are mathematically valid and the qualitative cross-method comparison is unchanged.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The estimand is not population-weighted.&lt;/strong> Every state-year observation gets equal weight. State-clustered SEs do not re-weight observations; they only adjust the variance for within-state autocorrelation. A population-weighted version (weighting state-years by state adult population) would give a different — and arguably more policy-relevant — estimand. The paper does not weight, so neither do we.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="14-comparison-to-fitzgerald-et-al-2026">14. Comparison to Fitzgerald et al. (2026)&lt;/h2>
&lt;p>The headline numerical reproduction is &lt;strong>faithful at the variable-selection level&lt;/strong>. Our LASSO selections for the rigorous-penalty Double LASSO match the paper&amp;rsquo;s Table 2 &lt;em>exactly&lt;/em> across all three outcomes:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Outcome&lt;/th>
&lt;th style="text-align:right">|I_y| ours&lt;/th>
&lt;th style="text-align:right">|I_y| paper&lt;/th>
&lt;th style="text-align:right">|I_d| ours&lt;/th>
&lt;th style="text-align:right">|I_d| paper&lt;/th>
&lt;th style="text-align:right">Point ours&lt;/th>
&lt;th style="text-align:right">Point paper&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Violent crime&lt;/td>
&lt;td style="text-align:right">&lt;strong>0&lt;/strong>&lt;/td>
&lt;td style="text-align:right">0&lt;/td>
&lt;td style="text-align:right">&lt;strong>8&lt;/strong>&lt;/td>
&lt;td style="text-align:right">8&lt;/td>
&lt;td style="text-align:right">−0.0964&lt;/td>
&lt;td style="text-align:right">−0.104&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Property crime&lt;/td>
&lt;td style="text-align:right">&lt;strong>3&lt;/strong>&lt;/td>
&lt;td style="text-align:right">3&lt;/td>
&lt;td style="text-align:right">&lt;strong>9&lt;/strong>&lt;/td>
&lt;td style="text-align:right">9&lt;/td>
&lt;td style="text-align:right">−0.0314&lt;/td>
&lt;td style="text-align:right">−0.030&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Murder&lt;/td>
&lt;td style="text-align:right">&lt;strong>0&lt;/strong>&lt;/td>
&lt;td style="text-align:right">0&lt;/td>
&lt;td style="text-align:right">&lt;strong>9&lt;/strong>&lt;/td>
&lt;td style="text-align:right">9&lt;/td>
&lt;td style="text-align:right">−0.1662&lt;/td>
&lt;td style="text-align:right">−0.125&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Six selection-count cells, six exact matches. Point estimates agree to within 0.04 on the largest absolute gap (murder); the others are within 0.01. The first-differenced baselines and PSL estimates likewise reproduce the paper to within 0.005 on point estimates (PSL property crime is the exception — our $-0.068$ vs. paper $-0.016$, attributable to the random fold assignment in 3-fold CV that the paper does not seed). The DL-CV row is our own addition: Fitzgerald et al. do not tabulate it for the empirical application (they study it in their Monte Carlo simulations), so we report it here as the second layer of our headline contrast.&lt;/p>
&lt;p>The complete row-by-row audit lives in &lt;code>results_report.md&lt;/code>&amp;rsquo;s appendix, with line citations to the paper&amp;rsquo;s manuscript markdown.&lt;/p>
&lt;hr>
&lt;h2 id="15-conclusion">15. Conclusion&lt;/h2>
&lt;p>Three takeaways worth carrying away from this post.&lt;/p>
&lt;p>First, &lt;strong>Double LASSO is a method, not a panacea&lt;/strong>. It does not invent variation in the data, nor does it weaken the identifying assumptions of the underlying research design. What it does is make high-dimensional control sets &lt;em>tractable&lt;/em> without committing to using all of them or to picking a subset by hand. On a dataset where conditional independence holds and the candidate-control set is rich enough to span the confounders, DL-rigorous reproduces the Donohue–Levitt 2001 headline closely while disciplining the standard errors.&lt;/p>
&lt;p>Second, &lt;strong>the rigorous penalty matters&lt;/strong>. Switching from &lt;code>hdm::rlasso&lt;/code> to &lt;code>cv.glmnet&lt;/code> flipped our violent-crime coefficient from $-0.096$ to $+0.019$ and inflated the murder estimate to $-1.11$. The CV penalty is optimised for prediction-MSE; for causal inference we want the theory-driven penalty that controls &lt;em>selection-error relative to estimation error&lt;/em>. Practitioners moving from supervised-ML training to causal inference often default to CV without thinking; this post&amp;rsquo;s headline contrast is a reminder that the choice is not innocuous.&lt;/p>
&lt;p>Third, &lt;strong>the regime determines the methodology&lt;/strong>. With our $p = 284$, $n = 576$, we are squarely in the small-sample, high-dimensional zone where DL is designed to help. With $p = 8$ and $n = 5{,}000$, plain OLS would be perfectly fine — DL adds nothing when classical OLS is in its comfort zone. The decision tree in §12 is a starting point for picking the right tool for the dimensions you face.&lt;/p>
&lt;p>If you came in expecting either a definitive statement about abortion and crime or a magic ML cure for omitted-variable bias, you should leave with neither. What you should leave with is a clearer mental model of &lt;em>when&lt;/em> the high-dimensional toolkit earns its complexity: when the controls are many, the sample is moderate, and the treatment is predictable from the controls but the outcome is not.&lt;/p>
&lt;hr>
&lt;h2 id="16-exercises">16. Exercises&lt;/h2>
&lt;p>These exercises ask you to modify and re-run the &lt;code>analysis.R&lt;/code> script in this post. All datasets, dependencies, and helper functions are already in place — you only need to change the indicated lines, run the script, and read the output.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Change the CV seed.&lt;/strong> In &lt;code>analysis.R&lt;/code> line 86, change &lt;code>set.seed(20260520)&lt;/code> to &lt;code>set.seed(1)&lt;/code>, then &lt;code>set.seed(2)&lt;/code>, then &lt;code>set.seed(3)&lt;/code>. Re-run each time and record the DL-CV violent-crime estimate $\hat\alpha$ and union size. How much does the DL-CV point estimate vary across seeds? Does the &lt;em>rigorous&lt;/em> DL estimate change at all? Why does the seed matter for one but not the other?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Tighten the rigorous penalty.&lt;/strong> In the &lt;code>dl_rigorous_fit()&lt;/code> function (around &lt;code>analysis.R&lt;/code> line 431), the penalty parameters are &lt;code>c = 1.1, gamma = 0.05&lt;/code>. Change to &lt;code>c = 0.9&lt;/code> (looser, expects more variables to be kept) and then &lt;code>c = 1.5&lt;/code> (tighter, expects fewer). Re-run and report the new $|I_y|$, $|I_d|$, and $\hat\alpha$ for violent crime. Does the headline α survive both perturbations? Which side of $c = 1.1$ is more sensitive?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Drop a year of data.&lt;/strong> Subset the differenced panel to 1986–1995 only (10 years × 48 states = 480 observations) by filtering &lt;code>linear&lt;/code>, &lt;code>partialled&lt;/code>, and the three control matrices to remove the last two years. Re-run DL-rigorous on the violent-crime equation. How does the estimate change? How does the standard error change? What does this tell you about the n=576 regime&amp;rsquo;s signal-to-noise ratio?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Substitute Ridge for LASSO.&lt;/strong> In the &lt;code>dl_cv_fit()&lt;/code> function (around &lt;code>analysis.R&lt;/code> line 487), change &lt;code>alpha = 1&lt;/code> to &lt;code>alpha = 0&lt;/code> to use Ridge (L2 penalty) instead of LASSO (L1). Re-run the DL-CV pipeline. What changes in the variable counts? Why does Ridge not produce a sparse $I_y$ or $I_d$ set? What does this tell you about why LASSO — and not Ridge — is the right tool for &lt;em>variable selection&lt;/em>, as distinct from prediction?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="17-reproducing-this-analysis">17. Reproducing this analysis&lt;/h2>
&lt;p>Everything in this post — figures, tables, point estimates, standard errors — comes from a single self-contained R script (&lt;code>analysis.R&lt;/code>, 757 lines) that loads its data from six CSVs hosted in the post&amp;rsquo;s &lt;code>data/&lt;/code> folder on GitHub. The script does not need any Matlab files locally: the one-time Matlab → CSV conversion is handled by the companion &lt;code>prepare_data.R&lt;/code>, which is also in the post folder for the curious. The full reproduction recipe is:&lt;/p>
&lt;ol>
&lt;li>Clone the GitHub repository (or copy &lt;code>analysis.R&lt;/code> and any of its required packages).&lt;/li>
&lt;li>Run &lt;code>Rscript analysis.R 2&amp;gt;&amp;amp;1 | tee execution_log.txt&lt;/code> from the post folder.&lt;/li>
&lt;li>The script writes &lt;code>r_double_lasso_*.png&lt;/code> (four figures), &lt;code>results_table2.csv&lt;/code> (the Table 2 replication), and &lt;code>selection_diagnostic.csv&lt;/code> (variable-selection counts).&lt;/li>
&lt;/ol>
&lt;p>R packages used: &lt;a href="https://cran.r-project.org/package=glmnet" target="_blank" rel="noopener">&lt;code>glmnet&lt;/code>&lt;/a> (for &lt;code>cv.glmnet&lt;/code>), &lt;a href="https://cran.r-project.org/package=hdm" target="_blank" rel="noopener">&lt;code>hdm&lt;/code>&lt;/a> (for &lt;code>rlasso&lt;/code>), &lt;a href="https://cran.r-project.org/package=sandwich" target="_blank" rel="noopener">&lt;code>sandwich&lt;/code>&lt;/a> (for general-purpose vcov utilities), &lt;a href="https://cran.r-project.org/package=lmtest" target="_blank" rel="noopener">&lt;code>lmtest&lt;/code>&lt;/a>, &lt;a href="https://cran.r-project.org/package=MASS" target="_blank" rel="noopener">&lt;code>MASS&lt;/code>&lt;/a> (for &lt;code>ginv()&lt;/code>), &lt;a href="https://cran.r-project.org/package=ggplot2" target="_blank" rel="noopener">&lt;code>ggplot2&lt;/code>&lt;/a>, &lt;a href="https://cran.r-project.org/package=dplyr" target="_blank" rel="noopener">&lt;code>dplyr&lt;/code>&lt;/a>, &lt;a href="https://cran.r-project.org/package=tidyr" target="_blank" rel="noopener">&lt;code>tidyr&lt;/code>&lt;/a>, &lt;a href="https://cran.r-project.org/package=scales" target="_blank" rel="noopener">&lt;code>scales&lt;/code>&lt;/a>, &lt;a href="https://cran.r-project.org/package=patchwork" target="_blank" rel="noopener">&lt;code>patchwork&lt;/code>&lt;/a>. All are on CRAN; the script installs missing ones automatically.&lt;/p>
&lt;p>The runtime on Apple Silicon is roughly &lt;strong>90 seconds&lt;/strong> for the full pipeline, dominated by the CV calls in &lt;code>cv.glmnet&lt;/code> and &lt;code>dl_cv_fit&lt;/code>. The rigorous-LASSO step is essentially instant; the post-OLS clustered-SE calculations are negligible.&lt;/p>
&lt;p>A note on the seed. The line &lt;code>set.seed(20260520)&lt;/code> near the top of &lt;code>analysis.R&lt;/code> controls the random fold assignment for &lt;code>cv.glmnet&lt;/code>. Changing the seed will shift the DL-CV numbers by roughly ±0.01 on point estimates and ±5 in variable-selection counts. The DL-rigorous numbers do not depend on the seed.&lt;/p>
&lt;hr>
&lt;h2 id="18-references">18. References&lt;/h2>
&lt;p>&lt;strong>Academic references&lt;/strong> (each linked to the publisher DOI):&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Belloni, A., Chen, D., Chernozhukov, V. &amp;amp; Hansen, C.&lt;/strong> (2012). &lt;a href="https://doi.org/10.3982/ECTA9626" target="_blank" rel="noopener">&amp;ldquo;Sparse models and methods for optimal instruments with an application to eminent domain.&amp;quot;&lt;/a> &lt;em>Econometrica&lt;/em> 80(6): 2369–2429. The original derivation of the rigorous LASSO penalty.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Belloni, A., Chernozhukov, V. &amp;amp; Hansen, C.&lt;/strong> (2014). &lt;a href="https://doi.org/10.1093/restud/rdt044" target="_blank" rel="noopener">&amp;ldquo;Inference on treatment effects after selection among high-dimensional controls.&amp;quot;&lt;/a> &lt;em>Review of Economic Studies&lt;/em> 81(2): 608–650. The Double LASSO paper, including the empirical-application data we use in this post.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Cameron, A. C. &amp;amp; Miller, D. L.&lt;/strong> (2015). &lt;a href="https://doi.org/10.3368/jhr.50.2.317" target="_blank" rel="noopener">&amp;ldquo;A practitioner&amp;rsquo;s guide to cluster-robust inference.&amp;quot;&lt;/a> &lt;em>Journal of Human Resources&lt;/em> 50(2): 317–372. The reference for the HC1 finite-sample adjustment in §8.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Donohue III, J. J. &amp;amp; Levitt, S. D.&lt;/strong> (2001). &lt;a href="https://doi.org/10.1162/00335530151144050" target="_blank" rel="noopener">&amp;ldquo;The impact of legalized abortion on crime.&amp;quot;&lt;/a> &lt;em>Quarterly Journal of Economics&lt;/em> 116(2): 379–420. The original empirical paper. The substantive debate has continued for over two decades; this post does not weigh in on it.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Fitzgerald Sice, J., Lattimore, F., Robinson, T. &amp;amp; Zhu, A.&lt;/strong> (2026). &lt;a href="https://doi.org/10.15456/jae.2025335.0258270663" target="_blank" rel="noopener">&amp;ldquo;Double LASSO: Replication and Practical Insights.&amp;quot;&lt;/a> &lt;em>Journal of Applied Econometrics&lt;/em>, forthcoming. The source paper for this replication. The JAE DOI &lt;code>10.15456/jae.2025335.0258270663&lt;/code> is also the &lt;a href="http://qed.econ.queensu.ca/jae/datasets/" target="_blank" rel="noopener">replication archive identifier&lt;/a> where the Matlab/R code and data live.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Friedman, J., Hastie, T. &amp;amp; Tibshirani, R.&lt;/strong> (2010). &lt;a href="https://doi.org/10.18637/jss.v033.i01" target="_blank" rel="noopener">&amp;ldquo;Regularization paths for generalized linear models via coordinate descent.&amp;quot;&lt;/a> &lt;em>Journal of Statistical Software&lt;/em> 33(1). The reference for the &lt;code>glmnet&lt;/code> package.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Tibshirani, R.&lt;/strong> (1996). &lt;a href="https://doi.org/10.1111/j.2517-6161.1996.tb02080.x" target="_blank" rel="noopener">&amp;ldquo;Regression shrinkage and selection via the LASSO.&amp;quot;&lt;/a> &lt;em>Journal of the Royal Statistical Society Series B&lt;/em> 58(1): 267–288. The original LASSO paper.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>R packages used:&lt;/strong>&lt;/p>
&lt;ol start="8">
&lt;li>&lt;a href="https://cran.r-project.org/package=glmnet" target="_blank" rel="noopener">&lt;strong>&lt;code>glmnet&lt;/code>&lt;/strong>&lt;/a> — CRAN package for cross-validated LASSO, Ridge, and Elastic Net via coordinate descent. Used here for &lt;code>cv.glmnet&lt;/code> (PSL and DL-CV).&lt;/li>
&lt;li>&lt;a href="https://cran.r-project.org/package=hdm" target="_blank" rel="noopener">&lt;strong>&lt;code>hdm&lt;/code>&lt;/strong>&lt;/a> — CRAN package for high-dimensional metrics, including the rigorous-penalty &lt;code>rlasso&lt;/code> function used in DL-rigorous.&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Data and replication archives:&lt;/strong>&lt;/p>
&lt;ol start="10">
&lt;li>
&lt;p>The CSV files for this post live in &lt;a href="https://github.com/cmg777/starter-academic-v501/tree/master/content/post/r_double_lasso/data" target="_blank" rel="noopener">&lt;code>content/post/r_double_lasso/data/&lt;/code>&lt;/a> on the site&amp;rsquo;s GitHub. They were extracted from the Matlab files in Fitzgerald et al.&amp;rsquo;s JAE replication archive by the companion script &lt;a href="https://github.com/cmg777/starter-academic-v501/blob/master/content/post/r_double_lasso/prepare_data.R" target="_blank" rel="noopener">&lt;code>prepare_data.R&lt;/code>&lt;/a>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The Donohue–Levitt (2001) original replication data is available via the QJE article&amp;rsquo;s &lt;a href="https://doi.org/10.1162/00335530151144050" target="_blank" rel="noopener">supplementary materials&lt;/a> and Steven Levitt&amp;rsquo;s &lt;a href="https://pricetheory.uchicago.edu/levitt/" target="_blank" rel="noopener">University of Chicago page&lt;/a>. Belloni, Chernozhukov and Hansen (2014) extended this dataset to the 284-control specification used here.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;style>
.podcast-overlay {
display: none;
position: fixed;
bottom: 0;
left: 0;
right: 0;
z-index: 9999;
animation: podSlideUp 0.35s ease-out;
}
@keyframes podSlideUp {
from { transform: translateY(100%); }
to { transform: translateY(0); }
}
.podcast-overlay.pod-closing {
animation: podSlideDown 0.3s ease-in forwards;
}
@keyframes podSlideDown {
from { transform: translateY(0); }
to { transform: translateY(100%); }
}
.podcast-container {
background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%);
padding: 18px 24px 20px;
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
box-shadow: 0 -4px 32px rgba(0,0,0,0.5);
border-top: 1px solid rgba(106,155,204,0.2);
}
.podcast-inner {
max-width: 800px;
margin: 0 auto;
}
.podcast-top-row {
display: flex;
align-items: center;
gap: 14px;
margin-bottom: 14px;
}
.podcast-icon {
width: 42px;
height: 42px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 10px;
display: flex;
align-items: center;
justify-content: center;
flex-shrink: 0;
}
.podcast-icon svg {
width: 22px;
height: 22px;
fill: #fff;
}
.podcast-title-block {
flex: 1;
min-width: 0;
}
.podcast-title-block h4 {
margin: 0 0 1px 0;
color: #f0ece2;
font-size: 14px;
font-weight: 600;
letter-spacing: 0.02em;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.podcast-title-block span {
color: #8b9dc3;
font-size: 11px;
}
.podcast-close-btn {
background: none;
border: none;
cursor: pointer;
padding: 6px;
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.2s;
flex-shrink: 0;
}
.podcast-close-btn:hover {
background: rgba(255,255,255,0.1);
}
.podcast-close-btn svg {
width: 20px;
height: 20px;
fill: #8b9dc3;
}
.podcast-progress-wrap {
margin-bottom: 12px;
}
.podcast-time-row {
display: flex;
justify-content: space-between;
font-size: 11px;
color: #8b9dc3;
margin-bottom: 5px;
font-variant-numeric: tabular-nums;
}
.podcast-bar-bg {
width: 100%;
height: 6px;
background: rgba(255,255,255,0.1);
border-radius: 3px;
cursor: pointer;
position: relative;
overflow: hidden;
transition: height 0.15s;
}
.podcast-bar-buffered {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: rgba(106,155,204,0.25);
border-radius: 3px;
transition: width 0.3s;
}
.podcast-bar-progress {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: linear-gradient(90deg, #6a9bcc, #00d4c8);
border-radius: 3px;
transition: width 0.1s linear;
}
.podcast-bar-bg:hover {
height: 10px;
margin-top: -2px;
}
.podcast-controls-row {
display: flex;
align-items: center;
justify-content: space-between;
}
.podcast-transport {
display: flex;
align-items: center;
gap: 8px;
}
.podcast-btn {
background: none;
border: none;
cursor: pointer;
padding: 4px;
display: flex;
align-items: center;
justify-content: center;
border-radius: 50%;
transition: all 0.2s;
}
.podcast-btn svg {
fill: #c8d0e0;
transition: fill 0.2s;
}
.podcast-btn:hover svg {
fill: #f0ece2;
}
.podcast-btn-skip {
position: relative;
}
.podcast-btn-skip span {
position: absolute;
font-size: 7px;
font-weight: 700;
color: #c8d0e0;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
pointer-events: none;
margin-top: 1px;
}
.podcast-btn-play {
width: 48px;
height: 48px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 50%;
box-shadow: 0 3px 12px rgba(217,119,87,0.4);
transition: all 0.2s;
}
.podcast-btn-play:hover {
transform: scale(1.08);
box-shadow: 0 5px 20px rgba(217,119,87,0.5);
}
.podcast-btn-play svg {
fill: #fff;
width: 22px;
height: 22px;
}
.podcast-extras {
display: flex;
align-items: center;
gap: 10px;
}
.podcast-volume-wrap {
display: flex;
align-items: center;
gap: 5px;
}
.podcast-volume-wrap svg {
fill: #8b9dc3;
width: 16px;
height: 16px;
cursor: pointer;
flex-shrink: 0;
}
.podcast-volume-wrap svg:hover {
fill: #c8d0e0;
}
.podcast-volume-slider {
-webkit-appearance: none;
appearance: none;
width: 60px;
height: 4px;
background: rgba(255,255,255,0.12);
border-radius: 2px;
outline: none;
cursor: pointer;
}
.podcast-volume-slider::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: 12px;
height: 12px;
background: #6a9bcc;
border-radius: 50%;
cursor: pointer;
}
.podcast-speed-btn {
background: rgba(255,255,255,0.08);
border: 1px solid rgba(255,255,255,0.12);
color: #c8d0e0;
font-size: 11px;
font-weight: 600;
padding: 3px 9px;
border-radius: 12px;
cursor: pointer;
transition: all 0.2s;
font-family: inherit;
min-width: 40px;
text-align: center;
}
.podcast-speed-btn:hover {
background: rgba(106,155,204,0.2);
border-color: #6a9bcc;
color: #f0ece2;
}
.podcast-download-btn {
background: none;
border: 1px solid rgba(255,255,255,0.12);
border-radius: 8px;
padding: 4px 10px;
cursor: pointer;
display: flex;
align-items: center;
gap: 4px;
color: #8b9dc3;
font-size: 11px;
font-family: inherit;
text-decoration: none;
transition: all 0.2s;
}
.podcast-download-btn:hover {
border-color: #6a9bcc;
color: #f0ece2;
background: rgba(106,155,204,0.1);
}
.podcast-download-btn svg {
width: 14px;
height: 14px;
fill: currentColor;
}
@media (max-width: 600px) {
.podcast-container { padding: 14px 16px 16px; }
.podcast-volume-wrap { display: none; }
.podcast-title-block h4 { font-size: 13px; }
.podcast-extras { gap: 8px; }
}
&lt;/style>
&lt;div class="podcast-overlay" id="podOverlay">
&lt;div class="podcast-container">
&lt;div class="podcast-inner">
&lt;audio id="podAudio" preload="none" src="https://files.catbox.moe/anx2jt.m4a">&lt;/audio>
&lt;div class="podcast-top-row">
&lt;div class="podcast-icon">
&lt;svg viewBox="0 0 24 24">&lt;path d="M12 1a5 5 0 0 0-5 5v4a5 5 0 0 0 10 0V6a5 5 0 0 0-5-5zm0 16a7 7 0 0 1-7-7H3a9 9 0 0 0 8 8.94V22h2v-3.06A9 9 0 0 0 21 10h-2a7 7 0 0 1-7 7z"/>&lt;/svg>
&lt;/div>
&lt;div class="podcast-title-block">
&lt;h4>AI Podcast: Double LASSO for Causal Inference&lt;/h4>
&lt;span id="podDurationLabel">Click play to load&lt;/span>
&lt;/div>
&lt;button class="podcast-close-btn" onclick="podClose()" title="Close player">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"/>&lt;/svg>
&lt;/button>
&lt;/div>
&lt;div class="podcast-progress-wrap">
&lt;div class="podcast-time-row">
&lt;span id="podCurrent">0:00&lt;/span>
&lt;span id="podDuration">0:00&lt;/span>
&lt;/div>
&lt;div class="podcast-bar-bg" id="podBarBg" onclick="podSeek(event)">
&lt;div class="podcast-bar-buffered" id="podBuffered">&lt;/div>
&lt;div class="podcast-bar-progress" id="podProgress">&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class="podcast-controls-row">
&lt;div class="podcast-transport">
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(-15)" title="Back 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1L7 6l5 5V7c3.31 0 6 2.69 6 6s-2.69 6-6 6-6-2.69-6-6H4c0 4.42 3.58 8 8 8s8-3.58 8-8-3.58-8-8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-play" id="podPlayBtn" onclick="podToggle()" title="Play">
&lt;svg id="podIconPlay" viewBox="0 0 24 24">&lt;path d="M8 5v14l11-7z"/>&lt;/svg>
&lt;svg id="podIconPause" viewBox="0 0 24 24" style="display:none">&lt;path d="M6 19h4V5H6v14zm8-14v14h4V5h-4z"/>&lt;/svg>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(15)" title="Forward 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1l5 5-5 5V7c-3.31 0-6 2.69-6 6s2.69 6 6 6 6-2.69 6-6h2c0 4.42-3.58 8-8 8s-8-3.58-8-8 3.58-8 8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;/div>
&lt;div class="podcast-extras">
&lt;div class="podcast-volume-wrap">
&lt;svg id="podVolIcon" onclick="podMute()" viewBox="0 0 24 24">&lt;path d="M3 9v6h4l5 5V4L7 9H3zm13.5 3A4.5 4.5 0 0 0 14 8.5v7a4.47 4.47 0 0 0 2.5-3.5zM14 3.23v2.06a6.51 6.51 0 0 1 0 13.42v2.06A8.51 8.51 0 0 0 14 3.23z"/>&lt;/svg>
&lt;input type="range" class="podcast-volume-slider" id="podVolume" min="0" max="1" step="0.05" value="0.8">
&lt;/div>
&lt;button class="podcast-speed-btn" id="podSpeedBtn" onclick="podCycleSpeed()" title="Playback speed">1x&lt;/button>
&lt;a class="podcast-download-btn" href="https://files.catbox.moe/anx2jt.m4a" target="_blank" rel="noopener" title="Stream">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 9h-4V3H9v6H5l7 7 7-7zM5 18v2h14v-2H5z"/>&lt;/svg>
&lt;/a>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;script>
(function(){
var overlay = document.getElementById('podOverlay');
var a = document.getElementById('podAudio');
var speeds = [0.75, 1, 1.25, 1.5, 2];
var si = 1;
var opened = false;
function fmt(s){
if(isNaN(s)) return '0:00';
var m=Math.floor(s/60), sec=Math.floor(s%60);
return m+':'+(sec&lt;10?'0':'')+sec;
}
document.addEventListener('click', function(e){
var link = e.target.closest('a.btn-page-header');
if(!link) return;
var text = link.textContent.trim();
if(text.indexOf('AI Podcast') === -1) return;
e.preventDefault();
e.stopPropagation();
overlay.style.display = 'block';
overlay.classList.remove('pod-closing');
if(!opened){
a.preload = 'metadata';
a.load();
opened = true;
}
});
a.volume = 0.8;
a.addEventListener('loadedmetadata', function(){
document.getElementById('podDuration').textContent = fmt(a.duration);
document.getElementById('podDurationLabel').textContent = fmt(a.duration) + ' minutes';
});
a.addEventListener('timeupdate', function(){
document.getElementById('podCurrent').textContent = fmt(a.currentTime);
var pct = a.duration ? (a.currentTime/a.duration)*100 : 0;
document.getElementById('podProgress').style.width = pct+'%';
});
a.addEventListener('progress', function(){
if(a.buffered.length>0){
var pct = (a.buffered.end(a.buffered.length-1)/a.duration)*100;
document.getElementById('podBuffered').style.width = pct+'%';
}
});
a.addEventListener('ended', function(){
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
});
window.podToggle = function(){
if(a.paused){a.play();document.getElementById('podIconPlay').style.display='none';document.getElementById('podIconPause').style.display='';}
else{a.pause();document.getElementById('podIconPlay').style.display='';document.getElementById('podIconPause').style.display='none';}
};
window.podSkip = function(s){a.currentTime = Math.max(0,Math.min(a.duration||0,a.currentTime+s));};
window.podSeek = function(e){
var rect = document.getElementById('podBarBg').getBoundingClientRect();
var pct = (e.clientX - rect.left)/rect.width;
a.currentTime = pct * (a.duration||0);
};
window.podMute = function(){
a.muted = !a.muted;
document.getElementById('podVolume').value = a.muted ? 0 : a.volume;
};
window.podCycleSpeed = function(){
si = (si+1) % speeds.length;
a.playbackRate = speeds[si];
document.getElementById('podSpeedBtn').textContent = speeds[si]+'x';
};
window.podClose = function(){
overlay.classList.add('pod-closing');
setTimeout(function(){ overlay.style.display='none'; }, 300);
a.pause();
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
};
document.getElementById('podVolume').addEventListener('input', function(){
a.volume = this.value;
a.muted = false;
});
if(window.location.hash === '#podcast-player'){
overlay.style.display = 'block';
a.preload = 'metadata';
a.load();
opened = true;
}
})();
&lt;/script></description></item></channel></rss>