Double LASSO for Causal Inference

Does abortion reduce crime? A disciplined replication

−0.096rigorous Double LASSO
+0.019cross-validated · sign flip
284candidate controls

Carlos Mendez

Nagoya University (GSID)

June 11, 2026

The Tension

Act I

With 284 candidate controls, the answer depends on which ones you keep

Donohue & Levitt found more abortion access tracked less crime. Belloni–Chernozhukov–Hansen then expanded 8 controls into 284.

Keep too few and you risk confounding. Keep too many and the signal drowns. Which subset?

Five estimators, three crimes — wildly different answers from one dataset

\(\hat\alpha \pm 95\%\) CI · First-diff, kitchen-sink OLS, PSL, Double LASSO (rigorous), Double LASSO (CV). Dashed line = zero.

Where we’re going

  • The data: a 48-state, 12-year panel with 284 candidate controls
  • Five estimators, escalating discipline
  • Double LASSO — selecting on the outcome and the treatment
  • The lesson: theory-tuned vs prediction-tuned penalties

The Investigation

Act II

The lab: 48 states × 12 years, 576 rows, 284 candidate controls

  • Outcome — one of three crime rates (violent, property, murder)
  • Treatment — the “effective abortion rate”
  • Controls — 8 original covariates expanded to 284 (lags, interactions, trends)

State fixed effects absorbed by first-differencing; year effects partialled out (Frisch–Waugh–Lovell). \(p/n \approx 0.49\) — the high-dimensional regime where Double LASSO is meant to help.

Five estimators ask the same question with escalating discipline

  • First-difference OLS — no controls (the Donohue–Levitt baseline)
  • Kitchen-sink OLS — all 284 controls at once
  • PSL — one LASSO, treatment forced in
  • Double LASSO (rigorous) — two LASSOs, theory-chosen penalty
  • Double LASSO (CV) — two LASSOs, cross-validated penalty

With zero controls, more abortion tracks less crime: −0.152

Outcome \(\hat\alpha\) SE Sig. 5%?
Violent −0.152 0.034 yes
Property −0.108 0.022 yes
Murder −0.204 0.067 yes

This is the result the four LASSO methods stress-test — not one they generate.

Throw in all 284 controls and OLS claims abortion raises murder by 234%

+2.34

Kitchen-sink OLS, murder (\(\hat\alpha\)); violent crime flips sign to +0.014

Double LASSO selects on the outcome and the treatment, then runs OLS

\[\hat\beta(\lambda)=\arg\min_\beta\ \frac{1}{2n}\sum_{i=1}^{n}\big(y_i-x_i^\top\beta\big)^2+\lambda\sum_{j=1}^{p}|\beta_j|\]

Run it twice — once for \(y\) on \(X\) (set \(I_y\)), once for \(d\) on \(X\) (set \(I_d\)) — then OLS of \(y\) on \(d\) and the union \(I_y\cup I_d\).

The L1 penalty \(\lambda\sum_j|\beta_j|\) zeroes weak controls; the union keeps anything that predicts either side.

Six lines fit the rigorous Double LASSO in R

library(hdm); library(sandwich); library(lmtest)
Iy <- which(rlasso(X, y)$index)     # controls that predict crime
Id <- which(rlasso(X, d)$index)     # controls that predict abortion
S  <- union(Iy, Id)                  # the union is the Double LASSO safeguard
fit <- lm(y ~ d + X[, S])            # post-OLS on the selected support
coeftest(fit, vcov = vcovCL, cluster = state)["d", ]

Theory keeps 8 controls; cross-validation keeps 150

\(|I_y|\), \(|I_d|\), intersection, union out of 284 candidates — rigorous (teal) vs CV (orange).

Theory-tuned \(\lambda\) protects the causal signal; prediction-tuned \(\lambda\) flips it

Rigorous (theory)

  • \(\lambda\) from Belloni et al. theory
  • 8–12 controls kept
  • violent \(\hat\alpha = -0.096\)
  • selection matches the paper exactly

CV (prediction)

  • \(\lambda\) minimises prediction MSE
  • 109–161 controls kept
  • violent \(\hat\alpha = +0.019\) (sign flip)
  • murder \(\hat\alpha = -1.11\) (explodes)

Cross-validation’s \(\lambda\) is so small that 143 of 284 controls survive

Coefficient paths, \(d\)-equation (violent panel). Dashed line = \(\log(\lambda_{\min})\); 143 paths nonzero there.

The Resolution

Act III

Rigorous Double LASSO restores a sensible −0.096 for violent crime

−0.096

\(\hat\alpha\), rigorous Double LASSO (SE 0.051) · matches the paper’s −0.104; selection counts match exactly

Does LASSO make this causal? No — two assumptions still carry the weight

Objection. Machine-selecting controls can’t manufacture identification.

Response. Correct. \(\alpha\) is identified only under conditional independence given X and parallel trends. LASSO just chooses controls flexibly; it can’t rule out collider bias or bias amplification. The paper evaluates a method, not the abortion–crime claim.

Let the theory, not the cross-validator, choose your controls.