Double LASSO for Causal Inference

Does abortion reduce crime? A disciplined replication

−0.096rigorous Double LASSO

+0.019cross-validated · sign flip

284candidate controls

Carlos Mendez

Nagoya University (GSID)

July 8, 2026

The Tension

Act I

With 284 candidate controls, the answer depends on which ones you keep

Donohue & Levitt found more abortion access tracked less crime. Belloni–Chernozhukov–Hansen then expanded 8 controls into 284.

Keep too few and you risk confounding. Keep too many and the signal drowns. Which subset?

Variable selection is the result — the same data gives opposite answers by which controls you keep.

Five estimators, three crimes — wildly different answers from one dataset

\(\hat\alpha \pm 95\%\) CI · First-diff, kitchen-sink OLS, PSL, Double LASSO (rigorous), Double LASSO (CV). Dashed line = zero.

One dataset, five estimators — and the sign of the effect is up for grabs.

Where we’re going

The data: a 48-state, 12-year panel with 284 candidate controls
Five estimators, escalating discipline
Double LASSO — selecting on the outcome and the treatment
The lesson: theory-tuned vs prediction-tuned penalties

The Investigation

Act II

The lab: 48 states × 12 years, 576 rows, 284 candidate controls

Outcome — one of three crime rates (violent, property, murder)
Treatment — the “effective abortion rate”
Controls — 8 original covariates expanded to 284 (lags, interactions, trends)

\(p/n \approx 0.49\) — the high-dimensional regime where Double LASSO is meant to help.

Five estimators ask the same question with escalating discipline

First-difference OLS — no controls (the Donohue–Levitt baseline)
Kitchen-sink OLS — all 284 controls at once
PSL — one LASSO, treatment forced in
Double LASSO (rigorous) — two LASSOs, theory-chosen penalty
Double LASSO (CV) — two LASSOs, cross-validated penalty

Same question, escalating discipline — from zero controls to two theory-tuned LASSOs.

With zero controls, more abortion tracks less crime: −0.152

Outcome	\(\hat\alpha\)	SE	Sig. 5%?
Violent	−0.152	0.034	yes
Property	−0.108	0.022	yes
Murder	−0.204	0.067	yes

This is the result the four LASSO methods stress-test — not one they generate.

Throw in all 284 controls and OLS claims abortion raises murder by 234%

+2.34

Kitchen-sink OLS, murder (\(\hat\alpha\)); violent crime flips sign to +0.014

Double LASSO selects on the outcome and the treatment, then runs OLS

\[\hat\beta(\lambda)=\arg\min_\beta\ \frac{1}{2n}\sum_{i=1}^{n}\big(y_i-x_i^\top\beta\big)^2+\lambda\sum_{j=1}^{p}|\beta_j|\]

Run two LASSOs: \(y\) on \(X\) → \(I_y\), and \(d\) on \(X\) → \(I_d\).

Then OLS of \(y\) on \(d\) and the union \(I_y\cup I_d\).

The L1 penalty \(\lambda\sum_j|\beta_j|\) zeroes weak controls; the union keeps anything that predicts either side.

Six lines fit the rigorous Double LASSO in R

library(hdm); library(sandwich); library(lmtest)
Iy <- which(rlasso(X, y)$index)     # controls that predict crime
Id <- which(rlasso(X, d)$index)     # controls that predict abortion
S  <- union(Iy, Id)                  # the union is the Double LASSO safeguard
fit <- lm(y ~ d + X[, S])            # post-OLS on the selected support
coeftest(fit, vcov = vcovCL, cluster = state)["d", ]

Two rlasso fits, their union, one post-OLS — the entire estimator in six lines.

Theory keeps 8 controls; cross-validation keeps 150

\(|I_y|\), \(|I_d|\), intersection, union out of 284 candidates — rigorous (teal) vs CV (orange).

The rigorous penalty under-selects on purpose; CV’s 20× larger union chases prediction, not the causal signal.

Theory-tuned \(\lambda\) protects the causal signal; prediction-tuned \(\lambda\) flips it

Rigorous (theory)

\(\lambda\) from Belloni et al. theory
8–12 controls kept
violent \(\hat\alpha = -0.096\)
selection matches the paper exactly

CV (prediction)

\(\lambda\) minimises prediction MSE
109–161 controls kept
violent \(\hat\alpha = +0.019\) (sign flip)
murder \(\hat\alpha = -1.11\) (explodes)

Cross-validation’s \(\lambda\) is so small that 143 of 284 controls survive

Coefficient paths, \(d\)-equation (violent panel). Dashed line = \(\log(\lambda_{\min})\); 143 paths nonzero there.

Prediction-optimal is not selection-optimal — the causal target doesn’t want these 143 controls.

The Resolution

Act III

Rigorous Double LASSO restores a sensible −0.096 for violent crime

−0.096

\(\hat\alpha\), rigorous Double LASSO (SE 0.051) · within 0.01 of the paper’s estimate; selection counts match exactly

Does LASSO make this causal? No — two assumptions still carry the weight

Objection. Machine-selecting controls can’t manufacture identification.

Response. Correct. \(\alpha\) is identified only under conditional independence given X and parallel trends. LASSO just chooses controls flexibly; it can’t rule out collider bias or bias amplification. The paper evaluates a method, not the abortion–crime claim.

Let the theory, not the cross-validator, choose your controls.