Does 401(k) access really cause households to save more?
A simple comparison says eligible households have \$19,559 more in net financial assets than ineligible ones. But eligible households also earn \$15,368 more in income, on average. How much of that \$19,559 gap is the 401(k) — and how much is the income gap dressed up in 401(k) clothing? Double Machine Learning (DML) answers that question by using flexible ML models to strip out confounding before estimating the causal effect.
This app lets you turn the dials yourself. In four tabs you will: watch a naive estimate decompose into causal effect + bias as confounding grows; compare PLR, IRM, and IIVM estimators side-by-side; and explore the 12 real DML estimates from the 1991 SIPP pension dataset.
The confounding picture in one animation
Two coefficients shrink as you regularise: the steel-blue path is the naive estimate that ignores income; the orange path is the DML estimate that partials it out. Both start at the same place, but only the DML path lands on the truth.
Confounding Lab
Increase income's role as confounder and watch the naive estimate inflate while DML stays anchored to the true effect.
PLR vs IRM vs IIVM
Same simulated data, three estimators. See how PLR and IRM agree on the ATE, while IIVM (an IV-style method) recovers a larger LATE for compliers.
Forest Plot
The 12 real DML estimates from the post. Toggle estimands and methods. Hover to see SE, CI, and the role of each ML learner.
Glossary (open a card if a term is unfamiliar)
Confounder
ATE — Average Treatment Effect
LATE — Local Average Treatment Effect
PLR — Partially Linear Regression
IRM — Interactive Regression Model
IIVM — Interactive IV Model
Cross-fitting (K-fold)
Orthogonal score
Propensity score
Complier
Confounding Lab — see how naive estimates inflate
Simulated data with one treatment column and many candidate controls — the way a DML practitioner would think of the 401(k) problem. The true treatment coefficient is α = 0.5. The LASSO chooses how many controls to keep based on a single penalty parameter λ. Drag λ and watch the controls drop out one at a time — this is the nuisance-function step DML performs internally.
What to look for
- Sparsity grows with λ. Slide right: more controls are pinned to zero. Slide left: more re-enter. At λ ≈ 0 you recover OLS — which would blow up if p ≥ n.
- Post-OLS α̂ tracks the true α more closely. Raw LASSO shrinks the treatment too. DML uses LASSO for selection, then refits unpenalised — the same logic used by the post's PLR estimator internally.
- The orange treatment line stays in. Try p = 100 and a large λ: 90+ controls disappear, but the orange line keeps a meaningful value. This is exactly what cross-fitting buys you in DML.
PLR vs IRM vs IIVM — three estimators, one truth
Same simulated data. PLR uses partialling out, IRM uses doubly-robust AIPW with propensity scores, IIVM uses instrumental variables. The three approaches make different assumptions and target different estimands. Tweak the sliders and watch how they agree (PLR ≈ IRM) — and how IIVM systematically picks up a larger effect (the LATE).
PLR / IRM ATE estimators
Partialling-out (PLR) and doubly-robust AIPW (IRM) target the ATE.
IIVM IV estimator
CV-tuned LASSO nuisances, IV-style scoring. Targets the LATE on compliers.
Why does this happen?
- PLR ≈ IRM under constant effects. When the true treatment effect is the same for every household, partialling out and AIPW give nearly identical answers — as the post's Table-2 results show (\$8,730 vs \$8,213 ATE).
- IIVM > PLR/IRM when compliers benefit more. The LATE captures the effect on marginal participants — exactly the population that responds to a policy change. In the real 401(k) data: \$11,746 LATE vs \$8,730 ATE.
- The naive gap (steel-blue line) is much larger than any DML estimate. That gap is mostly income confounding — the bias DML strips out.
Bias vs. variance over many simulations
Single runs are noisy. Run the whole pipeline 100 times with fresh draws (same parameters, different ε and v) to see whether PLR/IRM and IIVM bias is systematic.
The post's forest plot — interactively
These numbers come straight from all_results.csv and
naive_estimates.csv in the post's folder — the same data used
to produce the grand-comparison figure. Toggle estimands and methods to
compare. Hover a point for SE, 95% CI, and the number of covariates used.
What to look for
- Toggle "Naive (mean diff)" off to zoom into the DML estimates. The naive bars (\$19,559 for eligibility and \$27,372 for participation) compress the x-axis — they are more than double the corresponding DML estimates.
- PLR and IRM cluster tightly between \$7,800 and \$9,400 across 4 ML learners. This convergence is the robustness check: two distinct DML frameworks land on the same ATE.
- IIVM sits systematically higher (\$11,200--\$12,300). This is the LATE-vs-ATE gap, not noise — compliers respond more strongly to eligibility than the average household.
Estimand
Methods
The naive estimate, decomposed
For 401(k) eligibility, the naive mean difference of \$19,559 decomposes into two components:
- Causal effect (DML ATE) ≈ \$8,730 — what the 401(k) genuinely contributes.
- Confounding bias ≈ \$10,829 — pre-existing differences (income, education, age) that have nothing to do with the plan.
In percent terms, 55% of the naive gap is confounding bias. The income gap alone — \$15,368 — explains most of it. This is what DML's cross-fitting and orthogonal scoring are designed to strip out.
Connecting back to Tab 3
The simulated comparisons you just explored map directly onto the real pension data:
- PLR vs IRM agreement: the simulation shows they agree under constant effects; the data shows they agree at \$8,730 vs \$8,213.
- IIVM > ATE: the simulation shows IV-style scoring picks up the complier population; the data shows IIVM = \$11,746 — about 35% larger than ATE.
- Learner robustness: 4 ML learners (Lasso, RF, Tree, XGB) move each estimator by less than \$1,500 — a hallmark of well-functioning DML.
The policy takeaway from the post: expanding 401(k) eligibility raises net financial assets by roughly \$8,500 per newly eligible household, and by \$12,000 for marginal participants — the population most affected by an eligibility expansion.