Why Causal Machine Learning?
The post asks a sharp empirical question: can mining and mineral
prices raise economic activity, and do local institutions decide which way
it goes? The naive answer — comparing mining districts to
non-mining districts directly — is biased downward by 56% for the basic
mining effect (0.109 observed vs. 0.250 true). EconML's
CausalForestDML closes that gap by residualizing both the
outcome and the treatment against confounders before estimating the effect.
This app lets you turn the dials yourself. In four tabs you will: see how LASSO-style shrinkage selects controls one at a time; watch a Monte-Carlo contest pit Naive against DML over 100 simulations; and explore the post's six pairwise contrasts alongside their ground-truth values.
The selection idea behind the Causal Forest
Inside CausalForestDML, each honest tree decides which
controls to keep and which to drop
when estimating local treatment effects. The same shrink-or-keep
principle is easiest to see in the LASSO's L1 penalty: as the penalty
grows, coefficients snap to zero one at a time. Ridge
regression's L2 penalty only shrinks them toward zero — never exactly to
zero. That snapping behaviour is what makes LASSO a selector, and the
causal forest inherits the same logic at every split.
Shrinkage Lab
Slide the penalty knob λ and watch which controls survive. See how a few well-chosen covariates can recover a causal coefficient that overfit OLS cannot.
Bias Contest
Naive difference-in-means vs. DML residualization, head-to-head. Run 100 simulations to see the bias-variance pattern that motivates the whole tutorial.
Forest Plot
The post's six pairwise treatment contrasts. Compare the Naive raw-mean estimate, the DML Causal Forest estimate, and the known Ground Truth on the same x-axis. Toggle outcomes and methods.
Glossary (open a card if a term is unfamiliar)
CATE — τ(x)
ATE — E[τ(X)]
GATE
Nuisance functions g₀, m₀
Cross-fitting
Honest splitting
Neyman orthogonality
BLB SEs
Shrinkage Lab — turn the penalty knob
The simulated data has one treatment variable and many candidate controls.
The true treatment coefficient is α = 0.5 (orange curve below). LASSO
chooses how many controls to keep based on a single penalty λ.
Drag the λ slider and watch the coefficients shrink to exactly
zero, one at a time. This is the same selection logic
CausalForestDML applies inside every honest tree when it
decides which X-features drive heterogeneity.
What to look for
- Sparsity grows with λ. Slide right: more controls are pinned to zero. Slide left: more re-enter. At λ ≈ 0 you recover OLS.
- The post-OLS α̂ tracks the true α better than the raw LASSO α̂. The raw LASSO shrinks everything toward zero, including the treatment. Refit on the selected support to undo that.
- The treatment column is forced to stay in. Even at very large λ, the orange curve keeps a meaningful value — exactly the role
CausalForestDMLreserves for the treatment T in its second stage.
Bias Contest — Naive vs. residualized estimation
Same simulated data. The only difference is how the treatment effect is
estimated. The Naive approach uses a noisy single-step
LASSO (akin to running OLS on all controls). The residualized
Double LASSO follows the same two-stage logic that
CausalForestDML applies internally: subtract off the
confounding signal first, then estimate α from what is left. The contest
measures how close each method gets to the known true α — over a single
run, and over 100 Monte-Carlo replications.
Residualized DML
Two LASSOs (one for y, one for d) on the union of selected controls — the residualization step from Tutorial Eq. (3).
Naive (CV-tuned)
Single CV-tuned LASSO on (y, X) — the same model fit prediction-style, not causality-style.
Why does residualization win?
- Naive prediction-style estimation optimises out-of-sample MSE on y alone. That objective is not the same as estimating the causal α correctly — and it routinely over-selects controls that soak up treatment variation.
- Residualization is Neyman-orthogonal. Errors in the first-stage nuisance estimates enter the causal estimate only at second order. This is exactly the property
CausalForestDMLexploits with its Gradient Boosting first stage. - Run 100 simulations to see the average bias and spread. The residualized method centers on α̂ ≈ 0.5 — the true value the DGP used. The naive method centers below it.
Bias vs. variance over many simulations
Single runs are noisy. Run the whole pipeline 100 times with fresh draws (same parameters, different ε and v) to see whether the bias is systematic — the same diagnostic the post uses to defend the DML methodology.
The post's six pairwise contrasts — interactively
These numbers come straight from the post's tutorial_results/ate-table.csv
and execution log. Each row is one of the six pairwise treatment
contrasts: 1-0 (mining vs. none), 2-0, 3-0, and the within-mining
comparisons 2-1, 3-1, 3-2. First diff is the raw
difference of means (the naive estimator). DL (rigorous)
is the EconML CausalForestDML point estimate with its 90%
BLB confidence interval. DL (CV) here is repurposed to
show the known Ground Truth from the DGP — the bullseye each
estimator is trying to hit.
What to look for
- Compare First diff and DL (rigorous) on 1-0. The naive estimator gives 0.109; the DML forest gives 0.240 — almost exactly the true 0.250. That 0.13 gap is the confounding bias the residualization argument removes.
- Notice the price-effect contrasts (2-1, 3-1, 3-2). Hover the DML rows: 2-1 = 0.029 with a 90% CI that spans zero, while 3-1 = 0.220 is significant. The forest has recovered the non-linear price gradient (flat at low-to-medium, jumping at high) without being told to look for it.
- Where Naive over-estimates. For 3-1, the raw means give 0.413, almost double the true 0.300. Mining-at-high-prices districts are positively selected — better institutions, better geography. The DML pulls that bias out.
Outcomes (treatment contrasts)
Methods
GATE by Executive Constraints (Finding 3)
The headline heterogeneity result: institutions moderate the mining margin (1-0 climbs from 0.175 at exec_constraints = 1 to 0.264 at exec_constraints = 6, a 0.089 span) but not the price margin (3-1 stays roughly flat across the same axis, a 0.045 span with no monotone pattern). The asymmetry is the structural prediction of Mehlum-Moene-Torvik (2006) and the empirical finding of Hodler-Lechner-Raschky (2023).
Connecting back to Tabs 2 and 3
The Naive-vs-DML gap you just saw on the real EconML run is the same gap the simulation in Tab 3 documents. Both confirm the post's central methodological claim:
- Naive (1-0): 0.109 — 56% below the true 0.250.
- DML Causal Forest (1-0): 0.240 — within 4% of the true 0.250.
- Selection bias does not disappear with more data alone. The 0.13 gap is structural; it requires residualization to remove. That's why
CausalForestDML's two-stage architecture matters.