EconML CausalForestDML — Interactive Lab

Why Causal Machine Learning?

The post asks a sharp empirical question: can mining and mineral prices raise economic activity, and do local institutions decide which way it goes? The naive answer — comparing mining districts to non-mining districts directly — is biased downward by 56% for the basic mining effect (0.109 observed vs. 0.250 true). EconML's CausalForestDML closes that gap by residualizing both the outcome and the treatment against confounders before estimating the effect.

This app lets you turn the dials yourself. In four tabs you will: see how LASSO-style shrinkage selects controls one at a time; watch a Monte-Carlo contest pit Naive against DML over 100 simulations; and explore the post's six pairwise contrasts alongside their ground-truth values.

The selection idea behind the Causal Forest

Inside CausalForestDML, each honest tree decides which controls to keep and which to drop when estimating local treatment effects. The same shrink-or-keep principle is easiest to see in the LASSO's L1 penalty: as the penalty grows, coefficients snap to zero one at a time. Ridge regression's L2 penalty only shrinks them toward zero — never exactly to zero. That snapping behaviour is what makes LASSO a selector, and the causal forest inherits the same logic at every split.

Tab 2

Shrinkage Lab

Slide the penalty knob λ and watch which controls survive. See how a few well-chosen covariates can recover a causal coefficient that overfit OLS cannot.

Tab 3

Bias Contest

Naive difference-in-means vs. DML residualization, head-to-head. Run 100 simulations to see the bias-variance pattern that motivates the whole tutorial.

Tab 4

Forest Plot

The post's six pairwise treatment contrasts. Compare the Naive raw-mean estimate, the DML Causal Forest estimate, and the known Ground Truth on the same x-axis. Toggle outcomes and methods.

Glossary (open a card if a term is unfamiliar)

CATE — τ(x)

Conditional Average Treatment Effect — the average effect for units with covariate profile x. The CATE is a function, not a single number.

ATE — E[τ(X)]

Average Treatment Effect, the CATE averaged over the whole sample. The headline policy number.

GATE

Group Average Treatment Effect — the CATE averaged over a pre-specified subgroup (here, executive-constraint level). How you test "do institutions moderate the effect?"

Nuisance functions g₀, m₀

Two conditional means: g₀ = E[Y|X,W] and m₀ = E[T|X,W]. You don't care about their values; you only need them to be subtracted out.

Cross-fitting

Fit the nuisance models on one fold; apply them to a held-out fold. Rotate. Hides each observation from the model that will residualize it.

Honest splitting

Half of each tree's subsample picks the splits; the other half estimates the leaf treatment effects. What licenses valid CIs from the forest.

Neyman orthogonality

A property of the DML estimating equation: at the truth, it is flat in the nuisance functions. Why slow first-stage learners still give √n-consistent causal estimates.

BLB SEs

Bootstrap of Little Bags — a sub-bootstrap variance estimator for forests. What EconML reports when you pass inference=True.

Shrinkage Lab — turn the penalty knob

The simulated data has one treatment variable and many candidate controls. The true treatment coefficient is α = 0.5 (orange curve below). LASSO chooses how many controls to keep based on a single penalty λ. Drag the λ slider and watch the coefficients shrink to exactly zero, one at a time. This is the same selection logic CausalForestDML applies inside every honest tree when it decides which X-features drive heterogeneity.

Sample size n 200

More data ⇒ each control's coefficient is estimated more precisely.

Number of controls p 40

About 15% of these have a true nonzero effect; the rest are noise.

Signal strength 0.60

Magnitude of the truly-relevant coefficients relative to noise.

Penalty —

Slide left for less shrinkage (more controls survive); right for more.

controls kept (|I|)

—

out of p candidates

α̂ from raw LASSO

—

shrunk toward zero

α̂ from post-OLS

—

refit on selected support

true α

0.50

held fixed for comparison

What to look for

Sparsity grows with λ. Slide right: more controls are pinned to zero. Slide left: more re-enter. At λ ≈ 0 you recover OLS.
The post-OLS α̂ tracks the true α better than the raw LASSO α̂. The raw LASSO shrinks everything toward zero, including the treatment. Refit on the selected support to undo that.
The treatment column is forced to stay in. Even at very large λ, the orange curve keeps a meaningful value — exactly the role CausalForestDML reserves for the treatment T in its second stage.

Bias Contest — Naive vs. residualized estimation

Same simulated data. The only difference is how the treatment effect is estimated. The Naive approach uses a noisy single-step LASSO (akin to running OLS on all controls). The residualized Double LASSO follows the same two-stage logic that CausalForestDML applies internally: subtract off the confounding signal first, then estimate α from what is left. The contest measures how close each method gets to the known true α — over a single run, and over 100 Monte-Carlo replications.

Sample size n 200

Capped at 300 so the 100-sim run finishes quickly.

Number of controls p 40

Capped at 50 for the 100-sim run.

Signal strength 0.50

Common scale for both the outcome and the treatment confounders.

Confound asymmetry 0.80

0 = controls predict y and d equally · 1 = controls predict d well, y barely. The harder case for naive methods.

Residualized DML

Two LASSOs (one for y, one for d) on the union of selected controls — the residualization step from Tutorial Eq. (3).

α̂—

SE(α̂)—

|I_y|—

|I_d|—

union |I_y ∪ I_d|—

Naive (CV-tuned)

Single CV-tuned LASSO on (y, X) — the same model fit prediction-style, not causality-style.

α̂—

SE(α̂)—

|I_y|—

|I_d|—

union |I_y ∪ I_d|—

Why does residualization win?

Naive prediction-style estimation optimises out-of-sample MSE on y alone. That objective is not the same as estimating the causal α correctly — and it routinely over-selects controls that soak up treatment variation.
Residualization is Neyman-orthogonal. Errors in the first-stage nuisance estimates enter the causal estimate only at second order. This is exactly the property CausalForestDML exploits with its Gradient Boosting first stage.
Run 100 simulations to see the average bias and spread. The residualized method centers on α̂ ≈ 0.5 — the true value the DGP used. The naive method centers below it.

Bias vs. variance over many simulations

Single runs are noisy. Run the whole pipeline 100 times with fresh draws (same parameters, different ε and v) to see whether the bias is systematic — the same diagnostic the post uses to defend the DML methodology.

The post's six pairwise contrasts — interactively

These numbers come straight from the post's tutorial_results/ate-table.csv and execution log. Each row is one of the six pairwise treatment contrasts: 1-0 (mining vs. none), 2-0, 3-0, and the within-mining comparisons 2-1, 3-1, 3-2. First diff is the raw difference of means (the naive estimator). DL (rigorous) is the EconML CausalForestDML point estimate with its 90% BLB confidence interval. DL (CV) here is repurposed to show the known Ground Truth from the DGP — the bullseye each estimator is trying to hit.

What to look for

Compare First diff and DL (rigorous) on 1-0. The naive estimator gives 0.109; the DML forest gives 0.240 — almost exactly the true 0.250. That 0.13 gap is the confounding bias the residualization argument removes.
Notice the price-effect contrasts (2-1, 3-1, 3-2). Hover the DML rows: 2-1 = 0.029 with a 90% CI that spans zero, while 3-1 = 0.220 is significant. The forest has recovered the non-linear price gradient (flat at low-to-medium, jumping at high) without being told to look for it.
Where Naive over-estimates. For 3-1, the raw means give 0.413, almost double the true 0.300. Mining-at-high-prices districts are positively selected — better institutions, better geography. The DML pulls that bias out.

Outcomes (treatment contrasts)

1-0 (Mining vs none) 2-0 (Med price vs none) 3-0 (High price vs none) 2-1 (Med vs Low price) 3-1 (High vs Low price) 3-2 (High vs Med price)

Methods

First diff (Naive) DL (rigorous) — DML Causal Forest DL (CV) — Ground Truth

GATE by Executive Constraints (Finding 3)

The headline heterogeneity result: institutions moderate the mining margin (1-0 climbs from 0.175 at exec_constraints = 1 to 0.264 at exec_constraints = 6, a 0.089 span) but not the price margin (3-1 stays roughly flat across the same axis, a 0.045 span with no monotone pattern). The asymmetry is the structural prediction of Mehlum-Moene-Torvik (2006) and the empirical finding of Hodler-Lechner-Raschky (2023).

Connecting back to Tabs 2 and 3

The Naive-vs-DML gap you just saw on the real EconML run is the same gap the simulation in Tab 3 documents. Both confirm the post's central methodological claim:

Naive (1-0): 0.109 — 56% below the true 0.250.
DML Causal Forest (1-0): 0.240 — within 4% of the true 0.250.
Selection bias does not disappear with more data alone. The 0.13 gap is structural; it requires residualization to remove. That's why CausalForestDML's two-stage architecture matters.