<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>double machine learning | Carlos Mendez</title><link>https://carlos-mendez.org/tag/double-machine-learning/</link><atom:link href="https://carlos-mendez.org/tag/double-machine-learning/index.xml" rel="self" type="application/rss+xml"/><description>double machine learning</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>Carlos Mendez</copyright><lastBuildDate>Thu, 07 May 2026 00:00:00 +0000</lastBuildDate><image><url>https://carlos-mendez.org/media/icon_huedfae549300b4ca5d201a9bd09a3ecd5_79625_512x512_fill_lanczos_center_3.png</url><title>double machine learning</title><link>https://carlos-mendez.org/tag/double-machine-learning/</link></image><item><title>Causal Machine Learning and the Resource Curse with Python EconML</title><link>https://carlos-mendez.org/post/python_econml/</link><pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/python_econml/</guid><description>&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>Can natural resource wealth be both a blessing and a curse? And can local institutions determine which way it goes? In this tutorial, we use &lt;strong>EconML&amp;rsquo;s &lt;code>CausalForestDML&lt;/code>&lt;/strong> to estimate &lt;strong>heterogeneous causal effects&lt;/strong> of mining and mineral prices on economic development &amp;mdash; and test whether institutional quality moderates those effects differently for mining versus price shocks.&lt;/p>
&lt;p>We use &lt;strong>simulated data with known ground-truth parameters&lt;/strong> so we can verify that the method recovers the correct answers. The simulated dataset mirrors the structure of Hodler, Lechner &amp;amp; Raschky (2023), who studied 3,800 Sub-Saharan African districts using a Modified Causal Forest. This tutorial focuses on the &lt;strong>DML methodology&lt;/strong>: how the Double Machine Learning framework separates nuisance estimation from causal effect estimation to produce valid, efficient heterogeneous treatment effect estimates.&lt;/p>
&lt;p>For the &lt;strong>economic narrative&lt;/strong> and a companion implementation in Stata 19, see &lt;a href="https://carlos-mendez.org/post/stata_cate2/">Causal Machine Learning and the Resource Curse with Stata 19&lt;/a>.&lt;/p>
&lt;h3 id="learning-objectives">Learning objectives&lt;/h3>
&lt;p>By the end of this tutorial, you will be able to:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Understand&lt;/strong> the Double Machine Learning (DML) framework and why residualization enables valid causal inference&lt;/li>
&lt;li>&lt;strong>Distinguish&lt;/strong> heterogeneity features (X) from nuisance controls (W) in &lt;code>CausalForestDML&lt;/code>&lt;/li>
&lt;li>&lt;strong>Configure&lt;/strong> &lt;code>CausalForestDML&lt;/code> for discrete multi-valued treatments with panel data&lt;/li>
&lt;li>&lt;strong>Estimate&lt;/strong> Average Treatment Effects (ATEs) and Group Average Treatment Effects (GATEs) with proper BLB inference&lt;/li>
&lt;li>&lt;strong>Interpret&lt;/strong> GATE patterns to identify which variables moderate treatment effects&lt;/li>
&lt;li>&lt;strong>Use&lt;/strong> EconML-specific tools like &lt;code>SingleTreeCateInterpreter&lt;/code> for data-driven subgroup discovery&lt;/li>
&lt;li>&lt;strong>Evaluate&lt;/strong> results against known ground-truth parameters&lt;/li>
&lt;/ol>
&lt;h2 id="the-dml-causal-forest">The DML Causal Forest&lt;/h2>
&lt;h3 id="what-is-a-conditional-average-treatment-effect">What is a Conditional Average Treatment Effect?&lt;/h3>
&lt;p>The &lt;strong>Conditional Average Treatment Effect&lt;/strong> (CATE) measures how a treatment effect varies across individuals with different characteristics:&lt;/p>
&lt;p>$$\tau(\mathbf{x}) = E\{Y_i(1) - Y_i(0) \mid \mathbf{X}_i = \mathbf{x}\}$$&lt;/p>
&lt;p>In words: $\tau(\mathbf{x})$ is the expected difference in potential outcomes for a unit with covariates $\mathbf{x}$. When $\tau(\mathbf{x})$ varies across $\mathbf{x}$, we have &lt;strong>treatment effect heterogeneity&lt;/strong> &amp;mdash; the treatment helps some units more than others.&lt;/p>
&lt;h3 id="the-partial-linear-model">The Partial Linear Model&lt;/h3>
&lt;p>EconML&amp;rsquo;s &lt;code>CausalForestDML&lt;/code> estimates CATEs within the &lt;strong>partially linear model&lt;/strong>:&lt;/p>
&lt;p>$$Y = T \cdot \tau(\mathbf{x}) + g_0(\mathbf{x}, \mathbf{w}) + \varepsilon, \qquad T = m_0(\mathbf{x}, \mathbf{w}) + v$$&lt;/p>
&lt;p>where $g_0(\cdot)$ and $m_0(\cdot)$ are flexible &lt;strong>nuisance functions&lt;/strong> estimated by machine learning (Gradient Boosting in our case), $\tau(\mathbf{x})$ is the heterogeneous treatment effect function we want to learn, $\mathbf{x}$ are the covariates that may moderate the treatment effect, and $\mathbf{w}$ are additional controls used only in the first-stage nuisance models.&lt;/p>
&lt;p>The key insight of &lt;strong>Double Machine Learning&lt;/strong> (Chernozhukov et al., 2018) is to &lt;strong>residualize&lt;/strong> both the outcome and the treatment before fitting the causal forest. Think of residualization like noise-canceling headphones: the first stage removes the &amp;ldquo;background noise&amp;rdquo; of confounders from both the outcome and treatment, so the causal forest only hears the &amp;ldquo;signal&amp;rdquo; of the treatment effect. This two-step approach has a property called &lt;strong>Neyman orthogonality&lt;/strong>: first-stage estimation errors have only a second-order impact on the causal estimates. This means the causal forest remains valid even when the nuisance models converge at slower-than-parametric rates.&lt;/p>
&lt;h3 id="three-levels-of-effects">Three levels of effects&lt;/h3>
&lt;p>The causal forest produces per-observation CATE estimates, which aggregate to three levels:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Level&lt;/th>
&lt;th>Name&lt;/th>
&lt;th>What it measures&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>CATE&lt;/strong>&lt;/td>
&lt;td>Conditional ATE&lt;/td>
&lt;td>Effect for an individual with covariates $\mathbf{x}$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>GATE&lt;/strong>&lt;/td>
&lt;td>Group ATE&lt;/td>
&lt;td>Average effect for a subgroup defined by a variable $Z$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>ATE&lt;/strong>&lt;/td>
&lt;td>Average TE&lt;/td>
&lt;td>Overall average across all units&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="dml-pipeline">DML pipeline&lt;/h3>
&lt;pre>&lt;code class="language-mermaid">flowchart LR
A[&amp;quot;&amp;lt;b&amp;gt;Panel Data&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;3,000 obs&amp;quot;]:::data
B[&amp;quot;&amp;lt;b&amp;gt;First Stage&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;GBM nuisance&amp;lt;br/&amp;gt;models&amp;quot;]:::first
C[&amp;quot;&amp;lt;b&amp;gt;Residualize&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Y&amp;amp;#771; = Y - g&amp;amp;#770;(X,W)&amp;lt;br/&amp;gt;T&amp;amp;#771; = T - m&amp;amp;#770;(X,W)&amp;quot;]:::resid
D[&amp;quot;&amp;lt;b&amp;gt;Causal Forest&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;500 honest trees&amp;quot;]:::forest
E[&amp;quot;&amp;lt;b&amp;gt;CATEs&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Per-observation&amp;lt;br/&amp;gt;effects&amp;quot;]:::cate
A --&amp;gt; B --&amp;gt; C --&amp;gt; D --&amp;gt; E
classDef data fill:#6a9bcc,stroke:#141413,color:#fff
classDef first fill:#d97757,stroke:#141413,color:#fff
classDef resid fill:#00d4c8,stroke:#141413,color:#141413
classDef forest fill:#141413,stroke:#d97757,color:#fff
classDef cate fill:#6a9bcc,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;h2 id="setup-and-configuration">Setup and configuration&lt;/h2>
&lt;p>We use &lt;code>CausalForestDML&lt;/code> from EconML with Gradient Boosting nuisance models. The ground-truth parameters are defined inline so the tutorial is fully self-contained.&lt;/p>
&lt;pre>&lt;code class="language-python">import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from econml.dml import CausalForestDML
from sklearn.ensemble import (GradientBoostingRegressor,
GradientBoostingClassifier)
# Ground-truth ATEs from the data-generating process
TRUE_ATES = {
'1-0': 0.250, # Mining effect
'2-0': 0.300, # Mining + medium price
'3-0': 0.550, # Mining + high price
'2-1': 0.050, # Medium price premium (small)
'3-1': 0.300, # High price premium (large)
'3-2': 0.250, # High vs medium step
}
&lt;/code>&lt;/pre>
&lt;h2 id="load-the-simulated-data">Load the simulated data&lt;/h2>
&lt;p>The dataset simulates 300 districts across 8 countries observed over 10 years (2003&amp;ndash;2012), following the structure of Hodler, Lechner &amp;amp; Raschky (2023). Treatment has four levels: no mining (0), mining at low prices (1), medium prices (2), and high prices (3).&lt;/p>
&lt;pre>&lt;code class="language-python">DATA_URL = (&amp;quot;https://github.com/cmg777/starter-academic-v501&amp;quot;
&amp;quot;/raw/master/content/post/python_EconML/sim_resource_curse.csv&amp;quot;)
df = pd.read_csv(DATA_URL)
print(f&amp;quot;Dataset: {len(df):,} observations&amp;quot;)
print(f&amp;quot;Districts: {df['district_id'].nunique()}, &amp;quot;
f&amp;quot;Countries: {df['country_id'].nunique()}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Dataset: 3,000 observations
Districts: 300, Countries: 8
&lt;/code>&lt;/pre>
&lt;p>The dataset contains 3,000 district-year observations with a &lt;strong>heavily imbalanced&lt;/strong> treatment: 85% of observations are untreated (no mining), while each of the three mining groups comprises only 5% of the data. This imbalance makes causal inference challenging &amp;mdash; the causal forest must learn from relatively few treated observations.&lt;/p>
&lt;h2 id="descriptive-statistics">Descriptive statistics&lt;/h2>
&lt;h3 id="treatment-distribution">Treatment distribution&lt;/h3>
&lt;pre>&lt;code class="language-python">labels = {0: 'No mining', 1: 'Low prices',
2: 'Med prices', 3: 'High prices'}
for t, n in df['treatment'].value_counts().sort_index().items():
print(f&amp;quot; {t} ({labels[t]}): {n:,} ({n/len(df):.1%})&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> 0 (No mining): 2,550 (85.0%)
1 (Low prices): 150 (5.0%)
2 (Med prices): 150 (5.0%)
3 (High prices): 150 (5.0%)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="python_econml_treatment_dist.png" alt="Treatment distribution across the four groups">
&lt;em>Treatment distribution across the four groups. The 85/5/5/5 imbalance makes causal inference challenging.&lt;/em>&lt;/p>
&lt;p>The 85/5/5/5 split means the causal forest has 2,550 control observations but only 150 per treatment level. For within-mining comparisons (e.g., 3-1), only 300 observations contribute, making standard errors larger for price-effect estimates.&lt;/p>
&lt;h3 id="outcomes-by-treatment-group">Outcomes by treatment group&lt;/h3>
&lt;pre>&lt;code class="language-python">for t in sorted(df['treatment'].unique()):
mask = df['treatment'] == t
m_ntl = df.loc[mask, 'ntl_log'].mean()
m_conf = df.loc[mask, 'conflict'].mean()
print(f&amp;quot; {t} ({labels[t]}): NTL={m_ntl:.3f} Conflict={m_conf:.1%}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> 0 (No mining): NTL=-1.137 Conflict=10.7%
1 (Low prices): NTL=-1.028 Conflict=18.0%
2 (Med prices): NTL=-0.930 Conflict=18.0%
3 (High prices): NTL=-0.615 Conflict=28.0%
&lt;/code>&lt;/pre>
&lt;p>The raw means show a clear gradient: higher treatment levels are associated with higher NTL and higher conflict rates. But these raw comparisons are &lt;strong>biased&lt;/strong> because mining districts differ systematically from non-mining districts in geography, institutions, and economic development.&lt;/p>
&lt;h2 id="naive-comparison-why-we-need-causal-ml">Naive comparison: why we need causal ML&lt;/h2>
&lt;pre>&lt;code class="language-python">for comp in ['1-0', '2-1', '3-1']:
a, b = int(comp[0]), int(comp[2])
naive = df.loc[df['treatment']==a, 'ntl_log'].mean() - \
df.loc[df['treatment']==b, 'ntl_log'].mean()
truth = TRUE_ATES[comp]
print(f&amp;quot; {comp}: Naive={naive:.3f} Truth={truth:.3f} Bias={naive-truth:+.3f}&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> 1-0: Naive=0.109 Truth=0.250 Bias=-0.141
2-1: Naive=0.098 Truth=0.050 Bias=+0.048
3-1: Naive=0.413 Truth=0.300 Bias=+0.113
&lt;/code>&lt;/pre>
&lt;p>The naive 1-0 estimate of &lt;strong>0.109&lt;/strong> is severely biased downward from the true effect of &lt;strong>0.250&lt;/strong> &amp;mdash; a 56% underestimate. This happens because mining districts tend to have worse geographic and institutional characteristics that independently reduce development. The DML Causal Forest removes this &lt;strong>selection bias&lt;/strong> by residualizing both the outcome and the treatment against observed confounders before estimating the causal effect.&lt;/p>
&lt;h2 id="econml-estimation">EconML estimation&lt;/h2>
&lt;h3 id="configuration">Configuration&lt;/h3>
&lt;p>We separate covariates into two groups with distinct roles in the DML framework:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>X features&lt;/strong> (10 variables): Enter the causal forest and can drive treatment effect heterogeneity. These include &lt;code>exec_constraints&lt;/code>, &lt;code>quality_of_govt&lt;/code>, &lt;code>gdp_pc&lt;/code>, &lt;code>elevation&lt;/code>, &lt;code>temperature&lt;/code>, &lt;code>ruggedness&lt;/code>, &lt;code>distance_capital&lt;/code>, &lt;code>agri_suitability&lt;/code>, &lt;code>population&lt;/code>, and &lt;code>ethnic_frac&lt;/code>.&lt;/li>
&lt;li>&lt;strong>W controls&lt;/strong> (2 variables): Used only in the first-stage nuisance models (&lt;code>country_id&lt;/code>, &lt;code>year&lt;/code>). These absorb country and time fixed effects but do not enter the causal forest.&lt;/li>
&lt;/ul>
&lt;pre>&lt;code class="language-python">X_COLS = ['exec_constraints', 'quality_of_govt', 'gdp_pc',
'elevation', 'temperature', 'ruggedness',
'distance_capital', 'agri_suitability', 'population',
'ethnic_frac']
W_COLS = ['country_id', 'year']
&lt;/code>&lt;/pre>
&lt;h3 id="fitting-the-model">Fitting the model&lt;/h3>
&lt;pre>&lt;code class="language-python">Y = df['ntl_log'].values
T = df['treatment'].values
X = df[X_COLS].values
W = df[W_COLS].values
est_ntl = CausalForestDML(
model_y=GradientBoostingRegressor(n_estimators=200, max_depth=4,
random_state=42),
model_t=GradientBoostingClassifier(n_estimators=200, max_depth=4,
random_state=42),
discrete_treatment=True,
categories=[0, 1, 2, 3],
n_estimators=500,
min_samples_leaf=10,
honest=True, # Separate split/estimation samples
inference=True, # BLB confidence intervals
cv=5, # 5-fold cross-fitting
n_jobs=1,
random_state=42,
)
est_ntl.fit(Y, T, X=X, W=W, groups=df['district_id'].values)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> NTL: fitted in 25s
&lt;/code>&lt;/pre>
&lt;p>Several configuration choices deserve explanation. &lt;strong>Honest trees&lt;/strong> use separate subsamples for choosing splits versus estimating leaf values &amp;mdash; like having one team write the exam questions and a different team take the exam, this prevents the tree from &amp;ldquo;memorizing&amp;rdquo; the training data and enables valid confidence intervals. &lt;strong>GroupKFold&lt;/strong> via &lt;code>groups=district_id&lt;/code> ensures that cross-fitting (splitting data into K folds, training nuisance models on K-1 folds, and predicting on the held-out fold) does not split observations from the same district across folds, preventing data leakage in panel data. Note that this does &lt;strong>not&lt;/strong> provide clustered standard errors &amp;mdash; it only prevents within-district information from leaking across folds.&lt;/p>
&lt;h3 id="causal-identification">Causal identification&lt;/h3>
&lt;p>The Causal Forest requires the &lt;strong>Conditional Independence Assumption&lt;/strong> (CIA): treatment assignment is independent of potential outcomes conditional on observed covariates $(X, W)$. In our simulated data, the CIA holds by construction because all confounders are observed. In real data, unobserved confounders (geological surveys, political connections) could bias the estimates.&lt;/p>
&lt;h2 id="average-treatment-effects">Average Treatment Effects&lt;/h2>
&lt;p>EconML&amp;rsquo;s &lt;code>ate_inference()&lt;/code> provides ATEs with proper confidence intervals via the &lt;strong>Bootstrap of Little Bags&lt;/strong> (BLB) method &amp;mdash; a computationally efficient bootstrap that resamples within subsets (&amp;ldquo;bags&amp;rdquo;) of the data to estimate uncertainty without refitting the entire forest. We compute all six pairwise treatment contrasts:&lt;/p>
&lt;pre>&lt;code class="language-python">comparisons = [
('1-0', 0, 1), ('2-0', 0, 2), ('3-0', 0, 3),
('2-1', 1, 2), ('3-1', 1, 3), ('3-2', 2, 3),
]
for comp_label, t0, t1 in comparisons:
res = est_ntl.ate_inference(X, T0=t0, T1=t1)
lo, hi = res.conf_int_mean(alpha=0.1)
print(f&amp;quot; {comp_label}: ATE={res.mean_point:.4f} &amp;quot;
f&amp;quot;SE={res.stderr_mean:.4f} 90%CI=[{lo:.3f}, {hi:.3f}]&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> 1-0: ATE=0.2398 SE=0.0702 90%CI=[0.124, 0.355]
2-0: ATE=0.2684 SE=0.0791 90%CI=[0.138, 0.399]
3-0: ATE=0.4598 SE=0.0811 90%CI=[0.327, 0.593]
2-1: ATE=0.0286 SE=0.1008 90%CI=[-0.137, 0.194]
3-1: ATE=0.2200 SE=0.1014 90%CI=[0.053, 0.387]
3-2: ATE=0.1914 SE=0.1092 90%CI=[0.012, 0.371]
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Finding 1: Mining increases economic development and conflict.&lt;/strong> All three mining-vs-no-mining contrasts (1-0, 2-0, 3-0) are positive and highly significant. The ATE for the basic mining effect (1-0) is &lt;strong>0.240&lt;/strong>, close to the ground truth of 0.250. This represents a 24% increase in nighttime lights from mining activity, after controlling for geographic and institutional confounders.&lt;/p>
&lt;p>&lt;strong>Finding 2: Price effects are non-linear.&lt;/strong> The contrast 2-1 (medium vs low prices) is &lt;strong>0.029&lt;/strong> and statistically insignificant ($p &amp;gt; 0.10$) &amp;mdash; medium prices add essentially nothing beyond the basic mining effect. But the contrast 3-1 (high vs low prices) is &lt;strong>0.220&lt;/strong> and significant at the 5% level. This asymmetry confirms that price effects &amp;ldquo;jump&amp;rdquo; only at high commodity prices. The DML Causal Forest correctly recovers this non-linearity from the data.&lt;/p>
&lt;h2 id="treatment-effect-heterogeneity-gates">Treatment effect heterogeneity (GATEs)&lt;/h2>
&lt;h3 id="computing-gates-manually">Computing GATEs manually&lt;/h3>
&lt;p>Unlike Stata 19&amp;rsquo;s &lt;code>cate&lt;/code> command which computes GATEs automatically, EconML requires manual computation. We estimate individual-level CATEs via &lt;code>effect_inference()&lt;/code>, group observations by an institutional variable, and compute the mean CATE within each group:&lt;/p>
&lt;pre>&lt;code class="language-python">def compute_gate(est, df, z_var, t0, t1):
inf = est.effect_inference(X, T0=t0, T1=t1)
ite, ite_se = inf.point_estimate, inf.stderr
for z in sorted(df[z_var].unique()):
mask = df[z_var].values == z
gate = ite[mask].mean()
# Propagate BLB standard errors
gate_se = np.sqrt(np.mean(ite_se[mask]**2) / mask.sum())
&lt;/code>&lt;/pre>
&lt;p>The standard error formula &lt;code>sqrt(mean(se_i^2) / n)&lt;/code> propagates the per-observation BLB standard errors to the group level, capturing estimation uncertainty rather than just within-group heterogeneity.&lt;/p>
&lt;h3 id="gates-by-executive-constraints">GATEs by Executive Constraints&lt;/h3>
&lt;p>The mining effect (1-0) should vary with institutional quality, while the price effect (3-1) should be flat:&lt;/p>
&lt;p>&lt;img src="python_econml_gate_ntl_1v0_exec.png" alt="GATEs for NTL mining effect (1-0) by Executive Constraints">
&lt;em>GATEs for the mining effect (1-0) by executive constraints. The upward slope shows that stronger institutions amplify the economic benefits of mining.&lt;/em>&lt;/p>
&lt;p>&lt;img src="python_econml_gate_ntl_3v1_exec.png" alt="GATEs for NTL price effect (3-1) by Executive Constraints">
&lt;em>GATEs for the price effect (3-1) by executive constraints. The flat pattern confirms that institutions do not moderate price effects.&lt;/em>&lt;/p>
&lt;pre>&lt;code class="language-text"> 1-0 (Mining vs No Mining):
Exec. Constr. GATE 90% CI N
----------------------------------------------------
1 0.175 [0.168, 0.182] 300
2 0.255 [0.249, 0.262] 330
3 0.240 [0.236, 0.244] 720
4 0.242 [0.238, 0.246] 780
5 0.243 [0.237, 0.250] 420
6 0.264 [0.259, 0.269] 450
Range: 0.089
3-1 (High vs Low Prices):
Exec. Constr. GATE 90% CI N
----------------------------------------------------
1 0.242 [0.232, 0.252] 300
2 0.197 [0.187, 0.206] 330
3 0.217 [0.211, 0.224] 720
4 0.227 [0.221, 0.233] 780
5 0.224 [0.216, 0.231] 420
6 0.211 [0.204, 0.219] 450
Range: 0.045
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>Finding 3: Institutions moderate mining effects but NOT price effects.&lt;/strong> The mining effect GATEs (1-0) show a range of &lt;strong>0.089&lt;/strong> across executive constraint levels, with the lowest effect (0.175) at the weakest institutions (exec_constraints=1) rising to &lt;strong>0.264&lt;/strong> at the strongest (exec_constraints=6). The price effect GATEs (3-1) show a much narrower range of only &lt;strong>0.045&lt;/strong>, with no clear monotone pattern. This asymmetric pattern &amp;mdash; institutions shaping the mining-vs-no-mining margin but not the price margin &amp;mdash; is exactly the structural finding embedded in the DGP and reported in Hodler, Lechner &amp;amp; Raschky (2023).&lt;/p>
&lt;h3 id="gates-by-quality-of-government">GATEs by Quality of Government&lt;/h3>
&lt;p>The same pattern appears when we use a continuous institutional measure:&lt;/p>
&lt;p>&lt;img src="python_econml_gate_ntl_1v0_qog.png" alt="GATEs for NTL mining effect (1-0) by Quality of Government">
&lt;em>GATEs for the mining effect (1-0) by quality of government. The positive relationship cross-validates the executive constraints finding.&lt;/em>&lt;/p>
&lt;p>&lt;img src="python_econml_gate_ntl_3v1_qog.png" alt="GATEs for NTL price effect (3-1) by Quality of Government">
&lt;em>GATEs for the price effect (3-1) by quality of government. The flat pattern is consistent across institutional measures.&lt;/em>&lt;/p>
&lt;p>The mining effect (1-0) shows a positive relationship with quality of government, while the price effect (3-1) remains approximately flat across the institutional quality distribution. This cross-validates Finding 3 using a different institutional measure.&lt;/p>
&lt;h2 id="variable-importance">Variable importance&lt;/h2>
&lt;p>EconML computes feature importances as the normalized contribution of each variable to treatment effect heterogeneity across all forest splits:&lt;/p>
&lt;pre>&lt;code class="language-python">importances = est_ntl.feature_importances_
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> distance_capital 0.172
ethnic_frac 0.141
ruggedness 0.135
population 0.126
agri_suitability 0.120
temperature 0.120
elevation 0.120
gdp_pc 0.036
quality_of_govt 0.019
exec_constraints 0.010
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="python_econml_var_importance.png" alt="Feature importance for treatment effect heterogeneity">
&lt;em>Feature importance for treatment effect heterogeneity. Geographic variables dominate splitting frequency, but institutional variables are the true moderators.&lt;/em>&lt;/p>
&lt;p>Geographic variables dominate the importances because they have &lt;strong>continuous variation&lt;/strong> that the forest can split on finely. Institutional variables (&lt;code>exec_constraints&lt;/code>, &lt;code>quality_of_govt&lt;/code>) rank lower despite being the true moderators in the DGP &amp;mdash; they have limited discrete values (6 levels for executive constraints), so the forest cannot split on them as frequently. This illustrates an important caveat: feature importance measures &lt;strong>splitting frequency&lt;/strong>, not causal importance for moderation. The GATE analysis (which directly tests moderation) is more informative than feature importance for answering the question &amp;ldquo;which variables moderate treatment effects?&amp;rdquo;&lt;/p>
&lt;h2 id="cate-interpreter">CATE Interpreter&lt;/h2>
&lt;p>EconML provides a &lt;code>SingleTreeCateInterpreter&lt;/code> that fits a &lt;strong>shallow decision tree&lt;/strong> to the estimated CATEs, creating interpretable subgroups. This is an EconML-specific feature not available in Stata&amp;rsquo;s &lt;code>cate&lt;/code> command.&lt;/p>
&lt;pre>&lt;code class="language-python">from econml.cate_interpreter import SingleTreeCateInterpreter
intrp = SingleTreeCateInterpreter(max_depth=2, min_samples_leaf=100)
intrp.interpret(est_ntl, X)
intrp.plot(feature_names=X_COLS)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="python_econml_cate_tree.png" alt="Decision tree summarizing CATE heterogeneity for the mining effect">
&lt;em>Decision tree summarizing CATE heterogeneity for the mining effect (1-0). The shallow tree identifies data-driven subgroups with different treatment effects.&lt;/em>&lt;/p>
&lt;p>The interpreter tree identifies data-driven subgroups with different treatment effects using a depth-2 tree. This provides a complementary view to the GATE analysis: while GATEs test &lt;strong>pre-specified hypotheses&lt;/strong> about institutional moderation, the CATE interpreter discovers subgroups that the analyst may not have considered.&lt;/p>
&lt;h2 id="discussion">Discussion&lt;/h2>
&lt;h3 id="limitations">Limitations&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>No clustered standard errors&lt;/strong>: EconML does not support clustered SEs natively. We use &lt;code>GroupKFold&lt;/code> by district to prevent data leakage, but this does not account for within-district correlation in the standard errors. The &lt;a href="https://carlos-mendez.org/post/stata_cate2/">companion Stata tutorial&lt;/a> uses Stata 19&amp;rsquo;s &lt;code>cate&lt;/code> command which handles clustering directly.&lt;/li>
&lt;li>&lt;strong>Contemporaneous outcomes&lt;/strong>: The full paper uses treatment at time $t$ and outcome at $t+1$, strengthening causal identification. Our simulated data uses contemporaneous treatment and outcomes.&lt;/li>
&lt;li>&lt;strong>Simplified covariates&lt;/strong>: The real analysis uses 60+ covariates; we use 12. The simulated DGP guarantees that the CIA holds &amp;mdash; all confounders are observed by construction.&lt;/li>
&lt;/ul>
&lt;h3 id="assumptions">Assumptions&lt;/h3>
&lt;p>The CATE estimates rely on the &lt;strong>Conditional Independence Assumption&lt;/strong>: treatment is independent of potential outcomes given $(X, W)$. In observational settings, this assumption is untestable and may be violated by unobserved confounders. With simulated data, we know the assumption holds.&lt;/p>
&lt;h2 id="summary-and-next-steps">Summary and next steps&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>EconML&amp;rsquo;s CausalForestDML recovered all three ground-truth findings.&lt;/strong> The ATE for the basic mining effect (1-0 = 0.240) closely matches the true value of 0.250. Price effects are correctly identified as non-linear (2-1 = 0.029 n.s., 3-1 = 0.220 significant). GATE patterns confirm that institutions moderate mining effects (range = 0.089) but not price effects (range = 0.045).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The DML framework is the key methodological contribution.&lt;/strong> By residualizing both the outcome and treatment in a first stage, DML achieves Neyman orthogonality &amp;mdash; making the causal forest robust to errors in the nuisance models. This is particularly valuable when the outcome and treatment processes are complex.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Feature importance can be misleading for moderation analysis.&lt;/strong> Geographic variables dominate the forest&amp;rsquo;s splitting importances, but institutional variables are the true moderators. GATE analysis is more appropriate for testing specific moderation hypotheses.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The CATE interpreter provides data-driven subgroup discovery.&lt;/strong> Unlike pre-specified GATEs, the shallow decision tree finds the variables and thresholds that best separate high-effect from low-effect subgroups.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>For the economic story behind these findings and a parallel implementation using Stata 19&amp;rsquo;s built-in &lt;code>cate&lt;/code> command, see the companion tutorial: &lt;a href="https://carlos-mendez.org/post/stata_cate2/">Causal Machine Learning and the Resource Curse with Stata 19&lt;/a>.&lt;/p>
&lt;h2 id="exercises">Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Replace the nuisance models.&lt;/strong> Swap &lt;code>GradientBoostingRegressor&lt;/code> with &lt;code>RandomForestRegressor(n_estimators=200)&lt;/code>. Do the ATE and GATE estimates change? Why or why not (think about Neyman orthogonality)?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Vary the number of trees.&lt;/strong> Try &lt;code>n_estimators=100&lt;/code> vs &lt;code>n_estimators=1000&lt;/code>. How do the standard errors and GATE patterns change?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Test the GroupKFold assumption.&lt;/strong> Remove &lt;code>groups=df['district_id'].values&lt;/code> from the &lt;code>fit()&lt;/code> call. What happens to the confidence intervals?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Discretize quality of government.&lt;/strong> Create quartiles of &lt;code>quality_of_govt&lt;/code> and compute GATEs on the quartiles instead of raw values. Do the patterns become clearer?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Explore the CATE interpreter depth.&lt;/strong> Increase &lt;code>max_depth&lt;/code> from 2 to 4 in &lt;code>SingleTreeCateInterpreter&lt;/code>. Do the additional splits reveal meaningful subgroups or just noise?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="references">References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://doi.org/10.1371/journal.pone.0284968" target="_blank" rel="noopener">Hodler, R., Lechner, M., &amp;amp; Raschky, P.A. (2023). Institutions and the resource curse: New insights from causal machine learning. &lt;em>PLoS ONE&lt;/em>, 18(6), e0284968.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1111/ectj.12097" target="_blank" rel="noopener">Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., &amp;amp; Robins, J. (2018). Double/Debiased Machine Learning for Treatment and Structural Parameters. &lt;em>The Econometrics Journal&lt;/em>, 21(1), C1&amp;ndash;C68.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1214/18-AOS1709" target="_blank" rel="noopener">Athey, S., Tibshirani, J., &amp;amp; Wager, S. (2019). Generalized Random Forests. &lt;em>The Annals of Statistics&lt;/em>, 47(2), 1148&amp;ndash;1178.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.nber.org/papers/w5398" target="_blank" rel="noopener">Sachs, J.D. &amp;amp; Warner, A.M. (1995). Natural Resource Abundance and Economic Growth. &lt;em>NBER Working Paper&lt;/em> No. 5398.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1111/j.1468-0297.2006.01045.x" target="_blank" rel="noopener">Mehlum, H., Moene, K., &amp;amp; Torvik, R. (2006). Institutions and the Resource Curse. &lt;em>The Economic Journal&lt;/em>, 116(508), 1&amp;ndash;20.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.pywhy.org/EconML/" target="_blank" rel="noopener">EconML Documentation &amp;mdash; PyWhy&lt;/a>&lt;/li>
&lt;/ol></description></item></channel></rss>