BMA + Double-Selection LASSO — Interactive Lab

A pedagogical companion to Taming Model Uncertainty in the EKC: BMA and Double-Selection LASSO with Panel Data ↗ Back to the post

Two ways to tame model uncertainty

Suppose you have 12 candidate controls. That is $2^{12} = 4{,}096$ possible regressions. Which one should you trust? The post answers this with two complementary methods: Bayesian Model Averaging (BMA) — which averages across the whole model space, weighting each model by how well it fits — and Double-Selection LASSO (DSL) — which uses an L1 penalty to pick a parsimonious set of controls, twice, then runs a clean OLS on the union.

This app lets you turn the dials. In four tabs you will: see why LASSO is a variable selector (L1 vs L2); simulate a small EKC-style DGP and watch how BMA's posterior mean compares to DSL's selected-then-OLS estimate as you change sample size and noise; explore the post's forest plot comparing six estimators against the true DGP value; and inspect the Posterior Inclusion Probabilities from the post's BMA run.

L1 (LASSO) vs. L2 (Ridge) — why LASSO selects, Ridge does not

Both methods shrink coefficients toward zero. Only LASSO drives them exactly to zero. The animation below shows the same coefficient under the two penalties as λ grows: the orange L1 estimate hits zero abruptly, the steel-blue L2 estimate decays but never reaches zero. This is the reason DSL uses LASSO (and not Ridge) for its two selection steps.

Tab 2

BMA vs DSL Simulator

Generate a small EKC-style panel with a known truth. Slide sample size, noise, and signal asymmetry; compare how each method estimates the treatment α.

Tab 3

Forest Plot

The post's headline comparison. Six estimators × three GDP coefficients. Toggle outcomes and methods. Hover for SEs, CIs, and number of controls selected.

Tab 4

PIP Chart

Posterior Inclusion Probabilities for all 15 variables in the BMA run. Switch between the fixed-effects and pooled specifications to see how fixed effects collapse the false-positive rate from 5 to 0.

Glossary (open a card if a term is unfamiliar)

Model uncertainty
Many plausible regressions can be specified. With 12 controls there are 4,096 possible models. Standard practice picks one and ignores the rest; BMA and DSL refuse to.
BMA (Bayesian Model Averaging)
Averages coefficients across all candidate models, weighted by their posterior probabilities. Honest about which controls belong.
PIP (Posterior Inclusion Probability)
Total posterior weight on models that contain a given variable. PIP ≥ 0.80 = "strong evidence" by Raftery (1995).
PMP (Posterior Model Probability)
Bayesian weight on a single candidate model. Sums to 1 across all models in the space.
DSL (Double-Selection LASSO)
Belloni-Chernozhukov-Hansen (2014). LASSO on the outcome, LASSO on each variable of interest, take the union, then OLS. Valid inference on the treatment α.
LASSO penalty λ
Knob controlling shrinkage. Larger λ pins more coefficients to zero. At λ = 0, LASSO is just OLS.
EKC (Environmental Kuznets Curve)
Hypothesizes that pollution rises with development at low income, falls at high income, and may rise again at very high income (inverted-N).
Answer key (synthetic data)
Data generated from a known regression so the true coefficients are written down. In the post: 5 true predictors and 7 noise variables. Lets us grade each method.

BMA vs DSL — explore the bias-variance trade-off

The simulator generates a small EKC-flavoured DGP with one treatment (α = 0.5) and many candidate controls. BMA-style here is a rigorous (theory-driven) Double LASSO that mimics BMA's conservative selection. DSL (CV) uses cross-validated λ. Both produce an estimate of α; both are graded against the truth. The headline finding from the post applies here too: method agreement is reassuring only when the model is correctly specified.

More data ⇒ each control's coefficient is estimated more precisely.
Like the post's 12 candidate controls — about 15% have a true nonzero effect.
Magnitude of the truly-relevant control coefficients. Small signals are hard for any method.
0 = controls predict outcome and treatment equally · 1 = controls predict treatment well, outcome barely (the hardest case for single-step selection).

BMA-style (rigorous λ)

Theory-driven penalty: conservative, robust to model uncertainty — mimics BMA's "decisive evidence" threshold.

α̂
SE(α̂)
|I_y|
|I_d|
union |I_y ∪ I_d|
λ_y, λ_d

DSL (CV)

Cross-validated penalty: minimises prediction MSE — often over-selects controls.

α̂
SE(α̂)
|I_y|
|I_d|
union |I_y ∪ I_d|
λ_y, λ_d

What to look for

  • Asymmetry = 0.8 (default) is the hardest case. Controls predict the treatment but barely the outcome — a single-step selection that only looks at y will miss them and bias α. Watch DSL's union grow as you push asymmetry up.
  • Increase n. Both estimators converge to the true α = 0.5 as n grows. The BMA-style rigorous penalty shrinks variance faster because it admits fewer noisy controls.
  • Decrease signal. Small true effects are hard to detect — exactly why urban (true coef = 0.007) and democracy (true coef = -0.005) get PIPs below 0.80 in the post.

Bias vs. variance over 100 simulations

A single run is noisy. Run the whole pipeline 100 times with fresh draws (same parameters, different ε) to see whether the CV-style over-selection produces systematic bias.

EKC coefficients — six estimators side by side

The estimates below come directly from the post's Stata output. Six estimators × three GDP coefficients (linear, squared, cubic). The True DGP row is the answer key: that is what the methods are trying to recover. Toggle methods and outcomes; hover for SE, 95% CI, and the number of controls each estimator selected.

What to look for

  • Toggle "BMA (pooled)" and "DSL (pooled)" off. The pooled estimates are 2-3× the true value because they omit country fixed effects. Once you hide them, the four FE-based estimators all cluster tightly around the true DGP.
  • BMA (FE) is closest to the truth on β₁ (−7.139 vs true −7.100) — within 1%. Kitchen-Sink FE is a close second (−7.131).
  • The pooled methods agree with each other on the wrong answer. BMA (pooled) gives β₁ = −21.26, DSL (pooled) gives −22.03. Method agreement is not a substitute for correct model specification.

Outcomes (GDP polynomial coefficient)

Methods

Why do pooled estimates blow up?

Without country fixed effects, the GDP polynomial absorbs persistent cross-country differences in emissions levels that have nothing to do with income. The β̂₁ pooled estimates (−21.26 from BMA, −22.03 from DSL) are inflated by a factor of 3 relative to the true −7.10. And both methods are confidently wrong — their 95% intervals fail to cover the truth for any of the three GDP coefficients. The lesson: BMA and DSL are powerful, but they cannot fix a misspecified model.

Turning points: where does the EKC bend?

The cubic polynomial implies two turning points — where emissions stop rising (minimum) and where they start falling again (maximum). The true DGP values are $1,895 and $34,647 GDP per capita. The four FE methods cluster around $2,400 and $27,000 — close in spirit (right ballpark, same sign pattern) but compounded by small coefficient errors. The pooled methods land far from the truth at both ends.

  • FE-based methods: minimum at $2,411–2,478 · maximum at $25,656–27,694
  • Pooled methods: minimum at $5,581–5,752 · maximum at $23,298–24,532
  • True DGP: minimum at $1,895 · maximum at $34,647

BMA's signature output: Posterior Inclusion Probabilities

For each of the 15 variables in the post's BMA run, the PIP is the fraction of posterior probability mass on models that include it. The horizontal threshold (orange dashed line) is 0.80 — Raftery's "strong evidence" cutoff. Switch between FE and pooled to see how fixed effects collapse the false-positive rate from 5 noise variables down to zero.

Specification

FE: 6 variables clear 0.80, 0 false positives, 2 false negatives (urban, democracy). Pooled: 12 of 15 variables clear 0.80, including 5 noise variables — fixed effects are not optional.

True positives (PIP ≥ 0.80)
true predictors correctly identified
False positives
noise variables incorrectly flagged
False negatives
weak true predictors missed
True negatives
noise correctly rejected

What to look for

  • With FE: fossil fuel, GDP terms, industry, and renewable energy cleanly clear the 0.80 line. All 7 noise variables sit near zero. Urban (true coef = +0.007) and democracy (true coef = −0.005) are too weak to detect.
  • Without FE: services, pop_density, credit, and trade all jump above 0.80 — they piggyback on omitted country heterogeneity. This is what the post calls "false precision from a misspecified model".
  • The signal-strength limit. Even BMA cannot find effects smaller than the sampling noise. The 2 false negatives are not a method failure — they are an honest accounting of evidence.

Connecting back to Tab 3

The four FE-based estimators in the forest plot agree on the GDP coefficients precisely because they all use the same FE structure that keeps the PIP false-positive rate at zero. The pooled estimators agree with each other on a biased answer because they share the same misspecification. PIPs and confidence intervals are only trustworthy when the model class is right.