BMA and Double-Selection LASSO

Two ways to tame model uncertainty

Suppose you have 12 candidate controls. That is $2^{12} = 4{,}096$ possible regressions. Which one should you trust? The post answers this with two complementary methods: Bayesian Model Averaging (BMA) — which averages across the whole model space, weighting each model by how well it fits — and Double-Selection LASSO (DSL) — which uses an L1 penalty to pick a parsimonious set of controls, twice, then runs a clean OLS on the union.

This app lets you turn the dials. In four tabs you will: see why LASSO is a variable selector (L1 vs L2); simulate a small EKC-style DGP and watch how BMA's posterior mean compares to DSL's selected-then-OLS estimate as you change sample size and noise; explore the post's forest plot comparing six estimators against the true DGP value; and inspect the Posterior Inclusion Probabilities from the post's BMA run.

L1 (LASSO) vs. L2 (Ridge) — why LASSO selects, Ridge does not

Both methods shrink coefficients toward zero. Only LASSO drives them exactly to zero. The animation below shows the same coefficient under the two penalties as λ grows: the orange L1 estimate hits zero abruptly, the steel-blue L2 estimate decays but never reaches zero. This is the reason DSL uses LASSO (and not Ridge) for its two selection steps.

Tab 2

BMA vs DSL Simulator

Generate a small EKC-style panel with a known truth. Slide sample size, noise, and signal asymmetry; compare how each method estimates the treatment α.

Tab 3

Forest Plot

The post's headline comparison. Six estimators × three GDP coefficients. Toggle outcomes and methods. Hover for SEs, CIs, and number of controls selected.

Tab 4

PIP Chart

Posterior Inclusion Probabilities for all 15 variables in the BMA run. Switch between the fixed-effects and pooled specifications to see how fixed effects collapse the false-positive rate from 5 to 0.

Glossary (open a card if a term is unfamiliar)

Model uncertainty

Many plausible regressions can be specified. With 12 controls there are 4,096 possible models. Standard practice picks one and ignores the rest; BMA and DSL refuse to.

BMA (Bayesian Model Averaging)

Averages coefficients across all candidate models, weighted by their posterior probabilities. Honest about which controls belong.

PIP (Posterior Inclusion Probability)

Total posterior weight on models that contain a given variable. PIP ≥ 0.80 = "strong evidence" by Raftery (1995).

PMP (Posterior Model Probability)

Bayesian weight on a single candidate model. Sums to 1 across all models in the space.

DSL (Double-Selection LASSO)

Belloni-Chernozhukov-Hansen (2014). LASSO on the outcome, LASSO on each variable of interest, take the union, then OLS. Valid inference on the treatment α.

LASSO penalty λ

Knob controlling shrinkage. Larger λ pins more coefficients to zero. At λ = 0, LASSO is just OLS.

EKC (Environmental Kuznets Curve)

Hypothesizes that pollution rises with development at low income, falls at high income, and may rise again at very high income (inverted-N).

Answer key (synthetic data)

Data generated from a known regression so the true coefficients are written down. In the post: 5 true predictors and 7 noise variables. Lets us grade each method.

BMA vs DSL — explore the bias-variance trade-off

The simulator generates a small EKC-flavoured DGP with one treatment (α = 0.5) and many candidate controls. BMA-style here is a rigorous (theory-driven) Double LASSO that mimics BMA's conservative selection. DSL (CV) uses cross-validated λ. Both produce an estimate of α; both are graded against the truth. The headline finding from the post applies here too: method agreement is reassuring only when the model is correctly specified.

Sample size n 200

More data ⇒ each control's coefficient is estimated more precisely.

Number of candidate controls p 40

Like the post's 12 candidate controls — about 15% have a true nonzero effect.

Signal strength 0.50

Magnitude of the truly-relevant control coefficients. Small signals are hard for any method.

Confounding asymmetry 0.80

0 = controls predict outcome and treatment equally · 1 = controls predict treatment well, outcome barely (the hardest case for single-step selection).

BMA-style (rigorous λ)

Theory-driven penalty: conservative, robust to model uncertainty — mimics BMA's "decisive evidence" threshold.

α̂—

SE(α̂)—

|I_y|—

|I_d|—

union |I_y ∪ I_d|—

λ_y, λ_d—

DSL (CV)

Cross-validated penalty: minimises prediction MSE — often over-selects controls.

α̂—

SE(α̂)—

|I_y|—

|I_d|—

union |I_y ∪ I_d|—

λ_y, λ_d—

What to look for

Asymmetry = 0.8 (default) is the hardest case. Controls predict the treatment but barely the outcome — a single-step selection that only looks at y will miss them and bias α. Watch DSL's union grow as you push asymmetry up.
Increase n. Both estimators converge to the true α = 0.5 as n grows. The BMA-style rigorous penalty shrinks variance faster because it admits fewer noisy controls.
Decrease signal. Small true effects are hard to detect — exactly why urban (true coef = 0.007) and democracy (true coef = -0.005) get PIPs below 0.80 in the post.

Bias vs. variance over 100 simulations

A single run is noisy. Run the whole pipeline 100 times with fresh draws (same parameters, different ε) to see whether the CV-style over-selection produces systematic bias.

BMA's signature output: Posterior Inclusion Probabilities

For each of the 15 variables in the post's BMA run, the PIP is the fraction of posterior probability mass on models that include it. The horizontal threshold (orange dashed line) is 0.80 — Raftery's "strong evidence" cutoff. Switch between FE and pooled to see how fixed effects collapse the false-positive rate from 5 noise variables down to zero.

Specification

BMA with fixed effects BMA pooled (no FE)

FE: 6 variables clear 0.80, 0 false positives, 2 false negatives (urban, democracy). Pooled: 12 of 15 variables clear 0.80, including 5 noise variables — fixed effects are not optional.

True positives (PIP ≥ 0.80)

—

true predictors correctly identified

False positives

—

noise variables incorrectly flagged

False negatives

—

weak true predictors missed

True negatives

—

noise correctly rejected

What to look for

With FE: fossil fuel, GDP terms, industry, and renewable energy cleanly clear the 0.80 line. All 7 noise variables sit near zero. Urban (true coef = +0.007) and democracy (true coef = −0.005) are too weak to detect.
Without FE: services, pop_density, credit, and trade all jump above 0.80 — they piggyback on omitted country heterogeneity. This is what the post calls "false precision from a misspecified model".
The signal-strength limit. Even BMA cannot find effects smaller than the sampling noise. The 2 false negatives are not a method failure — they are an honest accounting of evidence.

Connecting back to Tab 3

The four FE-based estimators in the forest plot agree on the GDP coefficients precisely because they all use the same FE structure that keeps the PIP false-positive rate at zero. The pooled estimators agree with each other on a biased answer because they share the same misspecification. PIPs and confidence intervals are only trustworthy when the model class is right.

BMA + Double-Selection LASSO — Interactive Lab

Two ways to tame model uncertainty

L1 (LASSO) vs. L2 (Ridge) — why LASSO selects, Ridge does not

BMA vs DSL Simulator

Forest Plot

PIP Chart

Glossary (open a card if a term is unfamiliar)

BMA vs DSL — explore the bias-variance trade-off

BMA-style (rigorous λ)

DSL (CV)

What to look for

Bias vs. variance over 100 simulations

EKC coefficients — six estimators side by side

What to look for

Outcomes (GDP polynomial coefficient)

Methods

Why do pooled estimates blow up?

Turning points: where does the EKC bend?

BMA's signature output: Posterior Inclusion Probabilities

Specification

What to look for

Connecting back to Tab 3