Two ways to tame model uncertainty
Suppose you have 12 candidate controls. That is $2^{12} = 4{,}096$ possible regressions. Which one should you trust? The post answers this with two complementary methods: Bayesian Model Averaging (BMA) — which averages across the whole model space, weighting each model by how well it fits — and Double-Selection LASSO (DSL) — which uses an L1 penalty to pick a parsimonious set of controls, twice, then runs a clean OLS on the union.
This app lets you turn the dials. In four tabs you will: see why LASSO is a variable selector (L1 vs L2); simulate a small EKC-style DGP and watch how BMA's posterior mean compares to DSL's selected-then-OLS estimate as you change sample size and noise; explore the post's forest plot comparing six estimators against the true DGP value; and inspect the Posterior Inclusion Probabilities from the post's BMA run.
L1 (LASSO) vs. L2 (Ridge) — why LASSO selects, Ridge does not
Both methods shrink coefficients toward zero. Only LASSO drives them exactly to zero. The animation below shows the same coefficient under the two penalties as λ grows: the orange L1 estimate hits zero abruptly, the steel-blue L2 estimate decays but never reaches zero. This is the reason DSL uses LASSO (and not Ridge) for its two selection steps.
BMA vs DSL Simulator
Generate a small EKC-style panel with a known truth. Slide sample size, noise, and signal asymmetry; compare how each method estimates the treatment α.
Forest Plot
The post's headline comparison. Six estimators × three GDP coefficients. Toggle outcomes and methods. Hover for SEs, CIs, and number of controls selected.
PIP Chart
Posterior Inclusion Probabilities for all 15 variables in the BMA run. Switch between the fixed-effects and pooled specifications to see how fixed effects collapse the false-positive rate from 5 to 0.
Glossary (open a card if a term is unfamiliar)
Model uncertainty
BMA (Bayesian Model Averaging)
PIP (Posterior Inclusion Probability)
PMP (Posterior Model Probability)
DSL (Double-Selection LASSO)
LASSO penalty λ
EKC (Environmental Kuznets Curve)
Answer key (synthetic data)
BMA vs DSL — explore the bias-variance trade-off
The simulator generates a small EKC-flavoured DGP with one treatment (α = 0.5) and many candidate controls. BMA-style here is a rigorous (theory-driven) Double LASSO that mimics BMA's conservative selection. DSL (CV) uses cross-validated λ. Both produce an estimate of α; both are graded against the truth. The headline finding from the post applies here too: method agreement is reassuring only when the model is correctly specified.
BMA-style (rigorous λ)
Theory-driven penalty: conservative, robust to model uncertainty — mimics BMA's "decisive evidence" threshold.
DSL (CV)
Cross-validated penalty: minimises prediction MSE — often over-selects controls.
What to look for
- Asymmetry = 0.8 (default) is the hardest case. Controls predict the treatment but barely the outcome — a single-step selection that only looks at y will miss them and bias α. Watch DSL's union grow as you push asymmetry up.
- Increase n. Both estimators converge to the true α = 0.5 as n grows. The BMA-style rigorous penalty shrinks variance faster because it admits fewer noisy controls.
- Decrease signal. Small true effects are hard to detect — exactly why urban (true coef = 0.007) and democracy (true coef = -0.005) get PIPs below 0.80 in the post.
Bias vs. variance over 100 simulations
A single run is noisy. Run the whole pipeline 100 times with fresh draws (same parameters, different ε) to see whether the CV-style over-selection produces systematic bias.
EKC coefficients — six estimators side by side
The estimates below come directly from the post's Stata output. Six estimators × three GDP coefficients (linear, squared, cubic). The True DGP row is the answer key: that is what the methods are trying to recover. Toggle methods and outcomes; hover for SE, 95% CI, and the number of controls each estimator selected.
What to look for
- Toggle "BMA (pooled)" and "DSL (pooled)" off. The pooled estimates are 2-3× the true value because they omit country fixed effects. Once you hide them, the four FE-based estimators all cluster tightly around the true DGP.
- BMA (FE) is closest to the truth on β₁ (−7.139 vs true −7.100) — within 1%. Kitchen-Sink FE is a close second (−7.131).
- The pooled methods agree with each other on the wrong answer. BMA (pooled) gives β₁ = −21.26, DSL (pooled) gives −22.03. Method agreement is not a substitute for correct model specification.
Outcomes (GDP polynomial coefficient)
Methods
Why do pooled estimates blow up?
Without country fixed effects, the GDP polynomial absorbs persistent cross-country differences in emissions levels that have nothing to do with income. The β̂₁ pooled estimates (−21.26 from BMA, −22.03 from DSL) are inflated by a factor of 3 relative to the true −7.10. And both methods are confidently wrong — their 95% intervals fail to cover the truth for any of the three GDP coefficients. The lesson: BMA and DSL are powerful, but they cannot fix a misspecified model.
Turning points: where does the EKC bend?
The cubic polynomial implies two turning points — where emissions stop rising (minimum) and where they start falling again (maximum). The true DGP values are $1,895 and $34,647 GDP per capita. The four FE methods cluster around $2,400 and $27,000 — close in spirit (right ballpark, same sign pattern) but compounded by small coefficient errors. The pooled methods land far from the truth at both ends.
- FE-based methods: minimum at $2,411–2,478 · maximum at $25,656–27,694
- Pooled methods: minimum at $5,581–5,752 · maximum at $23,298–24,532
- True DGP: minimum at $1,895 · maximum at $34,647
BMA's signature output: Posterior Inclusion Probabilities
For each of the 15 variables in the post's BMA run, the PIP is the fraction of posterior probability mass on models that include it. The horizontal threshold (orange dashed line) is 0.80 — Raftery's "strong evidence" cutoff. Switch between FE and pooled to see how fixed effects collapse the false-positive rate from 5 noise variables down to zero.
Specification
FE: 6 variables clear 0.80, 0 false positives, 2 false negatives (urban, democracy). Pooled: 12 of 15 variables clear 0.80, including 5 noise variables — fixed effects are not optional.
What to look for
- With FE: fossil fuel, GDP terms, industry, and renewable energy cleanly clear the 0.80 line. All 7 noise variables sit near zero. Urban (true coef = +0.007) and democracy (true coef = −0.005) are too weak to detect.
- Without FE: services, pop_density, credit, and trade all jump above 0.80 — they piggyback on omitted country heterogeneity. This is what the post calls "false precision from a misspecified model".
- The signal-strength limit. Even BMA cannot find effects smaller than the sampling noise. The 2 false negatives are not a method failure — they are an honest accounting of evidence.
Connecting back to Tab 3
The four FE-based estimators in the forest plot agree on the GDP coefficients precisely because they all use the same FE structure that keeps the PIP false-positive rate at zero. The pooled estimators agree with each other on a biased answer because they share the same misspecification. PIPs and confidence intervals are only trustworthy when the model class is right.