BMA · LASSO · WALS — Interactive Lab

A pedagogical companion to Three Methods for Robust Variable Selection: BMA, LASSO, and WALS ↗ Back to the post

Three Methods, One Question: Which Variables Truly Matter?

Imagine 12 candidate drivers of CO2 emissions and only 120 countries. There are 212 = 4,096 possible models you could fit. Pick one, and you implicitly declare the other 4,095 wrong. BMA, LASSO, and WALS each take a principled approach to this variable selection problem, but they encode different statistical philosophies — Bayesian averaging, frequentist shrinkage, and frequentist averaging.

This lab lets you experience the three methods hands-on. In four tabs you will: watch the L1 (LASSO) penalty force coefficients to exactly zero while L2 (Ridge) only shrinks them; turn the LASSO penalty knob yourself on simulated data; explore how each method's sensitivity depends on signal strength; and inspect the post's actual 12-variable × 3-method agreement map.

L1 (LASSO) vs. L2 (Ridge) — why LASSO selects, Ridge does not

Both methods shrink coefficients. Only LASSO drives them exactly to zero. The animation below shows the same coefficient under the two penalties as λ grows: the orange L1 estimate hits zero abruptly, the steel-blue L2 estimate asymptotes but never reaches zero. This is why LASSO doubles as a variable-selection device — and why the Laplace prior underlying WALS encodes the same skeptical-but-open-minded prior belief.

Tab 2

LASSO Lab

Slide λ and watch coefficients snap to zero, one by one. Compare raw LASSO to post-LASSO OLS.

Tab 3

Sensitivity Simulator

Run 100 simulations to see how each method's true-positive rate depends on signal strength and sample size.

Tab 4

Method Agreement

The post's 12 variables × 3 methods, interactively. Filter by variable type and method to see who agrees on what.

Three takeaways the lab is designed to make obvious

  • Convergence is information. Four variables (log_gdp, trade_network, fossil_fuel, industry) are triple-robust — flagged by BMA, LASSO, and WALS. Triangulation across paradigms is the strongest claim you can make about variable importance.
  • BMA is conservative; LASSO and WALS are sensitive. BMA recovers 4/7 true predictors (57.1% sensitivity); LASSO and WALS recover 6/7 each (85.7%). The two methods missed by BMA — urban_pop and democracy — sit in BMA's borderline PIP zone (0.60–0.65).
  • All three keep specificity at 100%. None of the methods falsely flags a noise variable as robust. The cost of conservativism is missed true positives, not false alarms.

Glossary (open a card if a term is unfamiliar)

BMA
Bayesian Model Averaging. Averages over all 2K = 4,096 candidate regression models, weighting each by its posterior probability. Output: PIPs and posterior coefficient means.
PIP (Posterior Inclusion Probability)
For variable j, the sum of posterior probabilities of all models that include j. PIP ≥ 0.80 is the convention for "robust" (Raftery 1995).
LASSO
L1-penalised least squares. The absolute-value penalty produces exactly-zero coefficients — variable selection comes for free.
Penalty λ
The LASSO shrinkage knob. Larger λ pins more coefficients to zero. This is the main slider in Tab 2.
Post-LASSO
After LASSO selects a support, refit by plain OLS. Recovers unbiased magnitudes; LASSO is used only for selection.
WALS
Weighted Average Least Squares. Frequentist model averaging via a semi-orthogonal transformation and a Laplace prior. Closed-form, no MCMC, produces t-statistics.
|t| ≥ 2 threshold
WALS' analog of PIP ≥ 0.80. A variable with |t| ≥ 2 is classified as robust.
Methodological triangulation
Combining methods that share assumptions but differ mechanically. Variables flagged by all three are unusually credible.

LASSO Lab — turn the penalty knob yourself

Simulated data with one variable of interest and many candidate controls. The true coefficient of the highlighted (orange) variable is α = 0.5. LASSO chooses how many candidates to keep based on a single penalty parameter λ. Drag the λ slider and watch the coefficients shrink to exactly zero, one at a time. This is the geometric trick from §10 of the post made interactive.

More data ⇒ each coefficient estimated more precisely. The post uses n = 120.
About 15% of these have a true nonzero effect; the rest are noise (analog of the post's 5 noise variables).
Magnitude of the truly-relevant coefficients relative to noise. Small signal = post's "agriculture" case.
Slide left for less shrinkage (more variables survive); right for more (sparser model).
selected (|I|)
out of candidates
α̂ raw LASSO
shrunk toward zero (biased)
α̂ post-LASSO OLS
refit on selected support (unbiased)
true α
0.50
held fixed for comparison

What to look for

  • Sparsity grows with λ. Slide right and noise variables snap to exactly zero. This is the L1-corner geometry from §10 of the post — Ridge (L2) would never zero them out.
  • The post-OLS α̂ tracks the true α more closely than raw LASSO α̂. This is exactly the Post-LASSO step (Belloni & Chernozhukov 2013) from §12 — LASSO shrinks the magnitude on purpose, so refit by OLS to undo the bias.
  • True predictors persist longer. Crank λ up: noise variables are pinned to zero first; the true predictors (with larger coefficients) survive longer. This is why LASSO works as a selection device.
  • Try p > n. Set p = 100, n = 50. OLS is undefined, but LASSO still runs — that's the high-dimensional regime where shrinkage methods are designed to shine.

Sensitivity Simulator — when do the methods agree?

The post's headline finding is that LASSO and WALS recover 6 of 7 true predictors (85.7%) while BMA recovers only 4 (57.1%). That gap is not an accident: it depends on signal strength, sample size, and how strongly noise variables are correlated with true predictors. Use the sliders to explore the boundary where methods start to disagree.

The post uses n = 120 (cross-section of countries).
The post uses p = 12 (7 true + 5 noise).
Effect size of true predictors. Small = post's borderline-PIP case (urban_pop, democracy).
How much candidates are correlated with the focal variable. Higher = harder selection problem.

Rigorous LASSO (theory-driven λ)

λ from Belloni et al.: 2 · 1.1 · σ̂ / √n · Φ⁻¹(1 − 0.05/(2p)). Analogue of WALS conservatism.

α̂
SE(α̂)
|I_y|
|I_d|
union |I_y ∪ I_d|
λ_y, λ_d

CV LASSO (data-driven λ)

λ from 3-fold cross-validation (lambda.min). Analogue of LASSO+CV from §11 of the post.

α̂
SE(α̂)
|I_y|
|I_d|
union |I_y ∪ I_d|
λ_y, λ_d

How this maps onto the post's three methods

  • Rigorous λ ≈ BMA + WALS: conservative, Bonferroni-style threshold. Misses small true effects but keeps specificity perfect.
  • CV λ ≈ LASSO+CV: data-driven and more liberal. Higher sensitivity but can over-select when noise variables predict the outcome marginally well.
  • Small signal × large p: this is the regime where BMA and the rigorous penalty are most conservative. The post's urban_pop (true β = 0.010) and democracy (β = 0.004) sit here.

Bias vs. variance over many simulations

Single runs are noisy. Run the whole pipeline 100 times with fresh draws (same parameters, different random shocks) to see whether one estimator is systematically more biased.

Method Agreement — the post's 12 variables, three methods

These numbers come straight from the post's grand comparison table (§17). The top panel shows each variable's coefficient estimate under BMA, LASSO, Post-LASSO, and WALS. The bottom panel summarises confusion-matrix counts for each method (true positives / false positives / true negatives / false negatives). Toggle variable groups and methods to see who agrees on what.

What to look for

  • Triple-robust core. Keep all four methods checked and look at the top four variables (log_gdp, trade_network, fossil_fuel, industry). The estimates land in the same neighborhood under every method — these are the strongest claims the post can defend.
  • Borderline cases. urban_pop and democracy show clear method splits: LASSO and WALS keep them; BMA's PIP falls below the 0.80 threshold (n_methods = 2 in the data column). The disagreement is itself informative.
  • The noise variables (bottom 5). LASSO sets them to exact zero; BMA gives them PIPs < 0.15; WALS gives |t| < 1.5. All three methods correctly exclude them.
  • The missed true predictor: agriculture (β = 0.005). Its true effect is below the noise floor at n = 120. No method recovers it — an honest signal that small effects need larger samples, not better algorithms.

Variable groups

Methods

Method-level performance

How well does each method classify the 7 true predictors vs. 5 noise variables?

BMA
sensitivity0.571 (4/7)
specificity1.000 (5/5)
accuracy0.750 (9/12)
LASSO
sensitivity0.857 (6/7)
specificity1.000 (5/5)
accuracy0.917 (11/12)
WALS
sensitivity0.857 (6/7)
specificity1.000 (5/5)
accuracy0.917 (11/12)
Triple-robust4 of 7 true predictors
Double-robust6 of 7 (adds urban_pop, democracy)
All-method-agreed5 noise correctly excluded
Universally missedagriculture (β = 0.005)

Why do BMA, LASSO, and WALS disagree on urban_pop and democracy?

All three methods encode skepticism that "most coefficients are probably zero," but they execute that skepticism differently. BMA hedges by averaging over every model — if the variable doesn't earn a place in most of the 4,096 models, its PIP stays below 0.80. LASSO with cross-validated λ lets the data choose how aggressive to be; with n = 120 and a moderate signal, it keeps a wider net than BMA. WALS uses the same Laplace prior as LASSO but for averaging instead of selection — and the |t| ≥ 2 threshold is more permissive than BMA's PIP ≥ 0.80. The disagreement on urban_pop and democracy is the genuine methodological story: real but moderate effects sit in a "fragile" zone that different paradigms classify differently.

Connecting back to Tab 2 and Tab 3

The bias-variance gap you saw in Tab 2 (raw LASSO vs Post-LASSO) and the conservativism gap in Tab 3 (rigorous λ vs CV λ) are the same forces that produced the BMA-vs-LASSO disagreement you see here. Raw LASSO shrinks the magnitude → use Post-LASSO for interpretation. Conservative λ misses small true effects → use multiple methods for triangulation. Whether to be conservative or sensitive is a domain judgment, not a statistical one.