BMA · LASSO · WALS Interactive Lab

Three Methods, One Question: Which Variables Truly Matter?

Imagine 12 candidate drivers of CO₂ emissions and only 120 countries. There are 2¹² = 4,096 possible models you could fit. Pick one, and you implicitly declare the other 4,095 wrong. BMA, LASSO, and WALS each take a principled approach to this variable selection problem, but they encode different statistical philosophies — Bayesian averaging, frequentist shrinkage, and frequentist averaging.

This lab lets you experience the three methods hands-on. In four tabs you will: watch the L1 (LASSO) penalty force coefficients to exactly zero while L2 (Ridge) only shrinks them; turn the LASSO penalty knob yourself on simulated data; explore how each method's sensitivity depends on signal strength; and inspect the post's actual 12-variable × 3-method agreement map.

L1 (LASSO) vs. L2 (Ridge) — why LASSO selects, Ridge does not

Both methods shrink coefficients. Only LASSO drives them exactly to zero. The animation below shows the same coefficient under the two penalties as λ grows: the orange L1 estimate hits zero abruptly, the steel-blue L2 estimate asymptotes but never reaches zero. This is why LASSO doubles as a variable-selection device — and why the Laplace prior underlying WALS encodes the same skeptical-but-open-minded prior belief.

Tab 2

LASSO Lab

Slide λ and watch coefficients snap to zero, one by one. Compare raw LASSO to post-LASSO OLS.

Tab 3

Sensitivity Simulator

Run 100 simulations to see how each method's true-positive rate depends on signal strength and sample size.

Tab 4

Method Agreement

The post's 12 variables × 3 methods, interactively. Filter by variable type and method to see who agrees on what.

Three takeaways the lab is designed to make obvious

Convergence is information. Four variables (log_gdp, trade_network, fossil_fuel, industry) are triple-robust — flagged by BMA, LASSO, and WALS. Triangulation across paradigms is the strongest claim you can make about variable importance.
BMA is conservative; LASSO and WALS are sensitive. BMA recovers 4/7 true predictors (57.1% sensitivity); LASSO and WALS recover 6/7 each (85.7%). The two methods missed by BMA — urban_pop and democracy — sit in BMA's borderline PIP zone (0.60–0.65).
All three keep specificity at 100%. None of the methods falsely flags a noise variable as robust. The cost of conservativism is missed true positives, not false alarms.

Glossary (open a card if a term is unfamiliar)

BMA

Bayesian Model Averaging. Averages over all 2^K = 4,096 candidate regression models, weighting each by its posterior probability. Output: PIPs and posterior coefficient means.

PIP (Posterior Inclusion Probability)

For variable j, the sum of posterior probabilities of all models that include j. PIP ≥ 0.80 is the convention for "robust" (Raftery 1995).

LASSO

L1-penalised least squares. The absolute-value penalty produces exactly-zero coefficients — variable selection comes for free.

Penalty λ

The LASSO shrinkage knob. Larger λ pins more coefficients to zero. This is the main slider in Tab 2.

Post-LASSO

After LASSO selects a support, refit by plain OLS. Recovers unbiased magnitudes; LASSO is used only for selection.

WALS

Weighted Average Least Squares. Frequentist model averaging via a semi-orthogonal transformation and a Laplace prior. Closed-form, no MCMC, produces t-statistics.

|t| ≥ 2 threshold

WALS' analog of PIP ≥ 0.80. A variable with |t| ≥ 2 is classified as robust.

Methodological triangulation

Combining methods that share assumptions but differ mechanically. Variables flagged by all three are unusually credible.

LASSO Lab — turn the penalty knob yourself

Simulated data with one variable of interest and many candidate controls. The true coefficient of the highlighted (orange) variable is α = 0.5. LASSO chooses how many candidates to keep based on a single penalty parameter λ. Drag the λ slider and watch the coefficients shrink to exactly zero, one at a time. This is the geometric trick from §10 of the post made interactive.

Sample size n 200

More data ⇒ each coefficient estimated more precisely. The post uses n = 120.

Number of candidates p 40

About 15% of these have a true nonzero effect; the rest are noise (analog of the post's 5 noise variables).

Signal strength 0.60

Magnitude of the truly-relevant coefficients relative to noise. Small signal = post's "agriculture" case.

Penalty λ —

Slide left for less shrinkage (more variables survive); right for more (sparser model).

selected (|I|)

—

out of — candidates

α̂ raw LASSO

—

shrunk toward zero (biased)

α̂ post-LASSO OLS

—

refit on selected support (unbiased)

true α

0.50

held fixed for comparison

What to look for

Sparsity grows with λ. Slide right and noise variables snap to exactly zero. This is the L1-corner geometry from §10 of the post — Ridge (L2) would never zero them out.
The post-OLS α̂ tracks the true α more closely than raw LASSO α̂. This is exactly the Post-LASSO step (Belloni & Chernozhukov 2013) from §12 — LASSO shrinks the magnitude on purpose, so refit by OLS to undo the bias.
True predictors persist longer. Crank λ up: noise variables are pinned to zero first; the true predictors (with larger coefficients) survive longer. This is why LASSO works as a selection device.
Try p > n. Set p = 100, n = 50. OLS is undefined, but LASSO still runs — that's the high-dimensional regime where shrinkage methods are designed to shine.

Sensitivity Simulator — when do the methods agree?

The post's headline finding is that LASSO and WALS recover 6 of 7 true predictors (85.7%) while BMA recovers only 4 (57.1%). That gap is not an accident: it depends on signal strength, sample size, and how strongly noise variables are correlated with true predictors. Use the sliders to explore the boundary where methods start to disagree.

Sample size n 120

The post uses n = 120 (cross-section of countries).

Number of candidates p 12

The post uses p = 12 (7 true + 5 noise).

Signal strength 0.50

Effect size of true predictors. Small = post's borderline-PIP case (urban_pop, democracy).

Asymmetry (controls→treatment) 0.50

How much candidates are correlated with the focal variable. Higher = harder selection problem.

Rigorous LASSO (theory-driven λ)

λ from Belloni et al.: 2 · 1.1 · σ̂ / √n · Φ⁻¹(1 − 0.05/(2p)). Analogue of WALS conservatism.

α̂—

SE(α̂)—

|I_y|—

|I_d|—

union |I_y ∪ I_d|—

λ_y, λ_d—

CV LASSO (data-driven λ)

λ from 3-fold cross-validation (lambda.min). Analogue of LASSO+CV from §11 of the post.

α̂—

SE(α̂)—

|I_y|—

|I_d|—

union |I_y ∪ I_d|—

λ_y, λ_d—

How this maps onto the post's three methods

Rigorous λ ≈ BMA + WALS: conservative, Bonferroni-style threshold. Misses small true effects but keeps specificity perfect.
CV λ ≈ LASSO+CV: data-driven and more liberal. Higher sensitivity but can over-select when noise variables predict the outcome marginally well.
Small signal × large p: this is the regime where BMA and the rigorous penalty are most conservative. The post's urban_pop (true β = 0.010) and democracy (β = 0.004) sit here.

Bias vs. variance over many simulations

Single runs are noisy. Run the whole pipeline 100 times with fresh draws (same parameters, different random shocks) to see whether one estimator is systematically more biased.

BMA · LASSO · WALS — Interactive Lab

Three Methods, One Question: Which Variables Truly Matter?

L1 (LASSO) vs. L2 (Ridge) — why LASSO selects, Ridge does not

LASSO Lab

Sensitivity Simulator

Method Agreement

Three takeaways the lab is designed to make obvious

Glossary (open a card if a term is unfamiliar)

LASSO Lab — turn the penalty knob yourself

What to look for

Sensitivity Simulator — when do the methods agree?

Rigorous LASSO (theory-driven λ)

CV LASSO (data-driven λ)

How this maps onto the post's three methods

Bias vs. variance over many simulations

Method Agreement — the post's 12 variables, three methods

What to look for

Variable groups

Methods

Method-level performance

Why do BMA, LASSO, and WALS disagree on urban_pop and democracy?

Connecting back to Tab 2 and Tab 3