MGWFER: Causal Spatially Varying Coefficients

Removing a time-invariant spatial confounder from Multiscale GWR

−92%local-slope RMSE cut

+0.82correlation sign flip

0.9996confounder recovered

Carlos Mendez

Nagoya University (GSID)

July 8, 2026

The Tension

Act I

When place secretly drives both \(x\) and \(y\), MGWR maps the confounder, not the effect

An unobserved attribute of place shifts the outcome and the covariate levels.

MGWR’s local slopes then absorb that contamination.

What looks like genuine spatial heterogeneity is omitted-variable bias wearing a map. Can we get the real coefficients back?

One dataset, six estimators, and a coefficient surface that flips sign

True coefficient surfaces (\(\beta_1\) dome, \(\beta_2\) gradient, \(\beta_3\) constant) versus the exponential confounder \(\alpha_i\) — which dominates the cross-section by 50× the slope range.

Where we’re going

The lab: a 225-unit, 3-period panel where place drives every covariate
Why the naive pooled fit gets the most-biased slope backwards
MGWFER’s one move — the within-transformation — and why it works
Stage 2: recovering the confounder itself as a per-unit quantity

The Investigation

Act II

The lab: 225 spatial units × 3 periods, with place wired into every covariate

Outcome — \(y_{it}\) built from three causally-active slopes plus a fixed effect
Confounder — \(sc_i\) (the spatial context), exponential, range \(2\) to \(52\)
Covariates — each one coupled to place: \(x_{k}=0.05\,sc_i+\nu_k\)

We simulate the paper’s DGP (Eqs. 39–45) verbatim on a 15×15 grid. The coupling makes the indirect channel \(sc\to x_k\) active — that is the whole point.

Couple every covariate to place and \(x_4\) correlates 0.84 with \(y\) — with zero causal effect

Quantity	Value	Meaning
\(\mathrm{Cor}(x_k, sc)\)	0.84	every covariate tracks place
\(\mathrm{Cor}(x_4, y)\)	0.84	spurious — \(\beta_4\equiv 0\)

A regression that does not condition on \(sc\) will read this 0.84 as a real effect. That is the bias mechanism, made concrete.

Wooldridge in one line: OLS recovers \(\beta_k+\delta_k\), not \(\beta_k\)

\[y = \beta_0 + \textstyle\sum_k x_k\beta_k + sc + \varepsilon, \qquad sc = \delta_0 + \textstyle\sum_k x_k\delta_k + \eta\]

\[\Rightarrow\quad y = (\beta_0+\delta_0) + \textstyle\sum_k x_k(\beta_k+\delta_k) + (\varepsilon+\eta)\]

Hide \(sc\) in the error and project it on the covariates: the bias on each slope is exactly \(\delta_k\), the indirect contextual effect.

Six estimators, escalating discipline — only one removes the confounder

OLS / pooled OLS — global, no fix; the bias is on full display
Individual FE — global, within-transform; clean but no surface
MGWR (cross-section) / PMGWR — local surfaces, still contaminated
MGWFER — local surfaces and clean identification

Only MGWFER inherits the FE estimator’s identification while delivering a location-specific coefficient surface.

Globally, OLS overstates every slope ~4× and “detects” a null effect at \(p<10^{-13}\)

Coefficient	TRUE	Pooled OLS	Individual FE
\(\beta_1\)	1.50	6.14***	1.57***
\(\beta_3\)	1.50	5.79***	1.55***
\(\beta_4\)	0.00	4.16***	0.02 n.s.

OLS has nowhere to put \(sc\) except into the slopes — Wooldridge’s \(\hat\beta_k=\beta_k+\delta_k\). The within-transform neutralises it.

PMGWR’s local fit looks great (\(R^2=0.99\)) but \(\hat\beta_1\) is anti-correlated with truth

True vs PMGWR slopes: \(\beta_1\) scatters away from the 45° line and is anti-correlated (\(\mathrm{Cor}=-0.46\)); \(\beta_2,\beta_3\) sit well above identity.

MGWFER’s one move: subtract each unit’s mean, and the confounder vanishes exactly

\[\tilde{y}_{it} = y_{it} - \bar{y}_i = \textstyle\sum_k \beta_k(u_i,v_i)\,(x_{k,it}-\bar{x}_{k,i}) + (\varepsilon_{it}-\bar\varepsilon_i)\]

Since \(\alpha_i\) is the same in every period, \(\alpha_i-\alpha_i=0\) — demeaning cancels it to machine precision.

Like zeroing a kitchen scale: subtract the container’s weight (\(\alpha_i\)) so only the contents (the slopes) remain.

Demeaning shrinks the outcome’s range from 61 to 14 — the confounder was most of the signal

Raw \(y\) range: \([-4.07,\ 57.41]\) — spread of \(\approx 61\)
Demeaned \(\tilde y\) range: \([-6.88,\ 6.92]\) — spread of \(\approx 14\)
The confounder spanned \([2.07,\ 51.55]\) — and is now gone

Most of the original variation was between units. Demeaning isolates the within-unit signal that identifies the slopes.

Six lines fit MGWFER Stage 1 — demean, standardise, MGWR with no intercept

from mgwr.gwr import MGWR; from mgwr.sel_bw import Sel_BW
um = panel_df.groupby("unit_id")[cols].transform("mean")  # unit means
y_w, X_w = y - um["y"], X - um[xcols]                       # within-transform
sel = Sel_BW(coords, std(y_w), std(X_w), multi=True,
             constant=False, time=N_TIME)                   # no intercept
mgwfer = MGWR(coords, std(y_w), std(X_w), sel, constant=False).fit()

After demeaning, \(\hat\beta_1\)’s correlation with truth flips from \(-0.46\) to \(+0.82\)

True vs MGWFER slopes: \(\beta_1\) now clusters tightly along the 45° line (\(\mathrm{Cor}=+0.82\)); \(\beta_2,\beta_3\) collapse onto identity.

RMSE falls 92–96% on every coefficient — and the sign flips on the worst one

−92%

\(\beta_1\) RMSE \(2.30\to 0.18\) vs PMGWR; correlation with truth \(-0.46\to +0.82\) (a sign reversal)

The Resolution

Act III

Stage 2 hands back the confounder itself — recovered at correlation 0.9996

\[\hat\alpha_i = \bar y_i - \textstyle\sum_{k} \hat\beta_{bwk}(u_i,v_i)\,\bar x_{ik}\]

Once the slopes are clean, the leftover per-unit mean is the intrinsic contextual effect — no longer a nuisance, now an output.

In MGWR the role of place hid inside one intercept; in MGWFER it is explicit, per-unit, and significance-testable.

MGWFER reconstructs the confounder surface; PMGWR inverts it, MGWR_cs compresses it

Spatial-context surface (paper Fig. 5): true (top-left), MGWFER \(\hat\alpha_i\) near-identical (top-right), MGWR_cs compressed to \([2,22]\), PMGWR inverted to \([-11,10]\).

Recovered \(\hat\alpha_i\) correlates 0.9996 with the truth — every one of 225 units significant

0.9996

Pearson correlation of \(\hat\alpha_i\) with true \(sc_i\); RMSE 0.54 on a 2–52 scale; 225/225 units significant at 5%

Only MGWFER reads the true process scales — PMGWR collapses every bandwidth to 44–50

Bandwidths by covariate: PMGWR flattens all to 44–50; MGWFER differentiates [50, 91, 116, 62], the largest on the spatially-constant \(\beta_3\).

The strongest objection — does the within-transform make this causal?

Objection. Demeaning only removes time-invariant confounders — a one-trick pony. Real places change.

Response. True. MGWFER removes time-invariant confounding cleanly — and only that. It does not manufacture identification.

The stakes are real: on Georgia data, MGWFER flips poverty’s sign and 10× the place effect

MGWR / PMGWR

intrinsic effect \(\approx\pm 0.3\) (\(\pm 1.5\%\))
poverty coefficient: positive
“role of place” looks small

MGWFER

intrinsic effect \(\approx\pm 4\) (\(\pm 20\%\))
poverty coefficient: negative
place effect \(10\times\) larger

Let the within-transformation, not the bandwidth search, decide what place is doing.