Cross-Sectional Spatial Regression

Why spatial models? Crime does not stop at the border.

A neighborhood's crime rate depends not only on its own income and housing values but also on conditions in adjacent neighborhoods — through displacement (criminals move next door), diffusion (networks span borders), and shared exposure to risk factors. Ordinary least squares treats each neighborhood as independent and misses these spatial spillovers.

The Columbus crime data has Moran's I = 0.222 (p = 0.005), confirming that OLS residuals cluster geographically. This app lets you turn the dials yourself. Sweep ρ and λ in a SAR / SEM / SDM simulator, watch a shock ripple through the spatial multiplier (I − ρW)⁻¹, and compare direct/indirect/total effects across the full eight-model taxonomy on the post's actual estimates.

The shrinkage analogy — why ρ matters

Both ρ (spatial lag) and λ (spatial error) shrink the "extra" effect of a shock as it propagates through neighbours. The animation below sweeps a penalty knob (here, the spatial parameter) from zero to one. The orange curve mimics the SAR-type multiplier 1 / (1 − ρ) behaviour (amplifies); the steel dashed curve shows the complementary direction (compression).

Tab 2

Spillover Animation

Drop a shock on one cell of a 7×7 lattice. Watch the spatial multiplier (I − ρW)⁻¹ propagate it. Vary ρ — see how a 0.43 vs 0.80 world differ.

Tab 3

SAR / SEM / SDM Simulator

Generate fake Columbus-like crime data with known ρ, λ, and θ. Estimate ρ̂ and λ̂ back from the sample. See how SAR and SEM compete on the same data.

Tab 4

Direct / Indirect / Total Effects

The post's headline table, as a forest plot. Toggle the eight models and the two regressors. See why SDM and SDEM win — and why SAR forces the wrong sign on HOVAL spillovers.

The three key takeaways this app is built around

Spatial autocorrelation is real and substantively important. Moran's I = 0.222 (p = 0.005); ignoring it produces a misspecified OLS. The post's LM tests favour the error form (LM = 5.33 vs 3.40), but the full taxonomy is needed to disentangle local from global spillovers.
SDM and SDEM are the preferred specifications. Both allow the indirect-to-direct ratio to differ across regressors. The SAR forces a constant ratio (≈ 0.75), which the data does not support for HOVAL. SDEM gives a significant negative spillover of neighbour income on local crime (θ̂_W·INC = −1.20, p = 0.036).
The total income effect is 40–55% larger than OLS. OLS estimates a coefficient of −1.60 per \$1,000. SDM and SDEM give total effects of −2.26 to −2.52 — meaning OLS understates the income–crime relationship by ignoring the substantial spillover from neighbouring tracts' wealth.

Glossary (open a card if a term is unfamiliar)

Spatial weight matrix W

A row-standardised matrix encoding which units are neighbours. For Columbus, Queen contiguity: tracts that share any boundary point. Each row sums to 1; the diagonal is zero.

Spatial autocorrelation

The tendency for nearby units to take similar values. Moran's I is the standard scalar measure: near 0 means random, near 1 means strong clustering.

ρ (rho) — spatial lag of y

The coefficient on Wy in SAR. Measures global feedback: a shock to one tract propagates through the network and partially returns to itself. Estimated as 0.428 in the Columbus SAR.

θ (theta) — spatial lag of X

The coefficient on WX in SLX / SDM / SDEM. A local spillover: neighbour income or housing affects this tract directly, without feedback. SDEM gives θ̂_INC = −1.20 (p = 0.036).

λ (lambda) — spatial error

The coefficient on Wu in SEM / SAC / GNS. Captures spatially correlated unobservables. Substantive at 0.562 in SEM; weakens to 0.166 in SAC when ρ is also fitted.

Spatial multiplier (I − ρW)⁻¹

The inverse matrix that propagates a shock through the network when ρ ≠ 0. Diagonal entries amplify own effects; off-diagonals carry the spillover. For ρ = 0.428, average amplification ≈ 1.75.

Direct effect

∂y_i/∂x_i: the change in tract i's crime from a change in tract i's own income, including feedback through neighbours via (I − ρW)⁻¹.

Indirect (spillover) effect

∂y_i/∂x_j: how tract i's crime responds when a different tract j changes its income. Zero in OLS and SEM by construction; substantial in SLX, SDM, SDEM.

SAR vs SEM

SAR: spillovers in the outcome (substantive). SEM: spillovers in the error (nuisance). LM tests prefer SEM here (5.33 vs 3.40); SDM/SDEM tests cannot reject either restriction.

SDM vs SDEM

Both are flexible models nesting SLX. SDM has ρ (global feedback) plus WX. SDEM has λ (correlated errors) plus WX. Non-nested; both fit Columbus well, with comparable spillover estimates.

Spillover Animation — watch the multiplier propagate

A 7×7 lattice of "tracts" (49 cells, matching the Columbus n). Click any cell to drop a unit shock on it. The animation iteratively spreads it through a row-standardised rook spatial weight matrix at rate ρ. After many steps, the system stabilises at (I − ρW)⁻¹ times the initial shock. Slide ρ to see the difference between a weak (ρ = 0.2) and strong (ρ = 0.8) spatial world.

Spatial parameter ρ 0.43

The SAR estimate from Columbus is ρ̂ = 0.428. Try 0 (no spillover) to 0.9 (very strong) to feel the regime.

Iterations 15

Each step propagates the shock by ρ·W to immediate neighbours. ~15 steps suffice for ρ ≤ 0.6.

Click any cell to drop a unit shock there

Heat colour shows the steady-state response of crime in each tract. Brighter = stronger spillover received from the shocked cell.

multiplier amplification

—

≈ 1 / (1 − ρ)

cells with response > 5% of initial shock

—

out of 49 tracts

max response in any cell

—

always on the shocked cell itself

What to look for

At ρ = 0, the shock stays in the originating cell. No spillover. This is OLS.
At ρ = 0.43 (the Columbus SAR estimate), the shock visibly bleeds into the 4 immediate neighbours, then their neighbours, decaying geometrically. The diagonal of (I − ρW)⁻¹ is about 1.13 — the own-tract amplification.
At ρ = 0.8 or above, the entire 7×7 grid lights up. Every tract is meaningfully affected by every other tract's shock. This is the regime where spatial models must be used.
The multiplier amplification ≈ 1 / (1 − ρ) tells you the average diagonal of (I − ρW)⁻¹. At ρ = 0.43, this is ≈ 1.75. The post's §5.1 estimat impact reports this for INC: direct effect = −1.10, bare coefficient = −1.03, so the diagonal is about 1.07.

SAR / SEM / SDM Simulator — generate data, estimate it back

Below, we simulate fake Columbus-like data on a 7×7 lattice (n = 49) with known true parameters ρ (SAR-style feedback), λ (SEM-style error correlation), and θ (SLX-style covariate spillover). We then estimate ρ̂ and λ̂ back from the simulated sample to see what we recover. Slide ρ, λ, or θ and watch Moran's I shift — and watch the estimated parameters track (or fail to track) the truth.

true ρ (spatial lag) 0.40

Set 0 for OLS data, 0.40 for SDM-like, 0.80 for strong global feedback.

true λ (spatial error) 0.00

SEM-style nuisance. The post's SEM estimates λ̂ = 0.56; try 0.5.

true θ on W·INC -1.20

The post's SDEM estimates θ̂_W·INC = −1.20. SLX-channel: local spillover, no feedback.

sample noise σ 5.00

Larger σ = more noise. The Columbus regression Root MSE ≈ 10.

Truth what we simulated

true ρ—

true λ—

true θ_W·INC—

Moran's I of y—

Estimates what we recovered

ρ̂ from a quick concentrated likelihood on a grid; λ̂ from the residual Moran-statistic mapping. Pedagogical, not as accurate as Stata's spregress, ml.

ρ̂ (SAR-like)—

λ̂ (SEM-like)—

OLS β̂_INC—

SDM β̂_INC—

What to look for

Set ρ = 0 and λ = 0: Moran's I should hover near zero. OLS is correct. No spatial model needed.
Set ρ = 0.6, λ = 0: Strong SAR data. ρ̂ should track ρ. OLS β̂_INC looks biased compared to the SDM β̂_INC.
Set ρ = 0, λ = 0.6: Strong SEM data. ρ̂ from a misspecified SAR will be biased (it will catch some of λ). This is exactly the LM-test motivation in §4.3 of the post.
Set ρ = 0.4, λ = 0.4 simultaneously: The GNS regime. Both parameters become very hard to identify (large standard errors), reproducing the post's overparameterization warning in §8.3.

Cross-Sectional Spatial Regression — Interactive Lab

Why spatial models? Crime does not stop at the border.

The shrinkage analogy — why ρ matters

Spillover Animation

SAR / SEM / SDM Simulator

Direct / Indirect / Total Effects

The three key takeaways this app is built around

Glossary (open a card if a term is unfamiliar)

Spillover Animation — watch the multiplier propagate

Click any cell to drop a unit shock there

What to look for

SAR / SEM / SDM Simulator — generate data, estimate it back

Truth what we simulated

Estimates what we recovered

What to look for

The post's headline numbers — interactively

What to look for

Regressor

Models to display

Why does SAR force a constant indirect/direct ratio?

Connecting back to Tab 3