Cross-Sectional Spatial Regression — Interactive Lab

A pedagogical companion to Cross-Sectional Spatial Regression in Stata: Crime in Columbus Neighborhoods ↗ Back to the post

Why spatial models? Crime does not stop at the border.

A neighborhood's crime rate depends not only on its own income and housing values but also on conditions in adjacent neighborhoods — through displacement (criminals move next door), diffusion (networks span borders), and shared exposure to risk factors. Ordinary least squares treats each neighborhood as independent and misses these spatial spillovers.

The Columbus crime data has Moran's I = 0.222 (p = 0.005), confirming that OLS residuals cluster geographically. This app lets you turn the dials yourself. Sweep ρ and λ in a SAR / SEM / SDM simulator, watch a shock ripple through the spatial multiplier (I − ρW)−1, and compare direct/indirect/total effects across the full eight-model taxonomy on the post's actual estimates.

The shrinkage analogy — why ρ matters

Both ρ (spatial lag) and λ (spatial error) shrink the "extra" effect of a shock as it propagates through neighbours. The animation below sweeps a penalty knob (here, the spatial parameter) from zero to one. The orange curve mimics the SAR-type multiplier 1 / (1 − ρ) behaviour (amplifies); the steel dashed curve shows the complementary direction (compression).

Tab 2

Spillover Animation

Drop a shock on one cell of a 7×7 lattice. Watch the spatial multiplier (I − ρW)−1 propagate it. Vary ρ — see how a 0.43 vs 0.80 world differ.

Tab 3

SAR / SEM / SDM Simulator

Generate fake Columbus-like crime data with known ρ, λ, and θ. Estimate ρ̂ and λ̂ back from the sample. See how SAR and SEM compete on the same data.

Tab 4

Direct / Indirect / Total Effects

The post's headline table, as a forest plot. Toggle the eight models and the two regressors. See why SDM and SDEM win — and why SAR forces the wrong sign on HOVAL spillovers.

The three key takeaways this app is built around

  1. Spatial autocorrelation is real and substantively important. Moran's I = 0.222 (p = 0.005); ignoring it produces a misspecified OLS. The post's LM tests favour the error form (LM = 5.33 vs 3.40), but the full taxonomy is needed to disentangle local from global spillovers.
  2. SDM and SDEM are the preferred specifications. Both allow the indirect-to-direct ratio to differ across regressors. The SAR forces a constant ratio (≈ 0.75), which the data does not support for HOVAL. SDEM gives a significant negative spillover of neighbour income on local crime (θ̂W·INC = −1.20, p = 0.036).
  3. The total income effect is 40–55% larger than OLS. OLS estimates a coefficient of −1.60 per \$1,000. SDM and SDEM give total effects of −2.26 to −2.52 — meaning OLS understates the income–crime relationship by ignoring the substantial spillover from neighbouring tracts' wealth.

Glossary (open a card if a term is unfamiliar)

Spatial weight matrix W
A row-standardised matrix encoding which units are neighbours. For Columbus, Queen contiguity: tracts that share any boundary point. Each row sums to 1; the diagonal is zero.
Spatial autocorrelation
The tendency for nearby units to take similar values. Moran's I is the standard scalar measure: near 0 means random, near 1 means strong clustering.
ρ (rho) — spatial lag of y
The coefficient on Wy in SAR. Measures global feedback: a shock to one tract propagates through the network and partially returns to itself. Estimated as 0.428 in the Columbus SAR.
θ (theta) — spatial lag of X
The coefficient on WX in SLX / SDM / SDEM. A local spillover: neighbour income or housing affects this tract directly, without feedback. SDEM gives θ̂INC = −1.20 (p = 0.036).
λ (lambda) — spatial error
The coefficient on Wu in SEM / SAC / GNS. Captures spatially correlated unobservables. Substantive at 0.562 in SEM; weakens to 0.166 in SAC when ρ is also fitted.
Spatial multiplier (I − ρW)−1
The inverse matrix that propagates a shock through the network when ρ ≠ 0. Diagonal entries amplify own effects; off-diagonals carry the spillover. For ρ = 0.428, average amplification ≈ 1.75.
Direct effect
∂yi/∂xi: the change in tract i's crime from a change in tract i's own income, including feedback through neighbours via (I − ρW)−1.
Indirect (spillover) effect
∂yi/∂xj: how tract i's crime responds when a different tract j changes its income. Zero in OLS and SEM by construction; substantial in SLX, SDM, SDEM.
SAR vs SEM
SAR: spillovers in the outcome (substantive). SEM: spillovers in the error (nuisance). LM tests prefer SEM here (5.33 vs 3.40); SDM/SDEM tests cannot reject either restriction.
SDM vs SDEM
Both are flexible models nesting SLX. SDM has ρ (global feedback) plus WX. SDEM has λ (correlated errors) plus WX. Non-nested; both fit Columbus well, with comparable spillover estimates.

Spillover Animation — watch the multiplier propagate

A 7×7 lattice of "tracts" (49 cells, matching the Columbus n). Click any cell to drop a unit shock on it. The animation iteratively spreads it through a row-standardised rook spatial weight matrix at rate ρ. After many steps, the system stabilises at (I − ρW)−1 times the initial shock. Slide ρ to see the difference between a weak (ρ = 0.2) and strong (ρ = 0.8) spatial world.

The SAR estimate from Columbus is ρ̂ = 0.428. Try 0 (no spillover) to 0.9 (very strong) to feel the regime.
Each step propagates the shock by ρ·W to immediate neighbours. ~15 steps suffice for ρ ≤ 0.6.

Click any cell to drop a unit shock there

Heat colour shows the steady-state response of crime in each tract. Brighter = stronger spillover received from the shocked cell.

multiplier amplification
≈ 1 / (1 − ρ)
cells with response > 5% of initial shock
out of 49 tracts
max response in any cell
always on the shocked cell itself

What to look for

  • At ρ = 0, the shock stays in the originating cell. No spillover. This is OLS.
  • At ρ = 0.43 (the Columbus SAR estimate), the shock visibly bleeds into the 4 immediate neighbours, then their neighbours, decaying geometrically. The diagonal of (I − ρW)−1 is about 1.13 — the own-tract amplification.
  • At ρ = 0.8 or above, the entire 7×7 grid lights up. Every tract is meaningfully affected by every other tract's shock. This is the regime where spatial models must be used.
  • The multiplier amplification1 / (1 − ρ) tells you the average diagonal of (I − ρW)−1. At ρ = 0.43, this is ≈ 1.75. The post's §5.1 estimat impact reports this for INC: direct effect = −1.10, bare coefficient = −1.03, so the diagonal is about 1.07.

SAR / SEM / SDM Simulator — generate data, estimate it back

Below, we simulate fake Columbus-like data on a 7×7 lattice (n = 49) with known true parameters ρ (SAR-style feedback), λ (SEM-style error correlation), and θ (SLX-style covariate spillover). We then estimate ρ̂ and λ̂ back from the simulated sample to see what we recover. Slide ρ, λ, or θ and watch Moran's I shift — and watch the estimated parameters track (or fail to track) the truth.

Set 0 for OLS data, 0.40 for SDM-like, 0.80 for strong global feedback.
SEM-style nuisance. The post's SEM estimates λ̂ = 0.56; try 0.5.
The post's SDEM estimates θ̂W·INC = −1.20. SLX-channel: local spillover, no feedback.
Larger σ = more noise. The Columbus regression Root MSE ≈ 10.

Truth what we simulated

true ρ
true λ
true θW·INC
Moran's I of y

Estimates what we recovered

ρ̂ from a quick concentrated likelihood on a grid; λ̂ from the residual Moran-statistic mapping. Pedagogical, not as accurate as Stata's spregress, ml.

ρ̂ (SAR-like)
λ̂ (SEM-like)
OLS β̂INC
SDM β̂INC

What to look for

  • Set ρ = 0 and λ = 0: Moran's I should hover near zero. OLS is correct. No spatial model needed.
  • Set ρ = 0.6, λ = 0: Strong SAR data. ρ̂ should track ρ. OLS β̂INC looks biased compared to the SDM β̂INC.
  • Set ρ = 0, λ = 0.6: Strong SEM data. ρ̂ from a misspecified SAR will be biased (it will catch some of λ). This is exactly the LM-test motivation in §4.3 of the post.
  • Set ρ = 0.4, λ = 0.4 simultaneously: The GNS regime. Both parameters become very hard to identify (large standard errors), reproducing the post's overparameterization warning in §8.3.

The post's headline numbers — interactively

These estimates come straight from the post — §5 (SAR, SEM), §6 (SLX, SDM), §8 (SDEM, SAC, GNS), and the §9.2 comparison table. The y-axis lists the eight spatial models. The facets are the three impact components: direct, indirect (spillover), and total. Toggle which models and which regressor (INC or HOVAL) to display.

What to look for

  • Toggle OLS only: watch the indirect-effect facet collapse to zero. OLS has no spillover channel by construction. SEM is similar: zero indirect by construction (the error structure does not propagate to outcomes).
  • Toggle SLX, SDM, SDEM together: for INC, all three give large negative indirect effects (−1.20 to −1.50). For HOVAL, all three give small positive insignificant indirect effects. This agreement across non-nested models is the §9.1 robustness argument.
  • Switch the regressor to HOVAL: notice that SAR forces a negative indirect effect (−0.20), proportional to its direct effect. The SLX, SDM, and SDEM give positive indirect effects. The SAR's proportional-ratio constraint is doing harm here.
  • Compare total effects: for INC, OLS gives −1.60. SDM/SDEM give −2.26 to −2.52. That is the 40–55% understatement the post highlights as the headline finding.

Regressor

Models to display

Why does SAR force a constant indirect/direct ratio?

In SAR, all spillovers come from the spatial multiplier (I − ρW)−1. That matrix is the same for every regressor — so the ratio of off-diagonal to diagonal terms is identical across INC and HOVAL. The bare coefficient multiplies through both. In Columbus, this forces HOVAL's indirect effect to be ≈ 0.69 × HOVAL's direct effect, with the same sign (negative). But the SLX, SDM, and SDEM all estimate the HOVAL indirect freely — and find it small and positive. The data prefers the unrestricted ratio.

Connecting back to Tab 3

The DGP simulator in Tab 3 generates data with known ρ, λ, θ. When you set θ ≠ 0, an OLS estimator hides the spillover entirely. When you set ρ ≠ 0, a SAR recovers the spillover but forces it to be proportional across regressors. The forest plot above is the real-data counterpart: every row shows what each estimator finds when the truth is unknown but spatial.

  • INC total effect: OLS = −1.60; SAR = −1.86; SDM = −2.52; SDEM = −2.26. Spread of 0.92 across models.
  • INC direct effect: all spatial models cluster between −0.94 and −1.10. The direct effect is robust to specification choice.
  • INC indirect effect: 0 (OLS, SEM), −0.76 (SAR), −1.20 to −1.50 (SLX, SDM, SDEM). Identifying the indirect effect is the value-added of spatial econometrics.