Institutions, Settler Mortality, and IV

Why instrument institutions with settler mortality?

Cross-country plots show that rich countries have better property-rights institutions, but the slope cannot prove that institutions cause prosperity — reverse causality, omitted variables, and measurement error all bias naive OLS. Acemoglu, Johnson and Robinson (2001) propose a famous instrument: the mortality rate of European settlers circa 1500–1900. Mortality shaped which colonies became extractive vs settler — and thus what kind of institutions those countries inherited — but cannot directly affect 1995 GDP except through institutions. That untestable assumption is the exclusion restriction.

On the AJR 64-country sample, IV gives β̂ = 0.944 — 81% larger than the OLS slope of 0.522. The Wu-Hausman test rejects OLS at p < 0.0001. The first-stage F is 16.85 — borderline-strong, almost exactly at the Stock-Yogo 10% threshold. This app lets you turn the dials. In four tabs you will inspect the identification DAG; slide instrument strength and watch IV move from precise to wild; race OLS against IV under simulated confounding; and toggle the post's 14 specifications side-by-side on a single forest plot.

The IV identification strategy at a glance

The diagram below is the heart of every IV paper. The instrument Z (settler mortality) is allowed to affect the endogenous regressor X (institutions); X is allowed to affect the outcome Y (log GDP); but Z must not have a direct arrow into Y. Unobserved confounders U can freely contaminate X and Y — that is the whole reason OLS is biased. The orange dot is a particle traveling the allowed Z → X → Y path. The dashed red arrow shows the path forbidden by the exclusion restriction.

Tab 2

Instrument Strength

Drag the first-stage slope of Z on X from weak to strong. Watch the IV slope's confidence band collapse, and notice when OLS bias still dominates.

Tab 3

OLS vs IV Simulator

Confound treatment and outcome with shared unobservables. Run 100 simulations to compare the OLS and IV sampling distributions side-by-side.

Tab 4

Forest Plot — 14 specifications

OLS baseline, IV with logem4, IV with colonial/legal/religious/geographic/health controls, and IV with alternative instruments — all on one axis with 95% CIs.

Glossary (open a card if a term is unfamiliar)

Endogeneity

A regressor is endogenous when it is correlated with the error term. In our context, $\textit{avexpr}$ is endogenous: it is jointly determined with GDP, shares unobserved confounders with GDP, and is measured imperfectly. The Wu-Hausman test rejects OLS consistency at $p < 0.0001$ ($F = 24.22$).

Instrumental variable (Z)

A variable that affects the outcome $Y$ only through the endogenous regressor $X$. Three conditions: relevance (Z and X correlated), exclusion (Z has no direct arrow into Y), exogeneity (Z is uncorrelated with the error term). Coin-flip eligibility for a drug trial: the flip influences recovery only through whether the patient took the drug.

Two-Stage Least Squares (2SLS)

Stage 1: regress endogenous $X$ on instrument $Z$ to get $\hat X$. Stage 2: regress $Y$ on $\hat X$ to get the IV coefficient. Filtering muddy water through a sieve — the sieve catches the confounding; what passes through is the clean signal.

First stage and reduced form

The first stage regresses $X$ on $Z$. The reduced form regresses $Y$ directly on $Z$. With one instrument, the 2SLS coefficient equals the ratio $\hat\beta\_{IV} = \hat\beta\_{RF} / \hat\beta\_{FS}$. In the post: $-0.573 / -0.607 = 0.944$.

Weak instrument

An instrument that only weakly predicts the endogenous regressor. Conventional rule (Staiger-Stock 1997): first-stage $F > 10$. Stock-Yogo (2005) tighten this to $F > 16.38$ for 10% maximal IV size distortion. Weak instruments produce IV estimates with huge SEs and substantial finite-sample bias — a radio antenna picking up mostly static.

LATE vs ATE

Under heterogeneous effects, 2SLS does not identify the population ATE. Imbens-Angrist (1994): 2SLS identifies the Local Average Treatment Effect — the effect for "compliers", units whose treatment would change with the instrument. Our 0.944 applies to countries whose institutions would have been different had settler mortality been different — not to never-colonized countries.

Exclusion restriction

The untestable heart of every IV: the instrument $Z$ affects $Y$ only through $X$. If settler mortality also directly affects modern GDP via, say, a malaria channel, the exclusion restriction fails. The dashed red arrow in the DAG above.

Hansen J / Sargan test

With more instruments than endogenous regressors, the joint exogeneity of the instrument set is partially testable. If two instruments disagree on the causal effect, the test rejects. In Panel C of Tab 8, Hansen J p-values 0.18–0.79 across five alternative instrument pairs uniformly fail to reject — modest support for AJR's exclusion restriction.

Wu-Hausman endogeneity test

A formal test of whether OLS is consistent. Compares OLS and IV estimates: if the gap is large relative to standard errors, OLS is rejected as biased. $F = 24.22$, $p < 0.0001$ in the AJR main spec — the data say OLS is biased, IV is empirically warranted.

Instrument strength — when IV works and when it breaks

The engine of IV is the first-stage relationship $X = \pi Z + v$. If $\pi$ is large (strong instrument), Z extracts plenty of clean variation in X and the IV estimator is precise. If $\pi$ is small (weak instrument), the same noise gets amplified by the $1/\pi$ rescaling and the IV estimator can be more biased than OLS in finite samples. Drag the instrument strength slider and watch the teal IV line move from tightly tracking the true β to wildly swinging. The orange OLS line, by contrast, stays stubbornly biased in the same place — instrument strength does not help OLS.

Sample size n 200

More countries → tighter OLS and tighter IV. The AJR sample has n = 64.

Instrument strength π 0.80

First-stage slope of $Z$ on $X$. AJR's $\hat\pi = -0.607$ in absolute value $\approx 0.6$.

Confounding strength γ 0.60

How much the unobserved $U$ pushes both $X$ and $Y$. γ = 0 → OLS is unbiased; γ > 0 → OLS is biased.

True β 0.94

The causal effect of $X$ on $Y$ we are trying to recover. Default = 0.94 (AJR's IV estimate).

OLS β̂

—

SE —

IV β̂

—

SE —

First-stage F

—

Stock-Yogo 10% = 16.38

True β

0.94

held fixed for comparison

What to look for

OLS is biased whenever γ > 0. Slide γ from 0 to 1.5: the orange OLS line drifts well above the dashed true-β line, while the teal IV line stays roughly centred. The IV — OLS gap is the diagnostic that something is contaminating OLS.
Strong instruments give precise IV. With $\pi = 1.5$ (very strong) the IV slope hugs the truth tightly. With $\pi = 0.1$ (very weak), the IV slope swings wildly and the first-stage F drops below 10 — read the SE, not the point estimate.
OLS is unaffected by π. The OLS line ignores the instrument entirely. Improving your instrument doesn't help OLS — only switching to IV does.
The "bias-precision trade-off" of IV. When γ is large but π is weak, IV may have more total error (bias² + variance) than OLS. This is the AJR caveat: in Tab 7 health-channel specs, first-stage F drops below 5 and the IV CIs widen to uselessness.

OLS vs IV — bias and variance over many simulations

A single estimate is noisy. The deeper question is: on average, which method finds the truth? Draw the same DGP 100 times with fresh random shocks, estimate β by both OLS and IV each time, and compare the two sampling distributions. The orange OLS histogram clusters around a biased centre (its mean drifts from the truth by γ). The teal IV histogram clusters near the true β — wider, but right on target. Click "Run 100 simulations" below.

Sample size n 120

Capped at 300 so 100 sims finish under a second.

Instrument strength π 0.70

Higher π → tighter IV histogram.

Confounding γ 0.80

Higher γ → bigger OLS bias.

True β 0.94

Default = 0.94, the AJR main IV estimate.

OLS

Regress $Y$ on $X$ directly. Ignores the instrument.

single β̂—

mean β̂ over 100 sims—

sd(β̂)—

bias—

IV (2SLS)

Use $\hat X = \hat\pi Z$ from the first stage as a clean stand-in for $X$.

single β̂—

mean β̂ over 100 sims—

sd(β̂)—

bias—

Run the experiment

100 fresh datasets with the parameters above. Each gives one OLS β̂ and one IV β̂. The histogram below stacks them.

What to look for

OLS is biased on average. Its histogram centre drifts above the dashed true-β line by roughly γ. The bias does not vanish with more sims — it is systematic.
IV is approximately unbiased. Its histogram centres on the truth (subject to a small finite-sample bias that shrinks with $n \cdot \pi^2$).
IV pays for unbiasedness with variance. The teal histogram is wider than the orange one. When π is small, the variance penalty can be huge — the histogram spreads across the chart.
The post's AJR result lives at the right edge of this trade-off. With $n = 64$ and $\pi \approx 0.6$, IV is roughly unbiased but the SE is large (0.176 vs OLS's 0.05). The 81% gap between OLS (0.522) and IV (0.944) is interpretable only because the instrument is borderline strong.

Institutions, Settler Mortality, and IV — Interactive Lab

Why instrument institutions with settler mortality?

The IV identification strategy at a glance

Instrument Strength

OLS vs IV Simulator

Forest Plot — 14 specifications

Glossary (open a card if a term is unfamiliar)

Instrument strength — when IV works and when it breaks

What to look for

OLS vs IV — bias and variance over many simulations

OLS

IV (2SLS)

Run the experiment

What to look for

The post's headline estimates — 14 specifications interactively

What to look for

Outcomes

Methods (uncheck to declutter)

Connecting back to Tab 2 & Tab 3

Why are the weak-IV specs still on the chart?