Why IV? Why settler mortality?
A simple regression of log GDP on a country's institutional quality says that better institutions are associated with higher income — but it cannot tell you which way the arrow points. Maybe rich countries can afford better courts. Maybe geography or culture drives both. The naive slope is correlation, not causation.
Acemoglu, Johnson and Robinson (2001) solve the problem with an instrumental variable: the mortality rate of European settlers during 1500–1900. Their argument: places where Europeans died en masse became extractive colonies; places where they survived became settler colonies with property-rights protections that persist today. Mortality 300 years ago cannot directly determine 1995 GDP — but it can determine 1995 institutions, which determine GDP. That is the IV story in one sentence.
The IV causal diagram
Z is the instrument (settler mortality, logem4).
X is the endogenous regressor (modern institutions, avexpr).
Y is the outcome (log GDP, logpgp95).
U is the bundle of unobserved confounders (geography,
culture, human capital). The dashed red arrow is the
exclusion restriction: it is the assumption — not a tested
fact — that the instrument affects the outcome only through the
endogenous regressor.
First-Stage Lab
Drag instrument-strength and sample-size sliders. Watch the scatter, the first-stage F-statistic, and the IV confidence interval tighten or explode in real time.
OLS vs IV Showdown
Same data, two estimators. Run 100 simulations to see whether OLS recovers the truth, whether IV does, and how the gap depends on instrument strength.
Forest Plot
Twelve specifications from the post — OLS plus eleven IV variants. Toggle which families to show. Hover any point for SE, CI, and first-stage F.
Three takeaways the app foregrounds
1. IV > OLS by 81%
The 2SLS coefficient (0.944) is 81% larger than the OLS estimate (0.522). Classical measurement error in the institutional index attenuates OLS toward zero; IV de-noises it and reveals a steeper causal slope.
2. Weak instruments break IV
The first-stage F = 16.32 is borderline. Drop it below 10 (e.g. by adding geography controls) and confidence intervals balloon while point estimates wander. Strong instruments are the single most important diagnostic.
3. 2SLS = RF ÷ FS
The whole IV machinery is one division: reduced-form slope (−0.573) divided by first-stage slope (−0.607) gives 0.944 — exactly the 2SLS estimate. IV is just this ratio under the hood.
Glossary (open a card if a term is unfamiliar)
Endogenous regressor
avexpr.Instrument (Z)
logem4.First stage
Reduced form
2SLS
ivreg2 does both.Weak instrument
Exclusion restriction
LATE vs ATE
First-Stage Lab — how strong is the instrument?
Generate a fresh IV dataset under your chosen parameters. The left chart is the first stage (X on Z); the right is the reduced form (Y on Z). The bigger the first-stage slope, the stronger the instrument; the closer the ratio of slopes to the true effect, the better IV is doing.
What to look for
- Slide π toward 0: the left scatter flattens, F drops, and the 2SLS β̂ goes wild — the ratio RF/FS is unstable when the denominator is near zero.
- Slide γ up: the orange OLS estimate drifts away from the true β, while the teal IV estimate stays roughly on track. That is the whole point of IV.
- n = 64 (AJR's sample) produces F ≈ 10–20 for π = −0.6 — exactly the borderline-strong regime the post discusses. Try n = 200 and the F becomes comfortable.
- The 2SLS = RF/FS identity: divide the two right-hand statistics; you get the same number as IV β̂. The IV estimator is one division.
OLS vs IV Showdown — 100 simulations
A single dataset is noisy. To see whether OLS is systematically biased and IV is systematically unbiased, we need to repeat the experiment. Below, pick parameters and click Run 100 simulations: each simulation draws a fresh dataset under the same DGP, estimates both OLS and IV, and stacks the histograms.
OLS on the same data
IV (2SLS) on the same data
Distribution of β̂ across 100 simulated datasets
Why does this happen?
- OLS is biased by U. The orange histogram drifts away from the true β by an amount that scales with γ. Even after 100 datasets, OLS averages to the wrong number.
- IV is unbiased on average. The teal histogram centres on β. Its mean is correct even though any single estimate can be far off.
- IV pays for unbiasedness with variance. The teal histogram is wider than the orange one. That is the IV bargain: trade bias for variance. When the instrument is strong (large |π|), the trade-off is favourable; when it is weak, IV's variance explodes.
- Slide π toward 0 and rerun: the teal histogram becomes huge and skewed — the classic weak-instrument pathology that Tables 6–7 of the post warn about.
The post's forest plot — interactively
These twelve estimates come from the post's tables 2–8. Each is the
coefficient on avexpr from one regression spec; the orange
bar at the top is OLS and the eleven below are IV variants. Toggle which
families to show; hover any point for SE, CI, and the first-stage F.
Spec families
Estimators
What to look for
- The orange OLS bar sits at 0.522 with the tightest CI in the chart. Every IV variant has a wider CI but a larger point estimate — the post's headline that IV > OLS by 81% is visible in one glance.
- Health-control specs collapse toward OLS (0.55–0.69). The post (§9) flags this as the most uncomfortable robustness check: bad control or genuine exclusion violation? The data can't tell you, but the visual shrinkage is real.
- Hover any bar for the first-stage F. Notice that "+ resources" and "+ malaria" have F < 5 — their wide CIs are not a coincidence, they are weak-instrument pathology.
- Toggle "+ geography" off and the remaining IV variants all sit in the 0.55–1.08 range with F > 11 — the cleaner, more defensible estimates.
Connecting back to Tab 3
The OLS-vs-IV gap you watched grow with γ in the simulation is exactly the gap you see here on real data:
- OLS (orange): β̂ = 0.522, CI = [0.424, 0.620] — tight but biased.
- IV main (steel): β̂ = 0.944, CI = [0.599, 1.289] — wider but unbiased under AJR's exclusion restriction.
- The gap is 0.422 in log-GDP units — a country at
avexpr= 8 is predicted to earn 33% more under IV than under OLS, for the same institutional gap.
That is the post's punchline: institutional reform is roughly twice as valuable as naive cross-country regressions suggest — provided you trust the exclusion restriction.