IV in Stata — Interactive Lab

A pedagogical companion to Do Institutions Cause Prosperity? An IV Tutorial in Stata ↗ Back to the post

Why IV? Why settler mortality?

A simple regression of log GDP on a country's institutional quality says that better institutions are associated with higher income — but it cannot tell you which way the arrow points. Maybe rich countries can afford better courts. Maybe geography or culture drives both. The naive slope is correlation, not causation.

Acemoglu, Johnson and Robinson (2001) solve the problem with an instrumental variable: the mortality rate of European settlers during 1500–1900. Their argument: places where Europeans died en masse became extractive colonies; places where they survived became settler colonies with property-rights protections that persist today. Mortality 300 years ago cannot directly determine 1995 GDP — but it can determine 1995 institutions, which determine GDP. That is the IV story in one sentence.

The IV causal diagram

Z is the instrument (settler mortality, logem4). X is the endogenous regressor (modern institutions, avexpr). Y is the outcome (log GDP, logpgp95). U is the bundle of unobserved confounders (geography, culture, human capital). The dashed red arrow is the exclusion restriction: it is the assumption — not a tested fact — that the instrument affects the outcome only through the endogenous regressor.

Tab 2

First-Stage Lab

Drag instrument-strength and sample-size sliders. Watch the scatter, the first-stage F-statistic, and the IV confidence interval tighten or explode in real time.

Tab 3

OLS vs IV Showdown

Same data, two estimators. Run 100 simulations to see whether OLS recovers the truth, whether IV does, and how the gap depends on instrument strength.

Tab 4

Forest Plot

Twelve specifications from the post — OLS plus eleven IV variants. Toggle which families to show. Hover any point for SE, CI, and first-stage F.

Three takeaways the app foregrounds

1. IV > OLS by 81%

The 2SLS coefficient (0.944) is 81% larger than the OLS estimate (0.522). Classical measurement error in the institutional index attenuates OLS toward zero; IV de-noises it and reveals a steeper causal slope.

2. Weak instruments  break IV

The first-stage F = 16.32 is borderline. Drop it below 10 (e.g. by adding geography controls) and confidence intervals balloon while point estimates wander. Strong instruments are the single most important diagnostic.

3. 2SLS = RF ÷ FS

The whole IV machinery is one division: reduced-form slope (−0.573) divided by first-stage slope (−0.607) gives 0.944 — exactly the 2SLS estimate. IV is just this ratio under the hood.

Glossary (open a card if a term is unfamiliar)

Endogenous regressor
A right-hand-side variable that is correlated with the error term. OLS gives the wrong coefficient on it, even with infinite data. Here: avexpr.
Instrument (Z)
A variable that affects Y only through X. Three conditions: relevance (Z moves X), exclusion (no direct arrow Z → Y), exogeneity (Z ⊥ U). Here: logem4.
First stage
Regression of X on Z (and controls). Its slope tells you whether Z is relevant. Its F-statistic tells you whether Z is strong.
Reduced form
Regression of Y on Z directly. Its slope is the total effect of the instrument on the outcome. 2SLS = reduced form ÷ first stage.
2SLS
Two-Stage Least Squares. Stage 1: predict X from Z. Stage 2: regress Y on the predicted X. Stata's ivreg2 does both.
Weak instrument
Z is only weakly correlated with X. Even with infinite n, weak IV gives huge SEs and biased estimates. Rule of thumb: first-stage F > 10.
Exclusion restriction
The untestable assumption that Z does not appear in the outcome equation. It is the heart of every IV paper and has to be defended substantively, not statistically.
LATE vs ATE
Under heterogeneous effects, 2SLS identifies the Local Average Treatment Effect — the effect for compliers, not for everyone. Imbens-Angrist (1994).

First-Stage Lab — how strong is the instrument?

Generate a fresh IV dataset under your chosen parameters. The left chart is the first stage (X on Z); the right is the reduced form (Y on Z). The bigger the first-stage slope, the stronger the instrument; the closer the ratio of slopes to the true effect, the better IV is doing.

The AJR sample is 64 ex-colonies. Slide up to see how more data tightens both regressions.
First-stage slope of X on Z. AJR's π̂ = −0.607. Slide toward 0 and the instrument becomes weak.
The effect we are trying to recover. AJR's 2SLS estimate is 0.944. Slide to compare.
Strength of the unobserved confounder U. Larger γ = more OLS bias. IV is supposed to be immune.
First-stage slope π̂
vs true π
First-stage F
rule of thumb: F > 10
Reduced-form slope
Y on Z directly
2SLS β̂ = RF / FS
vs true β
OLS β̂
biased by U

What to look for

  • Slide π toward 0: the left scatter flattens, F drops, and the 2SLS β̂ goes wild — the ratio RF/FS is unstable when the denominator is near zero.
  • Slide γ up: the orange OLS estimate drifts away from the true β, while the teal IV estimate stays roughly on track. That is the whole point of IV.
  • n = 64 (AJR's sample) produces F ≈ 10–20 for π = −0.6 — exactly the borderline-strong regime the post discusses. Try n = 200 and the F becomes comfortable.
  • The 2SLS = RF/FS identity: divide the two right-hand statistics; you get the same number as IV β̂. The IV estimator is one division.

OLS vs IV Showdown — 100 simulations

A single dataset is noisy. To see whether OLS is systematically biased and IV is systematically unbiased, we need to repeat the experiment. Below, pick parameters and click Run 100 simulations: each simulation draws a fresh dataset under the same DGP, estimates both OLS and IV, and stacks the histograms.

Capped at 300 so 100 sims finish in < 300 ms.
Stronger = more reliable IV. The AJR baseline is π = −0.607.
Held fixed across the 100 simulations.
Larger γ widens the gap between OLS and IV.

OLS on the same data

mean β̂
bias (mean − true)
sd(β̂)
RMSE

IV (2SLS) on the same data

mean β̂
bias (mean − true)
sd(β̂)
RMSE
mean first-stage F

Distribution of β̂ across 100 simulated datasets

Why does this happen?

  • OLS is biased by U. The orange histogram drifts away from the true β by an amount that scales with γ. Even after 100 datasets, OLS averages to the wrong number.
  • IV is unbiased on average. The teal histogram centres on β. Its mean is correct even though any single estimate can be far off.
  • IV pays for unbiasedness with variance. The teal histogram is wider than the orange one. That is the IV bargain: trade bias for variance. When the instrument is strong (large |π|), the trade-off is favourable; when it is weak, IV's variance explodes.
  • Slide π toward 0 and rerun: the teal histogram becomes huge and skewed — the classic weak-instrument pathology that Tables 6–7 of the post warn about.

The post's forest plot — interactively

These twelve estimates come from the post's tables 2–8. Each is the coefficient on avexpr from one regression spec; the orange bar at the top is OLS and the eleven below are IV variants. Toggle which families to show; hover any point for SE, CI, and the first-stage F.

Spec families

Estimators

What to look for

  • The orange OLS bar sits at 0.522 with the tightest CI in the chart. Every IV variant has a wider CI but a larger point estimate — the post's headline that IV > OLS by 81% is visible in one glance.
  • Health-control specs collapse toward OLS (0.55–0.69). The post (§9) flags this as the most uncomfortable robustness check: bad control or genuine exclusion violation? The data can't tell you, but the visual shrinkage is real.
  • Hover any bar for the first-stage F. Notice that "+ resources" and "+ malaria" have F < 5 — their wide CIs are not a coincidence, they are weak-instrument pathology.
  • Toggle "+ geography" off and the remaining IV variants all sit in the 0.55–1.08 range with F > 11 — the cleaner, more defensible estimates.

Connecting back to Tab 3

The OLS-vs-IV gap you watched grow with γ in the simulation is exactly the gap you see here on real data:

  • OLS (orange): β̂ = 0.522, CI = [0.424, 0.620] — tight but biased.
  • IV main (steel): β̂ = 0.944, CI = [0.599, 1.289] — wider but unbiased under AJR's exclusion restriction.
  • The gap is 0.422 in log-GDP units — a country at avexpr = 8 is predicted to earn 33% more under IV than under OLS, for the same institutional gap.

That is the post's punchline: institutional reform is roughly twice as valuable as naive cross-country regressions suggest — provided you trust the exclusion restriction.