IV in Stata — Interactive Lab

Why IV? Why settler mortality?

A simple regression of log GDP on a country's institutional quality says that better institutions are associated with higher income — but it cannot tell you which way the arrow points. Maybe rich countries can afford better courts. Maybe geography or culture drives both. The naive slope is correlation, not causation.

Acemoglu, Johnson and Robinson (2001) solve the problem with an instrumental variable: the mortality rate of European settlers during 1500–1900. Their argument: places where Europeans died en masse became extractive colonies; places where they survived became settler colonies with property-rights protections that persist today. Mortality 300 years ago cannot directly determine 1995 GDP — but it can determine 1995 institutions, which determine GDP. That is the IV story in one sentence.

The IV causal diagram

Z is the instrument (settler mortality, logem4). X is the endogenous regressor (modern institutions, avexpr). Y is the outcome (log GDP, logpgp95). U is the bundle of unobserved confounders (geography, culture, human capital). The dashed red arrow is the exclusion restriction: it is the assumption — not a tested fact — that the instrument affects the outcome only through the endogenous regressor.

Tab 2

First-Stage Lab

Drag instrument-strength and sample-size sliders. Watch the scatter, the first-stage F-statistic, and the IV confidence interval tighten or explode in real time.

Tab 3

OLS vs IV Showdown

Same data, two estimators. Run 100 simulations to see whether OLS recovers the truth, whether IV does, and how the gap depends on instrument strength.

Tab 4

Forest Plot

Twelve specifications from the post — OLS plus eleven IV variants. Toggle which families to show. Hover any point for SE, CI, and first-stage F.

Three takeaways the app foregrounds

1. IV > OLS by 81%

The 2SLS coefficient (0.944) is 81% larger than the OLS estimate (0.522). Classical measurement error in the institutional index attenuates OLS toward zero; IV de-noises it and reveals a steeper causal slope.

2. Weak instruments break IV

The first-stage F = 16.32 is borderline. Drop it below 10 (e.g. by adding geography controls) and confidence intervals balloon while point estimates wander. Strong instruments are the single most important diagnostic.

3. 2SLS = RF ÷ FS

The whole IV machinery is one division: reduced-form slope (−0.573) divided by first-stage slope (−0.607) gives 0.944 — exactly the 2SLS estimate. IV is just this ratio under the hood.

Glossary (open a card if a term is unfamiliar)

Endogenous regressor

A right-hand-side variable that is correlated with the error term. OLS gives the wrong coefficient on it, even with infinite data. Here: avexpr.

Instrument (Z)

A variable that affects Y only through X. Three conditions: relevance (Z moves X), exclusion (no direct arrow Z → Y), exogeneity (Z ⊥ U). Here: logem4.

First stage

Regression of X on Z (and controls). Its slope tells you whether Z is relevant. Its F-statistic tells you whether Z is strong.

Reduced form

Regression of Y on Z directly. Its slope is the total effect of the instrument on the outcome. 2SLS = reduced form ÷ first stage.

2SLS

Two-Stage Least Squares. Stage 1: predict X from Z. Stage 2: regress Y on the predicted X. Stata's ivreg2 does both.

Weak instrument

Z is only weakly correlated with X. Even with infinite n, weak IV gives huge SEs and biased estimates. Rule of thumb: first-stage F > 10.

Exclusion restriction

The untestable assumption that Z does not appear in the outcome equation. It is the heart of every IV paper and has to be defended substantively, not statistically.

LATE vs ATE

Under heterogeneous effects, 2SLS identifies the Local Average Treatment Effect — the effect for compliers, not for everyone. Imbens-Angrist (1994).

First-Stage Lab — how strong is the instrument?

Generate a fresh IV dataset under your chosen parameters. The left chart is the first stage (X on Z); the right is the reduced form (Y on Z). The bigger the first-stage slope, the stronger the instrument; the closer the ratio of slopes to the true effect, the better IV is doing.

Sample size n 64

The AJR sample is 64 ex-colonies. Slide up to see how more data tightens both regressions.

Instrument strength π -0.60

First-stage slope of X on Z. AJR's π̂ = −0.607. Slide toward 0 and the instrument becomes weak.

True causal effect β 0.94

The effect we are trying to recover. AJR's 2SLS estimate is 0.944. Slide to compare.

Confounding γ 0.50

Strength of the unobserved confounder U. Larger γ = more OLS bias. IV is supposed to be immune.

First-stage slope π̂

—

vs true π

First-stage F

—

rule of thumb: F > 10

Reduced-form slope

—

Y on Z directly

2SLS β̂ = RF / FS

—

vs true β

OLS β̂

—

biased by U

What to look for

Slide π toward 0: the left scatter flattens, F drops, and the 2SLS β̂ goes wild — the ratio RF/FS is unstable when the denominator is near zero.
Slide γ up: the orange OLS estimate drifts away from the true β, while the teal IV estimate stays roughly on track. That is the whole point of IV.
n = 64 (AJR's sample) produces F ≈ 10–20 for π = −0.6 — exactly the borderline-strong regime the post discusses. Try n = 200 and the F becomes comfortable.
The 2SLS = RF/FS identity: divide the two right-hand statistics; you get the same number as IV β̂. The IV estimator is one division.

OLS vs IV Showdown — 100 simulations

A single dataset is noisy. To see whether OLS is systematically biased and IV is systematically unbiased, we need to repeat the experiment. Below, pick parameters and click Run 100 simulations: each simulation draws a fresh dataset under the same DGP, estimates both OLS and IV, and stacks the histograms.

Sample size n 64

Capped at 300 so 100 sims finish in < 300 ms.

Instrument strength π -0.60

Stronger = more reliable IV. The AJR baseline is π = −0.607.

True causal effect β 0.94

Held fixed across the 100 simulations.

Confounding γ 0.50

Larger γ widens the gap between OLS and IV.

OLS on the same data

mean β̂—

bias (mean − true)—

sd(β̂)—

RMSE—

IV (2SLS) on the same data

mean β̂—

bias (mean − true)—

sd(β̂)—

RMSE—

mean first-stage F—

Distribution of β̂ across 100 simulated datasets

Why does this happen?

OLS is biased by U. The orange histogram drifts away from the true β by an amount that scales with γ. Even after 100 datasets, OLS averages to the wrong number.
IV is unbiased on average. The teal histogram centres on β. Its mean is correct even though any single estimate can be far off.
IV pays for unbiasedness with variance. The teal histogram is wider than the orange one. That is the IV bargain: trade bias for variance. When the instrument is strong (large |π|), the trade-off is favourable; when it is weak, IV's variance explodes.
Slide π toward 0 and rerun: the teal histogram becomes huge and skewed — the classic weak-instrument pathology that Tables 6–7 of the post warn about.

IV in Stata — Interactive Lab

Why IV? Why settler mortality?

The IV causal diagram

First-Stage Lab

OLS vs IV Showdown

Forest Plot

Three takeaways the app foregrounds

1. IV > OLS by 81%

2. Weak instruments break IV

3. 2SLS = RF ÷ FS

Glossary (open a card if a term is unfamiliar)

First-Stage Lab — how strong is the instrument?

What to look for

OLS vs IV Showdown — 100 simulations

OLS on the same data

IV (2SLS) on the same data

Distribution of β̂ across 100 simulated datasets

Why does this happen?

The post's forest plot — interactively

Spec families

Estimators

What to look for

Connecting back to Tab 3