Downloads
Each dataset is available as a labeled Stata .dta and its source file.
⇩ Download all data (ZIP)stata_codebook.do
| Dataset | Grain | Rows | Stata | Source |
|---|---|---|---|---|
maketable1 | country (cross-section) | 376 × 11 | maketable1.dta | maketable1.dta |
maketable2 | country (cross-section) | 163 × 9 | maketable2.dta | maketable2.dta |
maketable3 | country (cross-section) | 376 × 11 | maketable3.dta | maketable3.dta |
maketable4 | country (cross-section) | 163 × 10 | maketable4.dta | maketable4.dta |
maketable5 | country (cross-section) | 163 × 12 | maketable5.dta | maketable5.dta |
maketable6 | country (cross-section) | 163 × 29 | maketable6.dta | maketable6.dta |
maketable7 | country (cross-section) | 163 × 15 | maketable7.dta | maketable7.dta |
maketable8 | country (cross-section) | 163 × 12 | maketable8.dta | maketable8.dta |
Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.
Load directly in code
Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.
Stata
* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_iv/data/"
use "${BASE}maketable1.dta", clear
describe
notesPython
!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_iv/data/"
df = pd.read_stata(BASE + "maketable1.dta")
# load every dataset at once
files = ["maketable1", "maketable2", "maketable3", "maketable4", "maketable5", "maketable6", "maketable7", "maketable8"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}
# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "maketable1.dta", "maketable1.dta")
df, meta = pyreadstat.read_dta("maketable1.dta")Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb
R
# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_iv/data/"
df <- read_dta(paste0(BASE, "maketable1.dta"))Overview & sources
Companion data for a hands-on Stata tutorial that replicates Acemoglu, Johnson and Robinson (2001), The Colonial Origins of Comparative Development. The study instruments modern property-rights institutions (avexpr) with European settler mortality during colonization (logem4) to recover the causal effect of institutions on income (logpgp95) across ex-colonies. The naive OLS slope is 0.522; the two-stage least-squares (2SLS) estimate is 0.944 — about 81% larger — implying that measurement error dominates OLS bias and that institutional reform is roughly twice as valuable as naive regressions suggest. The eight datasets here are AJR's original replication archive: one per table of the paper, each a country cross-section that progressively narrows from the full ~163-country world to the 64-country base sample of ex-colonies with valid settler-mortality data.
shortnam (3-letter country code); there is no time dimension. maketable1 and maketable3 carry the full ~376-row archive; the others hold ~163 rows. The base sample is selected with baseco==1 (and, in the original coding, this flag is missing for non-base countries rather than set to 0). The datasets share a common spine (shortnam, avexpr, logpgp95, logem4) and each adds the specific controls its table needs — OLS controls (Table 2), first-stage/historical-institution variables (Tables 3 & 8), colonial / legal / religion controls (Table 5), geography & climate (Table 6), and modern health channels (Table 7).
Data sources
| Source | Provides | Reference / URL |
|---|---|---|
| Acemoglu, Johnson & Robinson (2001) | All eight datasets (maketable1–maketable8) — the original AJR replication archive, one per table of the paper | Acemoglu, D., Johnson, S., & Robinson, J. A. (2001). The Colonial Origins of Comparative Development: An Empirical Investigation. American Economic Review, 91(5), 1369–1401. https://www.aeaweb.org/articles?id=10.1257/aer.91.5.1369 |
| World Bank / Penn World Table | Log PPP GDP per capita 1995 (logpgp95); Hall–Jones GDP per worker (loghjypl) | Underlying income series compiled by AJR from World Bank and Hall & Jones (1999). |
| Curtin (settler mortality) | European settler/soldier mortality rates used to build extmort4 / logem4 | Mortality figures assembled by AJR from Curtin (1989, 1998) and related historical sources; see Albouy (2012) on imputation. |
| Method references | Estimators, weak-instrument and overidentification diagnostics | Imbens & Angrist (1994, LATE); Staiger & Stock (1997); Stock & Yogo (2005); Olea & Pflueger (2013); Baum, Schaffer & Stillman (2007, ivreg2). |
Cite this data
Please cite this dataset as follows.
APA
Mendez, C. (2026). Do Institutions Cause Prosperity? An IV Tutorial in Stata (AJR 2001 replication data) [Data set]. https://carlos-mendez.org/post/stata_iv/
Acemoglu, D., Johnson, S., & Robinson, J. A. (2001). The Colonial Origins of Comparative Development: An Empirical Investigation. American Economic Review, 91(5), 1369–1401. Albouy, D. Y. (2012). The Colonial Origins of Comparative Development: An Investigation of the Settler Mortality Data. American Economic Review, 102(6), 3059–3076. Imbens, G. W., & Angrist, J. D. (1994). Identification and Estimation of Local Average Treatment Effects. Econometrica, 62(2), 467–475.BibTeX
@misc{mendez2026stataiv,
author = {Mendez, Carlos},
title = {Do Institutions Cause Prosperity? An IV Tutorial in Stata (AJR 2001 replication data)},
year = {2026},
howpublished = {\url{https://carlos-mendez.org/post/stata_iv/}},
note = {Data set}
}
@article{ajr2001colonial,
author = {Acemoglu, Daron and Johnson, Simon and Robinson, James A.},
title = {The Colonial Origins of Comparative Development: An Empirical Investigation},
journal = {American Economic Review},
volume = {91}, number = {5}, pages = {1369--1401}, year = {2001}
}
@article{albouy2012colonial,
author = {Albouy, David Y.},
title = {The Colonial Origins of Comparative Development: An Investigation of the Settler Mortality Data},
journal = {American Economic Review},
volume = {102}, number = {6}, pages = {3059--3076}, year = {2012}
}
@article{imbens1994late,
author = {Imbens, Guido W. and Angrist, Joshua D.},
title = {Identification and Estimation of Local Average Treatment Effects},
journal = {Econometrica},
volume = {62}, number = {2}, pages = {467--475}, year = {1994}
}Variable explorer search & filter all 55 variables
Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.
| Variable | Type | Distribution | Label | Definition | Units | In files | Source |
|---|---|---|---|---|---|---|---|
africa# | dummy | Africa dummy | 1 if the country is in Africa (continent control). | 0/1 | maketable2, maketable4, maketable7 | AJR (2001) | |
asia# | dummy | Asia dummy | 1 if the country is in Asia (continent control). | 0/1 | maketable2, maketable4, maketable7 | AJR (2001) | |
avelf# | continuous | Ethnolinguistic fractionalization | Average ethnolinguistic fractionalization index (probability two random people differ). | 0-1 | maketable6 | AJR (2001) | |
avexpr# | continuous | Avg protection against expropriation risk | Average index of protection against expropriation of private investment, ~1985-95 (the endogenous regressor X — 'modern institutions'). | 0-10 scale | maketable1, maketable2, maketable3, maketable4, maketable5, maketable6, maketable7, maketable8 | AJR (2001), from Political Risk Services | |
baseco# | dummy | Base-sample flag (1 = AJR base sample) | Indicator for the 64-country base sample of ex-colonies with valid settler-mortality data. | 1/missing | maketable1, maketable2, maketable4, maketable5, maketable6, maketable7, maketable8 | AJR (2001) | |
catho80# | continuous | Catholic share of population, 1980 (%) | Percent of population Catholic in 1980 (religion-composition control). | % (0-100) | maketable5 | WCE via AJR | |
cons00a# | continuous | Constraint on executive in 1900 | Polity constraint-on-the-executive score in 1900 (historical-institution / alternative instrument). | 1-7 scale | maketable1, maketable3, maketable8 | Polity via AJR | |
cons1# | continuous | Constraint on executive, 1st year of independence | Polity constraint-on-the-executive score in the country's first year of independence. | 1-7 scale | maketable1, maketable3, maketable8 | Polity via AJR | |
democ00a# | continuous | Democracy in 1900 | Polity democracy score in 1900 (a historical-institution / alternative-instrument variable). | 0-10 scale | maketable1, maketable3, maketable8 | Polity via AJR | |
democ1# | continuous | Democracy, 1st year of independence | Polity democracy score in the country's first year of independence (alternative instrument). | 0-10 scale | maketable8 | Polity via AJR | |
deslow# | dummy | Desert (low) soil dummy | Soil/climate-zone indicator (1 if low-latitude desert). | 0/1 | maketable6 | AJR (2001) | |
desmid# | dummy | Desert (mid) soil dummy | Soil/climate-zone indicator (1 if mid-latitude desert). | 0/1 | maketable6 | AJR (2001) | |
drystep# | dummy | Dry-steppe soil dummy | Soil/climate-zone indicator (1 if dry steppe). | 0/1 | maketable6 | AJR (2001) | |
drywint# | dummy | Dry-winter climate dummy | Soil/climate-zone indicator (1 if dry-winter climate). | 0/1 | maketable6 | AJR (2001) | |
edes1975# | continuous | European descent in 1975 (%) | Percent of the population of European descent in 1975. | % (0-100) | maketable6 | AJR (2001) | |
euro1900# | continuous | European settlers in 1900 (% of pop.) | Share of the population that was of European descent in 1900 (also used as an alternative instrument). | % (0-100) | maketable1, maketable3, maketable8 | AJR (2001) | |
excolony# | dummy | Ex-colony dummy | 1 if the country was ever a European colony (FLOPS definition). | 0/1 | maketable3 | AJR (2001) | |
extmort4# | continuous | Corrected settler mortality rate | Annualized European settler/soldier mortality during colonization (raw level behind logem4). | deaths per 1,000 | maketable1, maketable3 | AJR (2001) | |
f_brit# | dummy | British colony dummy | 1 if the country was a British colony (colonizer-identity control). | 0/1 | maketable5 | AJR (2001) | |
f_french# | dummy | French colony dummy | 1 if the country was a French colony (colonizer-identity control). | 0/1 | maketable5 | AJR (2001) | |
goldm# | continuous | Gold mineral measure | First of five mineral-resource measures (gold). | resource units | maketable6 | AJR (2001) | |
humid1# | continuous | Humidity indicator 1 (of 4) | First of four humidity indices used as climate controls. | index | maketable6 | AJR (2001) | |
humid2# | continuous | Humidity indicator 2 (of 4) | Second of four humidity indices (climate control). | index | maketable6 | AJR (2001) | |
humid3# | continuous | Humidity indicator 3 (of 4) | Third of four humidity indices (climate control). | index | maketable6 | AJR (2001) | |
humid4# | continuous | Humidity indicator 4 (of 4) | Fourth of four humidity indices (climate control). | index | maketable6 | AJR (2001) | |
imr95# | continuous | Infant mortality rate, 1995 | Infant mortality rate in 1995, deaths per 1,000 live births (modern health channel). | per 1,000 births | maketable7 | AJR (2001) | |
indtime# | continuous | Years independent (1995 - first year) | Number of years a country had been independent by 1995. | years | maketable3, maketable8 | AJR (2001) | |
iron# | continuous | Iron mineral measure | Iron mineral-resource measure (geology control). | resource units | maketable6 | AJR (2001) | |
landlock# | dummy | Landlocked dummy | 1 if the country is landlocked (geography control). | 0/1 | maketable6 | AJR (2001) | |
lat_abst# | continuous | Absolute latitude (scaled 0-1) | Absolute latitude of the capital, divided by 90 (a geography control). | 0-1 | maketable2, maketable3, maketable4, maketable5, maketable6, maketable7, maketable8 | AJR (2001) | |
latabs# | continuous | Absolute latitude (McArthur-Sachs) | Absolute latitude (0-1 scaled), McArthur-Sachs version, used as a geography instrument. | 0-1 | maketable7 | McArthur & Sachs via AJR | |
leb95# | continuous | Life expectancy at birth, 1995 | Life expectancy at birth in 1995 (modern health channel). | years | maketable7 | AJR (2001) | |
logem4# | continuous | Log settler mortality | Natural log of European settler/soldier mortality during early colonization (the instrument Z). | log deaths per 1,000 | maketable1, maketable3, maketable4, maketable5, maketable6, maketable7, maketable8 | AJR (2001), from Curtin and related sources | |
loghjypl# | continuous | Log GDP per worker (Hall-Jones) | Natural log of GDP per worker from Hall & Jones (1999); an alternative income measure. | log US$ | maketable1, maketable2, maketable4 | Hall & Jones (1999) via AJR | |
logpgp95# | continuous | Log GDP per capita, PPP, 1995 | Natural log of 1995 PPP GDP per capita (the outcome Y). | log US$ (PPP) | maketable1, maketable2, maketable3, maketable4, maketable5, maketable6, maketable7, maketable8 | AJR (2001), from World Bank | |
lt100km# | continuous | Share of land within 100km of coast | Fraction of territory within 100 km of the coast (geography instrument). | 0-1 | maketable7 | McArthur & Sachs via AJR | |
malfal94# | continuous | Falciparum malaria index, 1994 | Index of falciparum-malaria prevalence in 1994 (modern health channel). | 0-1 | maketable7 | AJR (2001), from Gallup-Sachs | |
meantemp# | continuous | Mean temperature (McArthur-Sachs) | Mean annual temperature, used as a geography instrument in overidentified specs. | deg C | maketable7 | McArthur & Sachs via AJR | |
muslim80# | continuous | Muslim share of population, 1980 (%) | Percent of population Muslim in 1980 (religion-composition control). | % (0-100) | maketable5 | WCE via AJR | |
no_cpm80# | continuous | Other-religion share, 1980 (%) | 100 minus Catholic, Protestant, and Muslim shares in 1980 (residual religion-composition control). | % (0-100) | maketable5 | WCE via AJR | |
oilres# | continuous | Oil-reserves measure | Oil-reserves measure (resource control). | resource units | maketable6 | AJR (2001) | |
other# | dummy | Other-continent dummy | 1 if the country is not in Asia, Africa, or the Americas (continent control). | 0/1 | maketable2 | AJR (2001) | |
rich4# | dummy | Neo-Europe dummy | 1 for the 'neo-Europes' (e.g. Australia, Canada, New Zealand, USA); dropped in some robustness columns. | 0/1 | maketable4 | AJR (2001) | |
shortnam# | identifier | – | Country code (3-letter) | Three-letter country identifier; the row key in every dataset. | string | maketable1, maketable2, maketable4, maketable5, maketable6, maketable7, maketable8 | AJR (2001) |
silv# | continuous | Silver mineral measure | Silver mineral-resource measure (geology control). | resource units | maketable6 | AJR (2001) | |
sjlofr# | dummy | French legal origin dummy | 1 if the country has French legal origin (legal-tradition control). | 0/1 | maketable5 | AJR (2001), legal-origins literature | |
steplow# | dummy | Steppe (low) soil dummy | First of six soil/climate-zone indicators (1 if low-latitude steppe). | 0/1 | maketable6 | AJR (2001) | |
stepmid# | dummy | Steppe (mid) soil dummy | Soil/climate-zone indicator (1 if mid-latitude steppe). | 0/1 | maketable6 | AJR (2001) | |
temp1# | continuous | Temperature indicator 1 (of 5) | First of five temperature indices used as climate controls. | index (deg C) | maketable6 | AJR (2001) | |
temp2# | continuous | Temperature indicator 2 (of 5) | Second of five temperature indices (climate control). | index (deg C) | maketable6 | AJR (2001) | |
temp3# | continuous | Temperature indicator 3 (of 5) | Third of five temperature indices (climate control). | index (deg C) | maketable6 | AJR (2001) | |
temp4# | continuous | Temperature indicator 4 (of 5) | Fourth of five temperature indices (climate control). | index (deg C) | maketable6 | AJR (2001) | |
temp5# | continuous | Temperature indicator 5 (of 5) | Fifth of five temperature indices (climate control). | index (deg C) | maketable6 | AJR (2001) | |
yellow# | dummy | Yellow-fever vector dummy | 1 if the yellow-fever vector is present today (disease-environment control). | 0/1 | maketable7 | AJR (2001) | |
zinc# | continuous | Zinc mineral measure | Zinc mineral-resource measure (geology control). | resource units | maketable6 | AJR (2001) |
Cross-file variable index
Which file each variable appears in (● = present).
| Variable | maketable1 | maketable2 | maketable3 | maketable4 | maketable5 | maketable6 | maketable7 | maketable8 |
|---|---|---|---|---|---|---|---|---|
africa | ● | ● | ● | |||||
asia | ● | ● | ● | |||||
avelf | ● | |||||||
avexpr | ● | ● | ● | ● | ● | ● | ● | ● |
baseco | ● | ● | ● | ● | ● | ● | ● | |
catho80 | ● | |||||||
cons00a | ● | ● | ● | |||||
cons1 | ● | ● | ● | |||||
democ00a | ● | ● | ● | |||||
democ1 | ● | |||||||
deslow | ● | |||||||
desmid | ● | |||||||
drystep | ● | |||||||
drywint | ● | |||||||
edes1975 | ● | |||||||
euro1900 | ● | ● | ● | |||||
excolony | ● | |||||||
extmort4 | ● | ● | ||||||
f_brit | ● | |||||||
f_french | ● | |||||||
goldm | ● | |||||||
humid1 | ● | |||||||
humid2 | ● | |||||||
humid3 | ● | |||||||
humid4 | ● | |||||||
imr95 | ● | |||||||
indtime | ● | ● | ||||||
iron | ● | |||||||
landlock | ● | |||||||
lat_abst | ● | ● | ● | ● | ● | ● | ● | |
latabs | ● | |||||||
leb95 | ● | |||||||
logem4 | ● | ● | ● | ● | ● | ● | ● | |
loghjypl | ● | ● | ● | |||||
logpgp95 | ● | ● | ● | ● | ● | ● | ● | ● |
lt100km | ● | |||||||
malfal94 | ● | |||||||
meantemp | ● | |||||||
muslim80 | ● | |||||||
no_cpm80 | ● | |||||||
oilres | ● | |||||||
other | ● | |||||||
rich4 | ● | |||||||
shortnam | ● | ● | ● | ● | ● | ● | ● | |
silv | ● | |||||||
sjlofr | ● | |||||||
steplow | ● | |||||||
stepmid | ● | |||||||
temp1 | ● | |||||||
temp2 | ● | |||||||
temp3 | ● | |||||||
temp4 | ● | |||||||
temp5 | ● | |||||||
yellow | ● | |||||||
zinc | ● |
Construction & formulas
The tutorial estimates the causal effect of institutions on income via
two-stage least squares (2SLS). The structural model has an endogenous
regressor X = avexpr correlated with the error, an outcome Y = logpgp95,
and an instrument Z = logem4:
- Structural equation:
Y_i = α + β·X_i + U_i, withCov(X_i, U_i) ≠ 0(this endogeneity is what biases OLS). - First stage (relevance):
X_i = π_0 + π_1·Z_i + v_i— settler mortality must move institutions (hereπ_1 = −0.607, Kleibergen–Paap F = 16.32). - Reduced form:
Y_i = γ_0 + γ_1·Z_i + e_i— the total effect of the instrument on the outcome (hereγ_1 ≈ −0.573). - 2SLS estimator (just-identified):
β̂_2SLS = Cov(Y,Z) / Cov(X,Z) = γ̂_RF / π̂_FS = −0.573 / −0.607 = 0.944. - Exclusion restriction:
ZaffectsYonly throughX— settler mortality circa 1700 influences 1995 GDP only by shaping inherited institutions. This is untestable in a just-identified model. - Overidentification (Hansen J): with more instruments than endogenous regressors, the J-test checks whether the instruments agree on one causal effect; non-rejection is consistent with (but does not prove) joint exogeneity.
The 2SLS estimate identifies a Local Average Treatment Effect (LATE) for "complier" countries whose institutions would respond to a change in settler mortality (Imbens & Angrist 1994); under constant effects, LATE = ATE.
The datasets
Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.
expand to search (Ctrl/⌘+F) or print across all datasets
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
shortnam identifier | Country code (3-letter) | Three-letter country identifier; the row key in every dataset. | AJR country abbreviation (e.g. AUS, USA, NGA); used for scatter point labels (mlabel). | string | AJR (2001) | all files |
euro1900 continuous | European settlers in 1900 (% of pop.) | Share of the population that was of European descent in 1900 (also used as an alternative instrument). | Percent European in 1900, AJR. | % (0-100) | AJR (2001) | tables 1, 3, 8 |
avexpr continuous | Avg protection against expropriation risk | Average index of protection against expropriation of private investment, ~1985-95 (the endogenous regressor X — 'modern institutions'). | Mean over available years of the Political Risk Services expropriation-risk index, scaled 0 (worst) to 10 (best). | 0-10 scale | AJR (2001), from Political Risk Services | all files |
logpgp95 continuous | Log GDP per capita, PPP, 1995 | Natural log of 1995 PPP GDP per capita (the outcome Y). | log of World Bank PPP GDP per capita, 1995. | log US$ (PPP) | AJR (2001), from World Bank | all files |
cons1 continuous | Constraint on executive, 1st year of independence | Polity constraint-on-the-executive score in the country's first year of independence. | Polity index, 1 (low) to 7 (high) constraint. | 1-7 scale | Polity via AJR | tables 1, 3, 8 |
democ00a continuous | Democracy in 1900 | Polity democracy score in 1900 (a historical-institution / alternative-instrument variable). | Polity democracy index, 0 (low) to 10 (high). | 0-10 scale | Polity via AJR | tables 1, 3, 8 |
cons00a continuous | Constraint on executive in 1900 | Polity constraint-on-the-executive score in 1900 (historical-institution / alternative instrument). | Polity index, 1 (low) to 7 (high) constraint. | 1-7 scale | Polity via AJR | tables 1, 3, 8 |
extmort4 continuous | Corrected settler mortality rate | Annualized European settler/soldier mortality during colonization (raw level behind logem4). | Deaths per 1,000 mean strength, corrected/standardized by AJR (≈2.55 to 2,940). | deaths per 1,000 | AJR (2001) | tables 1, 3 |
logem4 continuous | Log settler mortality | Natural log of European settler/soldier mortality during early colonization (the instrument Z). | log of corrected annualized deaths per 1,000 (extmort4). | log deaths per 1,000 | AJR (2001), from Curtin and related sources | tables 1, 3, 4, 5, 6, 7, 8 |
loghjypl continuous | Log GDP per worker (Hall-Jones) | Natural log of GDP per worker from Hall & Jones (1999); an alternative income measure. | log of Hall–Jones output per worker. | log US$ | Hall & Jones (1999) via AJR | tables 1, 2, 4 |
baseco dummy | Base-sample flag (1 = AJR base sample) | Indicator for the 64-country base sample of ex-colonies with valid settler-mortality data. | 1 for base-sample countries; MISSING (not 0) otherwise. Restrict with keep if baseco==1. | 1/missing | AJR (2001) | tables 1, 2, 4, 5, 6, 7, 8 |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
shortnam | – | 100% | 376 | 254 | — | — | — | — | — |
euro1900 | 44% | 166 | 27 | 0 | 30.10 | 3.00 | 100.0 | 41.86 | |
avexpr | 34% | 129 | 84 | 1.64 | 6.99 | 7.00 | 10.00 | 1.83 | |
logpgp95 | 43% | 162 | 151 | 6.11 | 8.30 | 8.30 | 10.29 | 1.07 | |
cons1 | 24% | 92 | 6 | 1.00 | 3.63 | 3.00 | 7.00 | 2.39 | |
democ00a | 24% | 90 | 9 | 0 | 1.12 | 0 | 10.00 | 2.54 | |
cons00a | 26% | 96 | 6 | 1.00 | 1.85 | 1.00 | 7.00 | 1.79 | |
extmort4 | 25% | 94 | 44 | 2.55 | 215.0 | 85.00 | 2,940.0 | 398.1 | |
logem4 | 24% | 89 | 44 | 0.936 | 4.61 | 4.44 | 7.99 | 1.30 | |
loghjypl | 34% | 127 | 108 | -3.54 | -1.71 | -1.55 | 0 | 1.08 | |
baseco | 17% | 64 | 1 | 1.00 | 1.00 | 1.00 | 1.00 | 0 |
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
shortnam identifier | Country code (3-letter) | Three-letter country identifier; the row key in every dataset. | AJR country abbreviation (e.g. AUS, USA, NGA); used for scatter point labels (mlabel). | string | AJR (2001) | all files |
africa dummy | Africa dummy | 1 if the country is in Africa (continent control). | Binary continent indicator. | 0/1 | AJR (2001) | tables 2, 4, 7 |
lat_abst continuous | Absolute latitude (scaled 0-1) | Absolute latitude of the capital, divided by 90 (a geography control). | abs(latitude of capital) / 90. | 0-1 | AJR (2001) | tables 2, 3, 4, 5, 6, 7, 8 |
avexpr continuous | Avg protection against expropriation risk | Average index of protection against expropriation of private investment, ~1985-95 (the endogenous regressor X — 'modern institutions'). | Mean over available years of the Political Risk Services expropriation-risk index, scaled 0 (worst) to 10 (best). | 0-10 scale | AJR (2001), from Political Risk Services | all files |
logpgp95 continuous | Log GDP per capita, PPP, 1995 | Natural log of 1995 PPP GDP per capita (the outcome Y). | log of World Bank PPP GDP per capita, 1995. | log US$ (PPP) | AJR (2001), from World Bank | all files |
other dummy | Other-continent dummy | 1 if the country is not in Asia, Africa, or the Americas (continent control). | Binary continent indicator. | 0/1 | AJR (2001) | table 2 |
asia dummy | Asia dummy | 1 if the country is in Asia (continent control). | Binary continent indicator. | 0/1 | AJR (2001) | tables 2, 4, 7 |
loghjypl continuous | Log GDP per worker (Hall-Jones) | Natural log of GDP per worker from Hall & Jones (1999); an alternative income measure. | log of Hall–Jones output per worker. | log US$ | Hall & Jones (1999) via AJR | tables 1, 2, 4 |
baseco dummy | Base-sample flag (1 = AJR base sample) | Indicator for the 64-country base sample of ex-colonies with valid settler-mortality data. | 1 for base-sample countries; MISSING (not 0) otherwise. Restrict with keep if baseco==1. | 1/missing | AJR (2001) | tables 1, 2, 4, 5, 6, 7, 8 |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
shortnam | – | 100% | 163 | 163 | — | — | — | — | — |
africa | 100% | 163 | 2 | 0 | 0.307 | 0 | 1.00 | 0.463 | |
lat_abst | 99% | 162 | 96 | 0 | 0.296 | 0.267 | 0.722 | 0.190 | |
avexpr | 74% | 121 | 80 | 1.64 | 7.07 | 7.05 | 10.00 | 1.80 | |
logpgp95 | 91% | 148 | 138 | 6.11 | 8.30 | 8.27 | 10.29 | 1.11 | |
other | 100% | 163 | 2 | 0 | 0.025 | 0 | 1.00 | 0.155 | |
asia | 100% | 163 | 2 | 0 | 0.258 | 0 | 1.00 | 0.439 | |
loghjypl | 75% | 123 | 104 | -3.54 | -1.73 | -1.56 | 0 | 1.08 | |
baseco | 39% | 64 | 1 | 1.00 | 1.00 | 1.00 | 1.00 | 0 |
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
lat_abst continuous | Absolute latitude (scaled 0-1) | Absolute latitude of the capital, divided by 90 (a geography control). | abs(latitude of capital) / 90. | 0-1 | AJR (2001) | tables 2, 3, 4, 5, 6, 7, 8 |
euro1900 continuous | European settlers in 1900 (% of pop.) | Share of the population that was of European descent in 1900 (also used as an alternative instrument). | Percent European in 1900, AJR. | % (0-100) | AJR (2001) | tables 1, 3, 8 |
excolony dummy | Ex-colony dummy | 1 if the country was ever a European colony (FLOPS definition). | Binary indicator from AJR/FLOPS coding. | 0/1 | AJR (2001) | table 3 |
avexpr continuous | Avg protection against expropriation risk | Average index of protection against expropriation of private investment, ~1985-95 (the endogenous regressor X — 'modern institutions'). | Mean over available years of the Political Risk Services expropriation-risk index, scaled 0 (worst) to 10 (best). | 0-10 scale | AJR (2001), from Political Risk Services | all files |
logpgp95 continuous | Log GDP per capita, PPP, 1995 | Natural log of 1995 PPP GDP per capita (the outcome Y). | log of World Bank PPP GDP per capita, 1995. | log US$ (PPP) | AJR (2001), from World Bank | all files |
cons1 continuous | Constraint on executive, 1st year of independence | Polity constraint-on-the-executive score in the country's first year of independence. | Polity index, 1 (low) to 7 (high) constraint. | 1-7 scale | Polity via AJR | tables 1, 3, 8 |
indtime continuous | Years independent (1995 - first year) | Number of years a country had been independent by 1995. | 1995 minus the first year of independence. | years | AJR (2001) | tables 3, 8 |
democ00a continuous | Democracy in 1900 | Polity democracy score in 1900 (a historical-institution / alternative-instrument variable). | Polity democracy index, 0 (low) to 10 (high). | 0-10 scale | Polity via AJR | tables 1, 3, 8 |
cons00a continuous | Constraint on executive in 1900 | Polity constraint-on-the-executive score in 1900 (historical-institution / alternative instrument). | Polity index, 1 (low) to 7 (high) constraint. | 1-7 scale | Polity via AJR | tables 1, 3, 8 |
extmort4 continuous | Corrected settler mortality rate | Annualized European settler/soldier mortality during colonization (raw level behind logem4). | Deaths per 1,000 mean strength, corrected/standardized by AJR (≈2.55 to 2,940). | deaths per 1,000 | AJR (2001) | tables 1, 3 |
logem4 continuous | Log settler mortality | Natural log of European settler/soldier mortality during early colonization (the instrument Z). | log of corrected annualized deaths per 1,000 (extmort4). | log deaths per 1,000 | AJR (2001), from Curtin and related sources | tables 1, 3, 4, 5, 6, 7, 8 |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
lat_abst | 45% | 170 | 101 | 0 | 0.294 | 0.256 | 0.722 | 0.189 | |
euro1900 | 44% | 166 | 27 | 0 | 30.10 | 3.00 | 100.0 | 41.86 | |
excolony | 58% | 218 | 2 | 0 | 0.523 | 1.00 | 1.00 | 0.501 | |
avexpr | 34% | 129 | 84 | 1.64 | 6.99 | 7.00 | 10.00 | 1.83 | |
logpgp95 | 43% | 162 | 151 | 6.11 | 8.30 | 8.30 | 10.29 | 1.07 | |
cons1 | 24% | 92 | 6 | 1.00 | 3.63 | 3.00 | 7.00 | 2.39 | |
indtime | 24% | 92 | 47 | 5.00 | 77.66 | 37.00 | 195.0 | 61.49 | |
democ00a | 24% | 90 | 9 | 0 | 1.12 | 0 | 10.00 | 2.54 | |
cons00a | 26% | 96 | 6 | 1.00 | 1.85 | 1.00 | 7.00 | 1.79 | |
extmort4 | 25% | 94 | 44 | 2.55 | 215.0 | 85.00 | 2,940.0 | 398.1 | |
logem4 | 24% | 89 | 44 | 0.936 | 4.61 | 4.44 | 7.99 | 1.30 |
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
shortnam identifier | Country code (3-letter) | Three-letter country identifier; the row key in every dataset. | AJR country abbreviation (e.g. AUS, USA, NGA); used for scatter point labels (mlabel). | string | AJR (2001) | all files |
africa dummy | Africa dummy | 1 if the country is in Africa (continent control). | Binary continent indicator. | 0/1 | AJR (2001) | tables 2, 4, 7 |
lat_abst continuous | Absolute latitude (scaled 0-1) | Absolute latitude of the capital, divided by 90 (a geography control). | abs(latitude of capital) / 90. | 0-1 | AJR (2001) | tables 2, 3, 4, 5, 6, 7, 8 |
rich4 dummy | Neo-Europe dummy | 1 for the 'neo-Europes' (e.g. Australia, Canada, New Zealand, USA); dropped in some robustness columns. | Binary indicator from AJR. | 0/1 | AJR (2001) | table 4 |
avexpr continuous | Avg protection against expropriation risk | Average index of protection against expropriation of private investment, ~1985-95 (the endogenous regressor X — 'modern institutions'). | Mean over available years of the Political Risk Services expropriation-risk index, scaled 0 (worst) to 10 (best). | 0-10 scale | AJR (2001), from Political Risk Services | all files |
logpgp95 continuous | Log GDP per capita, PPP, 1995 | Natural log of 1995 PPP GDP per capita (the outcome Y). | log of World Bank PPP GDP per capita, 1995. | log US$ (PPP) | AJR (2001), from World Bank | all files |
logem4 continuous | Log settler mortality | Natural log of European settler/soldier mortality during early colonization (the instrument Z). | log of corrected annualized deaths per 1,000 (extmort4). | log deaths per 1,000 | AJR (2001), from Curtin and related sources | tables 1, 3, 4, 5, 6, 7, 8 |
asia dummy | Asia dummy | 1 if the country is in Asia (continent control). | Binary continent indicator. | 0/1 | AJR (2001) | tables 2, 4, 7 |
loghjypl continuous | Log GDP per worker (Hall-Jones) | Natural log of GDP per worker from Hall & Jones (1999); an alternative income measure. | log of Hall–Jones output per worker. | log US$ | Hall & Jones (1999) via AJR | tables 1, 2, 4 |
baseco dummy | Base-sample flag (1 = AJR base sample) | Indicator for the 64-country base sample of ex-colonies with valid settler-mortality data. | 1 for base-sample countries; MISSING (not 0) otherwise. Restrict with keep if baseco==1. | 1/missing | AJR (2001) | tables 1, 2, 4, 5, 6, 7, 8 |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
shortnam | – | 100% | 163 | 163 | — | — | — | — | — |
africa | 100% | 163 | 2 | 0 | 0.307 | 0 | 1.00 | 0.463 | |
lat_abst | 99% | 162 | 96 | 0 | 0.296 | 0.267 | 0.722 | 0.190 | |
rich4 | 100% | 163 | 2 | 0 | 0.025 | 0 | 1.00 | 0.155 | |
avexpr | 74% | 121 | 80 | 1.64 | 7.07 | 7.05 | 10.00 | 1.80 | |
logpgp95 | 91% | 148 | 138 | 6.11 | 8.30 | 8.27 | 10.29 | 1.11 | |
logem4 | 53% | 87 | 44 | 0.936 | 4.60 | 4.44 | 7.99 | 1.30 | |
asia | 100% | 163 | 2 | 0 | 0.258 | 0 | 1.00 | 0.439 | |
loghjypl | 75% | 123 | 104 | -3.54 | -1.73 | -1.56 | 0 | 1.08 | |
baseco | 39% | 64 | 1 | 1.00 | 1.00 | 1.00 | 1.00 | 0 |
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
shortnam identifier | Country code (3-letter) | Three-letter country identifier; the row key in every dataset. | AJR country abbreviation (e.g. AUS, USA, NGA); used for scatter point labels (mlabel). | string | AJR (2001) | all files |
catho80 continuous | Catholic share of population, 1980 (%) | Percent of population Catholic in 1980 (religion-composition control). | Percent Catholic, World Christian Encyclopedia 1995. | % (0-100) | WCE via AJR | table 5 |
muslim80 continuous | Muslim share of population, 1980 (%) | Percent of population Muslim in 1980 (religion-composition control). | Percent Muslim, World Christian Encyclopedia 1995. | % (0-100) | WCE via AJR | table 5 |
lat_abst continuous | Absolute latitude (scaled 0-1) | Absolute latitude of the capital, divided by 90 (a geography control). | abs(latitude of capital) / 90. | 0-1 | AJR (2001) | tables 2, 3, 4, 5, 6, 7, 8 |
no_cpm80 continuous | Other-religion share, 1980 (%) | 100 minus Catholic, Protestant, and Muslim shares in 1980 (residual religion-composition control). | 100 - (Catholic + Protestant + Muslim) percent, 1980. | % (0-100) | WCE via AJR | table 5 |
f_brit dummy | British colony dummy | 1 if the country was a British colony (colonizer-identity control). | Binary indicator (FLOPS expansion). | 0/1 | AJR (2001) | table 5 |
f_french dummy | French colony dummy | 1 if the country was a French colony (colonizer-identity control). | Binary indicator (FLOPS expansion). | 0/1 | AJR (2001) | table 5 |
avexpr continuous | Avg protection against expropriation risk | Average index of protection against expropriation of private investment, ~1985-95 (the endogenous regressor X — 'modern institutions'). | Mean over available years of the Political Risk Services expropriation-risk index, scaled 0 (worst) to 10 (best). | 0-10 scale | AJR (2001), from Political Risk Services | all files |
sjlofr dummy | French legal origin dummy | 1 if the country has French legal origin (legal-tradition control). | Recoded French-legal-origin indicator. | 0/1 | AJR (2001), legal-origins literature | table 5 |
logpgp95 continuous | Log GDP per capita, PPP, 1995 | Natural log of 1995 PPP GDP per capita (the outcome Y). | log of World Bank PPP GDP per capita, 1995. | log US$ (PPP) | AJR (2001), from World Bank | all files |
logem4 continuous | Log settler mortality | Natural log of European settler/soldier mortality during early colonization (the instrument Z). | log of corrected annualized deaths per 1,000 (extmort4). | log deaths per 1,000 | AJR (2001), from Curtin and related sources | tables 1, 3, 4, 5, 6, 7, 8 |
baseco dummy | Base-sample flag (1 = AJR base sample) | Indicator for the 64-country base sample of ex-colonies with valid settler-mortality data. | 1 for base-sample countries; MISSING (not 0) otherwise. Restrict with keep if baseco==1. | 1/missing | AJR (2001) | tables 1, 2, 4, 5, 6, 7, 8 |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
shortnam | – | 100% | 163 | 163 | — | — | — | — | — |
catho80 | 99% | 162 | 114 | 0 | 30.04 | 10.60 | 97.30 | 35.61 | |
muslim80 | 99% | 162 | 95 | 0 | 25.29 | 2.05 | 99.80 | 36.88 | |
lat_abst | 99% | 162 | 96 | 0 | 0.296 | 0.267 | 0.722 | 0.190 | |
no_cpm80 | 98% | 160 | 154 | 0.100 | 32.71 | 21.00 | 100.0 | 32.19 | |
f_brit | 99% | 162 | 2 | 0 | 0.309 | 0 | 1.00 | 0.463 | |
f_french | 99% | 162 | 2 | 0 | 0.148 | 0 | 1.00 | 0.356 | |
avexpr | 74% | 121 | 80 | 1.64 | 7.07 | 7.05 | 10.00 | 1.80 | |
sjlofr | 99% | 162 | 2 | 0 | 0.469 | 0 | 1.00 | 0.501 | |
logpgp95 | 91% | 148 | 138 | 6.11 | 8.30 | 8.27 | 10.29 | 1.11 | |
logem4 | 53% | 87 | 44 | 0.936 | 4.60 | 4.44 | 7.99 | 1.30 | |
baseco | 39% | 64 | 1 | 1.00 | 1.00 | 1.00 | 1.00 | 0 |
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
shortnam identifier | Country code (3-letter) | Three-letter country identifier; the row key in every dataset. | AJR country abbreviation (e.g. AUS, USA, NGA); used for scatter point labels (mlabel). | string | AJR (2001) | all files |
avelf continuous | Ethnolinguistic fractionalization | Average ethnolinguistic fractionalization index (probability two random people differ). | Average of 5 fractionalization indicators (East-Levine). | 0-1 | AJR (2001) | table 6 |
lat_abst continuous | Absolute latitude (scaled 0-1) | Absolute latitude of the capital, divided by 90 (a geography control). | abs(latitude of capital) / 90. | 0-1 | AJR (2001) | tables 2, 3, 4, 5, 6, 7, 8 |
temp1 continuous | Temperature indicator 1 (of 5) | First of five temperature indices used as climate controls. | AJR temperature indicator (degrees C scale). | index (deg C) | AJR (2001) | table 6 |
temp2 continuous | Temperature indicator 2 (of 5) | Second of five temperature indices (climate control). | AJR temperature indicator. | index (deg C) | AJR (2001) | table 6 |
temp3 continuous | Temperature indicator 3 (of 5) | Third of five temperature indices (climate control). | AJR temperature indicator. | index (deg C) | AJR (2001) | table 6 |
temp4 continuous | Temperature indicator 4 (of 5) | Fourth of five temperature indices (climate control). | AJR temperature indicator. | index (deg C) | AJR (2001) | table 6 |
temp5 continuous | Temperature indicator 5 (of 5) | Fifth of five temperature indices (climate control). | AJR temperature indicator. | index (deg C) | AJR (2001) | table 6 |
humid1 continuous | Humidity indicator 1 (of 4) | First of four humidity indices used as climate controls. | AJR humidity indicator. | index | AJR (2001) | table 6 |
humid2 continuous | Humidity indicator 2 (of 4) | Second of four humidity indices (climate control). | AJR humidity indicator. | index | AJR (2001) | table 6 |
humid3 continuous | Humidity indicator 3 (of 4) | Third of four humidity indices (climate control). | AJR humidity indicator. | index | AJR (2001) | table 6 |
humid4 continuous | Humidity indicator 4 (of 4) | Fourth of four humidity indices (climate control). | AJR humidity indicator. | index | AJR (2001) | table 6 |
steplow dummy | Steppe (low) soil dummy | First of six soil/climate-zone indicators (1 if low-latitude steppe). | Binary soil/climate-zone indicator. | 0/1 | AJR (2001) | table 6 |
deslow dummy | Desert (low) soil dummy | Soil/climate-zone indicator (1 if low-latitude desert). | Binary soil/climate-zone indicator. | 0/1 | AJR (2001) | table 6 |
stepmid dummy | Steppe (mid) soil dummy | Soil/climate-zone indicator (1 if mid-latitude steppe). | Binary soil/climate-zone indicator. | 0/1 | AJR (2001) | table 6 |
desmid dummy | Desert (mid) soil dummy | Soil/climate-zone indicator (1 if mid-latitude desert). | Binary soil/climate-zone indicator. | 0/1 | AJR (2001) | table 6 |
drystep dummy | Dry-steppe soil dummy | Soil/climate-zone indicator (1 if dry steppe). | Binary soil/climate-zone indicator. | 0/1 | AJR (2001) | table 6 |
drywint dummy | Dry-winter climate dummy | Soil/climate-zone indicator (1 if dry-winter climate). | Binary soil/climate-zone indicator. | 0/1 | AJR (2001) | table 6 |
edes1975 continuous | European descent in 1975 (%) | Percent of the population of European descent in 1975. | Percent European descent, 1975. | % (0-100) | AJR (2001) | table 6 |
avexpr continuous | Avg protection against expropriation risk | Average index of protection against expropriation of private investment, ~1985-95 (the endogenous regressor X — 'modern institutions'). | Mean over available years of the Political Risk Services expropriation-risk index, scaled 0 (worst) to 10 (best). | 0-10 scale | AJR (2001), from Political Risk Services | all files |
logpgp95 continuous | Log GDP per capita, PPP, 1995 | Natural log of 1995 PPP GDP per capita (the outcome Y). | log of World Bank PPP GDP per capita, 1995. | log US$ (PPP) | AJR (2001), from World Bank | all files |
landlock dummy | Landlocked dummy | 1 if the country is landlocked (geography control). | Binary indicator. | 0/1 | AJR (2001) | table 6 |
goldm continuous | Gold mineral measure | First of five mineral-resource measures (gold). | Resource-quantity measure, AJR (not a 0/1 dummy). | resource units | AJR (2001) | table 6 |
iron continuous | Iron mineral measure | Iron mineral-resource measure (geology control). | Resource-quantity measure, AJR. | resource units | AJR (2001) | table 6 |
silv continuous | Silver mineral measure | Silver mineral-resource measure (geology control). | Resource-quantity measure, AJR. | resource units | AJR (2001) | table 6 |
zinc continuous | Zinc mineral measure | Zinc mineral-resource measure (geology control). | Resource-quantity measure, AJR. | resource units | AJR (2001) | table 6 |
oilres continuous | Oil-reserves measure | Oil-reserves measure (resource control). | Resource-quantity measure, AJR. | resource units | AJR (2001) | table 6 |
logem4 continuous | Log settler mortality | Natural log of European settler/soldier mortality during early colonization (the instrument Z). | log of corrected annualized deaths per 1,000 (extmort4). | log deaths per 1,000 | AJR (2001), from Curtin and related sources | tables 1, 3, 4, 5, 6, 7, 8 |
baseco dummy | Base-sample flag (1 = AJR base sample) | Indicator for the 64-country base sample of ex-colonies with valid settler-mortality data. | 1 for base-sample countries; MISSING (not 0) otherwise. Restrict with keep if baseco==1. | 1/missing | AJR (2001) | tables 1, 2, 4, 5, 6, 7, 8 |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
shortnam | – | 100% | 163 | 163 | — | — | — | — | — |
avelf | 80% | 131 | 127 | 0 | 0.360 | 0.275 | 1.00 | 0.306 | |
lat_abst | 99% | 162 | 96 | 0 | 0.296 | 0.267 | 0.722 | 0.190 | |
temp1 | 98% | 159 | 32 | -4.00 | 18.91 | 21.00 | 32.00 | 8.14 | |
temp2 | 98% | 159 | 35 | -6.00 | 24.08 | 26.00 | 40.00 | 9.38 | |
temp3 | 98% | 159 | 24 | 23.00 | 38.25 | 38.00 | 49.00 | 5.07 | |
temp4 | 98% | 159 | 48 | -44.00 | -2.64 | 0 | 20.00 | 16.50 | |
temp5 | 98% | 159 | 25 | 1.00 | 14.22 | 15.00 | 26.00 | 6.59 | |
humid1 | 98% | 159 | 54 | 18.00 | 67.57 | 70.00 | 97.00 | 16.20 | |
humid2 | 98% | 159 | 30 | 54.00 | 86.22 | 88.00 | 98.00 | 7.37 | |
humid3 | 98% | 159 | 56 | 10.00 | 49.08 | 52.00 | 86.00 | 16.00 | |
humid4 | 98% | 159 | 40 | 35.00 | 73.36 | 74.00 | 92.00 | 10.19 | |
steplow | 98% | 159 | 2 | 0 | 0.208 | 0 | 1.00 | 0.407 | |
deslow | 98% | 159 | 2 | 0 | 0.189 | 0 | 1.00 | 0.392 | |
stepmid | 98% | 159 | 2 | 0 | 0.057 | 0 | 1.00 | 0.232 | |
desmid | 98% | 159 | 2 | 0 | 0.025 | 0 | 1.00 | 0.157 | |
drystep | 98% | 159 | 2 | 0 | 0.038 | 0 | 1.00 | 0.191 | |
drywint | 98% | 159 | 2 | 0 | 0.006 | 0 | 1.00 | 0.079 | |
edes1975 | 96% | 156 | 21 | 0 | 32.18 | 0 | 100.0 | 43.67 | |
avexpr | 74% | 121 | 80 | 1.64 | 7.07 | 7.05 | 10.00 | 1.80 | |
logpgp95 | 91% | 148 | 138 | 6.11 | 8.30 | 8.27 | 10.29 | 1.11 | |
landlock | 100% | 163 | 2 | 0 | 0.196 | 0 | 1.00 | 0.398 | |
goldm | 98% | 159 | 6 | 0 | 0.421 | 0 | 47.00 | 3.84 | |
iron | 98% | 159 | 11 | 0 | 0.314 | 0 | 16.00 | 1.60 | |
silv | 98% | 159 | 4 | 0 | 0.346 | 0 | 13.00 | 1.95 | |
zinc | 98% | 159 | 10 | 0 | 0.503 | 0 | 15.00 | 2.03 | |
oilres | 94% | 154 | 66 | 0 | 295,853 | 0 | 15,700,000 | 1,531,138 | |
logem4 | 53% | 87 | 44 | 0.936 | 4.60 | 4.44 | 7.99 | 1.30 | |
baseco | 39% | 64 | 1 | 1.00 | 1.00 | 1.00 | 1.00 | 0 |
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
shortnam identifier | Country code (3-letter) | Three-letter country identifier; the row key in every dataset. | AJR country abbreviation (e.g. AUS, USA, NGA); used for scatter point labels (mlabel). | string | AJR (2001) | all files |
africa dummy | Africa dummy | 1 if the country is in Africa (continent control). | Binary continent indicator. | 0/1 | AJR (2001) | tables 2, 4, 7 |
lat_abst continuous | Absolute latitude (scaled 0-1) | Absolute latitude of the capital, divided by 90 (a geography control). | abs(latitude of capital) / 90. | 0-1 | AJR (2001) | tables 2, 3, 4, 5, 6, 7, 8 |
malfal94 continuous | Falciparum malaria index, 1994 | Index of falciparum-malaria prevalence in 1994 (modern health channel). | Share of population at risk of falciparum malaria, 1994 (0-1). | 0-1 | AJR (2001), from Gallup-Sachs | table 7 |
avexpr continuous | Avg protection against expropriation risk | Average index of protection against expropriation of private investment, ~1985-95 (the endogenous regressor X — 'modern institutions'). | Mean over available years of the Political Risk Services expropriation-risk index, scaled 0 (worst) to 10 (best). | 0-10 scale | AJR (2001), from Political Risk Services | all files |
logpgp95 continuous | Log GDP per capita, PPP, 1995 | Natural log of 1995 PPP GDP per capita (the outcome Y). | log of World Bank PPP GDP per capita, 1995. | log US$ (PPP) | AJR (2001), from World Bank | all files |
logem4 continuous | Log settler mortality | Natural log of European settler/soldier mortality during early colonization (the instrument Z). | log of corrected annualized deaths per 1,000 (extmort4). | log deaths per 1,000 | AJR (2001), from Curtin and related sources | tables 1, 3, 4, 5, 6, 7, 8 |
asia dummy | Asia dummy | 1 if the country is in Asia (continent control). | Binary continent indicator. | 0/1 | AJR (2001) | tables 2, 4, 7 |
yellow dummy | Yellow-fever vector dummy | 1 if the yellow-fever vector is present today (disease-environment control). | Binary indicator. | 0/1 | AJR (2001) | table 7 |
baseco dummy | Base-sample flag (1 = AJR base sample) | Indicator for the 64-country base sample of ex-colonies with valid settler-mortality data. | 1 for base-sample countries; MISSING (not 0) otherwise. Restrict with keep if baseco==1. | 1/missing | AJR (2001) | tables 1, 2, 4, 5, 6, 7, 8 |
leb95 continuous | Life expectancy at birth, 1995 | Life expectancy at birth in 1995 (modern health channel). | Years; World Bank / WHO. | years | AJR (2001) | table 7 |
imr95 continuous | Infant mortality rate, 1995 | Infant mortality rate in 1995, deaths per 1,000 live births (modern health channel). | Deaths per 1,000 live births; World Bank. | per 1,000 births | AJR (2001) | table 7 |
meantemp continuous | Mean temperature (McArthur-Sachs) | Mean annual temperature, used as a geography instrument in overidentified specs. | Degrees Celsius; McArthur & Sachs. | deg C | McArthur & Sachs via AJR | table 7 |
lt100km continuous | Share of land within 100km of coast | Fraction of territory within 100 km of the coast (geography instrument). | Share 0-1; McArthur & Sachs. | 0-1 | McArthur & Sachs via AJR | table 7 |
latabs continuous | Absolute latitude (McArthur-Sachs) | Absolute latitude (0-1 scaled), McArthur-Sachs version, used as a geography instrument. | abs(latitude) / 90, McArthur & Sachs. | 0-1 | McArthur & Sachs via AJR | table 7 |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
shortnam | – | 100% | 163 | 163 | — | — | — | — | — |
africa | 100% | 163 | 2 | 0 | 0.307 | 0 | 1.00 | 0.463 | |
lat_abst | 99% | 162 | 96 | 0 | 0.296 | 0.267 | 0.722 | 0.190 | |
malfal94 | 96% | 157 | 52 | 0 | 0.294 | 5.00e-04 | 1.00 | 0.402 | |
avexpr | 74% | 121 | 80 | 1.64 | 7.07 | 7.05 | 10.00 | 1.80 | |
logpgp95 | 91% | 148 | 138 | 6.11 | 8.30 | 8.27 | 10.29 | 1.11 | |
logem4 | 53% | 87 | 44 | 0.936 | 4.60 | 4.44 | 7.99 | 1.30 | |
asia | 100% | 163 | 2 | 0 | 0.258 | 0 | 1.00 | 0.439 | |
yellow | 100% | 163 | 2 | 0 | 0.472 | 0 | 1.00 | 0.501 | |
baseco | 39% | 64 | 1 | 1.00 | 1.00 | 1.00 | 1.00 | 0 | |
leb95 | 37% | 60 | 59 | 37.24 | 62.08 | 65.70 | 78.98 | 11.43 | |
imr95 | 37% | 60 | 59 | 4.90 | 57.07 | 49.45 | 170.0 | 37.71 | |
meantemp | 37% | 60 | 56 | -0.200 | 23.13 | 24.47 | 29.30 | 4.96 | |
lt100km | 37% | 61 | 48 | 0 | 0.374 | 0.239 | 1.00 | 0.355 | |
latabs | 37% | 61 | 40 | 0 | 0.178 | 0.150 | 0.667 | 0.132 |
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
shortnam identifier | Country code (3-letter) | Three-letter country identifier; the row key in every dataset. | AJR country abbreviation (e.g. AUS, USA, NGA); used for scatter point labels (mlabel). | string | AJR (2001) | all files |
lat_abst continuous | Absolute latitude (scaled 0-1) | Absolute latitude of the capital, divided by 90 (a geography control). | abs(latitude of capital) / 90. | 0-1 | AJR (2001) | tables 2, 3, 4, 5, 6, 7, 8 |
euro1900 continuous | European settlers in 1900 (% of pop.) | Share of the population that was of European descent in 1900 (also used as an alternative instrument). | Percent European in 1900, AJR. | % (0-100) | AJR (2001) | tables 1, 3, 8 |
avexpr continuous | Avg protection against expropriation risk | Average index of protection against expropriation of private investment, ~1985-95 (the endogenous regressor X — 'modern institutions'). | Mean over available years of the Political Risk Services expropriation-risk index, scaled 0 (worst) to 10 (best). | 0-10 scale | AJR (2001), from Political Risk Services | all files |
logpgp95 continuous | Log GDP per capita, PPP, 1995 | Natural log of 1995 PPP GDP per capita (the outcome Y). | log of World Bank PPP GDP per capita, 1995. | log US$ (PPP) | AJR (2001), from World Bank | all files |
democ1 continuous | Democracy, 1st year of independence | Polity democracy score in the country's first year of independence (alternative instrument). | Polity democracy index, 0 (low) to 10 (high). | 0-10 scale | Polity via AJR | table 8 |
cons1 continuous | Constraint on executive, 1st year of independence | Polity constraint-on-the-executive score in the country's first year of independence. | Polity index, 1 (low) to 7 (high) constraint. | 1-7 scale | Polity via AJR | tables 1, 3, 8 |
indtime continuous | Years independent (1995 - first year) | Number of years a country had been independent by 1995. | 1995 minus the first year of independence. | years | AJR (2001) | tables 3, 8 |
democ00a continuous | Democracy in 1900 | Polity democracy score in 1900 (a historical-institution / alternative-instrument variable). | Polity democracy index, 0 (low) to 10 (high). | 0-10 scale | Polity via AJR | tables 1, 3, 8 |
cons00a continuous | Constraint on executive in 1900 | Polity constraint-on-the-executive score in 1900 (historical-institution / alternative instrument). | Polity index, 1 (low) to 7 (high) constraint. | 1-7 scale | Polity via AJR | tables 1, 3, 8 |
logem4 continuous | Log settler mortality | Natural log of European settler/soldier mortality during early colonization (the instrument Z). | log of corrected annualized deaths per 1,000 (extmort4). | log deaths per 1,000 | AJR (2001), from Curtin and related sources | tables 1, 3, 4, 5, 6, 7, 8 |
baseco dummy | Base-sample flag (1 = AJR base sample) | Indicator for the 64-country base sample of ex-colonies with valid settler-mortality data. | 1 for base-sample countries; MISSING (not 0) otherwise. Restrict with keep if baseco==1. | 1/missing | AJR (2001) | tables 1, 2, 4, 5, 6, 7, 8 |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
shortnam | – | 100% | 163 | 163 | — | — | — | — | — |
lat_abst | 99% | 162 | 96 | 0 | 0.296 | 0.267 | 0.722 | 0.190 | |
euro1900 | 94% | 154 | 26 | 0 | 30.47 | 1.95 | 100.0 | 42.39 | |
avexpr | 74% | 121 | 80 | 1.64 | 7.07 | 7.05 | 10.00 | 1.80 | |
logpgp95 | 91% | 148 | 138 | 6.11 | 8.30 | 8.27 | 10.29 | 1.11 | |
democ1 | 53% | 87 | 11 | 0 | 3.37 | 1.00 | 10.00 | 3.67 | |
cons1 | 54% | 88 | 6 | 1.00 | 3.59 | 3.00 | 7.00 | 2.41 | |
indtime | 54% | 88 | 46 | 5.00 | 77.17 | 37.00 | 195.0 | 62.14 | |
democ00a | 53% | 87 | 9 | 0 | 1.15 | 0 | 10.00 | 2.58 | |
cons00a | 56% | 91 | 6 | 1.00 | 1.86 | 1.00 | 7.00 | 1.82 | |
logem4 | 53% | 87 | 44 | 0.936 | 4.60 | 4.44 | 7.99 | 1.30 | |
baseco | 39% | 64 | 1 | 1.00 | 1.00 | 1.00 | 1.00 | 0 |
Known limitations & caveats
- Cross-section, not panel. Every file is a country cross-section keyed on
shortnam; there is no time dimension. Most variables are measured at a single point (e.g. GDP in 1995, religion shares in 1980). - baseco coding. The base-sample flag
basecotakes the value 1 for the 64 base-sample ex-colonies and is missing (not 0) for other countries. Restrict withkeep if baseco==1. - Imputed mortality (Albouy 2012). Roughly 36% of AJR's settler-mortality observations are imputed or shared across countries, which weakens the exclusion restriction and gives Hansen J low power against shared imputation bias.
- Weak instruments in robustness specs. Several specifications (e.g. Table 6 resource/geography columns, Table 7 health-channel overidentified columns) drop the first-stage F below 5 — read their confidence intervals via the weak-IV-robust Anderson–Rubin test, not their point estimates.
- LATE, not ATE. The headline 0.944 is the effect for colonization-margin compliers, not a population-average treatment effect.
- Missing values. Coverage varies by variable (institutions, mortality, and many controls are missing for a subset of countries); the labeled
.dtapreserves AJR's original missing-data pattern byte-for-byte.