Downloads
Each dataset is available as a labeled Stata .dta and its source file.
⇩ Download all data (ZIP)stata_codebook.do
| Dataset | Grain | Rows | Stata | Source |
|---|---|---|---|---|
EL_regional_conflict_replication | region-year | 96,591 × 14 | EL_regional_conflict_replication.dta | EL_regional_conflict_replication.dta |
Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.
Load directly in code
Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.
Stata
* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_iv_panel/data/"
use "${BASE}EL_regional_conflict_replication.dta", clear
describe
notesPython
!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_iv_panel/data/"
df = pd.read_stata(BASE + "EL_regional_conflict_replication.dta")
# load every dataset at once
files = ["EL_regional_conflict_replication"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}
# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "EL_regional_conflict_replication.dta", "EL_regional_conflict_replication.dta")
df, meta = pyreadstat.read_dta("EL_regional_conflict_replication.dta")Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb
R
# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/stata_iv_panel/data/"
df <- read_dta(paste0(BASE, "EL_regional_conflict_replication.dta"))Overview & sources
Replication data for a hands-on Stata tutorial that estimates the causal effect of economic shocks on civil conflict by replicating Hodler & Raschky (2014, Economics Letters). The panel covers 96,591 region–year observations from 5,689 subnational administrative regions across African countries over 1994–2010. Conflict is a binary region-year outcome (1+ or 25+ battle deaths from UCDP); nighttime light intensity (DMSP) proxies local economic activity; and lagged log rainfall and the lagged Palmer Drought Severity Index serve as instruments. The endogenous light variable is instrumented because conflict suppresses light (reverse causality) and light measures activity with error (attenuation). OLS returns a near-zero coefficient; 2SLS using both weather instruments yields about −0.30 on conflict (1+ deaths), with first-stage F-statistics above the Stock–Yogo threshold and a Hansen J p-value of 0.93. This is real observational data, not simulated.
EL_regional_conflict_replication is an annual region panel (one row per region × year, objectid × year, 1994–2010) holding the conflict outcomes, the nighttime-light proxy, and the two weather instruments. Every estimation variable ships in two forms: the raw series and a detrended (_dt) series — residuals from a region-specific linear time trend, equivalent to including region-specific trends in the regression. The tutorial uses the detrended variables throughout, following the original paper.
Data sources
| Source | Provides | Reference / URL |
|---|---|---|
| Hodler & Raschky (2014) | Replicated study; the assembled region-year panel and detrended (_dt) variables | Hodler, R. & Raschky, P.A. (2014). Economic shocks and civil conflict at the regional level. Economics Letters, 124(3), 530–533. |
| UCDP Georeferenced Event Dataset (GED) | Civil-conflict outcomes (battle deaths geocoded to regions; 1+ and 25+ death dummies) | Uppsala Conflict Data Program (UCDP), Georeferenced Event Dataset. https://ucdp.uu.se/ |
| DMSP-OLS nighttime lights | Nighttime light intensity — proxy for local economic activity (logged, lagged) | NOAA/NGDC DMSP-OLS Nighttime Lights Time Series. Henderson, Storeygard & Weil (2012), American Economic Review, 102(2), 994–1028. |
| Weather instruments (rainfall & PDSI) | Lagged log rainfall and the lagged Palmer Drought Severity Index (the instruments) | Rainfall (GPCC); Palmer Drought Severity Index (PDSI). Used as instruments per Miguel, Satyanath & Sergenti (2004), Journal of Political Economy, 112(4), 725–753. |
| Method references | IV / 2SLS estimators and weak-instrument diagnostics | Miguel, Satyanath & Sergenti (2004); Stock & Yogo (2005); Stock & Wright (2000); Hansen J overidentification test. |
Cite this data
Please cite this dataset as follows.
APA
Mendez, C. (2026). IV Estimation with Panel Data: Economic Shocks and Civil Conflict [Data set]. https://carlos-mendez.org/post/stata_iv_panel/
Hodler, R., & Raschky, P. A. (2014). Economic shocks and civil conflict at the regional level. Economics Letters, 124(3), 530–533.BibTeX
@misc{mendez2026stataivpanel,
author = {Mendez, Carlos},
title = {IV Estimation with Panel Data: Economic Shocks and Civil Conflict},
year = {2026},
howpublished = {\url{https://carlos-mendez.org/post/stata_iv_panel/}},
note = {Data set}
}
@article{hodler2014economic,
author = {Hodler, Roland and Raschky, Paul A.},
title = {Economic shocks and civil conflict at the regional level},
journal = {Economics Letters},
volume = {124}, number = {3}, pages = {530--533}, year = {2014}
}Variable explorer search & filter all 14 variables
Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.
| Variable | Type | Distribution | Label | Definition | Units | In files | Source |
|---|---|---|---|---|---|---|---|
countrycode# | identifier | – | Country ISO3 code | Three-letter ISO country code of the region's country. | ISO 3166-1 alpha-3 | EL_regional_conflict_replication | Hodler & Raschky (2014) |
countryname# | identifier | – | Country name | Name of the country containing the region. | string | EL_regional_conflict_replication | Hodler & Raschky (2014) |
l2lnrain01# | continuous | Ln rainfall (t-2), raw | Log rainfall lagged two years — instrument 1 for economic activity. | log mm | EL_regional_conflict_replication | Rainfall (GPCC) | |
l2lnrain01_dt# | continuous | Ln rainfall (t-2), detrended | Detrended log rainfall — instrument 1 as used in the regressions. | deviation (log) | EL_regional_conflict_replication | Derived (this study) | |
l2meanpdsi# | continuous | Palmer Drought Severity Index (t-2), raw | Lagged Palmer Drought Severity Index — instrument 2 (higher = less drought). | PDSI (index) | EL_regional_conflict_replication | PDSI | |
l2meanpdsi_dt# | continuous | Palmer Drought Severity Index (t-2), detrended | Detrended PDSI — instrument 2 as used in the regressions. | deviation | EL_regional_conflict_replication | Derived (this study) | |
llnlight01# | continuous | Ln nighttime lights (t-1), raw | Log nighttime light intensity lagged one year — proxy for local economic activity (endogenous regressor). | log intensity | EL_regional_conflict_replication | DMSP-OLS nighttime lights | |
llnlight01_dt# | continuous | Ln nighttime lights (t-1), detrended | Detrended log nighttime lights — the endogenous regressor used in all regressions. | deviation (log) | EL_regional_conflict_replication | Derived (this study) | |
objectid# | identifier | – | Region identifier (panel ID) | Unique identifier for each subnational administrative region (the panel unit). | integer ID | EL_regional_conflict_replication | Hodler & Raschky (2014) |
ucdp_25death_dummy# | dummy | Conflict (25+ deaths), raw | 1 if the region-year had 25 or more conflict-related deaths, else 0 (outcome 2). | 0/1 | EL_regional_conflict_replication | UCDP GED | |
ucdp_25death_dummy_dt# | continuous | Conflict (25+ deaths), detrended | Detrended conflict (25+ deaths): residual from a region-specific linear time trend. | deviation | EL_regional_conflict_replication | Derived (this study) | |
ucdp_death_dummy# | dummy | Conflict (1+ deaths), raw | 1 if the region-year had at least one conflict-related death, else 0 (outcome 1). | 0/1 | EL_regional_conflict_replication | UCDP GED | |
ucdp_death_dummy_dt# | continuous | Conflict (1+ deaths), detrended | Detrended conflict (1+ deaths): residual from a region-specific linear time trend. | deviation | EL_regional_conflict_replication | Derived (this study) | |
year# | year | – | Calendar year | Annual time index (the panel time variable). | year | EL_regional_conflict_replication | Hodler & Raschky (2014) |
Cross-file variable index
Which file each variable appears in (● = present).
| Variable | EL_regional_conflict_replication |
|---|---|
countrycode | ● |
countryname | ● |
l2lnrain01 | ● |
l2lnrain01_dt | ● |
l2meanpdsi | ● |
l2meanpdsi_dt | ● |
llnlight01 | ● |
llnlight01_dt | ● |
objectid | ● |
ucdp_25death_dummy | ● |
ucdp_25death_dummy_dt | ● |
ucdp_death_dummy | ● |
ucdp_death_dummy_dt | ● |
year | ● |
Construction & formulas
The IV design isolates the causal effect of economic activity on conflict using weather as an
external shifter. With region fixed effects α_i, region-specific trends
β_i·t, and year fixed effects γ_t:
- Structural model (second stage):
Conflict_it = α_i + β_i·t + γ_t + δ · Light_(i,t−1) + ε_it— the parameter of interest isδ, the causal effect of economic activity on conflict probability (a Local Average Treatment Effect for weather-driven compliers). - First stage:
Light_(i,t−1) = α̃_i + β̃_i·t + γ̃_t + δ̃ · Weather_(i,t−2) + ε̃_it, whereWeather_(i,t−2)is lagged rainfall, lagged PDSI, or both. - Instrument used for the endogenous regressor: lagged nighttime light
llnlight01is instrumented by lagged log rainfalll2lnrain01and the lagged Palmer Drought Severity Indexl2meanpdsi(weather att−2). - Exclusion restriction: weather at
t−2affects conflict attonly through its effect on economic activity (light) att−1— no direct path. The two-year lag makes a direct effect implausible; with two instruments the Hansen J test checks joint validity (p = 0.93). - Detrending: each
_dtvariable is the residual from regressing the raw series onyearwithin each region — equivalent to region-specific linear time trends. All regressions use the_dtvariables.
The datasets
Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.
expand to search (Ctrl/⌘+F) or print across all datasets
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
objectid identifier | Region identifier (panel ID) | Unique identifier for each subnational administrative region (the panel unit). | Region code from the original replication; panel set via tsset objectid year. | integer ID | Hodler & Raschky (2014) | 5,689 regions |
year year | Calendar year | Annual time index (the panel time variable). | Observation year, 1994-2010. | year | Hodler & Raschky (2014) | 1994-2010 |
countrycode identifier | Country ISO3 code | Three-letter ISO country code of the region's country. | From the original panel (variable label 'ISO'). | ISO 3166-1 alpha-3 | Hodler & Raschky (2014) | 43 countries |
countryname identifier | Country name | Name of the country containing the region. | From the original panel (variable label 'NAME_0'). | string | Hodler & Raschky (2014) | 43 countries |
ucdp_death_dummy dummy | Conflict (1+ deaths), raw | 1 if the region-year had at least one conflict-related death, else 0 (outcome 1). | From UCDP geocoded battle deaths aggregated to the region-year. | 0/1 | UCDP GED | region-year |
ucdp_25death_dummy dummy | Conflict (25+ deaths), raw | 1 if the region-year had 25 or more conflict-related deaths, else 0 (outcome 2). | From UCDP geocoded battle deaths aggregated to the region-year. | 0/1 | UCDP GED | region-year |
llnlight01 continuous | Ln nighttime lights (t-1), raw | Log nighttime light intensity lagged one year — proxy for local economic activity (endogenous regressor). | Natural log of DMSP-OLS nighttime light intensity, lagged one year. | log intensity | DMSP-OLS nighttime lights | region-year |
l2lnrain01 continuous | Ln rainfall (t-2), raw | Log rainfall lagged two years — instrument 1 for economic activity. | Natural log of annual rainfall (GPCC), lagged two years. | log mm | Rainfall (GPCC) | region-year |
l2meanpdsi continuous | Palmer Drought Severity Index (t-2), raw | Lagged Palmer Drought Severity Index — instrument 2 (higher = less drought). | Mean PDSI for the region, lagged two years; the label reads '(Non) Drought'. | PDSI (index) | PDSI | region-year |
ucdp_death_dummy_dt continuous | Conflict (1+ deaths), detrended | Detrended conflict (1+ deaths): residual from a region-specific linear time trend. | Residual of ucdp_death_dummy on year, fitted within each region (objectid). | deviation | Derived (this study) | region-year |
ucdp_25death_dummy_dt continuous | Conflict (25+ deaths), detrended | Detrended conflict (25+ deaths): residual from a region-specific linear time trend. | Residual of ucdp_25death_dummy on year, fitted within each region (objectid). | deviation | Derived (this study) | region-year |
llnlight01_dt continuous | Ln nighttime lights (t-1), detrended | Detrended log nighttime lights — the endogenous regressor used in all regressions. | Residual of llnlight01 on year, fitted within each region (objectid). | deviation (log) | Derived (this study) | region-year |
l2lnrain01_dt continuous | Ln rainfall (t-2), detrended | Detrended log rainfall — instrument 1 as used in the regressions. | Residual of l2lnrain01 on year, fitted within each region (objectid). | deviation (log) | Derived (this study) | region-year |
l2meanpdsi_dt continuous | Palmer Drought Severity Index (t-2), detrended | Detrended PDSI — instrument 2 as used in the regressions. | Residual of l2meanpdsi on year, fitted within each region (objectid). | deviation | Derived (this study) | region-year |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
objectid | – | 100% | 96,591 | 5,689 | — | — | — | — | — |
year | – | 100% | 96,591 | 17 | 1994 | 2002.0 | 2002 | 2010 | 4.90 |
countrycode | – | 100% | 96,591 | 43 | — | — | — | — | — |
countryname | – | 100% | 96,591 | 43 | — | — | — | — | — |
ucdp_death_dummy | 100% | 96,591 | 2 | 0 | 0.046 | 0 | 1.00 | 0.208 | |
ucdp_25death_dummy | 100% | 96,591 | 2 | 0 | 0.014 | 0 | 1.00 | 0.119 | |
llnlight01 | 100% | 96,591 | 64,356 | -4.61 | -1.61 | -1.87 | 4.14 | 2.62 | |
l2lnrain01 | 100% | 96,591 | 58,543 | -4.61 | 3.83 | 4.10 | 6.09 | 1.48 | |
l2meanpdsi | 100% | 96,591 | 36,302 | -12.13 | -1.22 | -1.17 | 12.63 | 2.03 | |
ucdp_death_dummy_dt | 100% | 96,591 | 1,006 | -1.06 | -3.39e-10 | 0 | 1.01 | 0.161 | |
ucdp_25death_dummy_dt | 100% | 96,591 | 641 | -0.919 | -1.44e-10 | 0 | 0.983 | 0.098 | |
llnlight01_dt | 100% | 96,591 | 82,652 | -5.72 | -1.36e-11 | 0 | 5.36 | 0.411 | |
l2lnrain01_dt | 100% | 96,591 | 71,705 | -5.44 | 2.10e-11 | 0.003 | 2.50 | 0.187 | |
l2meanpdsi_dt | 100% | 96,591 | 37,353 | -6.66 | -1.53e-09 | -0.053 | 11.80 | 1.59 |
Known limitations & caveats
- Real observational data. Outcomes are from UCDP (geocoded battle deaths), lights from DMSP-OLS, weather from rainfall/PDSI products — not simulated. Causal claims rest on the instrument's validity, not on randomization.
- Nighttime lights are a noisy proxy. Light measures economic activity with error; classical measurement error biases OLS toward zero (attenuation), which is why OLS (≈0.001) and 2SLS (≈−0.30) differ by orders of magnitude.
- Exclusion is untestable with one instrument. With a single instrument the exclusion restriction cannot be tested; it is defended by the two-year lag. The over-identified (two-instrument) model lets the Hansen J test probe joint validity.
- Conflict is a rare event. Only 4.6% of region-years have 1+ conflict deaths and 1.4% have 25+; estimates are interpreted relative to these low baselines.
- Within vs. between variation. Most variation in lights and rainfall is between regions; the fixed-effects estimator uses only within-region variation, so strong instruments are essential.
- Detrended (_dt) variables only. The analysis uses the residualized series; the raw series are provided for reference but are not the regression inputs.