Synthetic Control in R

What did conflict cost the Basque Country? A counterfactual built from data

−0.580ATT · thousand 1986 USD/capita

2 of 16active donors · 85% Catalonia

2 of 8placebo rank · pseudo p = 0.25

Carlos Mendez

Nagoya University (GSID)

July 8, 2026

The Tension

Act I

We never see the Basque economy without the conflict

In 1970 the Basque Country entered decades of sustained terrorist activity. The natural question — what did it cost? — has no easy answer.

The path we observed is only half the story. The path without conflict — the counterfactual — was never recorded. How do you measure a road not taken?

One treated region, no clean comparison — until we build one

GDP per capita across Spanish regions, 1955–1997. The Basque Country (orange) sits among the richest regions throughout — so no single region is a clean comparison.

Where we’re going

The single-treated-unit problem — why difference-in-differences breaks
The recipe: a weighted blend of donor regions matched on pre-1970 fit
The headline gap — actual minus synthetic Basque (the ATT)
Two falsification tests: a Catalonia placebo and an in-space placebo

The Investigation

Act II

The estimand is the ATT: the gap from the counterfactual we never see

\[\alpha_{1t} = Y_{1t} - Y_{1t}^{N}, \quad t \geq 1970\]

Treatment effect = actual GDP minus the no-conflict counterfactual \(Y_{1t}^{N}\).

The fundamental problem: \(Y_{1t}^{N}\) is never observed. Synthetic control estimates it as a weighted average of donor regions.

The estimator replaces the counterfactual with a weighted donor recipe

\[\hat{\alpha}_{1t} = Y_{1t} - \sum_{j=2}^{18} w_j^{*}\, Y_{jt}, \quad t \geq 1970\]

The donor weights \(w_j^{*}\) are non-negative, sum to one, and are picked to match the Basque pre-treatment predictors.

If the pre-1970 match is good, the post-1970 gap is the most plausible estimate of the conflict’s cost.

The lab: 18 regions, 43 years, 13 predictors, treatment in 1970

Outcome — gdpcap, real GDP per capita in 1986 thousand USD
Treated unit — the Basque Country (region 17), conflict onset 1970
Donor pool — the 16 other Spanish autonomous communities
Predictors — 13 covariates: education, investment, sector shares, density

774 rows = 18 regions × 43 years. Pre-treatment window 1955–1969; post-treatment evaluation 1970–1997.

Two nested optimizations: inner picks the recipe, outer picks what matters

\[W^{*}(V) = \arg\min_{W \in \mathcal{W}} \lVert X_{1} - X_{0}W \rVert_{V}\]

The inner problem finds donor weights \(W\) matching the treated predictor profile \(X_1\), measured in a \(V\)-weighted norm.

\[V^{*} = \arg\min_{V} \, (Z_{1} - Z_{0}W^{*}(V))'(Z_{1} - Z_{0}W^{*}(V))\]

The outer problem picks predictor weights \(V\) so the induced recipe minimizes pre-1970 outcome error.

`dataprep()` then `synth()` — the whole estimation is two calls

basque_dp <- prepare_basque(treated_id  = 17,
                            control_ids = c(2:16, 18))   # 16-region donor pool

run_synth_quiet <- function(dp)
  synth(data.prep.obj = dp, optimxmethod = "BFGS", verbose = FALSE)
basque_synth <- run_synth_quiet(basque_dp)

basque_synth$solution.w        # the donor weights W*

The optimizer keeps just 2 of 16 donors — a sparse, readable recipe

Diagnostic	Value
W weights sum to	1
Active donors (\(w > 0.01\))	2 of 16
Pre-treatment loss \(V\)	0.0089
Pre-treatment loss \(W\) (MSPE)	0.2467

Sparsity is typical: few donors resemble the treated unit, the rest get zero.

The synthetic Basque is 85% Catalonia and 15% Madrid

Region	Weight
Cataluna	0.851
Madrid (Comunidad De)	0.149
Every other region	0

Basque \(\approx\) 85% Catalonia + 15% Madrid — the only two comparably industrial, urban, wealthy regions.

The match is excellent where it matters — pre-1970 GDP and education

Predictor	Treated	Synthetic	Donor mean
Pre-1970 GDP/capita	5.28	5.27	3.58
School (primary) %	85.9	82.3	80.9
Industry share %	45.1	37.6	22.4
Agriculture share %	6.84	6.18	21.4
Pop. density 1969	247	196	99.4

Outcome-relevant predictors match closely; density is the largest gap.

The Resolution

Act III

Before 1970 the lines are one; after 1970 they split apart

Actual Basque (orange) vs synthetic Basque (blue dashed). Pre-treatment window 1955–1969 shaded; the vertical line marks conflict onset in 1970.

The gap is the cost — peaking at −1.04 thousand USD in 1989

Estimated GDP gap (Basque minus synthetic Basque). Essentially zero before 1970, then negative — the deepest deficit of −1.04 thousand USD falls in 1989.

The conflict cost the Basque Country −0.580 thousand USD per capita per year

−0.580

\(\widehat{\mathrm{ATT}}\), average 1970–1997 (thousand 1986 USD/capita) · roughly an 8% income shortfall

A single placebo isn’t enough — Catalonia’s ratio is nearly as big

Catalonia placebo	Value
Pre-1970 MSPE	0.006
Post-1970 MSPE	0.391
Post/pre ratio	64.7

Comparable to Basque’s own ratio of 60.1 — one placebo run has limited inferential power.

In the comparable-fit placebos, the Basque ranks 2 of 8

In-space placebo gap traces for the 8 comparable-fit regions. The Basque (orange) ranks 2 of 8 by post/pre MSPE ratio — at the loud edge of the chorus, not far outside it.

The placebo is suggestive, not decisive — read it with the donor weights

Objection. If Catalonia tops the placebo ranking and is also 85% of the synthetic recipe, isn’t the Basque “effect” just the same Spanish industrial transition?

Response. A fair caveat. When a synthetic is built from one dominant donor, that donor naturally scores high in its own placebo — it has no close substitute to rebuild it. The result is consistent with a sizeable cost (rank 2 of 8, an 8% shortfall) but the small 16-region pool limits resolution (smallest pseudo p = 0.125). Report placebo and donor weights together, not separately.

Match the pre-treatment, build the counterfactual, read the gap.