Augmented Synthetic Control — Interactive Lab

One method, three doors: `augsynth`

The Augmented Synthetic Control Method builds a credible counterfactual for a treated unit from a weighted "recipe" of untreated donors — then adds an outcome model that removes leftover bias when the pre-treatment fit is imperfect. The augsynth package exposes three entry points: single_augsynth (one treated unit), multisynth (many units, staggered adoption), and augsynth_multiout (one unit, many outcomes).

This lab reproduces the post's findings interactively, all client-side from a precomputed results.json. We first prove the method recovers a known effect on simulated data, show exactly where plain SCM fails and augmentation saves it, then replicate Papaioannou (2021) on the euro area.

—

mean recovery error, plain → ridge (simulated)

—

C05 effect: plain SCM gets the sign wrong; ridge fixes it

—

rank correlation, our ASCM vs Papaioannou (2021)

Does the method recover a known effect?

On simulated data the true treatment effect of unit C01 is a jump plus a gentle ramp we injected ourselves. The estimated treated-minus-synthetic gap (blue, with its conformal band) should track the true effect (white dashed) and sit at zero before treatment.

If an estimator cannot reproduce a known truth on simulated ground, do not trust it on real data.

Tab 2

Single & Suitability

Switch between a well-fit unit and one outside the donor hull. Watch plain SCM break and Ridge-ASCM rescue it.

Tab 3

Many Units

The pooled effect path from multisynth, on simulated data (vs truth) and on the 12 euro members.

Tab 4

Replication

Our per-country ASCM estimates vs the paper's reported TFP contributions, country by country.

Glossary (open a card if a term is unfamiliar)

Synthetic control

A weighted average of donor (untreated) units built to match the treated unit's pre-treatment path. The synthetic's post-treatment path is the missing counterfactual.

Augmentation (bias correction)

An outcome model (Ridge regression) that estimates and subtracts the part of the gap SCM could not close. Zero when pre-fit is perfect; nonzero when it is poor.

Pre-fit imbalance (scaled L2)

How far the synthetic is from the treated unit before treatment, scaled so 1.0 is the naive donor average. Small means a trustworthy match.

ATT

Average treatment effect on the treated — the post-treatment gap between the actual unit and its synthetic counterfactual.

`single_augsynth` and the suitability test

One treated unit, fit against the donor pool. Unit C01 sits inside the donor hull, so plain SCM fits well and augmentation barely matters. Unit C05 was placed outside the hull on purpose — plain SCM cannot match its pre-period, and it takes the Ridge outcome model to recover the truth.

Treated unit C01 (inside hull — easy) C05 (outside hull — hard)

—

true average effect

—

plain SCM estimate

—

Ridge-ASCM estimate

—

pre-fit L2 (plain → ridge)

Actual vs synthetic control

—

Recovery error across all five treated units

Absolute distance between the estimate and the known truth, for plain SCM (orange) and Ridge-ASCM (teal). Augmentation helps most exactly where the fit is worst.

`multisynth`: many treated units at once

multisynth fits one synthetic control per treated unit and partially pools them, returning a pooled average effect and per-unit effects. Toggle between the simulated panel (where we can check against the known truth) and the real euro-area panel of 12 members.

Panel Simulated (vs known truth) Euro area (12 members)

—

pooled average effect

—

true pooled effect

—

scaled global L2 imbalance

Pooled effect path

Per-unit and pooled effects with jackknife CIs

Point = estimate, bar = 95% jackknife confidence interval, orange tick = known truth. teal = excludes zero (significant); grey = includes zero.

Per-unit recovery: jackknife vs wild bootstrap

The jackknife interval (used above) excludes zero for every unit; the more conservative wild bootstrap, which also carries the counterfactual-estimation uncertainty, does not. Same estimates, different verdict — the inference method matters.

Replicating Papaioannou (2021)

Did the euro raise total factor productivity? We fit a synthetic control for each of the 12 founding members against 24 non-euro donors and compare our ASCM percentage effect (2000–2007) to the paper's reported contribution, country by country. Points on the 45-degree line agree.

—

Spearman rank correlation

—

Pearson correlation

—

synthetic Germany TFP effect, 2008–17

ASCM vs the paper: TFP % contribution, 2000–2007

Hover a point for the exact pair. France and the Netherlands land almost on the line; Germany and Ireland diverge in magnitude but not in sign.

Country-by-country comparison

Greece and Portugal turn negative in 2008–17 in both our estimates and the paper — the post-crisis reversal.

Inference: is the effect real, or could it be noise?

A point estimate is only half the story. augsynth ships a small toolbox of inference methods, and they do not all agree. This tab shows which headline results are statistically significant, explains the three tools the tutorial uses, and lets you feel what drives significance with a hands-on simulator.

Significance scoreboard (from the tutorial's results)

Each headline estimate with its confidence interval or p-value, and whether it is distinguishable from zero at the 5% level.

On simulated data, where we injected a real effect, every headline is significant. C05 (outside the donor hull) and the real euro-area pooled effect are honestly not.

Three tools, matched to three estimators

jackknife+ — single_augsynth

Leaves out one donor at a time, refits, and builds a robust confidence interval for the average effect. Our primary interval for a single treated unit because it is stable even when the post-treatment window is long.

conformal — single_augsynth & augsynth_multiout

A permutation test that returns a p-value and a pointwise band. Powerful when the pre-period is long relative to the post-period (which is why the simulated panel starts in 1985), but lower-powered and noisier when the post-window is long. For multiple outcomes it returns a p-value per outcome.

jackknife — multisynth (primary)

The natural interval for the average effect across treated units. Tight and significant here for the pooled and per-unit effects.

wild bootstrap — multisynth (conservative)

Resamples to also propagate the counterfactual-estimation uncertainty, so it is wider. On these five units it does not exclude zero — the same estimates, a more cautious verdict.

Simulator: what makes an effect significant?

An illustrative treated-minus-control gap: flat before adoption, shifted by the true effect after. Move the sliders and watch the confidence interval widen or narrow and the verdict flip at the 5% line. (A teaching model — the real augsynth uses jackknife / conformal / bootstrap, not this normal approximation.)

True effect size 3.0

Bigger true effect → easier to detect.

Noise (σ) 1.0

More noise → wider interval, harder to detect.

Pre-treatment periods 15

More pre-periods → a better-pinned counterfactual → tighter interval.

estimate — 95% CI — p-value — —

What this tab teaches

Significance is not the point estimate. A large, accurate estimate can still be indistinguishable from zero if the interval is wide.
Three things widen the interval: more noise, fewer pre-treatment periods (a worse-pinned counterfactual), and a poorer pre-fit.
The inference method matters. The multisynth jackknife calls the pooled effect significant; the conservative wild bootstrap does not — on the same numbers.
Be honest on real data. The euro-area pooled effect is near zero and not significant; we report it as such rather than dressing it up.

One method, three doors: augsynth

Does the method recover a known effect?

Single & Suitability

Many Units

Replication

Glossary (open a card if a term is unfamiliar)

single_augsynth and the suitability test

Actual vs synthetic control

Recovery error across all five treated units

multisynth: many treated units at once

Pooled effect path

Per-unit and pooled effects with jackknife CIs

Per-unit recovery: jackknife vs wild bootstrap

Replicating Papaioannou (2021)

ASCM vs the paper: TFP % contribution, 2000–2007

Country-by-country comparison

Inference: is the effect real, or could it be noise?

Significance scoreboard (from the tutorial's results)

Three tools, matched to three estimators

Simulator: what makes an effect significant?

What this tab teaches

One method, three doors: `augsynth`

`single_augsynth` and the suitability test

`multisynth`: many treated units at once