Synthetic Difference-in-Differences

Did Proposition 99 cut smoking? Three ways to draw the counterfactual

In 1988 California passed Proposition 99, raising cigarette taxes and funding anti-smoking programs. Cigarette sales fell — but sales were already falling everywhere. The hard part is the counterfactual: how many packs would Californians have smoked without the law? Three methods answer that question with three different recipes, and they disagree by a lot. Use the tabs above to build each counterfactual yourself.

−27.3

Difference-in-differences
(2×2, equal weights)

−19.5

Synthetic control
(unit weights only)

−15.6

SDID
(unit + time weights)

Average treatment effect on the treated (ATT), in cigarette packs per capita per year, 1989–2000.

One regression, three weighting schemes

DiD, synthetic control, and SDID are not three unrelated tools. Each is the same two-way fixed-effects regression of packs per capita on a treatment dummy, fit with state fixed effects and year fixed effects. The only thing that changes is which observations get weight: which donor states stand in for California, and which pre-treatment years define the baseline.

DiD

Units: all 38 controls, equal weight
Years: all pre-years, equal weight

The control group is a plain average of every other state. Assumes California's untreated path would parallel that average — even though controls were higher and flatter before 1989.

Synthetic control

Units: a few donors, weights ω
Years: all pre-years, equal weight

Picks a sparse, weighted blend of donor states that tracks California's level before 1989. Here just 6 states get weight (Utah, Montana, Nevada, …).

SDID

Units: donors, weights ω
Years: recent pre-years, weights λ

Adds time weights on top of unit weights, concentrating the baseline on the years just before 1989 (1986–88) that look most like the post-period.

Tab 2

Weighting-scheme explorer

Flip between DiD, SC, and SDID and watch California's counterfactual redraw itself. See why DiD overshoots and SDID lands in the middle.

Tab 3

Counterfactual & gap

The treated-vs-synthetic view, with the estimated effect (the gap) charted below and the ATT labelled on the post-1989 region.

Tab 4

Placebo inference

With one treated unit, you cannot rely on a standard error alone. Pretend each control was treated, collect 38 placebo effects, and see how extreme California really is.

Why does SDID land between DiD and SC?

DiD compares California to a flat average of higher-smoking states, so it reads off a large drop (−27.3). Synthetic control fixes the level mismatch with unit weights and shrinks the estimate to −19.5. SDID keeps those unit weights and down-weights distant pre-years that no longer resemble California, removing the last bit of trend mismatch — landing at −15.6 packs per capita. The three estimates are a story about how much each method trusts parallel trends.

Glossary — open a card if a term is unfamiliar

Counterfactual

What would have happened to California without Prop 99. It is never observed — every method estimates it from the control states.

ATT

Average treatment effect on the treated. Here, the average gap between California's observed smoking and its counterfactual over 1989–2000. All three methods target the same ATT; they disagree on the counterfactual.

Difference-in-differences (DiD)

Compares the before/after change in California to the before/after change in the control group. Uses every control state with equal weight.

Parallel-trends assumption

DiD is only valid if California and the controls would have moved in parallel absent the law. When pre-trends differ, the DiD estimate is biased — which is what SC and SDID try to fix.

Synthetic control (SC)

Builds a weighted blend of donor states ("synthetic California") that matches California's pre-1989 outcome path. Weights are sparse: most states get zero.

Unit weights (ω)

How much each donor state contributes to the synthetic counterfactual. SC keeps 6 donors; SDID spreads weight across many states (diffuse).

Time weights (λ)

SDID's extra ingredient: how much each pre-treatment year counts toward the baseline. Here almost all weight falls on 1986–88 (0.37, 0.21, 0.43).

SDID

Synthetic difference-in-differences (Arkhangelsky et al. 2021). A two-way fixed-effects regression weighted by both ω and λ. Robust to both level and trend mismatch.

Placebo inference

With one treated unit there is no replication, so we reassign "treatment" to each control state in turn and build a distribution of fake effects. The p-value is the share of placebos at least as large as California's.

Donor pool

The 38 control states eligible to form the synthetic counterfactual. States with their own big tobacco shocks are excluded in the original study; here all 38 are donors.

Weighting-scheme explorer — pick a method, redraw the counterfactual

California's observed smoking (thick orange) is fixed. What moves is the dashed counterfactual — the smoking path California "would have had". Each method draws it differently because each weights the donor states and pre-years differently. Click DiD, SC, or SDID and watch the counterfactual snap to a new shape.

Method

DiD

all controls, equal weight

ATT (packs per capita)

—

average effect, 1989–2000

Donor states with weight

—

out of 38 controls

What to look for

DiD's counterfactual barely bends. It is California's pre-level shifted by the controls' average change. Because controls fell less steeply, the implied counterfactual stays high — so the gap (and the −27.3 ATT) is the largest of the three.
SC hugs California tightly before 1989, then separates. The synthetic blend is built to match the pre-period path, so the post-1989 gap is a cleaner read on the policy — a smaller −19.5.
SDID's counterfactual sits closest to California's late-1980s trajectory. Time weights pull the baseline toward 1986–88, the years most like the post-period, giving the most conservative estimate, −15.6.

Synthetic control weights (sparse)

Only 6 donor states get any weight. The synthetic counterfactual is a blend of these states.

SDID weights (diffuse)

SDID spreads weight across many states — no single donor dominates. Top 12 shown.

Sparse vs diffuse — why it matters

Synthetic control's sparsity is a feature for transparency (you can name the handful of states standing in for California) but a liability for robustness — leaning on Utah at 39% means one idiosyncratic state can move the whole estimate. SDID's diffuse weights spread that risk across dozens of states, which is one reason its placebo distribution (Tab 4) is well behaved.

Counterfactual & gap — the effect is the space between the lines

The treatment effect in any given year is simply California's observed smoking minus its counterfactual. Above, the two lines; below, the gap between them. Before 1989 the gap should hover near zero (a good counterfactual matches the past); after 1989 it opens up — that opening is the policy effect. Pick a method and read the gap.

Method

Synthetic control

counterfactual shown above

ATT (post-1989 average gap)

—

packs per capita per year

Gap in 2000 (final year)

—

effect grows over time

What to look for

The pre-1989 gap is the credibility test. For SC and SDID it stays near zero before the law — the counterfactual reproduces the past. For DiD the pre-period gap wanders, a visible warning that parallel trends is shaky.
The effect widens every year. By 2000 California smokes roughly 26 packs per capita less than its synthetic twin — the cumulative payoff of a sustained tobacco-control program, not a one-time jump.
The dashed ATT line is the average of the post-1989 gap. Switch methods and watch it move from −15.6 (SDID) to −19.5 (SC) to −27.3 (DiD) — the same ranking you saw on the Concept tab.

Placebo inference — is −15.6 bigger than chance?

One treated state means there is nothing to average over for a standard error in the usual way. SDID's answer: pretend, one at a time, that each of the 38 control states got the "treatment" in 1989, estimate the effect, and collect all 38 fake (placebo) effects. If California's real −15.6 sits far out in the tail of that distribution, the result is unlikely to be noise. The orange line is California; the steel bars are the placebos.

California SDID ATT

−15.6

packs per capita

Permutation p-value

0.026

share of placebos as large

Placebo std. error

9.88

spread of the fake effects

Normal-approx 95% CI

[−35.0, 3.8]

includes 0 — looks "insignificant"

Two tests, two verdicts — which do you trust?

Notice the tension. The rank-based permutation test gives p = 0.026: only one placebo state (out of 38) has an effect as large in magnitude as California's, so the result is significant at the 5% level. But the SE-based 95% interval [−35.0, 3.8] includes zero, which would normally be called "not significant".

Both are computed from the same placebos, so why disagree? With a single treated unit the standard error (9.88) is large and the normal approximation is crude — the interval is conservatively wide. The permutation test instead asks a sharper question: where does California rank among the placebos? It does not assume normality and, with one treated unit, is the more reliable read. The takeaway: report the placebo p-value, and treat the wide SE interval as a reminder that single-unit inference is inherently uncertain.

What to look for

California is in the left tail. Most placebo effects cluster near zero; California's −15.6 is more negative than all but one of them — the visual version of p = 0.026.
The shaded band is the SE-based 95% CI. It stretches past zero on the right, which is why a naive significance test would hesitate even though the rank is extreme.
A couple of placebos are large too (e.g. Rhode Island near −32). Those are states the model fits poorly; the permutation test counts them honestly rather than hiding them in a standard error.