Did Proposition 99 cut smoking? Three ways to draw the counterfactual
In 1988 California passed Proposition 99, raising cigarette taxes and funding anti-smoking programs. Cigarette sales fell — but sales were already falling everywhere. The hard part is the counterfactual: how many packs would Californians have smoked without the law? Three methods answer that question with three different recipes, and they disagree by a lot. Use the tabs above to build each counterfactual yourself.
(2×2, equal weights)
(unit weights only)
(unit + time weights)
Average treatment effect on the treated (ATT), in cigarette packs per capita per year, 1989–2000.
One regression, three weighting schemes
DiD, synthetic control, and SDID are not three unrelated tools. Each is the same two-way fixed-effects regression of packs per capita on a treatment dummy, fit with state fixed effects and year fixed effects. The only thing that changes is which observations get weight: which donor states stand in for California, and which pre-treatment years define the baseline.
DiD
Units: all 38 controls, equal weight
Years: all pre-years, equal weight
The control group is a plain average of every other state. Assumes California's untreated path would parallel that average — even though controls were higher and flatter before 1989.
Synthetic control
Units: a few donors, weights ω
Years: all pre-years, equal weight
Picks a sparse, weighted blend of donor states that tracks California's level before 1989. Here just 6 states get weight (Utah, Montana, Nevada, …).
SDID
Units: donors, weights ω
Years: recent pre-years, weights λ
Adds time weights on top of unit weights, concentrating the baseline on the years just before 1989 (1986–88) that look most like the post-period.
Weighting-scheme explorer
Flip between DiD, SC, and SDID and watch California's counterfactual redraw itself. See why DiD overshoots and SDID lands in the middle.
Counterfactual & gap
The treated-vs-synthetic view, with the estimated effect (the gap) charted below and the ATT labelled on the post-1989 region.
Placebo inference
With one treated unit, you cannot rely on a standard error alone. Pretend each control was treated, collect 38 placebo effects, and see how extreme California really is.
Why does SDID land between DiD and SC?
DiD compares California to a flat average of higher-smoking states, so it reads off a large drop (−27.3). Synthetic control fixes the level mismatch with unit weights and shrinks the estimate to −19.5. SDID keeps those unit weights and down-weights distant pre-years that no longer resemble California, removing the last bit of trend mismatch — landing at −15.6 packs per capita. The three estimates are a story about how much each method trusts parallel trends.
Glossary — open a card if a term is unfamiliar
Counterfactual
ATT
Difference-in-differences (DiD)
Parallel-trends assumption
Synthetic control (SC)
Unit weights (ω)
Time weights (λ)
SDID
Placebo inference
Donor pool
Weighting-scheme explorer — pick a method, redraw the counterfactual
California's observed smoking (thick orange) is fixed. What moves is the dashed counterfactual — the smoking path California "would have had". Each method draws it differently because each weights the donor states and pre-years differently. Click DiD, SC, or SDID and watch the counterfactual snap to a new shape.
What to look for
- DiD's counterfactual barely bends. It is California's pre-level shifted by the controls' average change. Because controls fell less steeply, the implied counterfactual stays high — so the gap (and the −27.3 ATT) is the largest of the three.
- SC hugs California tightly before 1989, then separates. The synthetic blend is built to match the pre-period path, so the post-1989 gap is a cleaner read on the policy — a smaller −19.5.
- SDID's counterfactual sits closest to California's late-1980s trajectory. Time weights pull the baseline toward 1986–88, the years most like the post-period, giving the most conservative estimate, −15.6.
Synthetic control weights (sparse)
Only 6 donor states get any weight. The synthetic counterfactual is a blend of these states.
SDID weights (diffuse)
SDID spreads weight across many states — no single donor dominates. Top 12 shown.
Sparse vs diffuse — why it matters
Synthetic control's sparsity is a feature for transparency (you can name the handful of states standing in for California) but a liability for robustness — leaning on Utah at 39% means one idiosyncratic state can move the whole estimate. SDID's diffuse weights spread that risk across dozens of states, which is one reason its placebo distribution (Tab 4) is well behaved.
Counterfactual & gap — the effect is the space between the lines
The treatment effect in any given year is simply California's observed smoking minus its counterfactual. Above, the two lines; below, the gap between them. Before 1989 the gap should hover near zero (a good counterfactual matches the past); after 1989 it opens up — that opening is the policy effect. Pick a method and read the gap.
What to look for
- The pre-1989 gap is the credibility test. For SC and SDID it stays near zero before the law — the counterfactual reproduces the past. For DiD the pre-period gap wanders, a visible warning that parallel trends is shaky.
- The effect widens every year. By 2000 California smokes roughly 26 packs per capita less than its synthetic twin — the cumulative payoff of a sustained tobacco-control program, not a one-time jump.
- The dashed ATT line is the average of the post-1989 gap. Switch methods and watch it move from −15.6 (SDID) to −19.5 (SC) to −27.3 (DiD) — the same ranking you saw on the Concept tab.
Placebo inference — is −15.6 bigger than chance?
One treated state means there is nothing to average over for a standard error in the usual way. SDID's answer: pretend, one at a time, that each of the 38 control states got the "treatment" in 1989, estimate the effect, and collect all 38 fake (placebo) effects. If California's real −15.6 sits far out in the tail of that distribution, the result is unlikely to be noise. The orange line is California; the steel bars are the placebos.
Two tests, two verdicts — which do you trust?
Notice the tension. The rank-based permutation test gives p = 0.026: only one placebo state (out of 38) has an effect as large in magnitude as California's, so the result is significant at the 5% level. But the SE-based 95% interval [−35.0, 3.8] includes zero, which would normally be called "not significant".
Both are computed from the same placebos, so why disagree? With a single treated unit the standard error (9.88) is large and the normal approximation is crude — the interval is conservatively wide. The permutation test instead asks a sharper question: where does California rank among the placebos? It does not assume normality and, with one treated unit, is the more reliable read. The takeaway: report the placebo p-value, and treat the wide SE interval as a reminder that single-unit inference is inherently uncertain.
What to look for
- California is in the left tail. Most placebo effects cluster near zero; California's −15.6 is more negative than all but one of them — the visual version of p = 0.026.
- The shaded band is the SE-based 95% CI. It stretches past zero on the right, which is why a naive significance test would hesitate even though the rank is extreme.
- A couple of placebos are large too (e.g. Rhode Island near −32). Those are states the model fits poorly; the permutation test counts them honestly rather than hiding them in a standard error.