Six ways to fill in the missing counterfactual
In January 1989 California raised its cigarette tax by 25 cents per pack (Proposition 99). Per-capita cigarette sales then dropped from 116 packs in 1988 to 60 packs in 2000 — a 48% fall. But smoking was declining nationwide too. The whole post is built around one question: how much of that drop was caused by the tax, and how much would have happened anyway?
Every causal estimator on offer answers that question the same way: construct a counterfactual — what California's sales would have looked like without Proposition 99 — and report the gap between observed and counterfactual as the policy effect. What changes from method to method is how the counterfactual is built. This app lets you see all six constructions side-by-side, simulate the easy and the hard cases, and reproduce the post's headline forest plot.
One observed series, six constructed counterfactuals
The orange curve is California's actual smoking history (1970–2000). The vertical dashed line is the 1989 policy threshold. Click any method button below to overlay that method's estimate of the no-Proposition-99 counterfactual. The shaded gap between the orange observed curve and the chosen dashed counterfactual is the policy effect that method reports.
—
Counterfactual Simulator
Generate a stylised policy panel. Set the control state's secular trend, then watch DiD-with-one-control collapse to noise while a multi-donor blend recovers the true effect.
Seven-Method Forest Plot
The post's headline figure. Toggle methods to compare the 13 to 20 pack consensus against the DiD-vs-Nevada and ITS-ARIMA outliers.
Bias & Variance Lab
Run 100 simulations with a true effect you set. Watch which methods are biased, which are noisy, and how the donor pool size changes Synthetic Control's variance.
Glossary (open a card if a term is unfamiliar)
Counterfactual
ATT — Average Treatment effect on the Treated
Parallel trends
Donor pool
RMSPE ratio
Fisher exact p-value
BSTS — Bayesian Structural Time Series
Posterior credible interval
Counterfactual Simulator — why one control is fragile, and many is robust
Simulate a stylised state-year panel where California is treated in year 0 and there are J donor states. You set the true ATT, the noise level, and the control-trend asymmetry — how strongly the single chosen control state drifts in the same direction as California's secular trend. Then watch the gap between (a) DiD against one neighbour and (b) a synthetic-control style weighted blend.
What to look for
- Push asymmetry to 0. The single control mirrors California's secular trend and DiD collapses to roughly zero (or to the true ATT only if you also lower the secular trend) — Nevada's fingerprint in §6.
- Push asymmetry to 1. The control state's trend flips against California's secular drift. DiD now overstates the effect because it subtracts a positive change instead of a negative one.
- Watch the teal SCM-style estimate. Even as the asymmetry slider whips back and forth, the blended estimate stays close to the true ATT, because averaging J donors washes out any single donor's idiosyncratic trend.
- Drop J to 5. The SCM-style blend gets noisier and starts to track the single-DiD estimate. This is why Synthetic Control papers ask for at least 20 donors.
The post's seven-method forest plot — interactively
These numbers come straight from table_cross_method.csv in the
post's folder — the same data used to produce fig9_cross_method_forest.png.
Toggle methods to compare. Five of the seven estimators agree on a
13 to 28 packs/capita reduction. DiD-vs-Nevada
(−5.7, CI crosses zero) and ITS-ARIMA
(+4.5) are the outliers — for completely different
reasons.
What to look for
- Toggle the two outliers off (DiD vs Nevada and ITS-ARIMA) and watch the remaining five methods cluster tightly between −12 and −28 packs/capita.
- Compare Synthetic Control (teal, −18.8) and CausalImpact (steel, −12.8). Both build their counterfactual from many donor states. Their intervals overlap. They are the workshop's most defensible answers.
- RDD on time (−20.1) lands in the middle of the consensus group, but inherits all the pre-trend mis-specification risk from ITS — its tight CI is conditional on the segmented-regression model being right.
- Naive pre-post and ITS (growth curve) are nearly identical (≈ −27 vs −28). Both use only within-California information, so they cannot separate the policy effect from the nationwide secular decline.
Methods
Synthetic California's donor recipe
tidysynth chose convex weights that minimise pre-1988 RMSE on
the lagged outcomes and four covariates. The optimal mix turns out to be
a five-state cocktail — Utah, Nevada, Montana, Colorado, Connecticut —
absorbing 99.8% of the weight. Read this as "the synthetic California
that best matches the real California's 1970–1988 trajectory is built
from these five states in these proportions".
Why does DiD vs Nevada collapse?
Nevada is geographically and culturally adjacent to California. Its own cigarette sales fell by 21.3 packs over 1984–1993 — almost as much as California's 27.0 pack drop. When DiD subtracts that Nevada change from California's change, almost all of California's drop is absorbed. Synthetic Control's response is to blend 38 donor states using data-driven weights, so no single similar-trending control can dominate.
Why does ITS-ARIMA flip sign?
The AICc-selected ARIMA(1, 2, 0) model double-differences California's pre-period series. That picks up the acceleration of the late 1980s downward trend and extrapolates that acceleration aggressively. The model's counterfactual lands below California's observed post-period sales, implying the policy "raised" smoking by ≈ 4.5 packs. This is the textbook warning against single-best-AIC ITS without a comparison-unit reality check.
Bias & Variance Lab — run the methods 100 times
Single runs are noisy. Run the simulator from Tab 2 one hundred times with fresh random draws (same parameters) to see whether DiD-with-one-control is systematically biased — and whether the multi-donor blend's variance is small enough to be useful. Each simulation regenerates J donor states from scratch and applies all three estimators.
Run the simulation
Each click runs 100 fresh draws and tallies the three estimators' answers.
Connecting back to the post
- The orange Naive-pre-post histogram is centred near zero or far from the true α depending on whether the secular trend is included in the simulator. This is why the post calls Naive pre-post "descriptive, not causal".
- The orange DiD histogram is wide — its variance comes from picking a single noisy control unit. Bigger σ ⇒ wider DiD histogram. The post's Nevada DiD has the same fragility.
- The teal SCM-style histogram is tight and centred on the true α at J = 20+. Its variance is √J times smaller than DiD's because averaging J donors washes out idiosyncratic donor noise. This is the principled justification for Synthetic Control over hand-picked DiD.
- Drop J to 5, run again. The teal histogram widens. At J = 5 the blend is still better than single-DiD, but the gain over DiD is small. This is the empirical justification for the "at least 20 donors" rule of thumb.