CATE with Stata 19 — Interactive Lab

A pedagogical companion to Conditional Average Treatment Effects (CATE) with Stata 19 ↗ Back to the post

The ATE is one number. The CATE is a function.

The textbook causal-inference workflow ends with a single number — the Average Treatment Effect. But policy makers, doctors, and managers rarely care only about the average. They want to know for whom the program works best, and for whom it does little. This question — how the treatment effect varies across the covariates — is captured by the Conditional Average Treatment Effect (CATE), τ(x) = E[Y(1) − Y(0) | X = x].

Stata 19's new cate command estimates τ(x) with cross-fit lasso + a causal forest, and seven postestimation tools turn the resulting function into pictures. This app lets you turn the dials yourself. In four tabs you will: see why an ATE can hide an enormous fan of effects; simulate a small CATE problem with a known truth and watch τ̂(x) recover it; explore the post's GATE-by-income and GATES-by-quartile bars interactively; and dig into the actual histogram of household-level effects across the 9,913 households in the assets3 dataset.

Why "the average hides the variation" — animation

The orange dashed line is the ATE: a single number (about \$8k for 401(k) eligibility on net assets). The teal curve is the true CATE: it dips, climbs, and jumps with income. The marker bouncing along the curve illustrates the problem — different households face very different treatment effects, but the ATE flattens it all into one headline.

Tab 2

CATE Simulator

Set the true τ(x) function yourself (linear, U-shape, or sigmoid). Slide n, noise, and heterogeneity. Watch how a "regression on x" estimator chases the truth as the data piles in.

Tab 3

GATE / GATES Bars

The post's two headline figures, interactively. Toggle between prespecified income groups (GATE) and data-driven quartiles (GATES). Hover for SE, CI, and p-value.

Tab 4

IATE Explorer

Histogram + scatter of the 9,913 household-level effects exported from Stata. Switch the x-axis between age, education, and income to see the post's IATE plots reconstructed in D3.

Glossary (open a card if a term is unfamiliar)

ATE — Average Treatment Effect
E[Y(1) − Y(0)]. A single number. The headline policy effect. Estimated by the four ATE rows in Tab 3.
CATE — Conditional ATE
τ(x) = E[Y(1) − Y(0) | X = x]. A function of covariates. The ATE is its average. When CATE varies, the ATE is hiding something.
IATE — Individual ATE
τ̂ᵢ = τ̂(xᵢ). The CATE evaluated at one household's covariate profile. Visible in the Tab 4 histogram.
GATE — Group ATE
Average of τ(x) inside a prespecified group (e.g., income category). Tests subgroup hypotheses you wrote down in advance.
GATES — Sorted GATE
Average of τ(x) inside data-driven quartiles of predicted τ̂. The strongest moderation signal a beginner can read without naming the moderator.
PO — Partialing Out
Residualises Y and D on covariates with cross-fit lasso, then fits a causal forest on residuals. Robust to extreme propensity scores.
AIPW — Augmented IPW
Doubly-robust score combining outcome models and propensity weights. More efficient than PO when both nuisance models are well specified.
Heterogeneity test
χ²(1) test that τ(x) is constant. Stata 19's estat heterogeneity. Rejection licenses the CATE / GATE / GATES interpretation.

CATE Simulator — set the truth, watch the estimator chase it

Most causal inference courses skip this step. They teach you the estimator without showing what it is trying to recover. Here you set the true τ(x) function (the blue curve), drag sliders for sample size and noise, and watch the per-household estimates τ̂ᵢ (teal dots) scatter around the truth. The flat orange line is the ATE — the average of τ over the covariate distribution. When the blue curve is steep and the orange line is flat, that is the gap CATE was invented to detect.

More data ⇒ τ̂ᵢ tightens around the true τ(x).
Outcome noise. Bigger σ ⇒ noisier τ̂ᵢ.
0 = constant τ (ATE = CATE) · 2 = strong heterogeneity (steep curve).
Pick the functional form for τ(x). The estimator does not know which.
true ATE = E[τ(x)]
average of the true CATE function
estimated ATE
mean of τ̂ᵢ across households
τ̂ range
min and max estimated effects
heterogeneity test (toy)
F-stat on a linear projection of τ̂ on x

What to look for

  • Set heterogeneity β = 0. The blue curve flattens onto the orange ATE line; the heterogeneity F-test should drop close to zero. This is the regime where the ATE is enough.
  • Set β = 1.5 and σ = 0.5. The teal dots track the blue curve precisely — high signal, low noise. This is the easy heterogeneity-detection regime.
  • Set σ = 3.0 with n = 100. The teal dots scatter wildly even though the truth is smooth. Cross-fitting and forests help here, but no method beats data: more n is the cleanest fix.
  • Switch the τ(x) shape to "U-shape". A linear-projection summary (like estat projection) would miss the curvature. This is why the post's R² of 0.0045 is not a problem — most heterogeneity is genuinely nonlinear.

Group-level effects — interactive bars from the post

These bars come straight from the post's PO model on the 9,913-household assets3 sample. Switch between the two views: GATE sorts households into the prespecified Stata incomecat bins and averages the doubly-robust score in each. GATES lets the data sort households into data-driven quartiles of predicted τ̂ — the strongest moderation signal a beginner can read without naming the moderator. The fourth panel is a forest plot of three different ATE estimators on the same data.

Pick a view

Joint heterogeneity test
Highest group / quartile
most-responsive subgroup
Lowest group / quartile
least-responsive subgroup
Top-to-bottom ratio
how much the average hides

What to look for

  • GATE view: the lowest-income category gains \$4,087 (significant) but income category 1 drops to \$1,399 (n.s.) — a non-monotone wrinkle the post flags as a real finding. The top income category gains \$20,511 — over five times the average.
  • GATES view: the data-driven ladder is clean and monotone — \$17,279 → \$8,121 → \$3,444 → \$2,919. The bottom quartile is not statistically distinguishable from zero (orange dot). About a quarter of households appear to gain little or nothing from eligibility.
  • Forest plot view: three independent ATE estimators (parametric AIPW, PO ML, AIPW ML) all bracket each other within a \$200 spread. The naive raw difference (\$19,557) is inflated by 2.4× — about 60% of it was selection, not causation.

Why GATES is more trustworthy than ad-hoc subgroups

A common temptation is to slice the data by every covariate and report the most extreme subgroup. That is p-hacking. GATES avoids it by using out-of-sample cross-fit predictions to assign each household to its quartile — each unit's bin is determined by data it did not contribute to. The resulting bins are honest in the same way a held-out test set is honest. The 5.9× spread in the GATES view is therefore a defensible number, not a fishing-expedition artefact.

The shape of individual effects — the real τ̂ᵢ from Stata

Stata wrote one predicted treatment effect for each of the 9,913 households in iate_predictions.csv. The histogram below shows their distribution; the steel-blue and dashed lines mark the ATE (~\$7,937) and the median (~\$5,815). Then switch the x-axis below to see how τ̂ᵢ varies with age, education, and income — the same picture as Stata's categraph iateplot figures, rebuilt in D3.

Histogram of household-level treatment effects τ̂ᵢ (n = 9,913)

Each bar is a count of households whose estimated effect falls in that dollar bin. Orange bars sit below zero — a small minority of households are estimated to gain little or nothing.

How τ̂ᵢ varies with one covariate

Steel-blue dots: 800 randomly sampled households. Teal line + band: bin mean ± 1.96 SE.

Top-vs-bottom-quartile profile (estat classification)

Stata's estat classification compares the mean of each covariate in the top-effect quartile against the bottom-effect quartile. The three covariates below all differ by a large margin (t-statistics in the 19–56 range). Income is the dominant moderator.

Variable Top quartile Bottom quartile Difference t

Formal heterogeneity tests

The CATE story rests on the formal rejection of "no heterogeneity". Stata's estat heterogeneity tests the null that τ(x) is constant against the alternative that it varies. Both PO and AIPW reject at the 5% level, and the joint GATE test rejects at p = 0.001.

Test χ² df p-value Verdict