RCT Interactive Lab — Cash Transfer Program

What does randomization actually do?

A randomized controlled trial assigns each household to treatment (offered the cash transfer) or control by a coin flip — here stratified by poverty status. The key promise is that, on average, treated and control groups look identical on every characteristic (observed or unobserved) except for the program. The animation below redraws the assignment every few seconds: notice how each new draw gives a fresh split, yet the totals stay close to 50/50 in every stratum.

This is what makes a simple difference in means an unbiased estimate of the treatment effect: each side of the comparison is a random sample from the same underlying population. The post estimates an 11.6% increase in log monthly consumption from the simple diff-in-means — close to the true effect of 12%.

Stratified random assignment

Each circle is one household. Orange = offered the transfer; steel = control. The two rows correspond to the two poverty strata (block randomization keeps the treated share near 50% within each).

Did randomization actually balance the groups?

Even with randomization, finite samples produce chance imbalances. The Standardized Mean Difference (SMD) puts every covariate on a common scale: |SMD| < 0.10 is the standard rule of thumb. In this dataset the only variable that flirts with the threshold is female (SMD ≈ 0.093) — close to but still below 10%.

These are the actual baseline numbers from the post (Section 5). The female imbalance is why every covariate-adjusted estimator in Tabs 3 and 4 controls for gender.

Tab 2

Precision & covariates

If randomization already makes the comparison unbiased, why bother adjusting for covariates? See the variance-reduction story interactively.

Tab 3

RA / IPW / DR simulator

Generate a fresh RCT, then compare the three modern estimators side by side. Run 100 simulations to see the bias-variance picture.

Tab 4

All methods forest plot

The post's headline result — 12 estimators stacked against the true effect. Toggle methods to compare. Hover for SEs and CIs.

Glossary (click a card to expand)

RCT (Randomized Controlled Trial)

Treatment is assigned by a known random mechanism (here, stratified Bernoulli). Independence between assignment and potential outcomes is built in by design.

ATE vs ATT

ATE = average effect on the whole population. ATT = average effect on the treated. Under homogeneous effects and full randomization, ATE = ATT. DiD inherently estimates the ATT only.

ITT (Intent-to-Treat)

The effect of being assigned to treatment, regardless of actual receipt. Section 10 of the post separates offer (ITT) from receipt.

Regression Adjustment (RA)

Fits two outcome models (one per arm). Predicts both potential outcomes for every household, then averages the differences. Vulnerable to outcome-model misspecification.

Inverse Probability Weighting (IPW)

Models the propensity Pr(treat=1 | X). Reweights observations by 1/p̂ (treated) or 1/(1−p̂) (control). Vulnerable to extreme weights.

Doubly Robust (AIPW / IPWRA)

Combines outcome and propensity models. Consistent if either one is correct. Standard recommendation in modern causal inference.

Difference-in-Differences (DiD)

Compares each household's change over time (treated minus control change). Removes time-invariant unobservables. Identifies ATT.

DR-DiD (Sant'Anna & Zhao, 2020)

Extends doubly robust logic to panel data. Consistent if either the outcome model for ΔY or the propensity score is correctly specified.

Why adjust for covariates in an RCT?

Randomization already makes the comparison unbiased. So why bother with regression adjustment, IPW, or doubly robust estimators? The answer is precision. When household-level outcome variation can be partly explained by age, education, gender, and poverty, adjusting for those covariates removes that variation from the residual — shrinking the standard error of the treatment-effect estimate.

Slide the controls below to add more or stronger covariates. The orange curve is the sampling distribution of α̂ without controls; the steel curve shows what happens once covariates absorb part of the noise. Both are centered near the truth — but the steel curve is tighter.

Covariate explanatory power R² 0.30

How much of consumption variance the covariates explain. 0 = useless controls; 0.7 = very informative controls.

Sample size n 2000

Bigger n shrinks both SEs proportionally to 1/√n.

SE (no controls)

—

simple diff-in-means

SE (with controls)

—

RA / IPW / DR

Precision gain

—

SE reduction

Equivalent n-boost

—

free sample size

What to look for

At R² = 0 the two curves overlap exactly — useless controls cannot reduce variance.
As R² grows the steel curve tightens. The precision gain at R² ≈ 0.3 is around 16% — roughly equivalent to doubling your sample size for free.
Both curves stay centered near 0.12. Randomization makes the simple estimator unbiased; covariates do not remove bias (there was none), they remove noise.
This is why the post uses the phrase "precision improvement, not bias removal" — covariate adjustment in an RCT is fundamentally different from covariate adjustment in an observational study.

What the post's numbers say

In the actual data, the simple diff-in-means gave α̂ = 0.1157 (SE = 0.0194). Adding covariates (age, edu, female, poverty) via regression adjustment gave α̂ = 0.1125 (SE = 0.0191) — a 1.5% SE reduction. The gain is modest here because the covariates only explain a small slice of consumption variance and the existing imbalance is small. In real RCTs with stronger pre-treatment predictors (e.g. baseline consumption itself), the gain can be much larger.

The three modern estimators side by side

The post fits Regression Adjustment, Inverse Probability Weighting, and the Doubly Robust IPWRA estimator on the same endline data. In a well-designed RCT they all give nearly identical answers (~0.113). But what if randomization were imperfect — if treated households had systematically different covariates? The simulator below introduces a tunable amount of covariate imbalance so you can see when the three methods diverge.

Sample size n 500

Capped at 1000 so the 100-sim run finishes in <300 ms.

True treatment effect α 0.12

Held fixed across all 100 sims for comparison.

Covariate strength γ 0.50

How strongly covariates predict the outcome. Larger γ ⇒ adjusted SE shrinks more.

Confounding strength δ 0.30

0 = perfect randomization · 1 = treatment heavily depends on X (think observational). The post is δ ≈ 0 territory.

What to look for

δ = 0 (full randomization). All four estimators (simple, RA, IPW, DR) cluster near the true α. This is the post's regime.
Increase δ. The simple estimator drifts away from the truth — confounding biases it. RA, IPW, and DR all still recover the truth, demonstrating the value of covariate adjustment in observational settings.
Increase γ. The confidence intervals shrink — covariates that strongly predict the outcome reduce residual variance.
RA, IPW, and DR usually agree closely when both their models are roughly correct. The post emphasises that doubly robust is the safest default because it only needs one of the two models to be right.

Bias vs variance across many simulated RCTs

Single draws are noisy. Run the whole experiment 100 times (same parameters, fresh ε) to see whether the simple estimator's bias under confounding is systematic.

All 12 estimators in one picture

These numbers come straight from the post's Section 11 comparison table — the full pipeline of estimators applied to the same simulated RCT data. Hover any point to see its SE, CI, and the dataset it used. Toggle methods on and off to compare specific contrasts.

Method toggle

What to look for

Every confidence interval contains the true α = 0.12. Every method gets the right answer. The differences are in precision and what they estimate (ATE vs ATT, offer vs receipt).
The cross-sectional methods cluster tightly (~0.113, SE 0.019). RA, IPW, and IPWRA agree to three decimal places — the convergence the post emphasizes for well-designed RCTs.
DiD estimates are higher (~0.135–0.137, SE 0.027). Wider SEs because differencing within households absorbs more variance; higher point estimates because they target the ATT, not the ATE.
Endogenous (etregress) jumps to 0.147. That's the effect of receiving the transfer, not being offered it — naturally larger because only ~85% of treated households complied.
DR-receipt comes back down to 0.117. The doubly-robust receipt estimator with baseline y₀ as a control gives the answer closest to the truth.

Color key

Steel: cross-sectional offer effect (ATE/ATT using endline only)
Teal: panel DiD methods (uses both baseline and endline)
Orange: receipt effect (per-recipient, accounts for compliance)

Connecting back to the post

The post's Section 12 summary states: "The cash transfer program increased household consumption by approximately 11–14% across all estimation methods, close to the true effect of 12%. Every confidence interval contained the true value." This forest plot makes that claim visible at a glance — and lets you see which methods land where relative to the dashed teal line at α = 0.12.